Empirical Investigation on Practical Robustness of Keystroke Recognition Using WiFi Sensing for Future IoT Applications

Wang, Haoming; Sharma, Aryan; Mishra, Deepak; Seneviratne, Aruna; Ambikairajah, Eliathamby

doi:10.3390/fi17070288

Open AccessArticle

Empirical Investigation on Practical Robustness of Keystroke Recognition Using WiFi Sensing for Future IoT Applications^†

by

Haoming Wang

¹

,

Aryan Sharma

^1,2

,

Deepak Mishra

^1,*

,

Aruna Seneviratne

¹ and

Eliathamby Ambikairajah

¹

School of Electrical Engineering and Telecommunications, University of New South Wales, Sydney, NSW 2052, Australia

²

Cyber Security Cooperative Research Centre, Kingston, ACT 2600, Australia

^*

Author to whom correspondence should be addressed.

^†

This paper is an extended version of our paper published in the IEEE Global Communication Conference (GLOBECOM) Workshops, Kuala Lumpur, Malaysia, 8–12 December 2023.

Future Internet 2025, 17(7), 288; https://doi.org/10.3390/fi17070288

Submission received: 1 May 2025 / Revised: 19 June 2025 / Accepted: 23 June 2025 / Published: 27 June 2025

(This article belongs to the Special Issue Efficient and Secure Wireless Communications and Networking: Architecture and Applications)

Download

Browse Figures

Versions Notes

Abstract

The widespread use of WiFi Internet-of-Things (IoT) devices has rendered them valuable tools for detecting information about the physical environment. Recent studies have demonstrated that WiFi Channel State Information (CSI) can detect physical events like movement, occupancy increases, and gestures. This paper empirically investigates the conditions under which WiFi sensing technology remains effective for keystroke detection. To achieve this timely goal of assessing whether it can raise any privacy concerns, experiments are conducted using commodity hardware to predict the accuracy of WiFi CSI in detecting keys pressed on a keyboard. Our novel results show that, in an ideal setting with a robotic arm, the position of a specific key can be predicted with

99 %

accuracy using a simple machine learning classifier. Furthermore, human finger localisation over a key and actual key-press recognition is also successfully achieved, with

94 %

and

89 %

reduced accuracy values, respectively. Moreover, our detailed investigation reveals that to ensure high accuracy, the gap distance between each test object must be substantial, while the size of the test group should be limited. Finally, we show WiFi sensing technology has limitations in small-scale gesture recognition for generic settings where proper device positioning is crucial. Specifically, detecting keyed words achieves an overall accuracy of

94 %

for the forefinger and

87 %

for multiple fingers when only the right hand is used. Accuracy drops to

56 %

when using both hands. We conclude WiFi sensing is effective in controlled indoor environments, but it has limitations due to the device location and the limited granularity of sensing objects.

Keywords:

WiFi sensing; Channel State Information; test bed; wireless IoT devices; machine learning; keystroke detection

Graphical Abstract

1. Introduction

1.1. Background

As Internet-of-Things (IoT) technology continues to advance, communication signals are increasingly present in our environment. Recent studies have shown that these signals can also be used for sensing purposes [1,2,3,4]. For instance, the Orthogonal Frequency Division Multiplexing (OFDM) WiFi signals with the IEEE 802.11n protocol can be analysed to gain sensing insights into the channel through which they travel and thus gain information by processing the data within it. This information, called the CSI signature, is sensitive to any changes in the channel, from macroscopic changes such as fall detection, occupancy counting, and flame detection [5,6,7], to microscopic changes such as human heartbeats and breathing [8,9]. WiFi sensing is particularly interesting for detecting personal electronic device inputs such as phone passwords [10] and keystrokes [11]. It may be noted that the privacy-preserving mechanisms for WiFi sensing are a bit different from those of the blockchain-based IoT networks, as the nature of signal-based inference poses distinct challenges and mitigation strategies. [12].

In this paper, we focus on the wireless keystroke recognition [11]. The result shows that typing patterns and sentences can be mined from WiFi, for which the resolution of applications is enough to cover small devices. Previously, a CSI-based low-cost commercial off-the-shelf device (Raspberry Pi) structure was generated which required intricate models and a thorough comprehension of the capabilities and limitations of WiFi-based keystroke recognition. Furthermore, accurate hand location and motion detection are necessary for this application. This paper investigates the efficiency of these IoT-enabled WiFi sensing technologies for keystroke detection using IoT CSI data analytics and embedded software. Our investigation involves building an experimental test bed to quantify the implications of this technology on the security and privacy preservation of sensitive data that may be compromised due to the next-generation sensing capabilities of innovative services enabled through future IoT application support.

1.2. Motivation and Contributions

Prior research has established that environmental changes and device topology can affect WiFi sensing technology [13]. However, there is a need to explore the limits on the robustness of this technology in accurately detecting small-scale movements such as keystrokes.

While prior work has achieved keystroke recognition under ideal setups [11], this study takes a different approach by focusing on the fundamental robustness of WiFi CSI sensing under controlled but increasingly challenging conditions. We emphasise that our findings are context-specific and should not be interpreted as demonstrating generalisable performance.

Keystroke recognition using wireless sensing offers promising applications in human–computer interaction, authentication, and health monitoring. However, it also raises serious privacy concerns. The ability to infer typed input passively without user consent or awareness poses potential threats to user privacy, especially in environments like homes or workplaces. Motivated by this dual-use potential, this work not only explores the technical capabilities of keystroke sensing using commodity WiFi and backscatter systems but also assesses scenarios where privacy risks are mitigated by signal degradation, such as during natural two-hand typing. By doing so, we aim to contribute to both technical advancement and responsible deployment of wireless sensing technologies. Keystroke recognition via wireless sensing enables applications in interaction, authentication, and health monitoring, but also raises privacy concerns. While accuracy drops during natural two-handed typing, offering some privacy protection, high recognition accuracy persists in single-finger scenarios like ATM or PIN entry. This underscores the need for privacy safeguards tailored to specific use cases and threat models.

We aim to evaluate the reliability and durability of machine learning-enabled WiFi sensing technology under different real-world circumstances and obstacles. This assessment will help us understand the potential privacy threats of WiFi sensing and determine the essential machine learning techniques required to meet its real-time and sustainability requirements. Moreover, our study aims to investigate the accuracy of WiFi sensing for keystroke detection applications and assess the need for mandatory sensing standards in privacy-conscious designs. Through our empirical investigation, we would like to advance knowledge and deepen our understanding of whether WiFi sensing-based keystrokes and keyed data detection can pose any privacy threats in different practical scenarios. All performance evaluations are based on real-world experimental measurements conducted on our deployed system. Overall, we make the following five contributions.

We demonstrate that commercial off-the-shelf WiFi devices can be effectively used for small-scale gesture and keystroke recognition through a detailed, fully experimental evaluation. This promotes the practicality, reproducibility, and accessibility of WiFi sensing for broader real-world applications.
We propose a baseline performance study using a two-jointed robotic arm to simulate ideal, highly repeatable typing scenarios. This provides an upper bound for recognition accuracy by minimising human-induced variability, serving as a reference for future robustness studies.
Building upon the robotic benchmark, we introduce a human hand to localise keystrokes, thereby introducing realistic motor variability. This step bridges the gap between idealised and real-world conditions, offering insights into the system robustness under natural hand movements.
We empirically investigate how key factors, such as inter-key distance, number of keys in the classification set, and Tx-Rx separation, affect the spatial resolution and performance of WiFi-based keystroke recognition systems.
We conduct a final set of experiments simulating varied, realistic typing scenarios including single-finger, multi-finger, and both-hand inputs. These experiments reflect practical user behaviours and provide evidence of the system’s potential as a countermeasure against WiFi-based keystroke inference attacks.

The paper is structured as follows: Section 2 presents a summary of the state-of-the-art in WiFi sensing and its various applications. Section 3 delves into the WiFi sensing system model and the underlying concepts of keystroke detection. In Section 4, we demonstrate a preliminary test bed for WiFi-based keystroke detection, initially validating the feasibility of CSI-based keystroke detection using a robotic arm. Section 5 extends the system to detect a hovering hand, followed by real keystroke detection experiments in Section 6. These experiments reveal the spatial resolution limitations of keystroke detection. In Section 6, we also examine the impact of key spacing and the number of keys being recognised on detection accuracy. Our aim is to identify factors that influence machine learning performance and to suggest directions for future work. Building on the evaluation of keystroke detection, Section 7 explores the feasibility of name detection. Finally, Section 10 investigates how the geometric placement of the transmitter and receiver affects overall accuracy.

2. State of the Art

2.1. WiFi Sensing and Localisation

WiFi sensing is predicated on the propagation of WiFi signals through physical environments [2,3]. These signals incur multipath propagation effects such as scattering, reflection, refraction, and fading when wireless signals are sent from a transmitter to a receiver [14]. These multipath effects can be encapsulated by the CSI, which is a complex-valued metric that describes the total attenuation and phase delay incurred on the wireless signal as it propagates between Tx and Rx. For a WiFi system with OFDM, when a static object is placed in the Line of Sight (LoS) of the wireless signal channel, it can affect each subcarrier uniquely, and we can use these CSI time series patterns for further data processing. Recent work has capitalised on this metric and applied machine learning (ML) in order to infer qualities about the environment based on changes in CSI [3,15].

One such application of CSI is localisation, in which users can be tracked as they move through indoor environments [16]. Currently, CSI-based indoor human localisation and tracking is achieved mainly based on Multiple-Input–Multiple-Output (MIMO) systems and the usage of the CSI phase, which, for instance, includes phase offset calibration [17], MUSIC-based phase calibration [18], phase sanitization [19], and so on. Whilst current endeavours in the literature are successful in localising large objects in the channel [20], it is less clear whether the position of smaller entities can be discerned.

2.2. DSP on Gesture Recognition

Beyond localisation, another application of WiFi sensing receiving tremendous attention is gesture recognition [2]. This application leverages the small-scale ripples created in the CSI time series by temporal variations in the channel [2]. Through conventional processing tools such as low-pass averaging filters and Principal Component Analysis (PCA), these variations can be further emphasised within the CSI time series [11]. Raw CSI always contains outliers and noises which can significantly reduce the performance of CSI and thus need to be firstly removed. Prior work has shown that effective outlier removal techniques included the moving average filter [21], Hampel filter [22], median filter [23], and phase removal techniques like linear regression [24]. The noise reduction is always followed by effectively applying frequency decomposition techniques such as the Discrete Wavelet Transform (DWT) and Fast Fourier Transform (FFT) to extract features pertaining to the gestures from the CSI time series data [2,11,15]. One interesting example of this is the extraction of keystroke information from CSI as users type on keyboards in the presence of WiFi networks. It was shown that the mentioned filtering and feature extraction techniques could uncover artifacts of human typing in WiFi CSI [11]. With intelligent ML frameworks, these artifacts were used to correctly identify which key was pressed [11]. While the paper successfully recognised all 26 keys with an overall

82.78 %

accuracy, they classified each key individually and fixed device positioning and gap distance. Based on this, there is a lack of insight into what the scale of the technology could be, since the effects of these factors remain a prominent research question, preventing the widespread adoption of such technologies in commercial environments.

2.3. Existing Technologies for Keystroke Detection

WiFi sensing technology for keystroke detection raises significant privacy concerns due to its ability to infer sensitive information without direct interaction. Zhang et al. [25] introduced “Widar”, a system that used commodity WiFi devices to detect human activities, including keystrokes, by analysing CSI. However, Widar’s performance was highly sensitive to environmental changes such as furniture rearrangement or user orientation, which could degrade accuracy significantly in real-world scenarios. Building upon this, Chen et al. developed “WiKey”, which achieved over

97.5 %

detection accuracy for keystrokes using a TP-Link router and a Lenovo laptop [11]. Despite its high accuracy, WiKey required strict line-of-sight placement and stable signal environments, making it less practical in dynamic or cluttered settings. Later, Hu’s team proposed “WiKI-Eve”, a system that eavesdropped on keystrokes using beamforming feedback information from smartphones, achieving up to

65.8 %

accuracy in password inference [26]. The drawback of WiKI-Eve lay in its relatively low accuracy for complex keystroke sequences and its reliance on specific hardware configurations to access beamforming data, which may limit scalability. Besides these applications, a recent system called “Wi-Crack” was introduced [27], which combined amplitude, phase, and their differences in CSI to enhance keystroke recognition accuracy on smartphones. While Wi-Crack improved signal feature extraction, it remained vulnerable to cross-device variability and still struggled in environments with multiple users or movement noise. Similarly, in 2024, Shen et al. proposed “WiPass” [28], a 1D-CNN-based system for numerical keypad inputs on smartphones, demonstrating the feasibility of keystroke inference via WiFi signals. Although WiPass showed promise, its deep learning model was heavily dependent on training data collected in static environments, raising concerns about robustness across different settings and users. Additionally, the model training process required considerable computation and may not be feasible for real-time applications without specialised hardware. Although all of these technologies have been shown to have high accuracy, the literature shows a lack of research on the spatial resolution of keystroke detection.

2.4. Deep Learning Approaches in CSI-Based Sensing

Recent research has applied deep learning (DL) techniques to enhance performance across various sensing tasks. DL models like Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM) networks, and hybrid CNN–LSTM architectures can effectively learn spatial–temporal patterns from CSI data. SenseFi [4], for instance, provides a benchmark dataset and DL framework that outperforms traditional methods. Li et al. [16] leveraged a CNN and LSTM for CSI-based indoor localisation, while Wang et al. [17] and Gong and Liu [18] proposed Fresnel-based and sparse recovery models, respectively. Zheng et al. [19] and Feng et al. [20] further contributed by incorporating orientation-aware features and compiling DL-focused surveys. These methods also extend to gesture recognition and biometric authentication. However, high resource demands limit deployment on lightweight IoT devices, underscoring the need for efficient model design.

2.5. Privacy Attacks via Wireless Sensing

CSI-based sensing also introduces privacy risks. Prior work demonstrates that wireless signals can be exploited to infer sensitive user behaviour. Li et al. [10] revealed password inference using public WiFi CSI. WiKI-Eve [26] used beamforming feedback for keystroke monitoring, and WiPass [28] employed CNNs for smartphone keypad recognition. Systems like WiKey [11] and Wi-Crack [27] show that small signal perturbations can leak keystroke information, even in cluttered environments. These techniques achieve high accuracy in ideal settings but degrade with real-world variability. Our study contributes by showing a significant drop in keystroke sensing accuracy under realistic two-hand typing, revealing a natural privacy safeguard and informing more responsible sensing system design.

3. System Description

In this section, we outline the fundamental principles of WiFi sensing and CSI-based keystroke detection.

3.1. CSI WiFi Sensing Model

WiFi CSI-based sensing is predicated on the propagation of packets through sensing subjects, which the transmission model can express as [29]:

Y (f, t) = H (f, t) \times X (f, t) + N (f, t)

(1)

where for a specific time instant t and frequency f,

Y (f, t)

is the received wireless signal,

X (f, t)

is the transmitted signal,

H (f, t)

represents CSI, and

N (f, t)

is noise. In modern protocols such as IEEE802.11n and others that followed, the OFDM technique is used to modulate data onto multiple frequency bins called subcarriers. Each subcarrier is a frequency-selective channel, which gives us the CSI readings. Every CSI reading has a unique sensitivity to the channel due to signal propagation in the channel. CSI is acquired as a complex value channel gain at time t. This value provides insights into the attenuation and delay that the channel introduces on the subcarriers operating at various frequencies. It is presumed that the physical medium encountered by these electromagnetic waves at different frequencies will affect them in distinct ways. Consequently, analysing their responses allows us to infer the characteristics of the signal’s path.

In Figure 1, we illustrate the generalised WiFi sensing architecture. The passive sensing and estimation of CSI are based on open-source software such as Nexmon [30], which allows Network Interface Cards (NICs) to extract CSI from received packets. The Tx propagates WiFi packets through the sensing medium, which contains hands performing gestures such as waving, moving, and touching. The act of performing a particular movement causes a change in the wireless signal as mentioned at the beginning of Section 2, consequently altering the amplitude of the ping packets acquired by the Rx. The sensitivity of this device is remarkably high, making it susceptible to even the slightest alterations in its surrounding environment [31]. Consequently, any minor changes in the surroundings, along with the burst noise caused by hardware [32], introduce unwanted disturbances to the channel. The firmware in the Rx includes the Nexmon CSI extraction tool [30], which calculates CSI for received packets and then processes them before being fed into an ML algorithm to predict physical gestures. The definition of the CSI calculation can be illustrated using the following equation [14]:

H (f, t) = \sum_{n = 1}^{N} a_{n} (t) e^{- j 2 π f τ_{n} (t)}

(2)

In this equation,

a_{i} (t)

is the amplitude attenuation factor, and

τ_{i} (t)

is the propagation delay. The movement and location changes of test objects or people impact the CSI amplitude attenuation and phase change.

3.2. Keystroke Recognition

The underlying principle behind keystroke and gesture recognition is that a typing finger incurs changes in the transmitted signal. This is dependent on both the location and the movement of the finger, as depicted below.

In Figure 2, we illustrate a side view of the keystroke recognition system model, in which a hand can be seen moving as it presses a key. In this model, there are 2 contributing factors to the multipath propagation environment.

Firstly, the static channel multipath profile will be defined by the position of the hand in 3D space, as demonstrated by the $x, y, z$ plane in Figure 2. When typing on a keyboard, the signals between the Tx and Rx encounter distinct reflection and attenuation patterns due to the hand’s position over the keys in 3D space.
In the diagram shown in Figure 2, when the hand presses a key, the signal paths reflected by the object are dynamically altered. This leads to a ripple effect in the CSI time series data, with each key press causing changes in the multipath environment between the initial profile (dashed line), the pressed profile (solid line), and back to the initial profile.

In this paper, we iteratively validate the viability of a keystroke recognition model. We begin by demonstrating the effectiveness of the static CSI profile for predicting hand positioning, followed by the use of dynamic CSI profiles for classifying the keystrokes.

4. Robotic Arm Position Detection

To establish the best possible model for predicting hand positioning over a keyboard, we used a controllable robotic arm. This model allowed us to understand the ceiling on accuracy achievable for real case recognition, since the human hand is always less accurate than a robot arm, which can reliably control the position above a keyboard without any shaking or jittering. This section describes the structure of the robotic arm, along with the Proportional–Integral–Derivative (PID) control system used to regulate its position over the keyboard.

4.1. Robotic Arm Control

The robotic arm model we used was manufactured by Quanser, as shown in Figure 3. It is a 2-Degree-of-Freedom (DOF) robotic arm that allows for rotational control at the shoulder and elbow joints. We introduced the Inverse Kinematic Equation (IKE) in 2D space, which was used to compute the control angles needed for each part of the robotic arm, in order to position it over each key on the keyboard. The angles were computed by:

θ_{1} = {tan}^{- 1} (\frac{y}{x}) - {tan}^{- 1} (\frac{l_{2} sin (θ_{2})}{l_{1} + l_{2} cos (θ_{2})})

(3)

θ_{2} = {cos}^{- 1} (\frac{x^{2} + y^{2} - l_{1}^{2} - l_{2}^{2}}{2 l_{1} l_{2}})

(4)

where

(x, y)

is the coordinate,

θ_{1}, θ_{2}

,

l_{1}, l_{2}

are the corresponding angle and length of the shoulder and elbow. The Quanser robot was controlled using Simulink and the process model depicted in Figure 4. Here, the calculated joint angles were scaled into control voltage inputs, which were fed into a PID control loop. The control loop ensured that the final position of the robotic arm was identical between trials, allowing us to perform a highly repeatable experiment for localisation.

The PID control system was characterised by 3 design factors that affected the speed of the robotic arm as it moved, as well as its transient response characteristics. To ensure a repeatable position between trials, we designed PID parameters to ensure no steady-state error in position and an over-damped transient response to prevent shaking of the experimental apparatus. The PID coefficients are detailed in Table 1.

4.2. Experimental Setup

The experiment took place in a commercial office space where there were regular communication signals and interference from other WiFi devices.

In Figure 5, we depict the experimental setup for our robotic arm position detection experiment. For all experiments shown in this paper, we used a single antenna. On both sides of the robotic arm, there were Tx and Rx devices set up with a distance of

1.02

m between them. These devices were Raspberry Pi 4B units that operated on channel 2 of the

2.4

GHz WiFi spectrum. We selected the

2.4

GHz spectrum according to the 802.11n WiFi standard for all our experiments in this work to ensure a fair comparison with previous studies [11]. Due to the natural property of CSI, it is very sensitive to changes in the environment and the layout of Tx-Rx. Hence, the robotic arm and devices were controlled remotely while all other setups stayed in a constant position throughout the experiment in order to maintain minimal interference in the channel.

The Tx propagated packets into the channel at a rate of 1300 Hz via one antenna with a 2 dBi gain. We chose this high packet rate to maximise the CSI sampling rate and allow for better sensing resolution. Each sample contained CSI values for 52 independent subcarriers. We collected CSI for 10 s with the robotic hand programmed to hover over each alphabetic key (A–Z) on a standard keyboard. This experiment involved all 26 alphabetic keys. For each key, 3 independent trials were conducted, each lasting 10 s. In total, CSI data were collected for 780 s across all trials and keys, corresponding to over 1 million CSI frames sampled at 1300 Hz.

We repeated this 3 times, with two sets of data used for training and the third for validation.

4.3. Machine Learning Model

We evaluate the captured CSI data using a set of statistical features successful at profiling static channels in recent work [33]. The amplitude data of the received raw CSI were initially inserted into the digital signal processing framework. This process commenced by utilising a Hampel filter [22] to eliminate irrelevant measurements and outliers caused by the transmission and reception hardware. Subsequently, a lowpass Butterworth FIR filter [29] was employed to eliminate both the noise generated by the movement of the human body and the high-frequency Additive White Gaussian Noise (AWGN). To ensure data smoothness, the moving average method was applied to each sample, followed by the computation of statistical estimators such as standard deviation, mean, variance, and kurtosis across the CSI amplitude values for all 52 subcarriers. The resulting feature set was then fed into a linear Support Vector Machine (SVM) classifier, which performed multi-class classification to determine which of the 26 alphabetical keys was pressed.

4.4. Result Visualization

In addition to the original CSI amplitude time series, we also employed the Probability Mass Function (PMF) to illustrate the outcome. The PMF’s credibility has been established by previous studies, affirming its suitability for advanced data analysis and categorisation. In our initial experiments, we utilised the PMF to exhibit the resilience of CSI in the presence of minor object shadowing, highlighting its distinctive shape characteristics. By definition [34], the PMF can be expressed as:

F_{| H (f, t) |} (x; f) = \frac{# (x_{l o w} < | H (f, t) | < = x_{u p})}{N u m b e r o f C S I A m p . F r a m e s}

(5)

where

F_{| H (f, t) |} (x; f)

is the PMF of the CSI amplitude at a certain carrier frequency f and time instant t. The numerator denotes the number of CSI ping packets that fall into that bin, which is divided by the total number of CSI amplitude frames and results in the probability. Moreover, we also applied machine learning structures to cross-validate our observation using numeric accuracy from the classifier.

4.5. Results

To qualitatively assess how CSI varied across key locations, we computed and plotted the PMFs of CSI amplitude distributions for eight distinct keys. As shown in Figure 6, each plot maps WiFi subcarriers (x-axis) against amplitude bins (y-axis), with colour intensity indicating the concentration of CSI values. While the PMFs shared a general shape, typically with three local maxima and two minima, the amplitude distributions differed noticeably between keys, particularly for spatially distant ones such as L and Z. This variation was attributed to unique WiFi signal propagation and distortion caused by the hand hovering over each key. In contrast, adjacent keys (e.g., M vs. N, E vs. D) produced more similar PMFs, suggesting reduced spatial resolution at finer scales. Figure 6e–h highlight this overlap, and Table 2 summarises key visual features and placement context. These findings indicated that CSI distributions were location-dependent and supported our motivation to apply the machine learning framework in Section 4.3, with classification results reported in Table 3 and Table 4.

The

f_{1}

scores for our classifier, which predicts the location of a robotic arm above each key on a keyboard, are presented in Table 3 and Table 4. It is worth noting that the

f_{1}

score is the harmonic mean of precision and recall scores, with a higher score indicating a better-quality classifier. We are pleased to report an overall high accuracy, with an average

f_{1}

score of

0.99

. However, it is worth highlighting that some letters, such as A, F, G and L, performed slightly worse with an

f_{1}

score of

0.98

. These results demonstrate the impressive ability of WiFi sensing devices to detect minute changes in location within the channel. Overall, the outcome suggests that CSI can be used to predict the location of an arm in an ideal scenario with no trembling and highly repeatable positioning.

In summary, this section establishes a performance upper bound for keystroke localisation using a highly controlled robotic arm. With an average

f_{1}

score of

0.99

across 26 keys, the results confirm that WiFi CSI can capture fine-grained spatial variations when subject motion is consistent and repeatable. This forms a benchmark for subsequent comparisons with more variable human movements.

5. Hand Position Detection

After showcasing our ability to predict the locations accurately using an ideal robot arm, we took it a step further and conducted a more intricate experiment with a real human hand. This experiment posed challenges due to the natural shaking and trembling of the human hand, differences in hand sizes, and variability in positioning between each trial.

5.1. Experimental Setup

Once more, we utilised Raspberry Pi 4B devices to transmit and receive packets on the

2.4

WiFi spectrum. For this experiment, to enhance precision and explore the feasibility of shadowing small objects, we initially employed a shorter distance. These devices were placed

0.68

m apart with a keyboard in between as depicted in Figure 7. Furthermore, to maximise the likelihood of detecting differences in CSI amplitude, we decided to extend the duration of observation from 10 s to 15 s. This alteration ensured that the experimental subject (depicted in the image) hovered their finger suspended above each alphabetical key (A–Z) for an extended period. This amounted to 78 trials (3 trials × 26 keys), conducted with a single human participant. The participant was instructed to keep their finger steady above each key to reduce variance. All devices were still controlled remotely. During that time, CSI was sampled at a rate of 1300 Hz by the Rx, which was equipped with the Nexmon [30] CSI extraction tool. We conducted three such trials for each key, with two datasets used for training and the third used for validation.

5.2. Results

In Table 4, we present the results of our classifier when predicting the position of a human finger over a keyboard. The overall accuracy, in that case, was

94 %

, down from the

99 %

accuracy achieved for the robotic arm earlier. We observed that some classes, such as C, G, I and J, still performed very well with an

f_{1}

score of

1.00

. It is noteworthy that the lowest classification performance was observed for the A, E, W, and S key classes. These keys are all located in the upper left corner of the standard keyboard. We attribute the degradation in performance for a human hand as compared to a robotic hand to the increased variability in hand position between trials. Whereas the control system described in Section 4 was able to guarantee a repeatable position for all trials, this was not the case for the human hand, which tremored and swayed due to human error.

This section demonstrates that CSI-based localisation remains effective with real humans, achieving

94 %

accuracy. However, increased variability due to involuntary movements introduces minor degradation, highlighting the sensitivity of the system to natural human inconsistencies.

6. Keystroke Recognition

Having demonstrated that the position over a keyboard can be predicted, we experimentally extended the model to include the pressing movement.

6.1. Data Processing and ML Framework

We used a method similar to a literature reference [11] to capture keystroke information from CSI as demonstrated in Figure 8. First, we processed the CSI time series data for all subcarriers using a lowpass filter with a cutoff frequency of 5 Hz. Then, we applied PCA to the filtered time series vectors to eliminate noise from the wireless channel. The 2nd, 3rd, and 4th PCA components were selected for further analysis due to the large DC gain on the first component. Next, to extract the shape feature from the data easily, we applied a continuous moving average window with a window size of 1500 samples, which was almost the number of samples for a single keystroke, to further remove interference and noise. Based on this, we applied a DWT technique to each component to compute shape features for distinct keystrokes. To take advantage of having 3 PCA components, we created a 3-layer combined K-Nearest-Neighbour (KNN) classifier with K = 5, using Roger Jang’s machine learning toolbox [35]. The program took training and testing data for each component from the previous stage and calculated their optimal distances using a DWT. Finally, we used distances from the 1st and 2nd components as coordinates in the x–y plane for classification and fed them into the KNN classifier to obtain the predicted class.

6.2. Preliminary Experimental Setup for Keystroke Recognition

The experiment was conducted with the same setup as in Figure 7. Two human participants performed 150 keystrokes per key (100 for training, 50 for testing in different trials), totalling 1500 samples for each person and 3000 across both. Each press was separated by at least 1 s to ensure signal clarity. Data were sampled at 1300 Hz on the

5.216

GHz band, yielding over

3.9

million CSI samples. This process was repeated for 2 unique experimental subjects. Previous research has demonstrated that 5 GHz surpasses

2.4

GHz in terms of performance due to its greater multitude of subcarriers [36]. Considering the involvement of motion in keystroke detection and later on name detection, it became imperative for us to employ a more robust hardware device to guarantee enhanced precision, which could be achieved by expanding the radio frequency range for WiFi. Hence, from this experiment and following ones, we replaced our

2.4

GHz channel with a

5.216

GHz channel, which contained 234 subcarriers and hence could give us more accurate data. Again, the CSI was sampled at 1300 Hz by the Rx.

6.3. Preliminary Results for Five-Key Recognition

The CSI was fed as an input to the data processing framework detailed above, and we obtained the following result for a single user. In Figure 9, we present a confusion matrix for our keystroke classification algorithm as it predicts the key pressed by our user. The rows represent true classes, and the columns represent predicted classes. Hence, the leading diagonal represents the true positive rate for each class. The TFP represents the true positive rate, and FNR represents for false negative rate.

The overall accuracy of the classifier was

89.04 %

. Z performed the worst of the five classes with a true positive rate of

73.8 %

, with H following closely at an accuracy of

76.2 %

. Investigating this more closely, the letter Z was misclassified as H

16.70 %

of the time. This result may indicate the difficulty for the classifier in creating clear boundaries between keystroke classes in the case of the H key, which sits in the middle of the keyboard. Furthermore, we note that no other keystroke was misclassified as P, as demonstrated by the zeros in the column for the predicted class P. This is symptomatic of the P key being the most spatially separated from all other keys. When cross-evaluating each set of CSI with the validation set from the alternate experimental subject, we observed the overall true positive rate drop by

5 %

. Overall, we derived the following key outcomes from this experiment:

We can maintain a high accuracy of keystroke recognition with a single antenna system deploying a low-complexity ML framework, significantly reducing hardware costs compared to prior work.
The cross-validation result highlights the susceptibility of keystroke recognition systems to changes in personnel.
Detecting keystrokes incurs an accuracy degradation compared to our prior results on hand fingerprints. This motivates the need for more sophisticated sensing algorithms to detect finer movements.

Overall, our system achieved

89 %

accuracy for a five-key recognition task using low-complexity ML with a single antenna. Keys spatially close or centred (e.g., Z and H) showed more misclassifications, emphasising the spatial limitations of CSI.

6.4. Spatial Resolution Investigation

6.4.1. Impact of Gap Between Keys

After demonstrating the high accuracy achieved for the initial set of five keys, our research expanded to encompass a broader range of keys, which included the alphabetic keys. Our focus shifted towards examining the minimum gap that the system could detect under investigation. Specifically, we calculated the average gap between the keys Q, Z, H, P, and M, which were previously tested, to be approximately

7.5

cm. Like the preliminary experiment, each subsequent experiment involved the experimental subject performing 100 keystrokes for training and 50 for testing. All hardware setups remained consistent throughout the experiments.

To gradually decrease the gap width, we devised the following process: In Figure 10, we present the key used for test 2. We repeatedly used “Q”, “Z”, “P”, “H”, and “M” and introduced two additional keys, “R” and “V”. These two keys have similar gap distances to “Q”, “Z”, “P”, and “M”, but can slightly reduce the average gap distance. In test 3, we examined keys that were one key away from each other, for which we chose “Q”, “E”, “T”, “U”, “O”, “Z”, “C”, “B”, and “M”, as shown in Figure 11. It was followed by implementing the concept of “neighbourhood keys” in test 4, which referred to keys not in immediate proximity to each other regarding their positional arrangement, as visualised in Figure 12. To further reduce the gap between keys, in test 5, we added a set of five keys in the keyboard’s middle row based on test 2. Again, these newly added keys are visually represented in Figure 13. Keys used for each test are circled in red in each sub-figure, and details are summarised in Table 5.

6.4.2. Effect of the Number of Keys Being Recognised

In addition to studying the resolution, we were interested in examining how different numbers of tested keys impacted the accuracy. To investigate this, we conducted nine experiments where the gap distance was kept constant while the number of tested keys varied. Each test is visually highlighted in Figure 14. In the first preliminary test, we selected five keys enclosed within a yellow box. Subsequent tests involved gradually increasing the number of keys in the test group, eventually encompassing all alphabetic keys, where for each test, the newly added keys were framed out by a specific colour box with a legend.

6.5. Experimental Results on Keys’ Gap and Count

6.5.1. Effect of Resolution

The findings from the tests above and the preliminary experiment are presented in Figure 15. A 2D line plot was generated to illustrate the ML accuracy against the average gap distance. On the x-axis, we represent the average gap distance between each typed key during the experiment, ranging from 2 cm to 8 cm. The y-axis then represents the overall accuracy of the confusion matrix as previously described. The plot demonstrates a positive correlation between accuracy and gap distance.

The diagram starts with a gap of 2 cm, which is the distance between two adjacent keys on the keyboard. Accuracy at this point is only

16 %

. In this case, the system can be claimed to have poor performance due to its destructive results. Behind that point, the accuracy grows much faster, achieving

32 %

for

2.2

cm and

43 %

for

2.45

cm, respectively. Following this, the precision increases but at a lower rate when we increase the gap distance to

6.3

cm, where the accuracy reaches

83 %

. Eventually, when we repeat using the five keys with an average gap distance of 8 cm, the duplicated accuracy reaches

90 %

, which aligns with our initial findings. This outcome indicates that the effectiveness of these WiFi sensing devices relies on the gap between each spatial point being tested. Consequently, based on this illustration of spatial resolution, it can be claimed that there is a reciprocal relationship between accuracy and average gap when utilising WiFi sensing for keystroke detection while keeping the Tx-Rx distance constant. Our analysis reveals a significant drop in machine learning accuracy when the average gap distance is reduced below

5.7

cm. This issue is linked to the inherent wavelength of the 5 GHz WiFi signal, which is roughly 6 cm. As a result, when the separation between keys falls within that range, the system struggles to distinguish between them due to the limited spatial resolution.

6.5.2. Effect on Number of Keys Counted

For the second set of experiments, we obtained a plot of keystroke recognition accuracy changes with the number of keys from which we extracted CSI and used it with the data processing and machine learning framework. We created a plot that perfectly described the relationship between accuracy and test group size, shown in Figure 16. The blue curve shows the trend in accuracy according to the number of keys being tested, and again, results for each test are marked using the red point. We found that the accuracy had a significant inverse correlation with the group size. The overall accuracy for five keys was

83 %

, which indicated that under the smallest gap distance, even if the tested keys were the same, the system’s ability was still reduced by around

5 %

. There was a massive drop in the accuracy when the group size was between 5 and 12 keys, with

66 %

for 7 keys,

54 %

for 9 keys, and another

15 %

drop for 12 keys (

53 %

to

38 %

). This circumstance improved after 12 keys, as the accuracy dropped much slower and still achieved

14 %

for full alphabetic keys.

This result can be illustrated using spatial resolution. Since the spatial resolution was the same in that case, the shape feature between each adjacent key pair was similar and more challenging to distinguish when we compared two keys far from each other. We initially tested keys “Q”, “W”, “E”, “A”, and “S”, and we found that “W” had more errors since all other keys were around it, and the features were roughly the same, which provided more uncertainty. For additional keys, such a phenomenon also occurred. When we used 26 keys, each key, except those on the outer area, interacted with others. Thus, it became harder for the machine learning classifier to draw the decision boundary for these keys, making it harder to distinguish classes from each other. In Figure 17 and Figure 18, we present the Voronoi diagram for five- and nine-key keystroke detection derived using the KNN classifier. The x-axis and y-axis represent the DWT distance between the first and third components of the shape feature. Figure 17 clearly illustrates that the elements corresponding to each individual key form distinct clusters. This separation makes it straightforward to define decision boundaries between the keys. However, when examining Figure 18, the addition of four more keys to the test group results in a more cluttered diagram. The elements begin to overlap, causing the clusters to blend and making the decision boundaries less distinct and more difficult to define.

The findings from the spatial resolution tests underscore a key constraint for real-world deployment of CSI-based keystroke detection systems. Specifically, the system’s accuracy drops significantly when the average gap between keys approaches or falls below the 5–6 cm range, which is common on standard compact keyboards. This limits the system’s effectiveness for realistic typing tasks involving densely packed keys or rapid key transitions. Furthermore, the classifier’s reduced performance as the number of detectable keys increases highlights a scalability issue, wherein distinguishing between many adjacent keys becomes infeasible due to overlapping CSI features. These limitations must be considered when applying this technology to scenarios such as password theft prevention, user behaviour analysis, or gesture-based interaction in commercial IoT environments.

These findings reinforce that spatial resolution and the number of keys significantly impact system performance. The recognition accuracy sharply drops when key spacing is below 6 cm or when the number of keys increases beyond nine, highlighting the limits of WiFi-based keystroke sensing granularity.

6.6. Deep Learning-Based Line Evaluation

To further evaluate the robustness of WiFi CSI-based keystroke recognition, we implemented a deep learning model using a one-dimensional Convolutional Neural Network (1D CNN) trained directly on the raw CSI time series. This section benchmarks the deep learning approach against traditional feature-based methods, including PCA, DWT, and DTW-KNN, as well as classical classifiers such as SVM and Random Forest.

6.6.1. CNN Model Architecture and Data Processing

The 1D CNN model processed raw CSI amplitude data in the form of time series sequences. Each sample was represented as a two-dimensional input of time steps across multiple subcarriers. The architecture began with a Conv1D layer of 64 filters (kernel size 5) to extract local temporal features, followed by a MaxPooling layer (pool size 2) to reduce dimensionality. A second Conv1D layer with 128 filters (kernel size 3) and another MaxPooling layer further refined the features. The output was flattened and passed through a dense layer with 128 ReLU-activated neurons, followed by a Dropout layer (rate 0.5) to prevent overfitting. The final dense layer with softmax activation produced class probabilities. The model was trained using the Adam optimiser and categorical cross-entropy loss, with early stopping based on validation loss and an 80:20 train–validation split.

6.6.2. Experimental Result

We trained and evaluated the CNN on the same five-key datasets (Q, Z, H, P, and M) used in Section 6. Performance was evaluated using accuracy, precision, recall,

f_{1}

score, Receiver Operating Characteristic (ROC) curves, and Area Under the Curve (AUC) values. The CNN achieved an average accuracy of

92.1 %

with an average

f_{1}

score of 0.917. The average AUC was 0.964. We also generated ROC curves to provide more granular insights, as shown in Figure 19.

This result demonstrated that the CNN was highly capable of distinguishing keystrokes, particularly when the spatial separation was clear. While adjacent keys like “Z” and “H” introduced some ambiguity, the overall separability was robust. ROC curves for all classes showed strong positive predictive capacity with minimal overlap.

6.6.3. Comparison with Traditional Models

We then benchmarked the proposed method with other models, as summarised in Table 6. Our findings revealed that deep learning models, particularly 1D CNNs, provided superior performance and robustness over traditional signal processing pipelines for keystroke classification using WiFi CSI. The CNN benefited from its ability to learn hierarchical temporal features directly from raw input, reducing dependency on manual preprocessing. Although more computationally intensive, these models show strong potential in scenarios where robustness to noise is essential. However, interpretability and real-time deployment challenges remain. Future work may consider lightweight architectures or hybrid approaches that combine handcrafted features with neural networks for improved efficiency and interpretability.

7. WiFi Sensing-Based Keyed Name Detection

After confirming the effectiveness of keystroke detection, we took a step further by enhancing the model to include both pressing movement and hand movement in different directions. Specifically, we investigated the practical robustness of wirelessly detecting keyed names, i.e., names typed by pressing keys on a computer keyboard with single or multiple fingers. This included incorporating the impact of the movement of a single or both hands horizontally and vertically.

7.1. Experimental Setup

We gathered CSI information for four names, each ranging from four to seven letters, from a sole experimental participant. To test the robustness of our system, we performed the following tests:

Typing using the forefinger of the right hand.
Typing with the right hand but using multiple fingers. Participants had the freedom to utilise various fingers that they found most comfortable for typing.
Typing with both hands using multiple fingers.

To avoid any fortuitousness, for every name, we conducted 60 training trials and 30 testing trials, with a three-second gap between each trial. This resulted in 360 trials in total across four names and three typing styles. Each trial spanned several keystrokes (4–7 per name), and the average typing speed was 1 key per second, yielding approximately 40,000–60,000 CSI samples per name per typing style. The testing trials collected in the first test were also used for later tests for variable control. The average typing rate was around one second per key, while the distance between the transmitter and receiver (Tx-Rx) was maintained at 1 m. Furthermore, the data underwent processing using the machine learning framework outlined in Section 6. The environment was unchanged during the experiment.

7.2. Empirical Results

7.2.1. Single Hand, Single Finger

Figure 20 shows the CSI time series for five trials of typing the name “Haoming” using a single finger. Each trial displayed a consistent shape with distinct keystroke features, clearly separated by red dashed lines. These observations indicate that sequential key presses for a given name produce consistent waveform signatures, making them suitable for classification.

Using the CSI time series as input features, our classifier achieved high accuracy in distinguishing among four names. As shown in Table 7, the true positive rate (TPR) for “Haoming” and “Deepak” was

100 %

, while “Aryan” and “Aruna” achieved

86.7 %

. The misclassification between “Aryan” and “Aruna” was attributed to their identical length and nearly identical key sequences, making their waveform shapes difficult to differentiate.

7.2.2. Single Hand, Multiple Fingers

When typing with multiple fingers of a single hand, the CSI time series still exhibited clear segmentation, but the shape features became less distinct due to overlapping hand motion. While “Haoming” and “Deepak” maintained

100 %

TPR, the performance for “Aryan” and “Aruna” declined to

73.3 %

, as shown in Table 7. This degradation was due to increased inter-finger interference, which altered the consistency of the waveform patterns.

7.2.3. Both Hands, Multiple Fingers

With both hands involved, accuracy significantly decreased across all names due to complex motion dynamics and signal interference. The TPR for “Aruna” dropped to

46.7 %

, and “Haoming” saw a decline of

43.3 %

. These results indicate that full-hand typing introduces substantial variability in CSI features, making reliable name classification infeasible under such conditions. However, this limitation enhances privacy, as keystroke-based user recognition becomes less effective in typical typing scenarios. In conclusion, name detection via WiFi CSI is highly effective when typing is performed with one finger and one hand (

93.9 %

accuracy), but becomes less accurate when multiple fingers or both hands are used.

These results demonstrate that keystroke classification accuracy is significantly reduced during natural two-handed typing, which limits the effectiveness of the system in realistic conditions. This reduction in accuracy suggests that current methods are less likely to pose a strong privacy risk in practice. Future work will explore targeted attack scenarios, such as password or PIN inference under practical typing behaviour, and investigate potential countermeasures, including controlled radio frequency noise injection. We acknowledge that the dataset used for name detection experiments is relatively small, comprising only four distinct names typed by a single participant. This limited dataset was intentionally chosen to serve as a preliminary proof of concept to evaluate the feasibility of CSI-based name inference. However, we recognise that a broader dataset including a greater variety of names, participants, and typing styles is essential for further validating the robustness of the system. Expanding the dataset and evaluating performance under more diverse real-world conditions will be an important direction for future research.

8. Practical Deployment Considerations

While this study focused on evaluating the technical robustness of WiFi CSI-based keystroke recognition, it is equally important to reflect on its viability as an IoT solution. Below, we outline key factors including hardware cost, power consumption, integration feasibility, and a comparative perspective against other sensing modalities.

8.1. Hardware Cost and Integration

Our experiments used commodity hardware, specifically Raspberry Pi 4B units, which cost approximately USD 50 each. These devices support CSI extraction through open-source tools such as Nexmon and require no modification to the physical infrastructure and user behaviour. In future implementations, lower-cost microcontrollers like ESP32 (costing under USD 10) could also be leveraged, enabling further reduction in cost and energy footprint. Since these platforms are already common in IoT ecosystems, integrating our system would not require any additional setup or maintenance.

8.2. Power Consumption and Energy Profile

The Raspberry Pi 4B consumes approximately 3–6 watts under continuous operation. Our sensing setup used a fixed packet transmission rate of 1300 Hz to maintain high resolution. However, this rate can be adjusted to reduce power consumption. Although we did not explicitly optimise energy usage in this work, several future directions can enhance deployment efficiency:

Reducing the transmission frequency during idle periods.
Dynamically adjusting sensing parameters based on context.
Offloading computation to energy-efficient edge processors.

8.3. Comparison with Alternative Modalities

In Table 8, we provide a high-level comparison of WiFi CSI-based sensing with other commonly used sensing modalities.

Compared to vision-based or physical contact sensors, WiFi CSI offers a practical trade-off between cost, privacy, and setup simplicity, though its performance is more sensitive to spatial and environmental dynamics.

9. Detection over Longer Tx-Rx Distances

Having demonstrated that keystroke recognition depended on the hand’s location and movement, we proceeded with a final investigation into the importance of device placement. We identified that prior work on gesture recognition relied on high-proximity WiFi devices, which may not be realisable in a commercial application. Hence, we iteratively increased the sensor device separation and examined its influence on keystroke recognition accuracy.

Experiment Setup and Results

We conducted the final experiment with the setup of the devices as in Figure 7 with the aim of examining the impact of the sensing range. We introduced an additional independent variable, changing the separation between Tx and Rx from 1 m to

3.5

m. The CSI was sampled at 1300 Hz by the Rx and processed with the DWT/KNN architecture described in Section 6.

In Figure 21, we present a plot of the keystroke recognition accuracy as a function of the Tx-Rx distance. The x-axis represents the distance between Tx and Rx, varying between 1 m and

3.5

m. The y-axis represents the accuracy of a keystroke recognition system setup with said device separation. Once again, key details are summarised in Table 9.

We observed that the accuracy was clearly highly negatively correlated with the device separation. Firstly, the accuracy with 1m separation was

88 %

, a result similar to that in Section 6. Beyond this, there was a huge drop in accuracy when the distance increases from 1 m to 2 m, with decreases of about

10 %

to

80 %

at 1.5 m and then another

10 %

to

72.5 %

at 2 m. After that, we further increased the distance from 2 m to

3.5

m. Within 2–3 m, average accuracy still dropped but in a much smaller range (from

72.5 %

to

68 %

). When the distance was more than 3 m, the value significantly dropped again, which indicates that the system can perfectly fit in indoor sensing scenarios. This result demonstrates that the efficacy of fine-grained gesture recognition using CSI hinges on the proximity to the sensing devices.

The results show that keystroke recognition is highly dependent on device proximity. Accuracy drops significantly as Tx-Rx distance exceeds 2 m, supporting the conclusion that WiFi sensing for fine-grained gestures is best suited for close-range indoor environments.

10. Concluding Remarks

In this paper, we investigated the robustness and readiness of WiFi sensing-based keystroke detection technology. We first generated a robotic arm control system and established a benchmark for localisation using WiFi CSI. The accuracy of that benchmark was found to be

99 %

. Next, we demonstrated that this ability for accurate localisation was maintained even when faced with the real-world challenges posed by a human hand. The accuracy in that scenario was found to be

94 %

. Following this, we showed an overall accuracy of

89 %

in predicting keystrokes among five keys. Based on this, we further examined the effect of varying the distance between each test key and the size of the test group. We found that, for both cases, the increase in distance or group size reduced the overall accuracy of the constructed ML framework. This accuracy for predicting keystrokes decreased by

5 %

when the system was evaluated with an unseen hand. Based on the success of keystroke detection, we experimented with name detection using single fingers from a single hand, multiple fingers from a single hand, and multiple fingers from both hands. The overall accuracy for these three scenarios was

93.875 %

,

86.665 %

, and

55.835 %

, and we found that when typing using two hands, we could still achieve WiFi sensing but could not detect accurately, which can protect users from privacy attacks. Further, we noticed a drop in performance as the sensing device separation increased. Thus, our iterative empirical analysis testing the robustness of WiFi sensing technology highlights the need for advanced signal processing techniques for extracting fine-grained keystroke features from CSI to maintain accuracy in more generic cases with larger separation between the IoT devices. The current study was conducted in a controlled environment with a limited number of participants and fixed device placements. While these constraints allowed for the isolation of key performance factors, future work will expand this evaluation to more realistic usage settings. This includes incorporating dynamic human activity, variable room geometries, and broader user diversity to assess the scalability and generalisation of CSI-based keystroke sensing in practical deployments. Nevertheless, future work can also explore system-level metrics such as real-time latency, scalability under concurrent access, and performance in multi-user environments to assess practical deployment feasibility. While our current experiments were conducted in a controlled environment to isolate core system behaviours, we acknowledge this does not fully capture the complexity of real-world deployments. This work was conducted under controlled indoor environments in line with prior literature to ensure repeatability and isolate core system behaviours. Scenarios involving non-line-of-sight placement, concurrent WiFi transmissions, and broader user diversity were beyond the current scope but are noted as important directions for future investigation. Evaluating the system under these more dynamic and diverse conditions is an important next step, and we plan to incorporate such scenarios into an extended test bed in future work to better assess the robustness of our approach. While our system demonstrates the feasibility of keystroke inference using WiFi CSI, the recognition performance degraded significantly under realistic two-hand typing conditions. Importantly, while our study evaluated practical challenges under varied but controlled conditions, our aim was to explore the operational boundaries of CSI-based keystroke detection rather than generalisability across users. Broader real-world robustness and scalability remain open research directions. This suggests a natural privacy safeguard in practical settings, limiting the risk of covert surveillance. We believe that highlighting such boundaries is essential for guiding future research toward both impactful and ethically responsible wireless sensing applications.

Author Contributions

Conceptualization, All; methodology, All; software, H.W. and A.S. (Aryan Sharma); validation, H.W. and D.M. formal analysis, All; investigation, All; resources, All; data curation, H.W. and A.S. (Aryan Sharma).; writing—original draft preparation, H.W. and A.S. (Aryan Sharma); writing—review and editing, D.M., A.S. (Aruna Seneviratne) and E.A. (Eliathamby Ambikairajah); visualization, All; supervision, D.M., A.S. (Aruna Seneviratne) and E.A. (Eliathamby Ambikairajah); project administration, D.M., A.S. (Aruna Seneviratne) and E.A. (Eliathamby Ambikairajah); funding acquisition, D.M., A.S. (Aruna Seneviratne) and E.A. (Eliathamby Ambikairajah); All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Australian Research Council (ARC) Discovery Early Career Researcher Award (DE230101391), and in part by the National Intelligence and Security Discovery Research Grant (NI240100147), funded by the Australian Office of National Intelligence (ONI). Aryan Sharma’s participation is supported by the Cyber Security Cooperative Research Centre Limited, whose activities are partially funded by the Australian Government’s Cooperative Research Centres Programme.

Data Availability Statement

The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wang, H.; Sharma, A.; Mishra, D.; Seneviratne, A.; Ambikairajah, E. Keystroke Recognition using WiFi Sensing: An Empirical Study on Robustness. In Proceedings of the IEEE Globecom Workshops, Kuala Lumpur, Malaysia, 4–8 December 2023; pp. 1123–1128. [Google Scholar] [CrossRef]
Shahbazian, R.; Trubitsyna, I. Human Sensing by Using Radio Frequency Signals: A Survey on Occupancy and Activity Detection. IEEE Access 2023, 11, 40878–40904. [Google Scholar] [CrossRef]
Soto, J.C.; Galdino, I.; Caballero, E.; Ferreira, V.; Muchaluat-Saade, D.; Albuquerque, C. A survey on vital signs monitoring based on Wi-Fi CSI data. Comput. Commun. 2022, 195, 99–110. [Google Scholar] [CrossRef] [PubMed]
Yang, J.; Chen, X.; Zou, H.; Lu, C.X.; Wang, D.; Sun, S.; Xie, L. SenseFi: A library and benchmark on deep-learning-empowered WiFi human sensing. Patterns 2023, 4, 100703. [Google Scholar] [CrossRef]
Nasser, F.; Suliman, A.; Poon, K.; Alteneiji, A.; Ahmad, U. Fall Detection Using Wi-Fi Channel State Information. In Proceedings of the 2024 7th International Conference on Signal Processing and Information Security (ICSPIS), Dubai, United Arab Emirates, 12–14 November 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–5. [Google Scholar]
Li, J.; Sharma, A.; Mishra, D.; Seneviratne, A. Fire Detection Using Commodity WiFi Devices. In Proceedings of the IEEE Global Communications Conference (GLOBECOM), Madrid, Spain, 7–11 December 2021; pp. 1–6. [Google Scholar] [CrossRef]
Sharma, A.; Li, J.; Mishra, D.; Batista, G.; Seneviratne, A. Passive WiFi CSI Sensing Based Machine Learning Framework for COVID-Safe Occupancy Monitoring. In Proceedings of the IEEE International Conference on Communications Workshops (ICC Workshops), Montreal, QC, Canada, 14–23 June 2021; pp. 1–6. [Google Scholar] [CrossRef]
Ge, Y.; Taha, A.; Shah, S.A.; Dashtipour, K.; Zhu, S.; Cooper, J.; Abbasi, Q.H.; Imran, M.A. Contactless WiFi Sensing and Monitoring for Future Healthcare-Emerging Trends, Challenges, and Opportunities. IEEE Rev. Biomed. Eng. 2022, 16, 171–191. [Google Scholar] [CrossRef]
Hussain, Z.; Sheng, Q.Z.; Zhang, W.E.; Ortiz, J.; Pouriyeh, S. Non-invasive Techniques for Monitoring Different Aspects of Sleep: A Comprehensive Review. ACM Trans. Comput. Healthc. (HEALTH) 2022, 3, 1–26. [Google Scholar] [CrossRef]
Li, M.; Meng, Y.; Liu, J.; Zhu, H.; Liang, X.; Liu, Y.; Ruan, N. When CSI Meets Public WiFi: Inferring Your Mobile Phone Password via WiFi Signals. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, 24–28 October 2016; pp. 1068–1079. [Google Scholar] [CrossRef]
Ali, K.; Liu, A.X.; Wang, W.; Shahzad, M. Recognizing Keystrokes Using WiFi Devices. IEEE J. Sel. Areas Commun. 2017, 35, 1175–1190. [Google Scholar] [CrossRef]
Dwivedi, A.D.; Srivastava, G.; Dhar, S.; Singh, R. A decentralized privacy-preserving healthcare blockchain for IoT. Sensors 2019, 19, 326. [Google Scholar] [CrossRef]
Wang, X.; Niu, K.; Xiong, J.; Qian, B.; Yao, Z.; Lou, T.; Zhang, D. Placement Matters: Understanding the Effects of Device Placement for WiFi Sensing. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2022, 6, 1–25. [Google Scholar] [CrossRef]
Ma, Y.; Zhou, G.; Wang, S. WiFi Sensing with Channel State Information: A Survey. ACM Comput. Surv. 2019, 52, 1–36. [Google Scholar] [CrossRef]
Wang, Z.; Huang, Z.; Zhang, C.; Dou, W.; Guo, Y.; Chen, D. CSI-based human sensing using model-based approaches: A survey. J. Comput. Des. Eng. 2021, 8, 510–523. [Google Scholar] [CrossRef]
Li, P.; Cui, H.; Khan, A.; Raza, U.; Piechocki, R.; Doufexi, A.; Farnham, T. Wireless Localisation in WiFi using Novel Deep Architectures. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 6251–6258. [Google Scholar] [CrossRef]
Wang, H.; Zhang, D.; Niu, K.; Lv, Q.; Liu, Y.; Wu, D.; Gao, R.; Xie, B. MFDL: A Multicarrier Fresnel Penetration Model based Device-Free Localization System leveraging Commodity Wi-Fi Cards. arXiv 2017, arXiv:1707.07514. [Google Scholar]
Gong, W.; Liu, J. RoArray: Towards More Robust Indoor Localization Using Sparse Recovery with Commodity WiFi. IEEE Trans. Mob. Comput. 2018, 18, 1380–1392. [Google Scholar] [CrossRef]
Zheng, Y.; Sheng, M.; Liu, J.; Li, J. OpArray: Exploiting Array Orientation for Accurate Indoor Localization. IEEE Trans. Commun. 2018, 67, 847–858. [Google Scholar] [CrossRef]
Feng, X.; Nguyen, K.A.; Luo, Z. A survey of deep learning approaches for WiFi-based indoor positioning. J. Inf. Telecommun. 2022, 6, 163–216. [Google Scholar] [CrossRef]
Yu, N.; Wang, W.; Liu, A.X.; Kong, L. QGesture: Quantifying Gesture Distance and Direction with WiFi Signals. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2018, 2, 1–23. [Google Scholar] [CrossRef]
Pearson, R.K.; Neuvo, Y.; Astola, J.; Gabbouj, M. Generalized Hampel Filters. EURASIP J. Adv. Signal Process. 2016, 2016, 87. [Google Scholar] [CrossRef]
Cheng, L.; Wang, J. How can I guard my AP? Non-intrusive user identification for mobile devices using WiFi signals. In Proceedings of the the 17th ACM International Symposium on Mobile Ad Hoc Networking and Computing, New York, NY, USA, 5–8 July 2016; MobiHoc ’16; pp. 91–100. [Google Scholar] [CrossRef]
Ma, Y.; Zhou, G.; Wang, S.; Zhao, H.; Jung, W. SignFi: Sign Language Recognition Using WiFi. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2018, 2. [Google Scholar] [CrossRef]
Zhang, Y.; Zheng, Y.; Qian, K.; Zhang, G.; Liu, Y.; Wu, C.; Yang, Z. Widar3.0: Zero-Effort Cross-Domain Gesture Recognition with Wi-Fi. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 8671–8688. [Google Scholar] [CrossRef]
Hu, J.; Wang, H.; Zheng, T.; Hu, J.; Chen, Z.; Jiang, H.; Luo, J. Password-stealing without hacking: Wi-Fi enabled practical keystroke eavesdropping. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, Copenhagen, Denmark, 26–30 November 2023; pp. 239–252. [Google Scholar]
Peng, M.; Fu, X.; Ge, B.; Wang, L. Wi-Crack: A smartphone keystroke recognition system based on multi-dimensional information. Comput. J. 2025, 68, 1–12. [Google Scholar] [CrossRef]
Shen, X.; Ni, Z.; Liu, L.; Yang, J.; Ahmed, K. WiPass: 1D-CNN-based smartphone keystroke recognition using WiFi signals. Pervasive Mob. Comput. 2021, 73, 101393. [Google Scholar] [CrossRef]
Wang, Z.; Jiang, K.; Hou, Y.; Dou, W.; Zhang, C.; Huang, Z.; Guo, Y. A Survey on Human Behavior Recognition Using Channel State Information. IEEE Access 2019, 7, 155986–156024. [Google Scholar] [CrossRef]
Gringoli, F.; Schulz, M.; Link, J.; Hollick, M. Free Your CSI: A Channel State Information Extraction Platform For Modern Wi-Fi Chipsets. In Proceedings of the 13th International Workshop on Wireless Network Testbeds, Experimental Evaluation & Characterization, Los Cabos, Mexico, 25 October 2019; pp. 21–28. [Google Scholar] [CrossRef]
Guo, R.; Li, H.; Han, D.; Liu, R. Feasibility Analysis of Using Channel State Information (CSI) Acquired from Wi-Fi Routers for Construction Worker Fall Detection. Int. J. Environ. Res. Public Health 2023, 20, 4998. [Google Scholar] [CrossRef]
Wang, W.; Liu, A.X.; Shahzad, M.; Ling, K.; Lu, S. Understanding and Modeling of WiFi Signal Based Human Activity Recognition. In Proceedings of the 21st Annual International Conference on Mobile Computing and Networking, Paris, France, 7–11 September 2015; pp. 65–76. [Google Scholar]
Damodaran, N.; Haruni, E.; Kokhkharova, M.; Schäfer, J. Device free human activity and fall recognition using WiFi channel state information (CSI). CCF Trans. Pervasive Comput. Interact. 2020, 2, 1–17. [Google Scholar] [CrossRef]
Li, J.; Mishra, D.; Seneviratne, A. Network traffic classification using wifi sensing. In Proceedings of the Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems, Nice, France, 17–19 November 2020; Springer: Cham, Switzerland, 2020; pp. 48–61. [Google Scholar]
Jang, J.S.R. Machine Learning Toolbox. Available online: http://mirlab.org/jang/matlab/toolbox/machineLearning/help/ (accessed on 10 December 2022).
Deng, T.; Zheng, B.; Du, R.; Liu, F.; Han, T.X. A statistical sensing method by utilizing Wi-Fi CSI subcarriers: Empirical study and performance enhancement. J. Inf. Intell. 2024, 2, 365–374. [Google Scholar] [CrossRef]

Figure 1. Adopted WiFi sensing system architecture.

Figure 2. Dependence on location and movement.

Figure 3. 2-DOF Quanser robotic arm.

Figure 4. System architecture for robotic arm position control.

Figure 5. Experimental setup.

Figure 6. PMFs for different keys.

Figure 7. Experimental setup.

Figure 8. Data processing and ML framework for keystroke detection.

Figure 9. Confusion matrix for keystroke recognition.

Figure 10. Seven-key keymap.

Figure 11. Nine-key keymap.

Figure 12. Nine-neighbourhood-key keymap.

Figure 13. Fourteen-key keymap.

Figure 14. Areas of keys for different tests.

Figure 15. Accuracy change with average gap distance.

Figure 16. Accuracy change with number of keyboard keys evaluated.

Figure 17. ML boundary for 5 keys.

Figure 18. ML boundary for 9 keys.

Figure 19. ROC curves for CNN keystroke classifier.

Figure 20. Raw CSI time series for 5 trials of the name “Haoming”.

Figure 21. ML accuracy change with Tx-Rx distance.

Table 1. Robotic arm control system’s PID parameters.

	Shoulder	Elbow
$K_{p}$	15	15
$K_{I}$	2.5	0.5
$K_{D}$	5	0.55

Table 2. Summary of PMF patterns for selected keys.

Key	PMF Shape Characteristics	Key Placement	Visual Distinction
L	3 peaks, high variation	Bottom-right	Clear
R	2 strong peaks, symmetric	Top-right	Moderate
B	Smooth distribution	Bottom-center	Moderate
Z	Diffuse, less distinct peaks	Bottom-left	Distinct
M	Clustered, mid-range	Bottom-right	Subtle
N	Simiar to M	Right of M	Overlapping
E	Sharp central peak	Top-left	Moderate
D	Slight spread, left-shifted	Mid-left	Slightly distinct

Table 3.

f_{1}

scores for robotic arm location detection.

Table 3.

f_{1}

scores for robotic arm location detection.

Key	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O	P	Q	R	S	T	U	V	W	X	Y	Z
$f_{1}$ score	$0.98$	$0.99$	$0.99$	$1.00$	$1.00$	$0.98$	$0.98$	$1.00$	$1.00$	$1.00$	$1.00$	$0.98$	$1.00$	$1.00$	$1.00$	$1.00$	$1.00$	$1.00$	$0.99$	$1.00$	$1.00$	$1.00$	$1.00$	$1.00$	$1.00$	$1.00$

Table 4.

f_{1}

scores for human arm location detection.

Table 4.

f_{1}

scores for human arm location detection.

Key	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O	P	Q	R	S	T	U	V	W	X	Y	Z
$f_{1}$ score	$0.96$	$0.99$	$1.00$	$0.99$	$0.66$	$0.99$	$1.00$	$1.00$	$1.00$	$1.00$	$0.98$	$1.00$	$1.00$	$1.00$	$1.00$	$1.00$	$1.00$	$1.00$	$0.96$	$1.00$	$1.00$	$0.99$	$0.00$	$1.00$	$1.00$	$1.00$

Table 5. Summary of keystroke detection tests.

Test No.	Number of Keys	Keys Tested	Average Gap Distance (cm)
1	5	QZHPM	8
2	7	QZRVPMH	6.3
3	9	QZECTBUMO	4
4	9	QSCFTHNKO	2.45
5	14	QAZEDCTGBUJMOL	2.2
6	26	All alphabetic keys	2

Table 6. Comparison of different ML algorithms.

Model	Accuracy (%)	$f_{1}$ Score	Avg AUC
PCA + DTW + KNN	89	0.884	0.923
PCA + SVM	87.2	0.865	0.905
DWT + RF	86.5	0.849	0.891
Raw CSI + CNN	92.1	0.917	0.964

Table 7. True positive rates for name detection under different typing conditions.

Name	Single Finger (%)	Single Hand (Multi-Finger) (%)	Both Hands (%)
Haoming	100.0	100.0	56.7
Deepak	100.0	100.0	63.3
Aryan	86.7	73.3	56.7
Aruna	86.7	73.3	46.7

Table 8. Comparison of the proposed method with other methods.

Modality	Power Use	Hardware Cost	Privacy	Integration Complexity
This work	Low	Low	High	Low
Optical cameras	High	Medium	Low	Medium
Floor sensors	Medium	High	Low	Low
RFID (backscatter)	Very low (tags)	Medium	High	Medium

Table 9. Summary of keystroke detection tests.

Tx-Rx Distance (m)	Accuracy (%)
1	88
1.5	80
2	72.5
2.5	70
3	68
3.5	53

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, H.; Sharma, A.; Mishra, D.; Seneviratne, A.; Ambikairajah, E. Empirical Investigation on Practical Robustness of Keystroke Recognition Using WiFi Sensing for Future IoT Applications. Future Internet 2025, 17, 288. https://doi.org/10.3390/fi17070288

AMA Style

Wang H, Sharma A, Mishra D, Seneviratne A, Ambikairajah E. Empirical Investigation on Practical Robustness of Keystroke Recognition Using WiFi Sensing for Future IoT Applications. Future Internet. 2025; 17(7):288. https://doi.org/10.3390/fi17070288

Chicago/Turabian Style

Wang, Haoming, Aryan Sharma, Deepak Mishra, Aruna Seneviratne, and Eliathamby Ambikairajah. 2025. "Empirical Investigation on Practical Robustness of Keystroke Recognition Using WiFi Sensing for Future IoT Applications" Future Internet 17, no. 7: 288. https://doi.org/10.3390/fi17070288

APA Style

Wang, H., Sharma, A., Mishra, D., Seneviratne, A., & Ambikairajah, E. (2025). Empirical Investigation on Practical Robustness of Keystroke Recognition Using WiFi Sensing for Future IoT Applications. Future Internet, 17(7), 288. https://doi.org/10.3390/fi17070288

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Empirical Investigation on Practical Robustness of Keystroke Recognition Using WiFi Sensing for Future IoT Applications †

Abstract

1. Introduction

1.1. Background

1.2. Motivation and Contributions

2. State of the Art

2.1. WiFi Sensing and Localisation

2.2. DSP on Gesture Recognition

2.3. Existing Technologies for Keystroke Detection

2.4. Deep Learning Approaches in CSI-Based Sensing

2.5. Privacy Attacks via Wireless Sensing

3. System Description

3.1. CSI WiFi Sensing Model

3.2. Keystroke Recognition

4. Robotic Arm Position Detection

4.1. Robotic Arm Control

4.2. Experimental Setup

4.3. Machine Learning Model

4.4. Result Visualization

4.5. Results

5. Hand Position Detection

5.1. Experimental Setup

5.2. Results

6. Keystroke Recognition

6.1. Data Processing and ML Framework

6.2. Preliminary Experimental Setup for Keystroke Recognition

6.3. Preliminary Results for Five-Key Recognition

6.4. Spatial Resolution Investigation

6.4.1. Impact of Gap Between Keys

6.4.2. Effect of the Number of Keys Being Recognised

6.5. Experimental Results on Keys’ Gap and Count

6.5.1. Effect of Resolution

6.5.2. Effect on Number of Keys Counted

6.6. Deep Learning-Based Line Evaluation

6.6.1. CNN Model Architecture and Data Processing

6.6.2. Experimental Result

6.6.3. Comparison with Traditional Models

7. WiFi Sensing-Based Keyed Name Detection

7.1. Experimental Setup

7.2. Empirical Results

7.2.1. Single Hand, Single Finger

7.2.2. Single Hand, Multiple Fingers

7.2.3. Both Hands, Multiple Fingers

8. Practical Deployment Considerations

8.1. Hardware Cost and Integration

8.2. Power Consumption and Energy Profile

8.3. Comparison with Alternative Modalities

9. Detection over Longer Tx-Rx Distances

Experiment Setup and Results

10. Concluding Remarks

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Empirical Investigation on Practical Robustness of Keystroke Recognition Using WiFi Sensing for Future IoT Applications^†