Author Contributions
Conceptualization, Z.Z., M.P., G.C. and N.D.; methodology, Z.Z., M.P., G.C. and N.D.; software, Z.Z. and M.P.; validation, Z.Z., M.P., G.C. and N.D.; formal analysis, Z.Z. and M.P.; investigation, Z.Z., M.P., G.C. and N.D.; resources, Z.Z. and M.P.; data curation, Z.Z. and M.P.; writing—original draft preparation, Z.Z., M.P. and G.C.; writing—review and editing, Z.Z., M.P., G.C. and N.D.; visualization, Z.Z., M.P., G.C. and N.D.; supervision, M.P., G.C. and N.D.; project administration, M.P., G.C. and N.D.; funding acquisition, N.D. All authors have read and agreed to the published version of the manuscript.
Figure 1.
End-to-end pipeline for client-side continuous authentication.
Figure 1.
End-to-end pipeline for client-side continuous authentication.
Figure 2.
Timing primitives for consecutive keystrokes: , , , and . can be negative under key overlap.
Figure 2.
Timing primitives for consecutive keystrokes: , , , and . can be negative under key overlap.
Figure 3.
Fixed-text: distribution of per-user AUC on S2.
Figure 3.
Fixed-text: distribution of per-user AUC on S2.
Figure 4.
Fixed-text: HTER on S2 under transferred personal vs. pooled-S1 global threshold.
Figure 4.
Fixed-text: HTER on S2 under transferred personal vs. pooled-S1 global threshold.
Figure 5.
Fixed-text: distribution of on S2.
Figure 5.
Fixed-text: distribution of on S2.
Figure 6.
Fixed-text: per-user AUC on S2 across models.
Figure 6.
Fixed-text: per-user AUC on S2 across models.
Figure 7.
Fixed-text: per-user HTER on S2 at each model’s transferred EER threshold.
Figure 7.
Fixed-text: per-user HTER on S2 at each model’s transferred EER threshold.
Figure 8.
Fixed-text: per-user on S2 (threshold selected on S1).
Figure 8.
Fixed-text: per-user on S2 (threshold selected on S1).
Figure 9.
Fixed-text: scaler-only warm-up effect on S2 (HTER, ).
Figure 9.
Fixed-text: scaler-only warm-up effect on S2 (HTER, ).
Figure 10.
Fixed-text: distribution of scaler-only warm-up impact on S2, .
Figure 10.
Fixed-text: distribution of scaler-only warm-up impact on S2, .
Figure 11.
Free-text: distribution of per-user AUC on S2 under the default windowing setting (, ).
Figure 11.
Free-text: distribution of per-user AUC on S2 under the default windowing setting (, ).
Figure 12.
Free-text thresholding comparison on S2 under the default windowing setting (, ). The panels compare personal thresholds with the pooled-S1 global threshold for HTER, FAR, and FRR. All thresholds are selected on S1 and transferred unchanged to S2.
Figure 12.
Free-text thresholding comparison on S2 under the default windowing setting (, ). The panels compare personal thresholds with the pooled-S1 global threshold for HTER, FAR, and FRR. All thresholds are selected on S1 and transferred unchanged to S2.
Figure 13.
Free-text model comparison under the default windowing setting (, ). The panels show the per-user S2 distributions of AUC, HTER at the transferred S1-selected EER threshold, and FRR at the S1-selected FAR = 5% operating point. LR = Logistic Regression, GNB = GaussianNB, DT = DecisionTree, and RF = RandomForest.
Figure 13.
Free-text model comparison under the default windowing setting (, ). The panels show the per-user S2 distributions of AUC, HTER at the transferred S1-selected EER threshold, and FRR at the S1-selected FAR = 5% operating point. LR = Logistic Regression, GNB = GaussianNB, DT = DecisionTree, and RF = RandomForest.
Figure 14.
Free-text: scaler-only warm-up effect on S2 (HTER, ).
Figure 14.
Free-text: scaler-only warm-up effect on S2 (HTER, ).
Figure 15.
Free-text: population median HTER on S2 vs. warm-up length K.
Figure 15.
Free-text: population median HTER on S2 vs. warm-up length K.
Figure 16.
Free-text: per-user distributions across K on S2.
Figure 16.
Free-text: per-user distributions across K on S2.
Figure 17.
Offline-to-client workflow for client-side scoring and the browser replay demo.
Figure 17.
Offline-to-client workflow for client-side scoring and the browser replay demo.
Figure 18.
Browser replay score-stream visualization for fixed-text repetitions (top) and free-text windows (bottom). Each point is scored using the exported model.json and scaler.json artifacts under the same client-side inference contract used by the demo. The horizontal threshold line denotes the fixed S1-selected operating point transferred to the replay stream. Scores above the threshold are interpreted as target-like under this operating point, while scores below the threshold are treated as non-genuine decisions. Prompt markers illustrate the optional demo-only consecutive-low-score policy and are not part of the offline S1→S2 evaluation metrics.
Figure 18.
Browser replay score-stream visualization for fixed-text repetitions (top) and free-text windows (bottom). Each point is scored using the exported model.json and scaler.json artifacts under the same client-side inference contract used by the demo. The horizontal threshold line denotes the fixed S1-selected operating point transferred to the replay stream. Scores above the threshold are interpreted as target-like under this operating point, while scores below the threshold are treated as non-genuine decisions. Prompt markers illustrate the optional demo-only consecutive-low-score policy and are not part of the offline S1→S2 evaluation metrics.
Table 1.
Quantitative highlights with comparability and deployment flags.
Table 1.
Quantitative highlights with comparability and deployment flags.
| Work | Dataset/Setting | Text | Decision Unit | Strict S1 → S2 Thr? | Client Evidence? | Reported Metric(s) and Representative Number(s) |
|---|
| Comparing anomaly detectors (2009) [18] | 51 subjects, password benchmark | Fixed | password attempt | No | N/A | Top detectors: EER 9.6–10.2%. |
| Real-time free-text verification (2011) [32] | 55 users, real-time verification | Free | m trials | Partial | N/A | : eFAR 0.73%, iFAR 0.66%, FRR 9.48%;
: eFAR 2.61%, iFAR 2.02%, FRR 1.84%. |
| Practical free-text evaluation (2017) [33] | Sliding-window free-text evaluation | Free | 1-min window | Partial | N/A | FAR 1%, FRR 11.5% (1-min window);
also reports CV: FAR 0.98%, FRR 11.85%. |
| Fast free-text authentication (2020) [35] | Clarkson II/Buffalo (public) | Free | 100/200 DD digraphs | No | N/A | EER: Clarkson II 9.7% (100), 7.8% (200);
Buffalo 5.3% (100), 3.0% (200). |
| CNN+RNN free-text CA (2020) [37] | Two public datasets | Free | fixed-length sequence | No | N/A | Best FRR (2.07%, 6.61%); best FAR (3.26%, 5.31%);
best EER (2.67%, 5.97%). |
| TypeNet (2020) [36] | Aalto, large-scale | Free | 50 keystrokes/ seq | No | N/A | With 1K test users: EER 4.8%. |
| DNN ensemble identification (2021) [38] | Integrated dataset; identification setting | Mixed | identification | No | N/A | Accuracy up to 0.997 in identification. |
| CA methodology audit (2017) [25] | Methodology audit | N/A | N/A | N/A | N/A | Shows common choices can underestimate error rates by 63% and 81%. |
| KeyRecs dataset release (2023) [52] | Dataset release | Both | dataset specification | N/A | N/A | fixed-text.csv: 19,773 × 50; free-text.csv: 562,584 × 9; 99 participants. |
| This paper | KeyRecs, strict protocol | Both | Fixed: repetition; Free: 30 digraphs | Yes | Yes | Fixed: HTER 19.4%, FRR@FAR = 5% 43.8% (S2, threshold from S1);
Free: HTER 18.8%, FRR@FAR = 5% 54.8% (S2, operating point from S1). |
Table 2.
Fixed-text repetition-level feature representation.
Table 2.
Fixed-text repetition-level feature representation.
| Component | Definition | Count |
|---|
| Decision unit | One complete prompted fixed-text repetition; each repetition is one verification sample. | 1 sample |
| Raw source | KeyRecs fixed-text repetition-level table, with one row identified by participant, session, and repetition. | – |
| Excluded columns | participant, session, and repetition; these are used only for labels, S1/S2 splitting, and bookkeeping, not as model inputs. | 3 columns |
| Retained features | All remaining numeric repetition-level timing descriptors, kept in a fixed dataset order. | 47 features |
| Aggregation | None. Fixed-text samples are already complete repetitions, so no sliding-window aggregation is used. | – |
| Standardization | A pooled-S1 StandardScaler is fitted on the 47-dimensional S1 feature vectors and reused for S1/S2 scoring. | 47 parameters per mean/scale vector |
| Deployment contract | The same ordered 47-dimensional vector is used by the LR verifier and the exported model.json and scaler.json artifacts. | 47 inputs |
Table 3.
Free-text window-level feature set (19 dimensions).
Table 3.
Free-text window-level feature set (19 dimensions).
| Group | Definition | Count |
|---|
| Basic/robust stats | For each primitive in : mean, std, median, MAD | |
| Derived summaries | ; long-pause ratio ( s); typing-rate proxy (kps) | 3 |
| Total | | 19 |
Table 4.
Fixed-text cross-session baseline (S2) and threshold strategy comparison.
Table 4.
Fixed-text cross-session baseline (S2) and threshold strategy comparison.
| Metric (S1-Select/S2-Evaluate) | Mean | Median |
|---|
| AUC | 0.894679 | 0.918053 |
| HTER (personal threshold) | 0.194290 | 0.179172 |
| HTER (pooled-S1 global threshold) | 0.191233 | 0.173217 |
Table 5.
Fixed-text: lightweight model comparison under cross-session evaluation.
Table 5.
Fixed-text: lightweight model comparison under cross-session evaluation.
| Model | AUC | HTER | FRR@FAR = 5% |
|---|
| | Mean | Med | Mean | Med | Mean | Med |
|---|
| DecisionTree (depth = 8) | 0.7243 | 0.7221 | 0.2758 | 0.2779 | 0.5443 | 0.5400 |
| GaussianNB | 0.8495 | 0.8768 | 0.1957 | 0.1799 | 0.6345 | 0.7400 |
| LR (baseline) | 0.8947 | 0.9181 | 0.1943 | 0.1792 | 0.4382 | 0.4100 |
| LinearSVM | 0.8933 | 0.9213 | 0.1924 | 0.1767 | 0.4351 | 0.4100 |
| RandomForest (200, depth = 10) | 0.9343 | 0.9680 | 0.3764 | 0.3964 | 0.6125 | 0.7400 |
Table 6.
Free-text: cross-session baseline under default windowing ().
Table 6.
Free-text: cross-session baseline under default windowing ().
| Metric (S1-Select/S2-Evaluate) | Mean | Median |
|---|
| AUC (S2) | 0.884 | 0.899 |
| HTER (personal EER threshold) | 0.188 | 0.167 |
| HTER (pooled-S1 global threshold) | 0.186 | 0.164 |
| (S1 operating point) | 0.548 | 0.604 |
Table 7.
Free-text: lightweight model comparison under cross-session evaluation ().
Table 7.
Free-text: lightweight model comparison under cross-session evaluation ().
| Model | AUC (Mean/Median) | HTER@Transferred EER Threshold (Mean/Median) | FRR@FAR = 5% (Mean/Median) |
|---|
| LR | 0.884/0.899 | 0.188/0.167 | 0.548/0.604 |
| LinearSVM | 0.885/0.905 | 0.183/0.165 | 0.523/0.542 |
| GaussianNB | 0.870/0.897 | 0.187/0.166 | 0.529/0.486 |
| DecisionTree (depth = 8) | 0.835/0.859 | 0.186/0.159 | 0.556/0.480 |
| RandomForest (200, depth = 10) | 0.926/0.950 | 0.233/0.212 | 0.381/0.355 |
Table 8.
Browser-side JavaScript scoring latency for the exported client-side verifier. The measured path includes StandardScaler normalization, Logistic Regression scoring, sigmoid conversion, and threshold comparison. Timing starts after replayed feature vectors are available, and excludes offline training, S1 threshold selection, raw-event capture, UI rendering, and battery measurements.
Table 8.
Browser-side JavaScript scoring latency for the exported client-side verifier. The measured path includes StandardScaler normalization, Logistic Regression scoring, sigmoid conversion, and threshold comparison. Timing starts after replayed feature vectors are available, and excludes offline training, S1 threshold selection, raw-event capture, UI rendering, and battery measurements.
| Device | Browser | Track | d | Median μs/Sample | p95 μs/Sample |
|---|
| Windows laptop | Chrome 147 | Fixed-text | 47 | 0.083 | 0.087 |
| Windows laptop | Chrome 147 | Free-text | 19 | 0.057 | 0.063 |
| Windows laptop | Edge 148 | Fixed-text | 47 | 0.101 | 0.114 |
| Windows laptop | Edge 148 | Free-text | 19 | 0.085 | 0.114 |
| Android phone | Chrome 148 | Fixed-text | 47 | 0.146 | 0.154 |
| Android phone | Chrome 148 | Free-text | 19 | 0.088 | 0.098 |
| Android phone | Firefox 150 | Fixed-text | 47 | 0.286 | 0.298 |
| Android phone | Firefox 150 | Free-text | 19 | 0.170 | 0.178 |
Table 9.
Paired user-level uncertainty analysis for personal versus pooled-S1 global thresholding. Here on S2; values close to zero indicate that the two thresholding strategies behave similarly under S1→S2 transfer. Negative values indicate lower HTER under the pooled-S1 global threshold. The confidence interval is a non-parametric bootstrap 95% interval, and p is from a two-sided Wilcoxon signed-rank test.
Table 9.
Paired user-level uncertainty analysis for personal versus pooled-S1 global thresholding. Here on S2; values close to zero indicate that the two thresholding strategies behave similarly under S1→S2 transfer. Negative values indicate lower HTER under the pooled-S1 global threshold. The confidence interval is a non-parametric bootstrap 95% interval, and p is from a two-sided Wilcoxon signed-rank test.
| Track | Comparison | n | Metric | Mean Δ | Median Δ | 95% CI/p |
|---|
| Fixed-text | Global–personal threshold | 99 | HTER | | | /0.0378 |
| Free-text | Global–personal threshold | 98 | HTER | | | /0.2276 |
Table 10.
Paired user-level uncertainty analysis for lightweight model comparisons against Logistic Regression. Here at the transferred S1-selected EER threshold on S2; positive values indicate higher transferred-threshold error than LR. The confidence interval is a non-parametric bootstrap 95% interval, and p is from a two-sided Wilcoxon signed-rank test. DecisionTree and RandomForest correspond to the lightweight tree baselines described in the experimental setup.
Table 10.
Paired user-level uncertainty analysis for lightweight model comparisons against Logistic Regression. Here at the transferred S1-selected EER threshold on S2; positive values indicate higher transferred-threshold error than LR. The confidence interval is a non-parametric bootstrap 95% interval, and p is from a two-sided Wilcoxon signed-rank test. DecisionTree and RandomForest correspond to the lightweight tree baselines described in the experimental setup.
| Track | Model vs. LR | n | Metric | Mean Δ | Median Δ | 95% CI/p |
|---|
| Fixed-text | LinearSVM–LR | 99 | HTER | | | /0.0863 |
| Fixed-text | GaussianNB–LR | 99 | HTER | | | /0.7376 |
| Fixed-text | DecisionTree–LR | 99 | HTER | | | / |
| Fixed-text | RandomForest–LR | 99 | HTER | | | / |
| Free-text | LinearSVM–LR | 98 | HTER | | | /0.0020 |
| Free-text | GaussianNB–LR | 98 | HTER | | | /0.8220 |
| Free-text | DecisionTree–LR | 98 | HTER | | | /0.7458 |
| Free-text | RandomForest–LR | 98 | HTER | | | / |
Table 11.
Minimal artifacts used by the browser replay demo. Only model.json and scaler.json are required for client-side scoring.
Table 11.
Minimal artifacts used by the browser replay demo. Only model.json and scaler.json are required for client-side scoring.
| File | Inference | Format | Purpose |
|---|
| model.json | Yes | JSON | Feature order; LR coefficients/intercept; optional S1-selected threshold hint. |
| scaler.json | Yes | JSON | StandardScaler parameters (mean_, scale_) for runtime standardization. |
| demo_metrics_target_user.json | No | JSON | Demo-only reference metrics (sanity checks; integration metadata). |