Mouse Data Protection in Image-Based User Authentication Using Two-Dimensional Generative Adversarial Networks: Based on a WM_INPUT Message Approach

Kim, Jinwook; Lee, Kyungroul

doi:10.3390/electronics15020292

Open AccessArticle

Mouse Data Protection in Image-Based User Authentication Using Two-Dimensional Generative Adversarial Networks: Based on a WM_INPUT Message Approach

by

Jinwook Kim

¹ and

Kyungroul Lee

^2,*

¹

Interdisciplinary Program of Information & Protection, Mokpo National University, Muan 58554, Republic of Korea

²

School of Computer Science and Engineering, Information Security Major, Mokpo National University, Muan 58554, Republic of Korea

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(2), 292; https://doi.org/10.3390/electronics15020292

Submission received: 19 November 2025 / Revised: 3 January 2026 / Accepted: 5 January 2026 / Published: 9 January 2026

(This article belongs to the Special Issue Artificial Intelligence in Cybersecurity: Practices, Challenges, and Innovations)

Download

Browse Figures

Versions Notes

Abstract

With the rapid evolution of computing technologies and the increased proliferation of online services, secure remote user authentication methods have become essential. Among these methods, password-based authentication remains dominant due to its straightforward implementation and ease of use. Nevertheless, password-based systems are particularly prone to credential theft from keylogging attacks, making user passwords easily compromised. To address these risks, image-based authentication methods were developed, allowing users to enter passwords through mouse clicks rather than keyboard input, thereby reducing vulnerabilities associated with conventional password entry. However, subsequent studies have shown that mouse movement and click information can still be obtained using APIs such as the GetCursorPos() function or WM_INPUT message, thus undermining the intended security benefits of image-based authentication. In response, various defense strategies have sought to inject artificial or random mouse data through functions such as SetCursorPos() or by utilizing the WM_INPUT message, in an effort to disguise authentic user input. Despite these defenses, recent machine learning-based attacks have demonstrated that such naïve bogus input can be distinguished from legitimate mouse data with up to 99% classification accuracy, resulting in substantial exposure of actual user actions. To address this, a technique leveraging Generative Adversarial Networks (GAN) was introduced to produce artificial mouse data closely mimicking genuine user input, which has been shown to reduce the attack success rate by roughly 37%, offering enhanced protection for mouse-driven authentication systems. This article seeks to advance GAN-based mouse data protection by integrating multiple adversarial generative models and conducting a comprehensive evaluation of their effectiveness with respect to data processing techniques, feature selection, generation intervals, and model-specific performance differences. Our experimental findings reveal that the enhanced approach reduces attack success rates by up to 48%, marking an 11% performance gain over previous mouse data protection approaches, and providing stronger empirical support that our method offers superior protection for user authentication data compared to prior techniques.

Keywords:

image-based user authentication; mouse data; machine learning; generative adversarial networks

1. Introduction

With the advancement of computing technology and the widespread deployment of online services, Internet usage has increased significantly [1]. Users now routinely utilize web-based platforms, including e-commerce sites for purchasing goods and financial services for asset management, to accomplish a variety of tasks. Within these environments, secure remote user authentication methods are critical, with password-based authentication remaining the most prevalent technique. Although this method is straightforward to implement and has continued to be used extensively since the early days of the Internet, it exhibits a significant vulnerability: user passwords can be readily compromised through attacks such as keylogging, resulting in persistent security risks. To address these vulnerabilities, image-based authentication techniques have been developed.

Image-based authentication commonly uses mechanisms like virtual keypads, where users enter authentication information using mouse clicks instead of the keyboard. Since input is not directly provided via the keyboard, these schemes are intrinsically more resistant to keystroke logging attacks that intercept keyboard data [2]. Nevertheless, subsequent studies have revealed that mouse movement and click data may be recorded using APIs such as GetCursorPos() function or through WM_INPUT message, which enables attackers to reconstruct the user’s input and thereby undermines the security benefits of image-based authentication techniques [3,4].

To counter these risks, a number of mouse data protection mechanisms have been designed that utilize SetCursorPos() function or WM_INPUT message to safeguard against such attacks [5,6]. The fundamental principle of these strategies is to inject artificially generated bogus mouse data into the event stream so an attacker collecting mouse data via GetCursorPos() function or WM_INPUT message cannot reliably differentiate authentic user input from the entire set of recorded mouse data. Consequently, even if mouse data is compromised, the inability to separate genuine from bogus mouse data is intended to prevent the attacker from accurately extracting actual authentication credentials.

Faced with these enhanced defenses, attackers have started utilizing more advanced attack strategies. Recent studies indicate that, in contexts where such protection measures are deployed, machine learning–driven attack techniques can be trained to distinguish between synthetically generated bogus mouse data and actual mouse data [7]. These classification models have reached accuracy rates as high as 99%, thereby restoring the attacker’s ability to accurately identify genuine mouse data and undermining current mouse data protection strategies. To address this vulnerability, a defense mechanism based on GAN has been proposed [8]. This strategy creates bogus mouse data that closely imitates real user behavior, substantially impairing the success rate of machine learning–based classification attacks—experimentally reducing the attack success rate, defined as the classification accuracy for genuine input, from 99% to approximately 37%.

Building on these research developments, this article seeks to further fortify the resilience of GAN-based mouse data protection methodologies. Specifically, the study extends previous investigations by evaluating a range of GAN architectures, including those previously reported, and by systematically assessing how various model selections impact defense efficacy against classification-based mouse data inference attacks.

The core innovations and contributions of this article are presented below:

A WM_INPUT message–level mouse data protection method is put forward, using GAN-generated bogus mouse trajectories. In environments where attackers use machine learning to extract genuine mouse data, we conduct empirical comparisons across several GAN models to determine which achieves the greatest reduction in attack success rates. The outcomes indicate that our approach better secures user authentication data by refining the defensive generation of mouse event data.
We develop a two-dimensional GAN–based framework for generating plausible bogus mouse data, representing an advancement over earlier studies that focused predominantly on the CTGAN model. Notably, we explore the CopulaGAN model, the conditional GAN (CGAN) model, and the WGAN-GP model for their capability to produce highly realistic bogus trajectories, underscoring both methodological innovation and architectural diversity relative to previous work.
We show that the proposed defense method offers greater effectiveness compared to established mouse data obfuscation approaches. Whereas earlier GAN-based solutions achieved a reduction in attack success rates to around 37%, our experimental results reveal that the optimal GAN configuration lowers the attack success rate by 48% relative to the baseline, an improvement of 11% over the previous highest-performing defense. This demonstrates that our strategy provides a more resilient protection framework for mouse-driven user authentication data.

The remainder of this article is structured as follows. Section 2 reviews the background and examines related works concerning mouse data–based attacks and defense methods. Section 3 introduces the proposed GAN-based mouse data protection method and explains the experimental setup. Section 4 details and evaluates the experimental outcomes from various perspectives, including data processing strategies, feature definition, data generation intervals, and analysis of performance impact. Section 5 presents the conclusions of the article.

2. Background and Related Works

This section examines attack and defense approaches targeting mouse data in image-based user authentication methods. A key previous study on the WM_INPUT-based mouse data attack technique [4] showed that an attacker can capture mouse coordinates by intercepting WM_INPUT messages sent from the operating system to the application. As depicted in Figure 1, this method allows adversaries to reconstruct the sequence of mouse interactions during image-based authentication and eventually recover the user’s password on a legitimate service.

Under this attack scenario, a malicious program initially registers as a handler for WM_INPUT messages with the operating system to monitor all WM_INPUT traffic. After registration, the handler receives continuous streams of raw mouse input from the user’s mouse device. By recording all observed coordinates, the attacker can reconstruct the path of mouse actions and analyze user interaction behaviors on virtual keypads or graphical password systems. This process enables the attacker to deduce authentication information, such as image-based passwords, even without observing any keystrokes.

To mitigate such threats, multiple studies have introduced mouse data protection strategies that generate and insert random bogus mouse data to disrupt the precise identification of true mouse data [6]. In these methods, both authentic and artificially generated mouse coordinates are combined in the recorded event stream. Although all events remain accessible to the attacker, the objective is to make it difficult to accurately distinguish legitimate mouse input from decoy data, complicating the reconstruction of actual authentication inputs. Nonetheless, further studies have proposed machine learning–based attacks designed to compromise these defense techniques [7], with the experimental findings summarized in Table 1.

In this study, the attacker develops classifiers using datasets that comprise both genuine and bogus mouse data collected within attack scenarios. Table 1 presents the most effective models under several configurations, considering various bogus-data generation intervals and different combinations of features. The dataset was assembled by altering the injection periods of bogus mouse data and by selecting features such as elapsed time, the current X and Y coordinates, and the stepwise distances between consecutive X and Y coordinates. The findings indicate that utilizing all specified features typically results in optimal classification performance across nearly all generation intervals, with classification accuracy reaching up to 99%. This demonstrates that the machine learning–supported attack can nearly perfectly differentiate genuine mouse data from synthetically produced bogus mouse movements, thereby rendering the original random-injection defense ineffective.

To respond to this advanced attack strategy, a GAN-based mouse data protection method was introduced in [8]. Instead of injecting randomly generated bogus mouse movements, this method applies a GAN model to create bogus mouse trajectories that closely mimic genuine mouse activity. The outcomes of the performance evaluation from a defensive perspective are illustrated in Figure 2.

Figure 2 demonstrates the defensive efficacy of this technique by comparing classification outcomes for datasets produced at distinct generation intervals, such as 50 ms (minimal effect) and 500 ms (substantial effect). With the standard random generation method, all assessed machine learning models, except logistic regression, sustained high classification accuracy, reliably distinguishing genuine from bogus mouse data. Conversely, employing CTGAN, a specific GAN model, resulted in a notable decrease in several machine learning models’ classification capabilities. The attack success rate dropped by as much as 37%, signifying that, once enough genuine data points cannot be confidently identified, attackers face significant challenges reconstructing complete mouse trajectories and recovering the target password. These findings confirm the promise of GAN-based solutions as a more resilient measure for safeguarding mouse data in image-based authentication applications.

Ref. [9] extends prior acoustic side-channel attack studies that primarily focused on keyboard devices and keystroke data [10] by introducing mouse data attack techniques based on the potential leakage of sensitive information through subtle sounds generated during mouse operation. As countermeasures against such attacks, attack detection and defense techniques have been investigated by implementing and analyzing diverse side-channel-based attack scenarios, including optical, acoustic, electromagnetic, and vibration channels, with the goal of mitigating side-channel-based mouse data attacks. In Ref. [11], various IoT devices and equipment were utilized to collect sensor data in real time, followed by environment-specific noise reduction and feature extraction. The results indicate that, in optical environments, high detection performance can be achieved by employing machine learning models that classify normal versus abnormal signals using detection logic such as a sensor-box-based approach. Similar sensor-box-based detection frameworks may also enable effective detection in electromagnetic and acoustic environment. Nevertheless, practical limitations remain, including sensitivity to environmental noise and the need to account for false positives in real-time settings.

Behavior-based mouse data attack techniques have been primarily studied in the context of exploiting mouse data in web bots, along with corresponding countermeasures. Ref. [12] evaluated, through experiments, the effectiveness of deep reinforcement learning in circumventing existing behavior-based bot detection methods, with the objective of neutralizing a range of web bot detection techniques. Ref. [13] introduced an AI bot that generates human-like mouse movements and experimentally verified whether the generated mouse trajectories are nearly indistinguishable from those of real users. To counter such behavior-based mouse data attacks, ref. [14] proposed BeCAPTCHA-Mouse for human identification. This study collected real mouse movements as well as various types of bot trajectories and employed a self-constructed BeCAPTCHA-Mouse Benchmark database. The dataset includes mouse positions, click events, and timestamps. For analysis, multiple models, such as SVM, Random Forest, KNN, MLP, LSTM, and GRU, were employed and their performance was compared. Moreover, ref. [15] proposed a method to improve detection performance by visually transforming mouse movements and leveraging deep learning models. While prior studies largely relied on time-series analyses, this study is novel in that it distinguished bot and human behaviors using image-based representations. Mouse behavior data, such as webpage clicks, drags, and cursor movements, were collected from real users, and the approach was evaluated through experiments using CNN and RNN models.

For ease of reading, we give abbreviations of terms used in this paper, as shown in Table 2.

3. Proposed Mouse Data Protection Technique

In previous studies, the CTGAN model was utilized to generate synthetic mouse data that closely mimics authentic user trajectories. By integrating these realistic synthetic mouse data with actual mouse data, the defense method introduced ambiguity for attackers, and its efficacy was validated through empirical studies [16,17]. Expanding upon this approach, the current article seeks to further mitigate the risk of user authentication information exposure by generating even more lifelike synthetic mouse data. Accordingly, we examine several GAN models, including CTGAN, and develop a comprehensive mouse data protection method that exploits these models to enhance resilience against mouse data inference attacks. In this section, we outline the proposed defense methodology and describe the construction of datasets used for empirical evaluation.

3.1. Methodology of the Proposed Mouse Data Defense Technique

The mouse data protection approach we propose extends earlier CTGAN-based solutions by explicitly capturing the relationship between authentic and synthetic mouse trajectories. Consider

A_{1}, A_{2}, \dots, A_{n}

as the sequence of authentic mouse coordinates gathered from user interactions when the CTGAN-based system is deployed. From this genuine data, we train generative models to output synthetic mouse coordinates

F_{1}, F_{2}, \dots, F_{n}

designed to replicate the fundamental statistical and behavioral characteristics of the authentic mouse data. During the authentication process, an attacker monitoring mouse data encounters not just the authentic sequence {

A_{i}

}, but a composite stream such as

A_{1}, F_{1}, F_{2}, A_{2}, \dots, A_{n}, F_{n}

, where authentic and synthetic mouse data are interspersed. To extract the user’s authentication information, the attacker might employ ML-based classification algorithms in an effort to separate authentic from synthetic data; nevertheless, the adversarially trained generators significantly impair the efficacy of such classifiers, thereby lowering the attack’s probability of success. The central objective of this article is to further diminish attacker effectiveness through comprehensive evaluation of multiple GAN models to determine the configuration that achieves optimal defensive strength. Figure 3 illustrates an overview of the proposed defense methodology.

To implement this defense technique, authentic mouse movement data are initially gathered from users to train defensive generators. Each dataset contains the X and Y coordinates along with the associated elapsed time. Using these features, we utilize several GAN architectures, including CopulaGAN, CGAN, and WGAN-GP, as well as the baseline CTGAN, to produce high-fidelity bogus mouse trajectories. The generated bogus mouse data are subsequently injected into the system through WM_INPUT message events and are queued alongside genuine mouse data. During user interaction with an image-based authentication interface, any malware capturing mouse data via WM_INPUT messages inevitably acquires a composite dataset with both genuine {

A_{i}

} and bogus {

F_{i}

} samples. Since attackers lack prior knowledge of which events are GAN-generated and have no dependable side information for classification, it becomes considerably more challenging to identify the genuine mouse coordinates related to the authentication sequence. This mixed-stream obfuscation framework is the foundational concept of the proposed mouse data defense method.

3.2. Dataset Configuration for Experimental Evaluation

To validate the proposed defense approach, we set up our experimental procedure as a six-stage pipeline: (1) feature extraction, (2) bogus data generation, (3) data preprocessing, (4) dataset configuration, (5) model training, and (6) classification evaluation. For consistency with previous studies, we utilize the earlier collected mouse datasets under identical experimental settings and apply our GAN-based defense mechanism to these datasets.

3.2.1. Feature Extraction Step

During the feature extraction phase, we employ the same set of features as used in the previous machine learning–based mouse data attack studies, ensuring a fair comparison. In particular, we use elapsed time, current X coordinate, current Y coordinate, and the distances between previous and current X and Y coordinates. Based on these, three feature sets are defined:

●: F1: elapsed time, current X coordinate, current Y coordinate
●: F2: elapsed time, |ΔX|, |ΔY| (distances between consecutive X and Y coordinates)
●: F3: elapsed time, current X coordinate, current Y coordinate, |ΔX|, |ΔY|

where |ΔX| and |ΔY| represent the absolute differences between consecutive coordinates. For binary classification tasks, authentic mouse data are assigned the label 1, and bogus mouse data are assigned the label 0.

3.2.2. Bogus Data Generation Step

In this stage, we employ several GAN-based methods, including CopulaGAN [18,19], CGAN [20], and WGAN-GP [21], to synthesize bogus mouse data that mimic authentic mouse trajectories more closely than the uniformly random or heuristic noise used in prior defenses. The sample distributions generated by each GAN model are illustrated in Figure 4, Figure 5 and Figure 6.

For this study, a total of 70,314 mouse-trajectory samples were collected, and a single research operated the device according to typical, everyday usage patterns. In these figures, blue points indicate genuine mouse data while red points indicate bogus mouse data; the x-axis represents the X coordinate and the y-axis denotes elapsed time. By visual assessment, the bogus mouse data generated by GAN-based models demonstrate a high degree of similarity to the distributions of genuine mouse data. To ensure reproducibility, the experimental environment was established using the PyTorch (2.6.0) framework on a high-performance computing system equipped with an Intel Core i9-13900 CPU and an NVIDIA GeForce RTX 4090 GPU. As detailed in the hyperparameter settings, utilized the Adam optimizer with a learning rate of 0.00005, and the batch size was set to 32 for CGAN and 64 for WGAN-GP. Under these conditions, for each dataset configuration, we trained the GAN models across different epochs (50, 100, and 200) to investigate how training progression influences the authenticity and utility of the generated bogus data.

As illustrated in Figure 7, real-time WM_INPUT data capture and fake WM_INPUT event injection operate as follows. Based on the captured actual mouse coordinates (

R_{1}, R_{2}, \dots, R_{n}

), and generative adversarial networks produce bogus mouse coordinates (

F_{1}, F_{2},

…,

F_{n}

) that closely resemble actual mouse movements. These bogus coordinates are then injected into the system message queue by triggering corresponding WM_INPUT events. As a result, an attacker collects an interleaved sequence containing both actual and bogus data (e.g.,

R_{1}, F_{1}, F_{2}, R_{2}, \dots, R_{n}, F_{n}

). The attacker subsequently attempts to steal the user’s actual mouse data by applying machine learning-based classification to the collected sequence. However, because the injected bogus mouse data is highly similar to actual data, the proposed method hinders reliable classification between the two classes, thereby reducing the attacker’s success rate.

3.2.3. Data Preprocessing Step

At the data preprocessing stage, normalization is performed on the features defined in Step 1, utilizing mouse data previously collected. To ensure stable and efficient model training, MinMaxScaler() [22] is employed to scale features within the [0, 1] range, as illustrated in Equation (1):

x^{'} = \frac{x - x_{m i n}}{x_{m a x} - x_{m i n}}

(1)

where x_min and x_max represent the minimum and maximum values for a feature, respectively. This normalization process reduces feature scale variance and enhances the convergence stability and accuracy of GAN-based algorithms.

3.2.4. Dataset Configuration Step

In the dataset construction phase, the outputs from Steps 1–3 are integrated to create datasets for training and evaluation in subsequent classification experiments outlined in Step 5 and Step 6. Four bogus mouse data generation intervals (50 ms, 100 ms, 250 ms, and 500 ms) are considered to produce multiple dataset variants. The comprehensive dataset configuration is outlined in Table 3.

Table 2 details that the count of genuine mouse data refers to authentic mouse events collected from user input devices, whereas the count of bogus mouse data refers to samples produced by GAN-based generation. All constructed datasets maintain a balanced 5:5 ratio between genuine and bogus samples. The dataset sample sizes are as follows: n − 1: 32,006; n − 2: 28,018; n − 3: 24,010; n − 4: 20,564; n − 5: 16,022; n − 6: 12,000; and n − 7: 8008.

For each dataset configuration, we conduct hyperparameter optimization of the classification models applied in the attack scenario to comprehensively assess the defense capability of the proposed method [23]. Table 4 summarizes the hyperparameter search space for each model.

Specifically:

●: K-Nearest Neighbors (KNN): metric is set to euclidean; n_neighbors ∈ {3, 5, 7, 10, 15, 20}
●: Logistic Regression: regularization parameter C ∈ {0.001, 0.1, 1, 10, 100, 1000}; max_iter∈{100, 200, 300}
●: Decision Tree: max_depth ∈ {3, 5, 7, 10, 15, 20}
●: Random Forest: min_samples_leaf = 5; min_samples_split = 5; max_depth ∈ {10, 20, 30}; n_estimators ∈ {100, 200, 300}
●: Gradient Boosting: learning_rate ∈ {0.01, 0.05, 0.1, 0.2, 0.3}; max_depth ∈ {3, 5, 7, 9}; n_estimators ∈ {100, 200, 300}
●: Multi-Layer Perceptron (MLP): alpha ∈ {0.0001, 0.001, 0.01, 0.1}; learning_rate = adaptive; max_iter ∈ {300, 500, 1000}

These configurations ensure an appropriate optimization of the attack models, leading to a conservative and rigorous assessment of our defense.

3.2.5. Model Training Step

At the training stage, an identical set of machine learning algorithms as used in prior research—KNN, Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, and MLP—are trained on every constructed dataset. In all experiments, the dataset is split into training and test subsets following an 80:20 ratio. The test subsets are solely reserved for performance evaluation of the defense in Step 6, ensuring that the results accurately represent the generalization capability of the attack models.

3.2.6. Classification Evaluation

In this study, the attack success rate is defined in terms of the classification accuracy of the attacker’s model. To enable a meaningful comparison with prior defense approaches, we report the performance evaluation results as a percentage reduction, calculated as the relative decrease in accuracy with respect to the baseline performance.

In the concluding state, defensive performance is quantified by assessing the ability of the trained classifiers to differentiate between genuine mouse and bogus data within the held-out test data. We employ standard metrics—Accuracy, Precision, Recall, F1-Score, and AUC [24]—to evaluate classification outcomes. A greater reduction in these metrics, indicative of weaker attacker performance, signifies a stronger defense, as it implies increased complexity in classifying genuine mouse trajectories from composite mouse data. The model and parameterization that result in the greatest decline of attack classification capability are considered the most robust GAN-based mouse data protection strategy.

4. Experimental Results

This section presents an evaluation of the proposed mouse data protection technique from four analytical perspectives. First, the impact of varying data processing strategies on classification effectiveness is studied. Second, the influence of different feature definitions is explored. Third, the optimal dataset configurations identified from prior analyses are evaluated, with focus on different data generation interval settings. Lastly, variations in attack effectiveness are assessed relative to different GAN models. Across these analyses, the GAN model achieving the highest level of defensive performance is determined.

4.1. Performance Evaluation According to Different Data Processing Strategies

The first experiment explores the effect of data normalization on the efficacy of the defense technique presented. We directly compare the performance of GAN models trained with non-normalized (raw) data to those trained with normalized data. For each data generation interval, the classification accuracy of the attacker’s models is measured across all constructed datasets, and Figure 8 provides an overview of the experimental outcomes.

The findings demonstrate that training GAN models, including CopulaGAN, CGAN, and WGAN-GP, on normalized data substantially reduces the classification performance of machine learning–based attacks across every evaluated dataset. In contrast, when these models are trained on non-normalized data, most configurations experience a lesser decrease in attack performance, and in the CGAN case, normalization does not lead to a notable reduction in classifier effectiveness. This suggests that generating synthetic mouse data using normalized feature datasets is more successful at impairing the attacker’s classifiers. Therefore, we recommend normalization-based preprocessing as the most effective data processing strategy for GAN-based mouse data protection.

4.2. Performance Evaluation According to Feature Definitions

The second experiment assesses the influence of various feature definitions on the defense technique’s performance. We evaluate three different feature sets.

For each feature set (F1, F2, and F3), we train the GAN models and corresponding attacker classifiers, followed by a detailed analysis of the resulting degradation in classifier performance. The results are illustrated in Figure 9, Figure 10 and Figure 11.

The results demonstrate that feature set F2 results in the most significant decrease in attack performance in several scenarios. Nevertheless, F2 is constructed solely from elapsed time and inter-point distances, excluding the absolute X and Y coordinates. This lack of absolute positional data complicates the accurate modeling and realistic synthesis of mixed trajectories when both genuine and bogus mouse data are present, and also indicates that these features were initially designed to facilitate classification from the attacker’s perspective, rather than robust obfuscation from the defender’s perspective. As a result, F2 is not regarded as a suitable foundation for a principled defense strategy in our feature configuration, leading us to concentrate our detailed analysis on F1 and F3.

When F1 and F3 are utilized, the observed reduction in attack effectiveness is as follows. For CopulaGAN, the attack accuracy decreases by approximately 9% to 30% with F1, and by 12% to 41% with F3. In the case of CGAN, the reduction spans from 9% to 31% with F1 and 13% to 44% with F3. For WGAN-GP, the decline ranges from 9% to 29% with F1 and 12% to 39% with F3. In summary, the experimental results reveal that F3, which incorporates elapsed time, absolute X and Y coordinates, and differences between consecutive coordinates, consistently produces the greatest reduction in attack performance across all models. Therefore, in the subsequent experiments, F3 is chosen as the main feature configuration for assessing the impact of optimal dataset selection and model-specific performance.

4.3. Performance Evaluation According to Data Generation Intervals

The third evaluation assesses the effectiveness of data generation intervals determined through earlier analyses of data processing methods and feature selection. In this evaluation, (i) datasets produced using normalized features and (ii) the F3 feature set, which includes elapsed time, current X coordinate, current Y coordinate, and distances between sequential X and Y coordinates, are used. Utilizing these settings, we assess the average performance reduction observed for the attacker’s classifiers, with the outcomes displayed in Figure 12, Figure 13 and Figure 14.

The results demonstrate that, for all data generation intervals—50 ms, 100 ms, 250 ms, and 500 ms—the optimal configurations for each GAN model are achieved when CopulaGAN, CGAN, and WGAN-GP are each trained for 200 epochs. In the macro-averaged evaluation metrics, the 50 ms generation interval achieves the most substantial reduction in attack performance overall, while the 250 ms interval shows the least reduction. However, for the Gradient Boosting model, the decrease in classification accuracy is limited to about 10% across all tested configurations, indicating this model is comparatively less impacted by the addition of bogus data.

Among the GAN models evaluated, CGAN exhibits the highest defensive effectiveness. Notably, at a 50 ms generation interval, CGAN decreases the KNN model’s performance by about 44%, and at a 250 ms interval, by about 43%. These findings indicate that the use of normalized data, the F3 feature set, and well-tuned GAN training—particularly CGAN at 200 epochs with short generation intervals—offers an effective approach for reducing the success rate of machine learning–based mouse data inference attacks.

4.4. Performance Impact Analysis According to GAN Models

For the final evaluation, we assess the impact of each GAN model’s performance by comparing the relative changes in attack performance between this study and previous work. This assessment considers datasets created using (i) normalized features and (ii) the F3 feature set (elapsed time, current X coordinate, current Y coordinate, and distances between consecutive coordinates), and examines performance trends across different data generation intervals. For each GAN model, we analyze the datasets that yield the highest and lowest reductions in attack performance as observed in the 50 ms and 250 ms generation intervals, respectively. The results are summarized in Table 5, Table 6 and Table 7.

To measure the defensive impact, we calculate the percentage change in key evaluation metrics—Accuracy, Precision, Recall, F1-Score, and AUC—for each configuration by comparing the baseline attack performance from previous studies to the performance observed when applying our GAN-based defense technique. This comparison is conducted separately for CopulaGAN, CGAN, and WGAN-GP. In line with previous findings, macro-averaged metrics show that datasets generated using a 50 ms interval lead to the most substantial reduction in attacker performance, while datasets with a 250 ms interval yield the least reduction. For each GAN model, instead of assessing all possible datasets, we focus on representative datasets that produce the greatest reduction in attack performance.

For CopulaGAN, among the datasets generated with the most impactful interval (50 ms), dataset 1-1 is chosen. In this selected dataset, the highest performance degradation is observed for both the MLP and KNN classifiers: Accuracy is reduced by up to 44% (MLP), Precision by 42% (KNN and MLP), Recall by 53% (MLP), F1-Score by 48% (MLP), and AUC by 43% (MLP). For the interval with the lowest impact (250 ms), dataset 3-3 is chosen. Even here, significant degradation persists: Accuracy decreases by 46% (MLP), Precision by 47% (MLP), Recall by 42% (KNN), F1-Score by 44% (KNN and MLP), and AUC by 44% (MLP).

For CGAN, among the 50 ms datasets, dataset 1-4 is selected as the most impactful scenario. In this case, the most considerable reductions are again found for MLP and KNN: Accuracy is reduced by 48% (MLP), Precision by 49% (MLP), Recall by 48% (KNN), F1-Score by 48% (KNN), and AUC by 47% (MLP). At the 250 ms interval, dataset 3-1 is chosen. Here, KNN experiences the highest degradation, with reductions in Accuracy and Precision by 45%, Recall by 47%, F1-Score by 46%, and AUC by 43%.

For WGAN-GP, within the 50 ms interval, dataset 1-2 is identified as the representative for the highest-impact group. The MLP model shows the greatest reduction: both Accuracy and Precision fall by 48%, Recall drops by 62%, F1-Score decreases by 56%, and AUC by 49%. For the 250 ms interval, dataset 3-2 is selected. In this case, MLP and KNN both experience notable decreases: Accuracy and Precision decline by 48% (MLP), Recall by 44% (KNN), F1-Score by 45% (KNN and MLP), and AUC by 48% (MLP).

4.5. Statistical Analysis

Table 8 reports t-statistic, confidence interval, and p-value to validate performance differences across models and datasets, and Table 9 presents a statistical divergence analysis between actual and GAN-generated bogus mouse data (KL and JS divergence).

The results indicate that the WGAN-GP and CopulaGAN models generate bogus mouse data that is most similar to actual user mouse data. In particular, CopulaGAN achieves higher similarity in terms of spatial features, whereas WGAN-GP shows superior similarity with respect to temporal features. Although CGAN shows the largest divergence—suggesting relatively lower similarity to actual data—it, together with WGAN-GP, achieves a substantial decrease in attack success rate in the attack evaluation. Overall, the proposed approach can reduce the attack success rate by up to 48%, even when an attacker collects both actual and injected bogus mouse data and applies machine learning-based classification models. Moreover, this reduction is greater than that achieved by the baseline setting, demonstrating the improved effectiveness of our defense. This outcome is attributable to the fact that image-based authentication procedures typically generate a large volume of mouse movement data; consequently, the injected bogus mouse data introduces additional ambiguity into the attacker’s classification models, making it more difficult to reliably steal valid actual mouse data.

5. Conclusions

This article introduced an enhanced WM_INPUT message–based mouse data protection strategy for image-based user authentication, utilizing various GAN models to significantly diminish the effectiveness of machine learning–driven mouse data attacks. To thoroughly evaluate the defense’s effectiveness, we performed an extensive assessment across different data processing approaches, feature definitions, and data generation intervals, using standard classification metrics—Accuracy, Precision, Recall, F1-Score, and AUC—to quantify attacker performance.

Our experimental results indicate that the most effective configuration is obtained when bogus mouse data are generated using the CGAN model trained on normalized features derived from the F3 feature set, which comprises elapsed time, current X and Y coordinates, and the distances between consecutive coordinates. With this configuration, the injected bogus mouse trajectories result in the most significant reduction in the attacker’s classifier performance. In comparison with previous studies, where CTGAN-based defense lowered the attack accuracy by up to 37%, our CGAN- and WGAN-GP–based defenses achieve a reduction of up to 48%, representing an approximate 11% enhancement in defensive effectiveness. These results provide empirical evidence that meticulously designed GAN-based mouse data generation markedly enhances the robustness of image-based authentication against sophisticated machine learning–driven mouse data attacks.

In summary, the findings demonstrate that using CGAN and WGAN-GP to generate realistic bogus mouse data through the WM_INPUT message offers a viable and robust method for protecting image-based authentication information. For future research, we intend to explore three-dimensional GAN-based models that capture spatiotemporal mouse dynamics and more extensive behavioral context, aiming to further reduce the distinction between genuine and bogus mouse trajectories and to bolster resistance against increasingly advanced attack models. Additionally, we plan to address practical implementation challenges, including the optimization of data injection intervals, and computational considerations to facilitate real-world deployment.

Author Contributions

Conceptualization, J.K. and K.L.; methodology, J.K. and K.L.; software, J.K.; validation, J.K. and K.L.; data curation, J.K.; writing—original draft, J.K. and K.L.; writing—review and editing, K.L.; supervision, K.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was mainly supported by the Technology Innovation Program (RS-2025-02317682, Shipyard-Partner Digital Production Cooperation Cloud Server Construction and Platform Development) funded By the Ministry of Trade, Industry & Energy (MOTIE, Korea) (50%). This research was also supported by the Regional Innovation System & Education (RISE) program through the Jeollanamdo RISE center, funded by the Ministry of Education (MOE) and the Jeollanamdo, Republic of Korea. (2025-RISE-14-001) (50%).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Acknowledgments

This study was conducted as a research project commissioned by the Korean Association of Cybersecurity Studies (Project title: ‘2025 Cyber Security Papers Contest’, Project period: August 2025~September 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

National Information Society Agency. Survey on the Internet Usage. Available online: https://nia.or.kr/site/nia_kor/ex/bbs/View.do?cbIdx=99870&bcIdx=27869&parentSeq=27869&ref=blog.stibee.com (accessed on 11 March 2025).
Ayo, F.; Awotunde, J.; Olalekan, O.; Imoize, A.; Li, C.; Lee, C. CBFISKD: A Combinatorial-Based Fuzzy Inference System for Keylogger Detection. Mathematics 2023, 11, 1899. [Google Scholar] [CrossRef]
MSDN. GetCursorPos Function (winuser.h). Available online: https://learn.microsoft.com/ko-kr/windows/win32/api/winuser/nf-winuser-getcursorpos (accessed on 18 March 2025).
Lee, K.; Yim, K.; Lee, S. Vulnerability analysis on the image-based authentication: Through the WM_INPUT message. Concurr. Comput. Pract. Exp. 2019, 32, e5596. [Google Scholar]
MSDN. SetCursorPos Function (winuser.h). Available online: https://learn.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-setcursorpos (accessed on 18 March 2025).
Lee, K.; Yim, K. A Protection Technique for Screen Image-based Authentication Utilizing the WM_INPUT message. In Proceedings of the Korean Society of Computer Information Conference, Busan, Republic of Korea, 11–13 January 2018; Volume 26, pp. 177–178. [Google Scholar]
Jung, W.; Hong, S.; Lee, K. Mouse Data Attack Technique Using Machine Learning in Image-Based User Authentication: Based on a Defense Technique Using the WM_INPUT Message. Electronics 2024, 13, 710. [Google Scholar]
Jung, W. A Mouse Data Protection Method using Generative Adversarial Networks for Image-Based Authentication. Master’s Thesis, Graduate School of Mokpo National University, Muan-gun, Republic of Korea, 2023; pp. 1–70. [Google Scholar]
Conti, M.; Duroyon, M.; Orazi, G.; Tsudik, G. Acoustic Side-Channel Attacks on a Computer Mouse. Detect. Intrusions Malware Vulnerability Assess. 2024, 14828, 44–63. [Google Scholar]
Cecconello, S.; Compagno, A.; Conti, M.; Lain, D.; Tsudik, G. Skype & Type: Keyboard Eavesdropping in Voice-over-IP. ACM Trans. Priv. Secur. 2019, 22, 1–34. [Google Scholar]
Yu, J. Research on Airgap Attack and Defense Techniques in Closed Network Environment. Ph.D. Thesis, Sejong University, Seoul, Republic of Korea, 2023. [Google Scholar]
Iliou, C.; Kostoulas, T.; Tsikrika, T.; Katos, V.; Vrochidis, S.; Kompatsiaris, I. Web Bot Detection Evasion Using Deep Reinforcement Learning. In Proceedings of the 17th International Conference on Availability, Reliability and Security, Vienna, Austria, 23–26 August 2022; pp. 1–10. [Google Scholar]
Antal, M.; Buza, K.; Fejer, N. SapiAgent: A Bot Based on Deep Learning to Generate Human-Like Mouse Trajectories. IEEE Access. 2021, 9, 124396–124408. [Google Scholar] [CrossRef]
Acien, A.; Morales, A.; Fierrez, J.; Vera-Rodriguez, R. BeCAPTCHA-Mouse: Synthetic mouse trajectories and improved bot detection. Pattern Recognit. 2022, 127, 1–13. [Google Scholar] [CrossRef]
Niu, H.; Wei, A.; Song, Y.; Cai, Z. Exploring visual representations of computer mouse movements for bot detection using deep learning approaches. Expert Syst. Appl. 2023, 229, 120225. [Google Scholar] [CrossRef]
Xu, L.; Skoularidou, M.; Cuesta-Infante, A.; Veeramachaneni, K. Modeling tabular data using conditional GAN. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, NC, Canada, 8–14 December 2019; Volume 659, pp. 7335–7345. [Google Scholar]
Seong, D.; Lim, Y. A Study on the Optimization of Data Augmentation Ratio using CTGAN. In Proceedings of the Annual Conference of KIPS, Busan, Republic of Korea, 2–4 November 2023; Volume 30, pp. 327–330. [Google Scholar]
Hong, C.; Lee, J. VaR Estimation of Multivariate Distribution Using Copula Functions. Korean J. Appl. Stat. 2011, 24, 523–533. [Google Scholar] [CrossRef]
Byeon, H. Predictive Modeling for Employee Turnover Intention in South Korea Using CopulaGAN and Isolation Forests. J. Korean Soc. Mech. Technol. 2024, 26, 1244–1253. [Google Scholar]
Kim, H.; Lee, D. Image denoising with conditional generative adversarial networks (CGAN) in low dose chest images. Nucl. Instrum. Methods Phys. Res. Sect. A Accel. Spectrometers Detect. Assoc. Equip. 2020, 954, 161914. [Google Scholar] [CrossRef]
Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A. Improved training of wasserstein GANs. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5769–5779. [Google Scholar]
Deepa, B.; Ramesh, K. Epileptic seizure detection using deep learning through min max scaler normalization. Int. J. Health Sci. 2022, 6, 10981–10996. [Google Scholar] [CrossRef]
Won, J.; Shin, J.; Kim, J.; Lee, J. A Survey on Hyperparameter Optimization in Machine Learning. J. Korean Inst. Commun. Inf. Sci. 2023, 48, 733–747. [Google Scholar] [CrossRef]
Vujovic, Z. Classification Model Evaluation Metrics. Int. J. Adv. Comput. Sci. Appl. 2021, 12, 599–606. [Google Scholar] [CrossRef]

Figure 1. Mouse data attack scenario using WM_INPUT message.

Figure 2. Overview of performance evaluation of the GAN-enhanced mouse data defense technique.

Figure 3. Mouse data defense technique methodology using GAN.

Figure 4. Visualization of the distribution of bogus data generation in the X coordinate of the CopulaGAN model.

Figure 5. Visualization of the distribution of bogus data generation in the X coordinate of the CGAN model.

Figure 6. Visualization of the distribution of bogus data generation in the X coordinate of the WGAN-GP model.

Figure 7. Overview of the real-time capture of WM_INPUT data and injection of fake WM_INPUT events.

Figure 8. Performance evaluation results under different data processing strategies. (a–c) correspond to the CopulaGAN, CGAN, and WGAN-GP models, respectively. The x-axis denotes the generation cycle of mouse data, and the y-axis denotes classification performance.

Figure 9. Performance evaluation according to feature definitions of the CopulaGAN model. The x-axis denotes the features corresponding to F1, F2, and F3, and the y-axis denotes classification performance.

Figure 10. Performance evaluation according to feature definitions of the CGAN model. The x-axis denotes the features corresponding to F1, F2, and F3, and the y-axis denotes classification performance.

Figure 11. Performance evaluation according to feature definitions of the WGAN-GP model. The x-axis denotes the features corresponding to F1, F2, and F3, and the y-axis denotes classification performance.

Figure 12. Performance evaluation results according to data generation intervals of the CopulaGAN model. The x-axis denotes the generation interval of mouse data, and the y-axis denotes classification performance.

Figure 13. Performance evaluation results according to data generation intervals of the CGAN model. The x-axis denotes the generation interval of mouse data, and the y-axis denotes classification performance.

Figure 14. Performance evaluation results according to data generation intervals of the WGAN-GP model. The x-axis denotes the generation interval of mouse data, and the y-axis denotes classification performance.

Table 1. Summary of experimental results of mouse data attack technique using machine learning models.

Dataset	Generation Interval	Feature	Model	Accuracy (Attack Success Rate)
Dataset 4–6	500	ALL	Gradient Boosting, Random Forest	1.000
Dataset 1–6	50	ALL	KNN	1.000
Dataset 1–7	50	ALL	Gradient Boosting, Decision Tree	1.000
Dataset 4–6	500	F2	Gradient Boosting	1.000
Dataset 2–6	500	F2	Random Forest	0.998

ALL: Elapsed time, current X and Y coordinates, distances between consecutive X and Y coordinates. F2: Elapsed time, distances between successive X and Y coordinates.

Table 2. List of abbreviations used in this paper.

Abbreviation	Description
GAN	Generative Adversarial Networks
CTGAN	Conditional Tabular Generative Adversarial Networks
CGAN	Conditional Generative Adversarial Networks
CopulaGAN	Copula Generative Adversarial Networks
WGAN-GP	Wasserstein Generative Adversarial Networks—Gradient Penalty
AUC	Area Under Curve
KNN	K-Nearest Neighbors
MLP	Multi-Layer Perceptron

Table 3. Overall dataset configuration reconstructed using GAN model outputs.

Dataset	Generation Intervals	Aggregate Count of Mixed Mouse Movements	Aggregate Count of Genuine Mouse Movements	Aggregate Count of Bogus Mouse Movements	Ratio
1–1	50 ms	32,006	16,003	16,003	5:5
1–2		28,018	14,009	14,009
1–3		24,010	12,005	12,005
1–4		20,564	10,282	10,282
1–5		16,022	8011	8011
1–6		12,000	6000	6000
1–7		8008	4004	4004
2–1	100 ms	32,006	16,003	16,003	5:5
2–2		28,018	14,009	14,009
2–3		24,010	12,005	12,005
2–4		20,564	10,282	10,282
2–5		16,022	8011	8011
2–6		12,000	6000	6000
2–7		8008	4004	4004
3–1	250 ms	32,006	16,003	16,003	5:5
3–2		28,018	14,009	14,009
3–3		24,010	12,005	12,005
3–4		20,564	10,282	10,282
3–5		16,022	8011	8011
3–6		12,000	6000	6000
3–7		8008	4004	4004
4–1	500 ms	32,006	16,003	16,003	5:5
4–2		28,018	14,009	14,009
4–3		24,010	12,005	12,005
4–4		20,564	10,282	10,282
4–5		16,022	8011	8011
4–6		12,000	6000	6000
4–7		8008	4004	4004

Table 4. Hyperparameter search space of all classification models.

Model	Hyperparameter
KNN	metric: euclidean, n_neighbors: 3~20
Logistic Regression	C: 0.001~1000, max_iter: 100~300
Decision Tree	max_depth: 3~20
Random Forest	max_depth: 10~30, min_samples_leaf: 5, min_samples_split: 5, n_estimators: 100~300
Gradient Boosting	learning_rate: 0.01~0.3, max_depth: 3~9, n_estimators: 100~300
MLP	alpha: 0.0001~0.1, learning_rate: adaptive, max_iter: 300~1000

Table 5. Summary of performance evaluation results according to the relative performance change of the CopulaGAN model.

Da/GI	M	D	AC	±	P	±	R	±	F	±	AU	±
1–1/ 50 ms	KNN	RD	0.994	−43%	0.990	−42%	0.999	−44%	0.994	−43%	0.998	−42%
	KNN	GAN	0.562	−43%	0.568	−42%	0.562	−44%	0.565	−43%	0.583	−42%
	LR	RD	0.845	−33%	0.841	−32%	0.856	−38%	0.849	−35%	0.930	−41%
	LR	GAN	0.515	−33%	0.523	−32%	0.474	−38%	0.497	−35%	0.522	−41%
	DT	RD	0.993	−21%	0.994	−22%	0.993	−19%	0.993	−21%	0.993	−18%
	DT	GAN	0.780	−21%	0.771	−22%	0.802	−19%	0.786	−21%	0.818	−18%
	RF	RD	0.994	−24%	0.993	−25%	0.995	−22%	0.994	−23%	1	−16%
	RF	GAN	0.754	−24%	0.745	−25%	0.780	−22%	0.762	−23%	0.837	−16%
	GB	RD	0.998	−11%	0.998	−13%	0.998	−8%	0.998	−10%	1	−5%
	GB	GAN	0.891	−11%	0.872	−13%	0.920	−8%	0.895	−10%	0.953	−5%
	MLP	RD	0.986	−44%	0.986	−42%	0.987	−53%	0.987	−48%	0.998	−43%
	MLP	GAN	0.550	−44%	0.567	−42%	0.460	−53%	0.508	−48%	0.568	−43%
3–3/ 250 ms	KNN	RD	0.998	−44%	0.997	−45%	0.999	−42%	0.998	−44%	0.999	−42%
	KNN	GAN	0.561	−44%	0.550	−45%	0.575	−42%	0.562	−44%	0.578	−42%
	LR	RD	0.740	−22%	0.757	−25%	0.721	−13%	0.738	−19%	0.825	−30%
	LR	GAN	0.517	−22%	0.506	−25%	0.593	−13%	0.546	−19%	0.521	−30%
	DT	RD	0.996	−20%	0.996	−21%	0.996	−18%	0.996	−20%	0.995	−16%
	DT	GAN	0.801	−20%	0.785	−21%	0.818	−18%	0.801	−20%	0.831	−16%
	RF	RD	0.996	−21%	0.995	−24%	0.998	−15%	0.997	−20%	1	−13%
	RF	GAN	0.790	−21%	0.753	−24%	0.850	−15%	0.798	−20%	0.872	−13%
	GB	RD	0.998	−10%	0.999	−14%	0.997	−6%	0.998	−10%	0.999	−4%
	GB	GAN	0.896	−10%	0.859	−14%	0.941	−6%	0.898	−10%	0.958	−4%
	MLP	RD	0.997	−46%	0.996	−47%	0.998	−41%	0.997	−44%	1	−44%
	MLP	GAN	0.540	−46%	0.527	−47%	0.590	−41%	0.557	−44%	0.565	−44%

Da/GI: Dataset/Generation Interval, LR: Logistic Regression, DT: Decision Tree, RF: Random Forest, GB: Gradient Boosting, RD: Random Dataset, GAN: GAN dataset, AC: ACcuracy, P: Precision, R: Recall, F: F1-score, AU: AUC, ±: Increase/Decrease rate.

Table 6. Summary of performance evaluation results according to the relative performance change of the CGAN model.

Da/GI	M	D	AC	±	P	±	R	±	F	±	AU	±
1–4/ 50 ms	KNN	RD	0.994	−47%	0.990	−48%	0.999	−48%	0.994	−48%	0.998	−46%
	KNN	GAN	0.528	−47%	0.513	−48%	0.523	−48%	0.518	−48%	0.542	−46%
	LR	RD	0.845	−36%	0.841	−36%	0.856	6%	0.849	−22%	0.930	−41%
	LR	GAN	0.488	−36%	0.485	−36%	0.914	6%	0.633	−22%	0.520	−41%
	DT	RD	0.993	−24%	0.994	−26%	0.993	−23%	0.993	−24%	0.993	−21%
	DT	GAN	0.755	−24%	0.738	−26%	0.768	−23%	0.752	−24%	0.786	−21%
	RF	RD	0.994	−25%	0.993	−27%	0.995	−25%	0.994	−26%	1	−18%
	RF	GAN	0.741	−25%	0.727	−27%	0.743	−25%	0.735	−26%	0.825	−18%
	GB	RD	0.998	−16%	0.998	−18%	0.998	−14%	0.998	−16%	1	−8%
	GB	GAN	0.840	−16%	0.819	−18%	0.861	−14%	0.839	−16%	0.920	−8%
	MLP	RD	0.986	−48%	0.986	−49%	0.987	−28%	0.987	−41%	0.998	−47%
	MLP	GAN	0.505	−48%	0.492	−49%	0.704	−28%	0.580	−41%	0.528	−47%
3–1/ 250 ms	KNN	RD	0.998	−45%	0.997	−45%	0.999	−47%	0.998	−46%	0.999	−43%
	KNN	GAN	0.549	−45%	0.549	−45%	0.529	−47%	0.539	−46%	0.567	−43%
	LR	RD	0.740	−24%	0.757	−26%	0.721	−20%	0.738	−23%	0.825	−32%
	LR	GAN	0.500	−24%	0.498	−26%	0.524	−20%	0.511	−23%	0.503	−32%
	DT	RD	0.996	−20%	0.996	−22%	0.996	−17%	0.996	−20%	0.995	−15%
	DT	GAN	0.797	−20%	0.780	−22%	0.824	−17%	0.801	−20%	0.846	−15%
	RF	RD	0.996	−21%	0.995	−22%	0.998	−18%	0.997	−20%	1	−13%
	RF	GAN	0.791	−21%	0.775	−22%	0.815	−18%	0.795	−20%	0.873	−13%
	GB	RD	0.998	−10%	0.999	−12%	0.997	−7%	0.998	−10%	0.999	−4%
	GB	GAN	0.901	−10%	0.877	−12%	0.931	−7%	0.903	−10%	0.963	−4%
	MLP	RD	0.997	−43%	0.996	−43%	0.998	−41%	0.997	−42%	1	−41%
	MLP	GAN	0.570	−43%	0.565	−43%	0.587	−41%	0.576	−42%	0.595	−41%

Da/GI: Dataset/Generation Interval, LR: Logistic Regression, DT: Decision Tree, RF: Random Forest, GB: Gradient Boosting, RD: Random Dataset, GAN: GAN dataset, AC: ACcuracy, P: Precision, R: Recall, F: F1-score, AU: AUC, ±: Increase/Decrease rate.

Table 7. Summary of performance evaluation results according to the relative performance change of the WGAN-GP model.

Da/GI	M	D	AC	±	P	±	R	±	F	±	AU	±
1–2/ 50 ms	KNN	RD	0.994	−46%	0.99	−45%	0.999	−44%	0.994	−45%	0.998	−45%
	KNN	GAN	0.538	−46%	0.538	−45%	0.555	−44%	0.546	−45%	0.548	−45%
	LR	RD	0.845	−34%	0.841	−34%	0.856	−39%	0.849	−36%	0.930	−43%
	LR	GAN	0.502	−34%	0.504	−34%	0.469	−39%	0.486	−36%	0.504	−43%
	DT	RD	0.993	−22%	0.994	−24%	0.993	−17%	0.993	−21%	0.993	−18%
	DT	GAN	0.773	−22%	0.750	−24%	0.822	−17%	0.784	−21%	0.818	−18%
	RF	RD	0.994	−24%	0.993	−25%	0.995	−22%	0.994	−24%	1	−16%
	RF	GAN	0.751	−24%	0.741	−25%	0.774	−22%	0.775	−24%	0.837	−16%
	GB	RD	0.998	−13%	0.998	−15%	0.998	−11%	0.998	−13%	1	−6%
	GB	GAN	0.865	−13%	0.847	−15%	0.892	−11%	0.869	−13%	0.935	−6%
	MLP	RD	0.986	−48%	0.986	−48%	0.987	−62%	0.987	−56%	0.998	−49%
	MLP	GAN	0.502	−48%	0.505	−48%	0.37	−62%	0.427	−56%	0.506	−49%
3–2/ 250 ms	KNN	RD	0.998	−46%	0.997	−45%	0.999	−44%	0.998	−45%	0.999	−45%
	KNN	GAN	0.542	−46%	0.545	−45%	0.555	−44%	0.550	−45%	0.548	−45%
	LR	RD	0.740	−23%	0.757	−24%	0.721	−27%	0.738	−25%	0.825	−31%
	LR	GAN	0.515	−23%	0.522	−24%	0.450	−27%	0.484	−25%	0.515	−31%
	DT	RD	0.996	−19%	0.996	−22%	0.996	−13%	0.996	−18%	0.995	−13%
	DT	GAN	0.808	−19%	0.776	−22%	0.871	−13%	0.821	−18%	0.861	−13%
	RF	RD	0.996	−20%	0.995	−23%	0.998	−15%	0.997	−19%	1	−13%
	RF	GAN	0.794	−20%	0.769	−23%	0.845	−15%	0.805	−19%	0.873	−13%
	GB	RD	0.998	−11%	0.999	−14%	0.997	−6%	0.998	−10%	0.999	−4%
	GB	GAN	0.888	−11%	0.856	−14%	0.935	−6%	0.894	−10%	0.958	−4%
	MLP	RD	0.997	−48%	0.996	−48%	0.998	−41%	0.997	−45%	1	−48%
	MLP	GAN	0.520	−48%	0.521	−48%	0.585	−41%	0.551	−45%	0.525	−48%

Da/GI: Dataset/Generation Interval, LR: Logistic Regression, DT: Decision Tree, RF: Random Forest, GB: Gradient Boosting, RD: Random Dataset, GAN: GAN dataset, AC: Accuracy, P: Precision, R: Recall, F: F1-score, AU: AUC, ±: Increase/Decrease rate.

Table 8. Summary of performance comparison to validate differences across models and datasets.

CopulaGAN
Feature	t-Statistic	Confidence Interval (95%)	p-Value (One-Tailed)
F1	10.683	0.182~0.298	6.222 × 10⁻⁵
F2	19.562	0.378~0.492	3.222 × 10⁻⁶
F3	5.816	0.154~0.398	1.060 × 10⁻³
CGAN
Feature	t-statistic	Confidence interval (95%)	p-Value (one-tailed)
F1	6.948	0.143~0.310	4.744 × 10⁻⁴
F2	19.586	0.389~0.506	3.203 × 10⁻⁶
F3	5.700	0.164~0.433	1.160 × 10⁻³
WGAN-GP
Feature	t-statistic	Confidence interval (95%)	p-Value (one-tailed)
F1	7.109	0.129~0.275	4.270 × 10⁻⁴
F2	20.924	0.363~0.464	2.309 × 10⁻⁶
F3	5.877	0.151~0.387	1.013 × 10⁻³

Table 9. Statistical divergence analysis between actual and GAN-generated bogus mouse data (KL and JS Divergence).

CopulaGAN
Feature	KL (Actual\|\|GAN)	KL (GAN\|\|Actual)	JS Divergence
TIME	0.020863	0.636583	0.007110
X	0.054311	0.051512	0.012997
Y	0.031778	0.030115	0.007667
DIFFPOSX	0.008541	0.003357	0.000876
DIFFPOSY	0.003836	0.003828	0.000955
CGAN
Feature	KL (Actual\|\|GAN)	KL (GAN\|\|Actual)	JS divergence
TIME	0.003274	0.003264	0.000817
X	0.517183	0.156080	0.022026
Y	0.112007	0.247569	0.026299
DIFFPOSX	0.010827	0.005645	0.001435
DIFFPOSY	0.005141	0.005043	0.001270
WGAN-GP
Feature	KL (Actual\|\|GAN)	KL (GAN\|\|Actual)	JS divergence
TIME	0.010806	0.009362	0.002478
X	0.045696	0.094837	0.011406
Y	0.197605	0.059712	0.010226
DIFFPOSX	0.003893	0.003814	0.000961
DIFFPOSY	0.004054	0.007541	0.000999

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kim, J.; Lee, K. Mouse Data Protection in Image-Based User Authentication Using Two-Dimensional Generative Adversarial Networks: Based on a WM_INPUT Message Approach. Electronics 2026, 15, 292. https://doi.org/10.3390/electronics15020292

AMA Style

Kim J, Lee K. Mouse Data Protection in Image-Based User Authentication Using Two-Dimensional Generative Adversarial Networks: Based on a WM_INPUT Message Approach. Electronics. 2026; 15(2):292. https://doi.org/10.3390/electronics15020292

Chicago/Turabian Style

Kim, Jinwook, and Kyungroul Lee. 2026. "Mouse Data Protection in Image-Based User Authentication Using Two-Dimensional Generative Adversarial Networks: Based on a WM_INPUT Message Approach" Electronics 15, no. 2: 292. https://doi.org/10.3390/electronics15020292

APA Style

Kim, J., & Lee, K. (2026). Mouse Data Protection in Image-Based User Authentication Using Two-Dimensional Generative Adversarial Networks: Based on a WM_INPUT Message Approach. Electronics, 15(2), 292. https://doi.org/10.3390/electronics15020292

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Mouse Data Protection in Image-Based User Authentication Using Two-Dimensional Generative Adversarial Networks: Based on a WM_INPUT Message Approach

Abstract

1. Introduction

2. Background and Related Works

3. Proposed Mouse Data Protection Technique

3.1. Methodology of the Proposed Mouse Data Defense Technique

3.2. Dataset Configuration for Experimental Evaluation

3.2.1. Feature Extraction Step

3.2.2. Bogus Data Generation Step

3.2.3. Data Preprocessing Step

3.2.4. Dataset Configuration Step

3.2.5. Model Training Step

3.2.6. Classification Evaluation

4. Experimental Results

4.1. Performance Evaluation According to Different Data Processing Strategies

4.2. Performance Evaluation According to Feature Definitions

4.3. Performance Evaluation According to Data Generation Intervals

4.4. Performance Impact Analysis According to GAN Models

4.5. Statistical Analysis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI