1. Introduction
In recent years, the adoption of biometric technologies for user authentication has increased significantly. This trend is driven by the unique advantages of biometrics, including distinctiveness, convenience, and high security. For example, fingerprint and facial recognition systems demonstrate high accuracy by leveraging users’ personal and physiological characteristics while minimizing the need for user cooperation during the authentication process. In addition to these modalities, physiological biosignals such as electrocardiogram (ECG), electroencephalogram (EEG), electromyogram (EMG), and photoplethysmogram (PPG) signals have attracted attention for their potential to enhance security. Unlike externally observable traits, these signals are internally generated and inherently dynamic, reflecting an individual’s unique biological state in real time. Their continuous variability and resistance to replication or spoofing provide strong protection against common biometric threats such as forgery and replay attacks, thereby contributing to a more robust and secure authentication framework [
1,
2,
3,
4,
5].
However, despite these advantages, biometric authentication systems have inherent limitations, particularly when the physiological or psychological states of users fluctuate. Biometric features, especially those derived from biosignals, can be significantly affected by transient conditions such as stress, fatigue, emotional state, and illness. These factors may alter the underlying physiological processes, leading to changes in signal morphology, amplitude, or frequency, which reduces a system’s ability to match acquired signals with stored templates reliably. Consequently, authentication accuracy may degrade, increasing the rates of false rejections or even false acceptances in some cases [
6,
7,
8].
This variability challenges the stability and permanence of biometric traits, which are the foundational assumptions of most recognition systems. To mitigate these issues, researchers have proposed a variety of adaptive and resilient approaches [
9,
10,
11]. One promising direction is the development of dynamic biometric systems that can adjust their decision thresholds or update templates over time based on the user’s current state. For example, adaptive template-updating strategies can accommodate gradual changes in biosignal patterns and improve long-term system reliability. Additionally, machine and deep learning techniques have been leveraged to construct context-aware models that can be generalized across varying conditions by learning discriminative features that remain robust to physiological fluctuations [
12].
Another complementary strategy is the use of multimodal biometric systems that combine multiple sources of biometric information, including either multiple biosignals (e.g., ECG and PPG) or a combination of behavioral and physiological traits. By integrating multiple modalities, these systems can compensate for the degradation of a single signal source under certain conditions, enhancing overall recognition performance and fault tolerance [
13,
14,
15]. Fusion techniques can be applied at different levels such as sensors, features, scores, and decisions to improve accuracy and reduce vulnerability to spoofing and environmental noise [
16,
17,
18]. Together, these approaches contribute to building more resilient and reliable biometric authentication systems suitable for real-world dynamic user environments.
To address the challenges outlined above, this paper proposes a user authentication system that leverages the characteristics of graph neural networks (GNNs) to maintain high accuracy while adapting to changes in the user context. GNNs are well-suited to processing non-Euclidean and irregular data structures, enabling the modeling of spatial and temporal dependencies through their networked architecture [
19,
20,
21,
22,
23,
24]. These features make them particularly effective at recognizing complex and dynamic patterns. The contributions of this study can be summarized as follows:
A novel application of GNNs for EMG-based user authentication is proposed, enabling robust modeling of temporal and spatial dependencies in biosignal data.
The proposed GNN model demonstrates strong long-term generalization, maintaining performance even under fluctuating physiological and psychological conditions.
An EMG dataset collected across multiple sessions over several months is constructed and utilized, providing a realistic evaluation of authentication performance under diverse conditions.
The adaptability of the proposed system to nonstationary user states is analyzed, highlighting its potential for real-world deployment.
These contributions highlight the novelty and practical significance of this work, positioning it as a step toward more reliable biosignal-based authentication systems.
Furthermore, GNNs can process multiple input nodes simultaneously, enabling the integration and analysis of diverse biometric signals such as ECG, EEG, or PPG in a unified framework [
25,
26]. This multi-signal fusion facilitates a more comprehensive understanding of user identity and contributes to the maintenance of high accuracy, even under fluctuating environmental or physiological conditions. By exploiting these strengths, the proposed system aims to provide a more resilient and adaptive biometric authentication framework that can operate reliably in real-world dynamic settings.
While a wide range of biometric modalities such as facial recognition, iris scanning, and palm vein identification are commonly employed in biometric signal research, this study focused on the use of EMG signals. EMG data, which measure the electrical activity generated by muscle contractions, are characterized by significant inter-user variability and temporal instability. These properties make EMG one of the most difficult physiological signals to use for obtaining consistently accurate authentication results.
This study utilized EMG datasets collected from the same users at different times of the day and over several months, enabling the evaluation of authentication performance under diverse and realistic conditions. By examining the performance of authentication models under varying physiological states and environmental contexts, this study explored the feasibility and reliability of EMG-based biometric systems. Furthermore, various test environments were constructed to assess the adaptability of the system to real-time physiological changes, including fluctuations in the users’ health conditions, stress levels, and fatigue, all of which can significantly influence EMG signals. A GNN-based approach was designed to process these dynamically changing signals, offering a means of capturing both the spatial and temporal dependencies inherent in the data. The GNN model enables adaptive learning from complex patterns, thereby improving authentication performance under nonstationary conditions. All the experiments conducted in this study were implemented using the PyTorch framework (version 2.4.1), which supports efficient model training and evaluation with high flexibility and scalability.
The remainder of this paper is organized as follows.
Section 2 describes the construction of the EMG dataset, details of data collection, and preprocessing methods.
Section 3 introduces the structure and training method of the GNN, which forms the core of this study and explains the theoretical background of how the GNN addresses the user authentication problem.
Section 4 summarizes the experimental results and evaluates the performance of the proposed method by comparing it with existing biometric authentication systems. Finally,
Section 5 concludes this paper and discusses future research directions and potential improvements.
3. Graph Neural Network
GNNs are a class of neural networks designed to process graph-structured data directly. Unlike traditional neural networks that operate on Euclidean data such as images or sequences, GNNs can capture complex relationships and dependencies between nodes by leveraging graph topology. This capability makes GNNs particularly effective for applications involving social networks, molecular structures, and other data that are naturally represented as graphs.
Graph convolutional network convolution (GCNConv) is a widely used type of graph convolutional layer that generalizes the concept of convolution from grid-structured data to graphs [
31]. It aggregates feature information from a node’s neighbors in the graph, enabling the network to learn local patterns and node representations that incorporate both node features and graph connectivity. This operation facilitates effective learning on graph data by considering both node attributes and the graph structure. GCNConv is particularly well-suited for processing structured non-Euclidean data such as graphs derived from biometric signals, where spatial or temporal relationships can be encoded in the form of edges.
The GCNConv layer updates each node’s representation by aggregating its neighboring node features, which are then combined with the features of the target node. The graph structure is encoded in an adjacency matrix and aggregation is normalized to account for variations in the node degree, thereby improving training stability and ensuring a balanced influence among neighbors. The aggregated features are passed through learnable weight matrices and nonlinear activation functions, resulting in expressive node embeddings. One of the key strengths of GCNConv is its ability to integrate the structural context during training, allowing the model to learn from both the individual characteristics of each node and interactions between connected nodes. This property is particularly valuable in graph-based tasks such as node classification, link prediction, and graph classification. Despite its relatively simple computational structure, GCNConv provides strong representational capacity with few parameters and a low risk of overfitting, making it applicable to a wide range of datasets.
In this study, a graph was constructed using a fixed k-nearest neighbors (k-NN) approach, rather than dynamically computing edges based on data similarity. Although this method does not fully capture semantic or adaptive relationships, it preserves the core functionality of GCNConv by providing a consistent local neighborhood for each node, enabling effective feature propagation across graph structures. This approach is particularly effective for modeling time series EMG signals, where the relationships between signal channels or segments can be encoded in graph topology. The choice of a fixed k-NN graph ensures a consistent structure across samples and significantly reduces computational overhead, thereby providing a practical balance between efficiency and representational power. Although this approach may limit the expressiveness of the graph compared with data-driven or learned edge construction methods, the use of GCNConv still allows the network to leverage the local connectivity and structural patterns essential for robust user authentication.
Despite these advantages, GCNConv has certain limitations. One of the most well-known issues is over-smoothing, which occurs as the number of layers increases. In this phenomenon, node representations become increasingly similar across network layers, eventually leading to the loss of distinctive features and convergence toward average representations. Additionally, GCNConv treats all neighboring nodes with equal importance, making it difficult to reflect the differences in the influence of individual neighbors during learning. Consequently, its performance may be limited when applied to graphs with complex structures or non-uniform node influences. Furthermore, for large-scale graphs, adjacency-matrix-based operations can become computationally intensive, raising challenges related to efficiency and memory usage. Regardless, GCNConv remains a powerful and widely applicable tool for graph-based data analysis owing to its simple structure and intuitive operation. To address structural limitations and scalability issues, ongoing research has yielded various improvements [
32].
In this study, the GCNConv layer was implemented using a residual architecture to preserve the original input features during propagation, and the overall network was structured as a Siamese architecture. This allows the model to embed paired input data using GCNConv layers and perform user authentication based on the Euclidean distances between the resulting embeddings. To enhance generalization performance, batch normalization and dropout were applied within the GCN layers with a dropout rate of 0.5. In this case, a two-layer GCNConv architecture was employed, and the number of time steps was fixed at 2000. Two additional convolution layers were added to reduce the overall computational load. To reduce the temporal dimensions of the input, two one-dimensional convolutional layers (Conv1D) were applied sequentially. The number of time steps was reduced from 2000 to 199 in the first layer and further to 19 in the second layer according to the following convolutional output size formula:
where
Lin is the input length,
K is the kernel size,
S is the stride,
P is the padding, and
Lout is the output length. By incorporating additional Conv1D layers, the model achieves a substantial reduction in both the number of parameters and computational complexity, and enables more efficient feature vector fusion. All network components were implemented using the PyTorch framework.
Figure 2 illustrates the overall architecture of the proposed neural network used for user authentication based on paired EMG data inputs.
For the edge construction used in the GCNConv operations, the k-NN algorithm was employed to ensure consistency in the edge index across all samples. The value of k was fixed at five, allowing the use of a uniform edge index structure throughout training and inference. The k-NN algorithm constructs edges by identifying the k nearest neighbors to each node (or data point) based on a distance metric. This method enables the transformation of unstructured data or samples distributed in continuous spaces into graph structures that capture underlying relationships.
A key feature of the k-NN algorithm is its ability to generate meaningful graph topologies, even when explicit relational information is absent. By leveraging the similarity or proximity between data points, the k-NN algorithm creates edges that reflect the local structure of the data. This approach is particularly advantageous for time-continuous sensor data, image pixel values, and high-dimensional feature spaces, where the nearest neighbors can help define edge indices that naturally encode spatial or temporal relationships. In such cases, the k-NN-based edge index plays a crucial role in establishing message-passing paths that are essential for GNN operations.
This approach has several advantages. First, it enables consistent edge structures across different data samples, as long as the relative similarity or distance patterns are preserved. This consistency contributes to stable training and improved generalization performance. Second, k-NN is highly flexible and can be applied to irregular input data that do not conform to regular grids. This algorithm automatically identifies local neighborhoods, allowing the construction of graph structures for a wide range of data types. Third, it is computationally efficient and easy to implement using techniques such as the KD-tree, ball tree, or brute-force search. When the graph is pre-computed and remains static, the additional computational cost during training is negligible.
Furthermore, this method emphasizes the local structure of data by connecting each node to its nearest neighbors, facilitating the preservation of local context during the message-passing process, which is beneficial for noise suppression and overfitting mitigation in GCNConv and other GNN layers. However, this approach has some limitations that must be considered. Notably, model performance can be sensitive to the choice of k, which may require empirical tuning or domain-specific knowledge for an appropriate selection.
In summary, constructing an edge index using the k-NN algorithm provides a practical and efficient method of structuring unstructured or high-dimensional data into graphs. This approach enables robust and flexible input representations for GNN-based models and supports stable localized learning in graph convolution operations.
While preserving data characteristics, hyperparameter optimization was performed using PyTorch to reduce computational complexity. Through this process, the total number of parameters was reduced. Although the proposed network consists of only 23,592 trainable parameters, it leverages GCNConv layers, which effectively capture the structural information of the graph by aggregating neighborhood features. In this study, a fixed k-NN algorithm was used to construct a consistent and stable edge index for the graph, enabling the model to maintain a uniform graph structure across different inputs. This approach allows expressive representation learning, even with a small model footprint, making it suitable for resource-constrained environments such as real-time user authentication systems or embedded devices.
However, despite the use of fixed k-NN graphs to simplify the graph construction process, the GCNConv layers still introduce certain computational challenges. The message-passing mechanism and sparse matrix multiplications inherent in GCNConv operations can lead to substantial computational overhead, particularly as the number of nodes or graph complexity increases. While the fixed k-NN approach helps maintain consistent edge connectivity and can improve batching efficiency compared with dynamically computed graphs, the irregular nature of graph data poses challenges in terms of memory consumption and computational cost compared with traditional convolutional neural networks (CNNs).
In summary, the proposed architecture benefits from the compact parameterization and strong structural awareness provided by GCNConv layers combined with fixed k-NN graph construction. This balance helps ensure efficient and scalable performance; however, careful optimization of both the network design and graph construction remains essential for deployment in real-world resource-limited settings.
To maximize generalization performance during the training process, additional techniques such as the sharpness-aware minimization (SAM) algorithm, k-fold cross-validation, and threshold shift method were employed [
33,
34]. Unlike conventional loss minimization approaches, SAM is an advanced optimization technique that aims to find flat minima by considering the geometry of the loss landscape around the model parameters. Rather than simply reducing the loss value at the current parameter point, SAM encourages the model to learn such that the loss remains stable in the vicinity of these parameters. This approach provides improved generalization performance, even when tested on unseen data or under domain shifts.
Traditional gradient-descent-based optimizers tend to converge to sharp local minima, where small changes in parameter values can lead to large increases in loss. Although such minima may offer high performance on the training dataset, they are often sensitive to new data and prone to overfitting. In contrast, SAM explicitly considers the sharpness of the loss surface and updates the parameters in a direction that leads to a smoother loss landscape, resulting in model parameters that perform robustly under a wide range of conditions.
SAM operates in two steps. First, it identifies the point within a neighborhood of the current parameters that maximizes the loss value and then updates the parameters to minimize the loss at that worst-case point. This procedure can be understood as a form of worst-case-aware training, which enhances the robustness of the model to diverse data distributions. Mathematically, SAM minimizes the local sharpness of the loss function, thereby guiding the model toward more generalizable representations.
The adoption of SAM helps reduce sensitivity, even in complex graph structures or noisy data, thereby enabling stable representation learning. When combined with models such as GCNConv, which aggregate structural information, SAM facilitates the more robust integration of signals from neighboring nodes. In summary, SAM is a strategic optimization method that balances loss minimization and loss surface flatness, preventing the model from overfitting the training data and ensuring strong predictive performance in unseen or varying environments. In this study, SAM was employed to improve the training stability and generalization ability of a graph-based model simultaneously.
To enhance the reliability and robustness of the training process, 10-fold cross-validation was employed with a batch size of 64. The k-fold cross-validation technique is widely used to estimate model performance, particularly when the dataset size is limited. By partitioning the data into k equally sized folds and iteratively training the model on k − 1 folds while validating the remaining folds, this method ensures that each data point is used for both training and validation, which not only reduces the risk of overfitting but also provides a more comprehensive evaluation of the generalization ability of the model.
The use of 10 folds strikes a practical balance between computational efficiency and statistical reliability, while a batch size of 64 provides stable training dynamics without requiring GPU acceleration. When applied in conjunction with the GCNConv network, k-fold cross-validation further supports the model’s ability to generalize across different graph structures. Because GCNConv aggregates information from neighboring nodes, variations in the graph topology or local node connectivity can lead to diverse learning dynamics. Cross-validation helps mitigate this issue by exposing the model to a broader variety of training–validation splits, thereby encouraging robustness in structural representation learning. Overall, the combination of 10-fold cross-validation and GCNConv provides a strong foundation for stable and generalizable performance across complex graph-based datasets, even in CPU-only training environments.
The SAM algorithm showed minimal contribution in terms of training accuracy but yielded an improvement of approximately 1–2% in generalization performance, particularly when evaluated on test data collected three months later. In contrast, k-fold cross-validation did not result in a direct performance gain; however, it is expected to contribute to training stability, particularly in cases where the dataset size is limited.
Additionally, this study incorporated the threshold shift technique to adjust the interpretation of model outputs flexibly and optimize classification performance according to real-world application requirements. Given that user authentication is typically performed over extended periods, the system is designed to allow the adaptive adjustment of the classification threshold if authentication performance degrades over time. Specifically, in scenarios where accuracy declines after several months, the decision threshold can be shifted to maintain reliable performance. Although the accuracy improvement achieved through threshold adjustment over a three-month interval was relatively small (approximately 0.4–0.5%), this technique is expected to become more effective over longer timespans.
Threshold shift is a technique that enables more precise control over the tradeoff between sensitivity and specificity by dynamically adjusting the classification threshold, rather than applying a fixed cutoff value. In standard binary classification, outputs with a predicted probability of 0.5 or higher are typically classified as positive, while those below this probability are classified as negative. However, this approach does not adequately reflect class imbalances or application-specific requirements. For example, in domains such as medical diagnosis, anomaly detection, and security, false negatives can incur high costs, making it critical to prioritize higher sensitivity. In such cases, reducing the threshold to below 0.5 increases sensitivity at the expense of a higher false positive rate. Conversely, in applications where specificity is more important, increasing the threshold allows for more conservative predictions.
In this study, the optimal threshold was identified based on performance metrics such as the false positive rate and true positive rate on the validation dataset. This approach enhances both the reliability of the prediction results and their suitability for real-world applications.
Figure 3 illustrates the number of trainable parameters used in each layer of the model. A total of 23,592 parameters were used, with the largest portion concentrated in the dense bottom layer. Although the GCNConv layers contain relatively few parameters, they involve a significantly higher computational load because of the nature of graph-based operations.
4. Comparison Results
To evaluate the performance improvement achieved by the proposed model compared with baseline approaches, two key aspects were considered. First, the user authentication performance of each model was assessed under various physiological and psychological conditions. Specifically, data collected at the initial time, and one and two months later were used for training. These datasets included EMG signals recorded in diverse user conditions such as a stable resting state, post-physical-exercise fatigue, post-study mental stress, and post-relaxation states (e.g., after listening to music). By incorporating such variations, the experiment investigated how well each model could internalize and adapt to fluctuations in user states, which are common in real-world authentication scenarios. Second, the generalization capability of the models was evaluated using a new dataset collected three months after the initial time. This temporal shift provided an opportunity to evaluate the ability of the model to handle domain drift and long-term variability in user-specific biometric signals. For this comparison, two existing EMG-based user authentication models, one based on a Siamese neural network combined with a CNN and the other combining a CNN with long short-term memory (LSTM), were implemented and evaluated on the same dataset to ensure consistency [
35,
36]. For the CNN model, instead of focusing on a lightweight architecture, the CNN structure proposed in the CNN + LSTM model was adopted and implemented. For the GNN, a moving average was applied to reduce the number of input nodes to one-fourth, managing computational complexity and maintaining generalization performance, whereas for the CNN and CNN + LSTM models, the full 8000 time steps were used without any modification. The baseline comparison models were chosen from small-scale neural networks rather than large-scale ones, in order to better reflect the constraints of wearable environments.
Figure 4 presents the confusion matrices of the CNN, CNN + LSTM, and proposed GNN models evaluated on the training data. A confusion matrix is a performance evaluation tool that summarizes the classification results by comparing the predicted labels on the x axis with the actual (real) labels on the y axis. In this case, the confusion matrix is presented in the form of percentages for comparison. For the CNN model, 21,692 parameters were used, including 21,500 trainable parameters. In the case of the CNN + LSTM model, 17,268 parameters were used, of which 17,076 were trained. Because the target application of user authentication is a wearable environment, compact neural network architectures were used as baselines. Performance metrics such as accuracy, recall, precision, and F1 score, are summarized in
Table 1. The results clearly indicate that both the CNN and CNN + LSTM models struggled to authenticate users accurately across the various states included in the training data, suggesting that these models failed to extract robust and consistent features under dynamic user conditions, leading to poor interclass discrimination and identity verification. In contrast, the proposed GNN model demonstrated a substantially higher classification accuracy (98.8%), indicating its superior capacity to generalize over heterogeneous input conditions. This performance gain is attributable not only to the inherent graph-based representation power of GNNs but also to the use of optimization techniques such as SAM and k-fold cross-validation, which helped reinforce the model’s generalization during training.
The same evaluation procedure was applied to the dataset collected three months after the initial session (
Figure 5). In this scenario, the CNN and CNN + LSTM models exhibited a marked decline in performance, underscoring their limited adaptability to the evolving biometric patterns of the same user. This performance degradation implies that the feature representations learned by these models lack temporal robustness (
Table 2). Additionally, compared with the CNN model, the CNN + LSTM model demonstrated better adaptability to temporal variations and previously unseen data, which is likely attributable to the inherent ability of LSTM to retain and process sequential temporal information, enabling the more effective learning of time-dependent changes in biosignals. In contrast, the GNN model maintained performance levels comparable to those observed during training, suggesting strong generalization and temporal resilience. This result highlights the robustness of the model in terms of both intersession variability and real-world fluctuations in user states and environments. For the proposed GNN model, the accuracy on the training data was 98.8% and it maintained a high accuracy of 98.5% on the testing data collected three months later. Additionally, when defining instances involving the same individual as a positive, there is a tendency for all neural network models to exhibit an increased rate of false positives, where the predicted label indicates the same user despite the actual label corresponding to a different individual. This issue is particularly critical from a security perspective in user authentication systems and strategies to mitigate such false positives are critical.
Overall, these findings demonstrate the effectiveness of the proposed GNN-based approach for long-term state-resilient user authentication using EMG signals. The ability of the model to generalize across time and adapt to dynamic physiological conditions makes it a strong candidate for practical deployment in continuous biometric authentication systems. However, achieving >99% accuracy remains challenging. To overcome this limitation, additional network modules or multimodal architectures that incorporate additional biosignals may be required. Based on the current results, even with the use of the GNN model, constructing a fully reliable user authentication system using only EMG signals under diverse conditions may still be infeasible.
5. Conclusions and Discussion
Conventional biosignal-based models demonstrate relatively high accuracy in user authentication under stable conditions. However, their limited ability to adapt to temporal variations and diverse physiological or psychological states poses challenges for their practical deployment. To address these limitations, a GNN-based approach was introduced to model the dynamic nature of biosignal patterns and enhance adaptability across various user states.
The proposed method employs a GCNConv architecture with edge indices constructed using a fixed k-NN algorithm. During the training process, SAM, k-fold cross-validation, and a threshold shift technique were incorporated to improve model robustness and generalization. Authentication performance was evaluated across distinct conditions in a stable resting state, after physical exercise, following mental stress induced by study, and after meditation activities such as listening to music. EMG signals were collected from seven users at four different time points: initial, one month later, two months later, and three months later. The model achieved an accuracy of 98.8% on the training data, which included measurements from the initial session, as well as from one and two months later. When tested on the data collected three months later, which had not been used during training, the model maintained a high accuracy of 98.5%, demonstrating effective long-term generalization.
In summary, this study demonstrates that GNNs can effectively capture the temporal and contextual dynamics of EMG signals, thereby offering a robust and adaptable framework for user authentication. The novelty of this work lies in the application of graph-based modeling to biosignal authentication, which enables the system to adapt to diverse and evolving user states over extended periods. The key contribution is the demonstration of long-term generalization with high accuracy, even under varying physical and psychological conditions, achieving results that are consistently close to the 99% benchmark.
From a practical perspective, the findings suggest that GNN-based user authentication has strong potential for real-world deployment, where users inevitably experience dynamic physiological and psychological variations. Nevertheless, challenges remain, including the difficulty of consistently exceeding 99% accuracy and the need to ensure scalability for larger populations. Future improvements may include refinement of the neural network architecture, the integration of multimodal biosignals, and the development of adaptive algorithms that account for signal variability. Additionally, analyzing misclassification cases in detail will provide insights for enhancing system reliability and robustness. Overall, this work contributes a novel and practical direction toward the realization of commercially viable biosignal-based authentication systems.