A Secure Multimodal Biometric Data Protection Framework Using Optimized CNN, GAN-Based Privacy Preservation, and ElGamal Cryptography

Tynymbayev, Sakhybay; Razaque, Abdul; Chinibayeva, Tolganay; Temirbekova, Zhanerke; Chinibayev, Yersain; Hassan, Dina S. M.

doi:10.3390/app16136528

Open AccessArticle

A Secure Multimodal Biometric Data Protection Framework Using Optimized CNN, GAN-Based Privacy Preservation, and ElGamal Cryptography

by

Sakhybay Tynymbayev

¹,

Abdul Razaque

^1,*

,

Tolganay Chinibayeva

¹,

Zhanerke Temirbekova

¹,

Yersain Chinibayev

²

and

Dina S. M. Hassan

³

¹

Department of Computer Engineering, International IT University, Almaty 050000, Kazakhstan

²

Department of Software Engineering, Satbayev University, Almaty 050013, Kazakhstan

³

Department of Information Technology, College of Computer and Information Sciences, Princess Nourah Bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(13), 6528; https://doi.org/10.3390/app16136528

Submission received: 30 May 2026 / Revised: 25 June 2026 / Accepted: 25 June 2026 / Published: 30 June 2026

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Review Reports Versions Notes

Featured Application

The proposed Secure Biometric Data Protection framework is well suited for deployment in security-critical applications requiring high biometric authentication and privacy-preserving identity verification such as border control systems, smart healthcare platforms, financial authentication services, defense and military access control, e-government services, digital identity management infrastructures and more. The framework fuses optimized deep learning, synthetic biometric representation generation and cryptographic protection to enable the secure storage, transmission and verification of multimodal biometric data, also mitigating the risk of identity theft, template leakage, spoofing and unauthorized access.

Abstract

We propose a secure biometric data protection (SBDP) system, which uses artificial intelligence (AI) and encryption methods to prevent forgery and keep the biometric data private and intact. The proposed SBDP approach integrates deep learning-based feature extraction with robust encryption and authentication mechanisms in a single pipeline. We use the optimized convolutional neural network (OCNN) to obtain unique features from multimodal biometric inputs like fingerprints, facial photos, and retinal scans. This works well because it learns how to represent data efficiently. To reduce the risks of raw biometric exposure, we adopt a generative adversarial network (GAN) to generate synthetic biometric representations that maintain essential characteristics while reducing sensitivity to data leakage. The biometric features and images are encrypted using the ElGamal cryptosystem to provide security assurance, while the digital signature scheme based on the SHA-256 hash function is used to provide data integrity and authenticity. Experimental results show good performance of all components of the framework. The optimized CNN obtains a classification accuracy of more than 99.8%, while the GAN shows stable training behavior with the discriminator and generator losses converging to around 0.3 and 4.0, respectively. The cryptographic module guarantees encryption dependability and signature verification efficacy across all evaluated scenarios. The integrated system provides effective protection of biometric data from unauthorized access, tampering and identity forgery. The SBDP framework is a promising solution for defense, healthcare and digital identity management, ensuring secure transmission and storage of biometric data.

Keywords:

artificial intelligence; biometric data; cybersecurity; El-Gamal cryptosystem; machine learning

1. Introduction

Biometric authentication methods are commonly used in contemporary security systems. They offer reliable and easy identity authentication using unique physiological and behavioral characteristics such as images of the face, fingerprints, and retina patterns. These systems are used extensively by vital sectors such as defense, healthcare, financial services and digital identity management. However, there are security and privacy issues in the existing systems that may lead to template leakage, spoofing and identity theft despite their several advantages. Unlike traditional authentication data, biometric features are non-revocably and irreversibly, which brings serious challenges in protecting against reconstruction attacks and misuse [1]. Recent advances in artificial intelligence, especially deep learning, have greatly improved the accuracy of the biometric recognition systems. Convolutional neural networks (CNNs) are powerful tools to extract discriminative and hierarchical information from complex biometric data for robust detection under different environmental conditions. Existing advanced techniques use multimodal learning and complex representation techniques to improve the accuracy and robustness against spoofing attacks [2]. Multimodal biometric systems, which use multiple sources of information, have shown better performance than unimodal systems by reducing ambiguity and increasing robustness against malicious interference [3]. However, the fusion of multiple biometric modalities introduces further challenges in terms of privacy, preservation, secure storage, and transmission. It has been recently shown that deep feature representations can still leak sensitive information and are prone to reconstruction attacks, thus exposing biometric identities to adversaries [4,5]. In addition, emerging threats, such as presentation attacks and domain generalization issues, still affect the reliability of biometric systems in real-world deployments [6,7]. These limitations motivate a need for secure frameworks that achieve high recognition accuracy while providing strong guarantees for confidentiality, integrity and authenticity. Significant efforts have concentrated on employing cryptographic approaches in biometric systems to address these concerns. Public-key cryptosystems and secure authentication techniques have been proposed to guarantee confidentiality and resistance against interception attacks to protect biometric templates from being compromised during storage and transmission [8,9]. Privacy-preserving indexing and template protection mechanisms have been proposed to reduce the hazards of template leaking and illegal reconstruction [10]. However, traditional cryptographic methods are not sufficient to alleviate privacy issues, especially when raw biometric data or feature representations are disclosed during the processing. Meanwhile, privacy-preserving machine learning techniques have become a promising direction to improve the security of biometrics. Generative models and synthetic data generation techniques can convert sensitive biometric data into non-reversible representations while keeping the key identity information. Recent works have shown that privacy-enhancing transformations, such as feature bottleneck learning and biometric de-identification frameworks, can significantly reduce the risk of identity leakage while retaining the recognition performance [11,12]. Advances in robust feature learning and privacy-aware data publishing can further improve the security of biometric systems against adversarial threats [13,14]. Motivated by these challenges, we propose an SBDP framework that integrates optimized deep learning, generative modelling, and cryptographic security in a unified end-to-end architecture. The proposed framework can overcome the main drawbacks of the existing biometric systems through improved feature representation, reduced risk of sensitive biometric data exposure and secure storage and transmission. We propose a holistic and robust solution for secure biometric authentication via a combination of optimized convolutional neural networks for discriminative feature extraction, generative modelling for privacy preservation, and cryptographic mechanisms for data protection. This work advances the state-of-the-art in bridging the gap between high-performance biometric recognition and strong security guarantees. The proposed system provides a unified solution for recognition and protection, traditionally treated as two separate problems, improving robustness against adversarial attacks, reducing the possibility of biometric data leakage, and guaranteeing reliable identity verification in real-world scenarios. This renders the framework particularly relevant for deployment in security-critical settings, where both recognition and accuracy and data protection are of paramount importance. This paper is organized as follows: Section 2 provides a literature review on biometric encryption and AI applications in security. Section 3: Formulate the problem and mathematical modelling. Section 4 outlines the proposed framework. Section 5 describes the experimental setup and the results. Section 6 discusses the results and limitations. Section 7 concludes with implications, limitations and future directions.

1.1. Main Contributions

The major contributions of this article are summarized as follows:

▪: A unified optimization framework for secure multimodal biometric authentication is proposed. Instead of optimizing recognition accuracy, privacy preservation, cryptographic confidentiality, and authentication integrity separately as in traditional biometric systems, the proposed framework optimizes all of them simultaneously.
▪: We propose a GAN-based privacy-aware latent transformation mechanism to synthesize synthetic biometric embeddings that preserve identity-discriminative information and drastically reduce the biometric reconstruction risk, privacy leakage, and susceptibility to inversion attacks.
▪: An adaptive multimodal biometric fusion strategy based on secure latent feature learning is proposed for facial, retinal, and fingerprint modalities to improve authentication accuracy, robustness against spoofing attacks, and resilience to noisy or partially corrupted biometric inputs.
▪: We propose a secure end-to-end AI-cryptography framework to convert the raw multimodal biometric inputs into encrypted and digitally signed authentication outputs by tightly integrating the OCNN feature extraction, GAN-based privacy-preserving transformation, ElGamal encryption, and SHA-256 integrity verification. Experimental results show that it achieves high recognition accuracy, strong privacy protection, stable adversarial training behavior, and secure biometric authentication.

1.2. Novelty and Distinction of the Proposed Framework

Although the proposed SBDP framework employs well-established components such as convolutional neural networks, generative adversarial networks, ElGamal cryptography, and SHA-256 hashing, the novelty of this work does not reside in introducing an entirely new standalone algorithm. Rather, the novelty lies in the design of a unified optimization framework that jointly addresses biometric recognition, privacy preservation, encryption, and authentication integrity within a single end-to-end architecture. Unlike conventional biometric systems where feature extraction, template protection, privacy preservation, and cryptographic authentication are implemented as separate and independently optimized modules, The proposed SBDP framework formulates biometric recognition and privacy-preserving representation learning as jointly optimized objectives through the OCNN and GAN modules, while the cryptographic protection mechanisms, namely ElGamal encryption and SHA-256 integrity verification, are integrated procedurally to provide confidentiality and authenticity of the learned biometric representations. Consequently, the framework combines differentiable learning and deterministic cryptographic protection within a unified secure biometric pipeline. The optimized CNN learns discriminative multimodal biometric embeddings, while the GAN does not generate synthetic biometric images but instead transforms the learned latent representations into privacy-preserving synthetic embeddings that retain identity-discriminative characteristics while minimizing the possibility of biometric reconstruction and privacy leakage. Furthermore, the proposed framework introduces a secure transformation pipeline that maps raw multimodal biometric inputs directly into encrypted and digitally signed authentication outputs. This architecture tightly couples OCNN-based feature learning, GAN-based privacy-aware representation learning, ElGamal probabilistic encryption, and SHA-256 integrity verification into a unified security framework. Such an integrated optimization strategy is fundamentally different from existing approaches that primarily focus on a single protection mechanism, template transformation, encrypted matching, or biometric recognition independently. Therefore, the principal contribution of this work is not merely the integration of existing modules, but the formulation of a unified AI-cryptography framework that simultaneously optimizes recognition accuracy, privacy preservation, cryptographic confidentiality, and authentication integrity for secure multimodal biometric systems.

2. Related Work

Recent studies show increasing interest in integrating biometric recognition, template protection, encryption, and privacy-preserving learning. However, most existing approaches address only one or two components of the secure biometric pipeline, such as template transformation, encrypted matching, synthetic generation, or blockchain-based protection, rather than providing an end-to-end framework that jointly covers feature extraction, synthetic representation, encryption, and authentication. Wang et al. [15] proposed a cancelable template protection method based on the convolutional neural network for multimodal biometrics with iris and fingerprint traits. Their solution extracts feature from each modality, fuses them into a common representation and applies a cancelable transformation to protect the biometric template. The main advantage of this work is that it improves template revocability and strengthens protection against direct template compromise. However, the method mainly focuses on cancelable template transformation and does not provide an integrated cryptographic layer for encrypted storage, secure transmission, or digital signature-based integrity verification. Therefore, although it is relevant to multimodal template protection, it does not fully address the combined requirements of confidentiality, authenticity, and privacy-preserving synthetic representation. Zaiyu et al. [16] developed a secure multimodal biometric framework using a deep ConvGRU-based architecture. They enhance the performance of multimodal authentication through a combination of preprocessing, feature extraction and hashing. The major advantage of this method is that it can model spatial and sequential dependencies using deep learning, which may improve the discriminative biometric representation. The solution is mainly based on hashing and deep feature learning, but it does not fully harness the more robust public key encryption, digital signatures and synthetic feature generation. This method improves recognition and template protection, but does not ensure secure transmission of biometrics and verifiable integrity. A decentralized fuzzy vault-based multimodal biometric authentication scheme using blockchain technology was proposed by Shuyi et al. [17]. Their system uses biometric data from face and dorsal hand biometrics and an improved fuzzy vault mechanism for privacy preserving authentication. The advantage of this approach is the blockchain allows for more decentralization, auditability and single point compromise resiliency. However, the storage on the blockchain introduces additional communication and computational overhead that may reduce the suitability for lightweight or real-time biometric environments. Moreover, the work does not focus on GAN-based privacy-preserving biometric representation or CNN-optimized multimodal feature extraction, which limits its closeness to AI-driven biometric data protection. Dacan et al. [18] introduced enhanced biometric template protection schemes based on distance-preserving hashing for IoT-based biometric authentication. Their work is important because IoT biometric systems require lightweight template protection and efficient matching under constrained computational resources. The advantage of the proposed method is its focus on practical biometric template protection in distributed environments. However, distance-preserving hashing may still require careful security analysis against correlation, inversion, and linkage attacks. In addition, the method does not integrate asymmetric encryption, SHA-based digital signature verification, or synthetic biometric generation, which are central to the proposed SBDP framework. Xinghan et al. [19] proposed a cancelable random masking framework combined with lightweight deep learning for secure finger-vein authentication. The solution transforms biometric templates using cryptographic random masks and then applies CNN-based learning directly on masked inputs. The main advantage of the method is that it provides cancelability, revocability, interpretability and real-time authentication while minimizing the risk of template inversion and replay attacks. The work is modality-specific, as it deals with finger-vein biometrics rather than multimodal biometric inputs like face, fingerprint and retina. It also does not contain the synthetic biometric generation or ElGamal-based encryption with SHA-256 signature verification. This restricts its application in a larger secure multimodal biometric system. Zhaosen et al. [20] proposed a privacy-preserving face recognition framework using FaceNet/ArcFace embeddings, locality-preserving projection and CKKS homomorphic encryption. The main advantage of this work is that encrypted-domain matching allows biometric comparison without revealing facial embeddings, hence enhancing privacy in cloud-based biometric recognition. The use of dimensionality reduction also helps reduce ciphertext size and computational overhead. However, homomorphic encryption remains computationally expensive and may not be feasible for all resource-constrained biometric systems. In addition, the framework is mostly face-centric and does not support multimodal fusion, GAN-based privacy-preserving synthetic generation, and ElGamal digital-signature authentication. Yongluo et al. [21] proposed AEGIIS, a quantum-proof biometric authentication framework using binary lattices and homomorphic encryption for cancelable templates. The main advantage of this approach is its resistance to classical and quantum attacks, which is more and more important with regard to long-term biometric security. The use of lattice-based mechanisms provides strong theoretical security and improves the revocability of protected biometric templates. However, the framework may introduce higher mathematical and computational complexity, especially for practical large-scale deployment. Furthermore, it focuses mainly on quantum-resistant template protection and does not provide a complete AI–cryptography pipeline involving optimized CNN feature extraction, GAN-based synthetic representation, ElGamal encryption, and SHA-256-based integrity validation. The existing studies confirm that recent biometric security research is moving toward cancelable templates, encrypted-domain matching, blockchain-assisted storage, and privacy-preserving deep learning. Nevertheless, most existing works remain limited to a specific biometric modality, a single protection mechanism, or a partial security pipeline. In contrast, the proposed SBDP framework combines optimized CNN-based multimodal feature extraction, GAN-based privacy-preserving biometric representation, ElGamal encryption, and SHA-256 signature verification in a unified architecture. This integration directly addresses the major gaps in existing work by jointly improving recognition reliability, biometric confidentiality, integrity verification, and resistance to identity forgery. Table 1 presents a comparative comparison of the existing biometric security frameworks and the proposed SBDP framework.

3. Problem Formulation and Mathematical Modeling

The present work is focused on the design of a unified biometrics system capable of ensuring high recognition accuracy, strong privacy protection and cryptographic security. Multimodal biometrics refers to the use of heterogeneous data sources like facial images, iris scans, fingerprint patterns, and so on. The input space consists of such sources where each modality provides complementary identity information. The goal is to learn a mapping function that maps these multi-modal inputs into concise and discriminative representations without information leakage and guarantees secure storage and transmission. The proposed formulation differs from traditional biometric systems, where extraction, template protection, and authentication are treated as separate processes. Instead, it views these elements as interdependent functions that are optimized together. The challenge is to unify feature learning with privacy and security constraints in a single objective. In particular, the system must learn mode-invariant feature representations that maximize classification performance while ensuring that the extracted features do not contain enough information to reconstruct the original biometric inputs. To formalize this objective, the feature extraction process is modelled using an optimized convolutional neural network, which defines a non-linear transformation from the input space to a latent feature space. However, directly utilizing such learned representations poses a privacy risk, as these features may still contain reconstructable information. To address this issue, the formulation incorporates a generative mechanism that converts the learned features into synthetic representations. These representations need to satisfy two competing constraints, (i) to retain identity-discriminative information for accurate classification, and (ii) to be robust to inversion or reconstruction attacks to ensure privacy. In addition to representation learning and privacy preservation, the formulation also imposes severe cryptographic constraints on the feature space. We need probabilistic encryption to map the transformed biometric features into a secure domain, where the probability of recovering the original feature vector from the encrypted representation is computationally negligible. An integrity constraint is enforced by means of a cryptographic hashing mechanism that allows for reliable detection of any modification to the encrypted data. These requirements introduce additional dependencies between the learning process and the security layer as the generated features should be compatible with the encryption and the verification operations. The global problem can be formulated as a constrained multi-objective optimization problem, aiming to maximize recognition performance while minimizing privacy leakage and satisfying cryptographic security constraints. This leads to a tightly coupled pipeline in which multimodal feature extraction, synthetic data generation, encryption, and integrity verification are jointly optimized. The proposed formulation facilitates a consistent transformation from raw biometric inputs to secure and verifiable output representations, thus overcoming the fundamental limitations of existing biometric systems regarding data exposure, tampering, and identity forgery.

3.1. Multimodal Input Space and Notation

The multimodal biometric dataset be defined as:

X = {\{x^{(n)}\}}_{n = 1}^{N}, x^{(n)} = (x_{f}^{(n)}, x_{r}^{(n)}, x_{p}^{(n)})

(1)

Each biometric observation is annotated with a discrete identity label representing its corresponding class in the recognition task. These labels provide the ground truth required for supervised optimization of the feature extraction model. It should be noted that the biometric datasets used in this study, namely CelebA for facial images, UBIRIS v2 for iris images, and the fingerprint dataset for fingerprint samples, are independent publicly available datasets and do not contain biometric samples collected from the same individuals. Consequently, the experiments are not intended to establish subject-level multimodal identity correspondence across modalities. Instead, the proposed SBDP framework is evaluated using a representation-level multimodal protocol, in which each biometric modality is processed independently by the OCNN to extract discriminative latent representations. These latent features are subsequently transformed by the GAN into privacy-preserving synthetic embeddings and fused into a unified multimodal representation. The primary objective of this experimental protocol is to evaluate the capability of the proposed framework to perform secure multimodal feature learning, privacy-preserving synthetic representation generation, multimodal fusion, and cryptographic protection using ElGamal encryption and SHA-256 integrity verification. Therefore, the reported results should be interpreted as an evaluation of the integrated privacy-preserving biometric security framework rather than a benchmark on a naturally paired multimodal biometric identity dataset. The collection of identity labels is defined as:

Y = {\{y^{(n)} \in {1,2, \dots, K}\}}_{n = 1}^{N}

(2)

where

H

,

W

, and

C

are height, width and number of channels respectively,

x_{f}^{(n)} \in R^{H_{f} \times W_{f} \times C_{f}}

denotes the face modality;

x_{r}^{(n)} \in R^{H_{r} \times W_{r} \times C_{r}}

denotes the retina modality, and

x_{p}^{(n)} \in R^{H_{p} \times W_{p} \times C_{p}}

present the fingerprint modality,

X

denotes complete multimodal biometric dataset,

N

is total number of biometric samples in the dataset,

X^{(n)}

is the

n

-th multimodal biometric sample,

n

denotes the sample index,

Y

is the Set of identity labels for all biometric samples,

y^{(n)}

denotes the ground-truth identity label associated with the

n

-th biometric sample, and

K

is the total number of identity classes (subjects) in the dataset.

3.2. Preprocessing Transformation

A standard preprocessing pipeline is applied to each biometric modality prior to featuring extraction to provide uniformity, reduce noise and improve the quality of the input data. This step is essential for improving the robustness and generalization capability of the learning model, especially when dealing with heterogeneous multimodal inputs. The preprocessing stage includes spatial alignment, intensity normalization, and stochastic data augmentation to account for variations in illumination, scale, and orientation. Accordingly, each modality undergoes a deterministic preprocessing operator defined as follows:

T_{p r e p} (x) = A_{a u g} (N_{n o r m} (R_{r e s i z e} (x)))

(3)

where

R_{r e s i z e}

denotes the spatial normalization;

N_{n o r m}

is the intensity normalization, and

A_{a u g}

is the stochastic augmentation operator.

Thus, the processed sample becomes:

{\tilde{x}}^{(n)} = T_{p r e p} (x^{(n)})

(4)

3.3. Deep Feature Extraction via OCNN

Let the optimized CNN be parameterized by

Θ

. The hierarchical feature extraction is modeled as a composition of nonlinear operators. Thus, the deep feature representation

z^{(n)}

extracted from the CNN before the final embedding layer is given as follows:

z^{(n)} = F_{Θ} ({\tilde{x}}^{(n)}) = (ϕ_{L} \circ ϕ_{L - 1} \circ \dots \circ ϕ_{1}) ({\tilde{x}}^{(n)})

(5)

where

ϕ

denotes the transformation,

{\tilde{X}}^{(n)}

: denotes preprocessed multimodal biometric input corresponding to the

n

-th sample,

F_{Θ}

is the nonlinear mapping learned by the optimized CNN, and

L

denotes total number of feature extraction blocks in the CNN.

Each block transformation is defined as:

ϕ_{l} (u) = P_{p o o l} (σ (B_{b n} (C_{c o n v}^{(l)} (u))))

(6)

where

C_{c o n v}^{(l)}

denotes the convolution at layer

l

;

B_{b n}

is the batch normalization,

σ

denotes nonlinear activation, and

P_{p o o l}

denotes the pooling operator, and

ϕ_{l}

transformation performed by the

l

-th CNN block.

The final embedding

h^{(n)}

is defined as follows:

h^{(n)} = G_{f c} (z^{(n)}) \in R^{d}

(7)

where

G_{f c}

is the fully connected projection function that transforms the deep features into the embedding space,

d

: denotes the dimension of the final embedding vector, and

R^{d}

denotes

d

-dimensional real-valued embedding space.

3.4. Classification Objective

Following the feature extraction and representation learning stages, the task is to project the learned feature embeddings to the corresponding identity classes correctly. This is achieved by adopting a supervised classification framework, which trains the model to output higher probabilities for the correct class labels and lower probabilities for wrong predictions. The high-level feature representations are then fed into the classification layer, which outputs a probability distribution over all the possible classes, thereby allowing effective identity discrimination over the multimodal biometric inputs. The expected class distribution

{\hat{p}}^{(n)}

is then defined as

{\hat{p}}^{(n)} = Softmax ({W_{c}}^{(n)} + b_{c})

(8)

The classification loss

L_{c l s}

is defined as:

L_{c l s} = - \frac{1}{N} \sum_{n = 1}^{N} \sum_{k = 1}^{K} I (y^{(n)} = k) l o g {\hat{p}}_{k}^{(n)}

(9)

where

Softmax (\cdot)

is the softmax activation function,

W_{c}

and

b_{c}

represent the weight matrix and bias vector of the classification layer,

{\hat{p}}_{k}^{(n)}

denotes predicted posterior probability of the

k

-th class for the

n

-th biometric sample generated by the Softmax classifier, and

K

is the class index.

3.5. Privacy-Preserving Synthetic Feature Generation

To address the important issue of the leakage of biometric data and the reconstruction attacks, we propose a privacy-preserving feature transformation mechanism based on the generative adversarial framework. Instead of directly using the original feature representations, the model is trained to learn to generate synthetic feature embeddings that are still identity-discriminative but also sensitive-irrelevant to help with the reconstruction of original biometric inputs. Such an adversarial learning strategy imposes a trade-off between the feature utility and privacy, thereby improving the overall security of the biometric system. The synthetic feature representation

h_{s y n}^{(n)}

generated for the

n

-th sample is determined as follows:

h_{s y n}^{(n)} = G_{s y n} (u^{(n)}) where u^{(n)} \sim N (0, I)

(10)

The adversarial objective is determined by:

\underset{Θ_{g}}{m i n} \underset{Θ_{d}}{m a x} E_{h \sim P_{r e a l}} [l o g D_{a d v} (h)] + E_{u \sim P_{n o i s e}} [l o g (1 - D_{a d v} (G_{s y n} (u)))]

(11)

The supremum over all possible reconstruction functions (worst-case adversary) can be determined as follows:

\underset{R_{i n v}}{s u p} E [∥ R_{i n v} (h_{s y n}) - o b ∥_{2}^{2}] \geq ϵ

(12)

This ensures resistance against reconstruction attacks.

Where

G_{s y n}

denotes the generator function that maps a noise vector to a synthetic feature space,

u^{(n)}

is the random noise vector for the

n

-th sample,

N (0, I)

is the multivariate normal (Gaussian) distribution with zero mean and identity covariance matrix,

Θ_{g}

is the learnable parameter,

Θ_{d}

: denotes learnable parameters of the discriminator network,

D_{a d v}

denotes the discriminator function,

h \sim P_{r e a l}

reparents the real feature vector,

P_{r e a l}

is distribution of real biometric feature representations,

u \sim P_{n o i s e}

is the noise vector sampled from a predefined distribution,

P_{n o i s e}

denotes prior noise distribution,

E

is the expectation operator,

l o g

denotes the natural logarithm used in adversarial loss formulation,

R_{i n v}

is the reconstruction (inverse) function,

o b

denotes the original biometric input sample,

∥_{2}^{2}

is the squared Euclidean (L2) norm measuring reconstruction error,

E

is the expectation over the data distribution, and

ϵ

denotes the privacy threshold that defines the minimum acceptable reconstruction error.

3.6. Multimodal Fusion Representation

We propose an effective fusion mechanism to leverage complementary information from heterogeneous biometric modalities by merging modality-specific feature embeddings into a unified representation. This is important to improve recognition robustness and discriminative capability, since single modalities can be affected by noise, occlusion or acquisition variability. The model generates a holistic identity representation by combining features extracted from facial, retinal and fingerprint inputs, capturing common and modality-specific features. Thus, the modality-specific embeddings can be expressed as follows:

h_{f}^{(n)}, h_{r}^{(n)}, h_{p}^{(n)}

(13)

h_{f}^{(n)}

denotes feature embedding corresponding to the face modality for the

n

-th sample,

h_{r}^{(n)}

is the feature embedding corresponding to the retina modality,

h_{p}^{(n)}

denotes the feature embedding corresponding to the fingerprint modality for the

n

-th sample.

Fusion

h_{f u s i o n}^{(n)}

is defined as:

h_{f u s i o n}^{(n)} = Ψ_{f u s i o n} (h_{f}^{(n)} \oplus h_{r}^{(n)} \oplus h_{p}^{(n)})

(14)

where

\oplus

denotes concatenation and

Ψ_{f u s i o n}

is a learnable transformation.

3.7. Cryptographic Security Modeling (ElGamal)

The selection of ElGamal encryption in the proposed secure biometric data protection framework is driven by the security requirements of multimodal biometric authentication rather than by ciphertext compactness alone. We acknowledge that ElGamal, as a classical public-key cryptosystem based on the Diffie–Hellman assumption, incurs ciphertext expansion and increased storage and bandwidth requirements compared with lightweight symmetric encryption schemes. Moreover, ElGamal does not provide native homomorphic computation capabilities. However, these limitations do not significantly affect the proposed framework due to its hybrid architecture. First, the proposed SBDP framework does not encrypt raw biometric images directly. Instead, facial images, retinal scans, and fingerprint samples are initially processed by the OCNN to extract compact latent representations, which are subsequently transformed by the GAN into privacy-preserving synthetic embeddings. Therefore, ElGamal encryption is applied only to the fused synthetic biometric representation rather than to the original high-dimensional biometric data. This significantly reduces the impact of ciphertext expansion and limits additional storage and communication overhead. Second, ElGamal is a probabilistic encryption scheme in which a random ephemeral key is generated during each encryption process. Consequently, the same biometric feature vector produces different ciphertexts at different encryption instances. This probabilistic characteristic is highly desirable in biometric systems because it prevents ciphertext pattern leakage and improves resistance against replay attacks, statistical inference attacks, and biometric template correlation attacks. Third, the proposed framework adopts a layered security design in which OCNN performs discriminative feature extraction, GAN provides privacy-preserving transformation, ElGamal ensures confidentiality of biometric embeddings, and SHA-256 digital signatures guarantee integrity and authenticity. Within this hybrid architecture, the confidentiality benefits of ElGamal outweigh its moderate storage and bandwidth overhead, particularly because encryption is performed only on compact latent biometric representations. Therefore, although ElGamal introduces ciphertext expansion and lacks homomorphic properties, it provides an effective trade-off between strong confidentiality guarantees, probabilistic security, implementation simplicity, and compatibility with the proposed OCNN-GAN hybrid biometric protection framework. The fused biometric representation is stored and transmitted securely using a probabilistic public-key cryptographic mechanism. This approach injects randomness during encryption, unlike deterministic encryption schemes. This prevents any pattern leakage in the ciphertext and increases the robustness against statistical and chosen-plaintext attacks. This design is consistent with contemporary cryptographic requirements, where the security of sensitive feature representations is paramount to counter possible reconstruction or inference attacks, as also emphasized in studies on cryptographic resilience.

Let

(G, q, g)

be a cyclic group of prime order, the key generation is expressed as follows:

\begin{matrix} α & \sim Z_{q} \\ β & = g^{α} m o d q \end{matrix}

Encryption of feature vector is described as follows:

E (h_{f u s i o n}) = (g^{k} m o d q, h_{f u s i o n} \cdot β^{k} m o d q)

(15)

where

G

denotes cyclic group,

q

is a large prime number,

g

is a generator element of the cyclic group

G

,

α

is the private key, randomly selected from the finite field

Z_{q}

,

β

denotes public key component,

k

is a Ephemeral random value, and

E

is the encryption function that maps the input feature vector into ciphertext space.

3.8. Integrity and Authentication Constraint

To complement the confidentiality guarantees provided by encryption, the proposed framework incorporates an integrity and authentication mechanism to ensure that the secured biometric data remains unaltered and verifiable during storage and transmission. In practical biometric systems, encrypted data is not immune to tampering or substitution attacks, hence a cryptographic verification scheme binding the encrypted representation is essential. This is achieved through a secure hash-based signature generation process which generates a fixed-length digest uniquely associated with the encrypted feature representation, ensuring reliable verification of data authenticity.

H : {0,1}^{*} \to {0,1}^{256}

(16)

Signature generation is determined as follows:

s^{(n)} = H (E (h_{f u s i o n}^{(n)}))

(17)

Verification

V

condition is given by:

V (s^{(n)}, E (h_{f u s i o n}^{(n)})) = 1

(18)

where

H

is the cryptographic hash function,

{\{0,1\}}^{*}

is the set of binary strings of arbitrary length,

{\{0,1\}}^{256}

denotes set of 256-bit binary outputs, and

s^{(n)}

denotes generated digital signature for the

n

-th sample.

3.9. Unified Optimization Problem

To jointly address the interdependent tasks of recognition accuracy, privacy preservation, and cryptographic security, the proposed framework is presented as a unified optimization problem where all functional components are incorporated into a single learning objective. Such a formulation can impose a coordinated optimization strategy that considers the trade-off among the classification performance, adversarial robustness, privacy constraints, and security guarantees, unlike the conventional methods where extraction, adversarial learning, and security mechanisms are optimized independently. The overall objective function includes these components through weighted contributions, enabling the model to learn discriminative feature representations, produce privacy-preserving synthetic embeddings, and fulfill encryption and verification requirements in a unified training paradigm.

The complete framework is formulated as:

\underset{Θ, Θ_{g}}{m i n} \underset{Θ_{d}}{m a x} J = λ_{1} L_{c l s} + λ_{2} L_{a d v} + λ_{3} L_{p r i v} + λ_{4} L_{s e c}

(19)

The security loss component

L_{s e c}

which combines the decryption consistency and verification reliability is determined as follows:

L_{s e c} = E [∥ D_{d e c r y p t} (E (h)) - h ∥^{2}] + E [1 - V (s, E (h))]

(20)

where

J

is an overall unified objective,

λ_{1}, λ_{2}, λ_{3}, λ_{4}

are the non-negative weighting coefficients balancing different objectives,

D_{d e c r y p t}

denotes the decryption function mapping encrypted representation back to feature space, and

h

denotes the biometric feature embedding (latent feature vector).

3.10. Final Secure Mapping

We propose a formalization of the end-to-end behavior of the proposed framework by defining the whole pipeline as a unique secure mapping from the raw multimodal biometric input data to a protected and verifiable output representation. This mapping involves all the above-described stages, i.e., feature extraction, synthetic representation, encryption and integrity verification, but guarantees the preservation of identity-discriminative information in the output and the satisfaction of very strict security constraints. The formulation highlights that the system does not reveal any intermediate representations, but provides a compact output that contains classification, encrypted features and a cryptographic signature that can be securely stored, transmitted and verified through a single transformation. The overall system defines a secure transformation

S_{s e c u r e}

is determined as follows:

S_{s e c u r e} : I_{s m b} \to (\hat{y}, E (h_{f u s i o n}), H (E (h_{f u s i o n})))

(21)

where

I_{s m b}

denotes the input space of multimodal biometric samples.

The secure mapping allows a unified transformation from multimodal biometric inputs to protected authentication outputs, while retaining identity-discriminative information. The system combines feature extraction, synthetic representation learning, encryption and integrity verification in a single framework, thus avoiding exposure of intermediate biometric templates. The encrypted representation provides confidentiality, while the hash-based verification mechanism ensures data integrity and authenticity. In addition, the proposed mapping increases robustness against spoofing, replay and reconstruction attacks, and allows secure biometric storage and transmission in real-world authentication environments.

4. Proposed Optimized Convolutional Neural Network

The framework combines face, retina and fingerprint modalities to improve reliability, robustness and resistance against spoofing and identity leakage attacks. First, multimodal biometric inputs are acquired and passed through a preprocessing stage that includes resizing, normalization and augmentation to improve image quality and augment the diversity of the dataset. These operations help reduce noise variations and increase the consistency of biometric representations prior to feature extraction. The optimized CNN extractor is trained with preprocessed data to learn deep discriminative features of multimodal biometric data.

The convolutional layer along with batch normalization and the ReLU activation function in the CNN extractor enhance the learning of nonlinear features while stabilizing the training process. In the CNN extractor, max pooling is employed to down-sample the spatial dimensions and keep the most informative feature characteristics. Dropout regularization is applied in the CNN extractor to avoid overfitting by randomly turning off neurons in the training process and thus enhancing the generalization capability of the model. The learned deep features are input to the fully connected layer that converts them into high-level biometric representations suitable for classification. A probability distribution over various biometric classes is created by the softmax classifier and the final biometric identity representation or feature vector is generated. The extracted features are then protected using ElGamal encryption scheme with SHA-256 digital signature generation. The ElGamal encryption scheme provides data confidentiality, and the SHA-256 signature mechanism guarantees the data integrity and authenticity by identifying any unauthorized change during the storage and transmission. The framework ultimately enables secure transmission and storage of biometric data along with high authentication accuracy, privacy preservation and computational efficiency. Figure 1 depicts the complete architecture of the proposed Secure Biometric Data Protection framework for multimodal biometric authentication and secure data preservation. To further clarify the interaction among the proposed components, Figure 1 presents the complete data-flow architecture of the SBDP framework. The framework follows a sequential processing strategy in which raw multimodal biometric inputs, The multimodal biometric input space is made up of heterogeneous biometric sources such as facial images, iris images and fingerprint patterns, in which each modality provides complementary identity information. The iris modality is represented by images taken from the UBIRIS v2 database that provides unconstrained iris images acquired under visible wavelength conditions. The preprocessing operations consist of normalization, resizing, and augmentation. The preprocessed biometric images are then forwarded to the OCNN, which extracts discriminative latent feature representations from each modality. The extracted OCNN feature embeddings are then forwarded to the Generative Adversarial Network (GAN). Instead of directly working on the raw biometric images, the GAN translates the latent features generated by the OCNN into privacy-preserving synthetic representations that retain identity-discriminative information while suppressing the possibility of reconstructing the original biometric data. Such conversion improves privacy protection and lowers the risk of identity leakage. After generating synthetic features, the modality specific representations are fused by a multimodal fusion mechanism to create a unified biometric embedding. The fused embedding is then secured with ElGamal public-key cryptosystem, which provides probabilistic encryption and confidentiality for storage and transmission. Finally, a SHA-256 based digital signature is generated on the encrypted representation to ensure integrity, authenticity and tamper detection. Therefore, the complete data flow of the proposed framework is formally expressed as:

Multimodal Input → Preprocessing → OCNN Feature Extraction → GAN Privacy Transformation → Multimodal Fusion → ElGamal Encryption → SHA-256 Signature → Secure Authentication.

This explicit architecture makes it clear that privacy preservation based on GAN is done after the feature extraction of OCNN and before cryptographic protection. This clearly separates feature learning, privacy enhancement and security enforcement. Figure 2 depicts the comprehensive system framework and data-flow architecture of the proposed Secure Biometric Data Protection framework, highlighting the sequential interactions among multimodal preprocessing, OCNN feature extraction, GAN-based privacy-preserving synthetic feature generation, multimodal fusion, ElGamal encryption, SHA-256 digital signature generation, and secure biometric authentication.

4.1. Framework Overview

This section presents the proposed Secure Biometric Data Protection framework that integrates deep learning, generative modeling and cryptographic security in a unified end-to-end pipeline that transforms raw multimodal biometric data (e.g., facial images, retinal scans, fingerprint data) into secure and verifiable information. Unlike conventional approaches where extraction, privacy preservation and encryption are considered as separate components, the proposed framework optimizes these components jointly to achieve high recognition accuracy and provide guarantees of confidentiality, integrity and privacy.

It should be noted that the term “end-to-end” in the proposed SBDP framework refers to the complete data-flow transformation from raw multimodal biometric inputs to secure and verifiable authentication outputs rather than to a fully differentiable optimization process. The trainable learning components of the framework consist of the optimized convolutional neural network (OCNN) and the GAN-based privacy-preserving transformation module, which are optimized jointly through gradient-based learning. In contrast, the ElGamal encryption and SHA-256 digital signature modules are deterministic cryptographic operations and do not participate in backpropagation or parameter optimization. Instead, these cryptographic components are applied procedurally after the OCNN-GAN learning stage to provide confidentiality, integrity, and authenticity of the generated biometric representations. Therefore, the proposed framework should be interpreted as a hybrid AI-cryptography architecture in which differentiable learning modules and non-differentiable cryptographic modules are tightly integrated in a sequential and secure processing pipeline.

The proposed SBDP framework consists of four tightly coupled stages: multimodal pre-processing, optimized CNN-based feature extraction, GAN-based privacy-preserving transformation, and cryptographic protection using ElGamal encryption with SHA-256-based digital signature generation. Let the multimodal input space be denoted by

X = {x_{f}, x_{r}, x_{p}}

, where

x_{f}

,

x_{r}

, and

x_{p}

represent facial, retinal, and fingerprint biometric samples, respectively. Each input modality is first normalized and enhanced through a preprocessing function

P

, after which an optimized CNN extracts discriminative latent representations. These representations are then converted by a GAN into synthetic privacy-preserving features that are fused into a single biometric embedding. The fused representation is then encrypted and digitally signed to ensure secure storage, transmission and verification. The complete system describes the following secure mapping:

F : X \to (\hat{y}, C, S)

(22)

where

\hat{y}

denotes the predicted identityand

C

represents the encrypted feature vector. More specifically, the secure transformation

{\tilde{x}}_{m}

can be expressed as:

{\tilde{x}}_{m} = P (x_{m}), m \in \{f, r, p\}

(23)

z_{m} = f_{θ} ({\tilde{x}}_{m})

(24)

z_{m}^{'} = G_{θ_{G}} (z_{m}, ξ)

(25)

z_{fusion} = ϕ ([z_{f}^{'} ∥ z_{r}^{'} ∥ z_{p}^{'}])

(26)

\hat{y} = Softmax (W z_{fusion} + b)

(27)

C = {Enc}_{p k} (z_{fusion})

(28)

S = {Sign}_{s k} (SHA-256 (C))

(29)

Thus, the final secured output is obtained as follows:

O = (\hat{y}, C, S)

(30)

where

m \in {f, r, p}

denotes the modality index corresponding to face, retina, and fingerprint, respectively;

G_{θ_{G}}

denotes the generator network parameterized by

θ_{G}

;

ξ

is a random noise vector;

z_{m}^{'}

represents the synthetic privacy-preserving feature embedding;

∥

denotes the concatenation operator;

W

and

b

are the weight matrix and bias vector of the Softmax classifier, respectively;

C

denotes the encrypted biometric representation generated by the ElGamal encryption function

{Enc}_{p k}

using public key

p k

;

S

denotes the digital signature generated by

{Sign}_{s k}

using the private signing key

s k

;

SHA-256

is the cryptographic hash function; and

O

denotes the final secured output of the proposed framework.

Hypothesis 1:

The proposed SBDP framework enhances safe biometric authentication by maintaining identity-discriminative information, minimizing biometric privacy leakage, and assuring cryptographic secrecy and integrity.

Formally:

m a x A c c (\hat{y}, y), m i n I (X; z^{'}), and P r [Verify (C, S) = 1] \to 1

(31)

where

A c c (\hat{y}, y)

denotes recognition accuracy,

I (X; z^{'})

represents the mutual information between the original biometric input

X

and the synthetic representation

z^{'}

, and

Verify (C, S)

denotes the signature verification function.

Proof:

The OCNN extracts identity-relevant features from the preprocessed biometric input:

z = f_{θ} (P (X))

(32)

The classifier predicts the identity label

\hat{y}

which is given as follows:

\hat{y} = \arg \max Softmax (W z + b)

(33)

The purpose of classification

L_{c l s}

reduces the following:

L_{c l s} = - \sum_{i = 1}^{N} y_{i} \log ({\hat{y}}_{i})

(34)

When

L_{c l s} \to 0

, the predicted identity approaches the true class as

\hat{y} \to y

. Hence, recognition accuracy increases:

A c c (\hat{y}, y) \to 1

(35)

For privacy preservation, the GAN transforms the real feature vector

z

into a synthetic representation:

z^{'} = G (z, ξ)

(36)

The privacy objective is to reduce reconstructability of the original biometric input that is given as follows:

Where

a r g m a x

denotes the operator that returns the class index corresponding to the maximum posterior probability,

i

is the sample index,

l o g

represents the natural logarithm function,

Acc (\hat{y}, y)

denotes the recognition accuracy, and

z^{'}

represents the synthetic privacy-preserving feature representation generated by the GAN.

E [∥ X− \hat{X} ∥^{2}] \geq τ

(37)

where

\hat{X}

is the reconstructed biometric sample and

τ

denotes the minimum acceptable reconstruction error threshold. If the reconstruction error remains above

τ

, then the adversary cannot reliably recover the original biometric input from

z^{'}

. Therefore:

I (X; z^{'}) < I (X; z)

(38)

This demonstrates that the synthetic feature representation mitigates biometric information leakage. To ensure confidentiality, the fused representation is encrypted with ElGamal encryption.

C = (C_{1}, C_{2}) C_{1} = g^{k} m o d p C_{2} = z_{fusion} \cdot y^{k} m o d p

Since

k

is randomly selected for each encryption operation, the same biometric feature vector produces different ciphertexts:

{Enc}_{p k} (z_{fusion}, k_{1}) \neq {Enc}_{p k} (z_{fusion}, k_{2}), k_{1} \neq k_{2}

(39)

Thus, ciphertext pattern leakage is prevented. For integrity and authenticity, the digital signature is generated as:

S = {Sign}_{s k} (SHA-256 (C))

(40)

The verification process is determined as follows:

{Verify}_{p k} (C, S) = 1

(41)

If the ciphertext is modified from

C

to

C^{'}

, then:

SHA-256 (C^{'}) \neq SHA-256 (C)

(42)

Therefore,

{Verify}_{p k} (C^{'}, S) = 0

(43)

This confirms that tampering can be detected reliably. Hence, the proposed framework simultaneously satisfies:

A c c (\hat{y}, y) \to 1 I (X; z^{'}) \to m i n \Pr [{Verify}_{p k} (C, S) = 1] \to 1

Therefore, the hypothesis is proven. □

The computational complexity of the proposed SBDP framework is determined by the cumulative cost of OCNN feature extraction, GAN-based privacy-preserving transformation, multimodal fusion, ElGamal encryption, and SHA-256 digital signature generation. The overall complexity

O_{T o t a l}

is expressed as follows:

O_{T o t a l} = O (\sum_{l = 1}^{L} K_{l}^{2} C_{l} N_{l}^{2} + E p (N_{G} + N_{D}) + d + l o g p + n)

where

K_{l}

denotes kernel size,

C_{l}

is the number of filters,

N_{l}

represents feature map size,

E p

is the number of epochs,

N_{G}

denotes generator parameters, and

N_{D}

is the discriminator parameters.

4.2. OCNN Feature Extraction

The proposed optimized convolutional neural network consists of four convolutional blocks followed by fully connected classification layers. Each convolutional block consists of a convolutional layer, batch normalization, Rectified Linear Unit (ReLU) activation, and max-pooling operation. The first, second, third and fourth convolutional layers use 32, 64, 128 and 256 filters respectively, with a kernel size of 3 × 3 and stride of 1. Max-pooling with a pooling size of 2 × 2 is applied after each convolutional block to decrease spatial dimensionality while retaining discriminative information. We apply batch normalization after each convolutional layer to enhance the stability of the training process and speed up convergence. ReLU is selected as the activation function because of its computational efficiency and its capability of solving the vanishing gradient problem. To decrease overfitting and improve generalization performance, we add a dropout layer with a dropout rate of 0.5 before the fully connected layer.

The OCNN is optimized by Adam optimizer with learning rate of 0.001, batch size 32 and categorical cross entropy loss function. The last classification layer of the OCNN uses Softmax activation function to output probability distributions over the biometric identity classes. This optimized architecture can achieve efficient hierarchical feature extraction from multimodal biometric inputs with high recognition accuracy and stable convergence behavior.

It is worth mentioning that all competing deep learning architectures, namely CNN, ResNet-18, Vision Transformer (ViT), ConvGRU, and proposed SBDP framework, have been trained on the same multimodal biometric inputs, same preprocessing steps, similar training budgets, and comparable optimization settings. To be specific, all models used the same training and testing partitions, image normalization and augmentation procedures, Adam optimizer with a learning rate of 0.001, batch size of 32, and the same training epoch. Furthermore, the same multimodal fusion protocol was used to ensure consistency across all experiments. As such, the reported differences in performance are mainly attributed to the architectural design and privacy-preserving capabilities of the methods, rather than differences in data preparation, feature fusion strategies, or training configurations. This guarantees that the comparison is objective, reproducible, and experimentally fair.

The proposed OCNN is a light-weighted architecture based on the ResNet-18 framework with modifications for improved computational efficiency and multimodal biometric feature learning. The proposed OCNN is built of four convolutional blocks with 32, 64, 128 and 256 filters respectively, which is different from the standard ResNet-18 architecture. Each block has a convolutional layer, batch normalization, ReLU activation and max-pooling operation. The principles of residual learning are used to ensure stable gradient propagation and faster convergence of the model during the training phase. Dropout regularization with a dropout rate of 0.5 is also used before the fully connected layer to mitigate overfitting. The network is trained using the Adam optimizer with a learning rate of 0.001 and a batch size of 32. Such architectural modifications make the proposed OCNN more suitable for multimodal biometric authentication while achieving high recognition accuracy and computational efficiency.

The optimization is achieved by a combination of architectural refinement, hyperparameter tuning, regularization, and training strategy, specifically tailored for multimodal biometric authentication. The optimization comprises: (i) adopting a lightweight ResNet-18 inspired architecture with four convolutional blocks with 32, 64, 128 and 256 filters, respectively; (ii) batch normalization after each convolutional layer to stabilize training and speed up convergence; (iii) ReLU activation and max-pooling operations to enhance nonlinear feature learning and computational efficiency; (iv) dropout regularization with dropout rate of 0.5 to alleviate overfitting and enhance generalization; and (v) hyperparameter optimization using the Adam optimizer with a learning rate of 0.001, batch size of 32, and categorical cross-entropy loss. Hence, the term “optimized” pertains to the joint optimization of architecture design, training parameters, and regularization mechanisms for secure multimodal biometric feature extraction.

The optimized convolutional neural network is utilized to extract hierarchical and discriminative feature representations from preprocessed biometric data. Given a normalized input

\tilde{x}

, the OCNN learns a nonlinear transformation that maps the input space into a compact latent feature space, which refers as

z = f_{θ} (\tilde{x})

.

where

f_{θ}

denotes the OCNN parameterized by

θ

, and

z \in R^{d}

is the learned feature embedding.

The OCNN is composed of

L

stacked convolutional blocks, where each block performs feature transformation

f_{t}

as follows:

f_{t} = σ (BN (W^{(l)} \times h^{(l− 1)} + b^{(l)}))

(44)

Where

h^{(0)} = \tilde{x}

,

W^{(l)}

and

b^{(l)}

are the convolutional weights and biases,

BN

is batch normalization, and

σ

is a nonlinear activation function (e.g., ReLU).

Pooling is applied to reduce spatial dimensionality

S_{d}

.

S_{d} \leftarrow P_{p o o l} (h^{(l)})

(45)

The OCNN is trained using a classification objective that maximizes inter-class separability while minimizing intra-class variation. The predicted class probability

{P c}_{p}

is determined as follows:

{P c}_{p} = Softmax (W z + b)

(46)

The cross-entropy loss is defined as follows:

L_{c l s} = - \sum_{i = 1}^{N} y_{i} l o g ({\hat{y}}_{i})

(47)

Minimizing

L_{c l s}

enforces, which is given as follows:

z_{i} \approx z_{j} if y_{i} = y_{j}, and z_{i} \neq z_{k} if y_{i} \neq y_{k}

(48)

To improve generalization, dropout is as follows:

\tilde{z} = Dropout (z)

(49)

This reduces overfitting and enhances robustness against noisy biometric inputs.

Lemma 1:

Let

\tilde{x}

and

{\tilde{x}}^{'}

be two preprocessed inputs such that:

∥ \tilde{x} - {\tilde{x}}^{'} ∥ \leq δ

Then, the OCNN mapping satisfies:

∥ f_{θ} (\tilde{x}) - f_{θ} ({\tilde{x}}^{'}) ∥ \leq K δ

(50)

where

K > 0

is Lipschitz constantly dependent on network parameters, and

δ

denotes maximum allowable perturbation bound between the two preprocessed inputs.

Proof:

Each OCNN layer consists of convolution, batch normalization, and activation functions, all of which are Lipschitz continuous. Let the Lipschitz constants of these operations be

K_{l}

. Then, for each layer:

∥ h^{(l)} (x) - h^{(l)} (x^{'}) ∥ \leq K_{l} ∥ h^{(l− 1)} (x) - h^{(l− 1)} (x^{'}) ∥

(51)

The recursive application across

L

layers is expressed as follows:

∥ z - z^{'} ∥ \leq (\prod_{l = 1}^{L} K_{l}) ∥ \tilde{x} - {\tilde{x}}^{'} ∥

(52)

Let

K = \prod_{l = 1}^{L} K_{l}

, then:

∥ f_{θ} (\tilde{x}) - f_{θ} ({\tilde{x}}^{'}) ∥ \leq K δ

(53)

Hence, the OCNN mapping is Lipschitz continuous and stable. □

The OCNN is a key building block of the proposed framework since it projects heterogeneous biometric inputs into a common latent space. Hierarchical structure enables the extraction of low-level and high-level features, and the stability property provides robustness against noise and adversarial perturbations. In conjunction with the ensuing GAN-based transformation and cryptographic protection, OCNN provides a solid basis for secure and accurate biometric authentication.

4.3. GAN-Based Transformation

The GAN-based transformation mechanism can preserve biometric privacy by learning a robust nonlinear mapping from the original feature embeddings to the synthetic latent representations. The adversarial learning strategy guarantees discriminative biometric patterns while substantially reducing the likelihood of biometric reconstruction and identity leakage in comparison with the traditional feature perturbation approaches. During the training process, the generator is continuously improved to generate realistic synthetic embeddings that match the statistical distribution of the original latent space, while the discriminator learns to recognize the subtle differences between the real and the generated representations. This competitive optimization process drives the generator to produce highly secure synthetic biometric features that can maintain the authentication reliability without disclosing sensitive personal information. Moreover, the stochastic noise component

ξ

adds randomness to the generation process, which increases the diversity and improves the resistance against the inversion and replay attacks. Thus, the proposed transformation framework can achieve an effective trade-off between biometric recognition performance, privacy preservation and adversarial robustness for secure multimodal biometric authentication systems.

Given a latent feature vector

z \in R^{d}

extracted by the OCNN, the generator

G_{θ_{G}}

produces a synthetic embedding:

z^{'} = G_{θ_{G}} (z, ξ)

(54)

where

ξ \sim N (0, I_{k})

is a stochastic noise vector and

θ_{G}

denotes the parameters of the generator network. The objective is to ensure that

z^{'}

preserves identity-discriminative characteristics while obfuscating sensitive biometric information that could enable reconstruction of the original input.

The GAN framework consists of a generator

G_{θ_{G}}

and a discriminator

D_{θ_{D}}

, where the discriminator attempts to distinguish real embeddings from synthetic ones. The adversarial optimization is formulated as follows:

\underset{θ_{G}}{m i n} \underset{θ_{D}}{m a x} L_{G A N} = E_{z \sim p_{z}} [l o g D_{θ_{D}} (z)] + E_{z \sim p_{z}, ξ \sim p_{ξ}} [l o g (1 - D_{θ_{D}} (G_{θ_{G}} (z, ξ)))]

(55)

A modified loss function (non-saturating variant) is employed to stabilize training and enhance gradient behavior:

\underset{θ_{G}}{m i n} E_{z, ξ} [- l o g D_{θ_{D}} (G_{θ_{G}} (z, ξ))]

(56)

Training GANs is inherently difficult due to unstable optimization dynamics, vanishing gradients and mode collapse, where the generator produces only a limited diversity of outputs. To ease these challenges, the proposed framework adopts the Wasserstein generative adversarial network with gradient penalty (WGAN-GP) objective, which offers smoother gradients and improves the convergence stability during adversarial training. In WGAN-GP, the discriminator is replaced by a critic network that estimates the Wasserstein distance between the distributions of real biometric embeddings and synthetic privacy-preserving embeddings. The critic is optimized as follows:

L_{D} = E [D (G (z))] - E [D (x)] + λ_{G P} E [{({∥ \nabla_{\hat{x}} D (\hat{x}) ∥}_{2}− 1)}^{2}]

(57)

where

D

denotes the critic network,

G (z)

represents the synthetic biometric embedding generated from the latent noise vector

z

, and

x

denotes the real biometric embedding extracted by the OCNN feature extractor. The parameter

λ_{G P}

is the gradient penalty coefficient, while

\hat{x}

is sampled uniformly along the straight lines connecting real and generated embeddings.

The generator is trained to minimize the Wasserstein distance by maximizing the critic score of the samples generated. The generator loss is formulated as follows:

L_{G} = - E [D (G (z))]

(58)

It encourages the generator to synthesize privacy-preserving biometric embeddings whose distribution closely approximates that of the original biometric feature space. To preserve identity-discriminative characteristics while simultaneously enhancing privacy protection, the overall generator objective is augmented with additional regularization terms as follows:

L_{G}^{T o t a l} = L_{G} + λ_{c l s} L_{c l s} + λ_{r e c} L_{r e c}

(59)

where

L_{c l s}

denotes the classification consistency loss that preserves identity-related discriminative information,

L_{r e c}

represents the reconstruction-resistance loss that prevents recovery of the original biometric data, and

λ_{c l s}

and

λ_{r e c}

are weighting coefficients controlling the contribution of the corresponding loss components.

Furthermore, the gradient penalty term

L_{G P} = E [{({∥ \nabla_{\hat{x}} D (\hat{x}) ∥}_{2}− 1)}^{2}]

(60)

enforces the Lipschitz continuity constraint on the critic and stabilizes the adversarial optimization process. Consequently, the proposed WGAN-GP framework effectively mitigates mode collapse, improves convergence stability, and generates diverse privacy-preserving biometric embeddings suitable for secure multimodal biometric authentication.

To explicitly enforce resistance against reconstruction attacks, a reconstruction adversary

R_{θ_{R}}

is introduced, which attempts to recover the original biometric input

x

from the synthetic feature

z^{'}

. The reconstruction loss is defined as:

L_{r e c} = E_{x \sim p_{x}} [∥ x - R_{θ_{R}} (G_{θ_{G}} (f_{θ} (\tilde{x}), ξ)) ∥_{2}^{2}]

(61)

A privacy constraint is imposed such that:

L_{r e c} \geq τ,

where

τ > 0

is a predefined privacy threshold ensuring that reconstructed outputs remain significantly different from the original inputs.

To sustain classification efficacy, the synthetic representation must retain identity-relevant information. Consequently, a classification consistency loss is integrated:

L_{u t i l} = E_{z, ξ} [∥ f_{c} (G_{θ_{G}} (z, ξ)) - f_{c} (z) ∥_{2}^{2}]

(62)

where

f_{c}

denotes the classifier mapping. This ensures that the synthetic embedding

z^{'}

retains discriminative power.

The overall GAN optimization problem is formulated as follows:

\underset{θ_{G}}{m i n} \underset{θ_{D}}{m a x} L_{G A N} + λ_{1} L_{u t i l} - λ_{2} L_{r e c}

(63)

where

λ_{1}, λ_{2} > 0

are balancing coefficients controlling the trade-off between utility preservation and privacy protection.

The transformation aims to minimize the mutual information between the original biometric input

X

and the synthetic representation

Z^{'}

:

m i n I (X; Z^{'}) = m i n (H (X) - H (X∣ Z^{'}))

(64)

subject to the constraint that identity information is preserved:

I (Y; Z^{'}) \approx I (Y; Z),

where

Z^{'}

denotes informative for classification but uninformative for reconstructing the original biometric sample.

Lemma 2:

If the reconstruction loss satisfies

L_{r e c} \geq τ

, then the probability of accurately reconstructing the original biometric input from

z^{'}

is bounded by:

\Pr (∥ x - \hat{x} ∥_{2} \leq ϵ) \leq \exp (1− \frac{τ - ϵ}{σ^{2}})

(65)

where

\hat{x} = R_{θ_{R}} (z^{'})

and

σ^{2}

denotes reconstruction variance, and

ϵ

is reconstruction error.

Proof:

By applying concentration inequalities on the reconstruction error distribution and using the lower bound constraint

L_{r e c} \geq τ

, the probability of small reconstruction error decays exponentially. Hence, reconstruction becomes statistically infeasible. □

Corollary 2:

Under the above lemma, the mutual information between

X

and

Z^{'}

satisfies:

I (X; Z^{'}) \leq δ

for sufficiently large

τ

, where

δ

is a small constant.

Implication:

The synthetic representation

z^{'}

effectively eliminates sensitive biometric information while preserving identity-discriminative features.

The GAN-based transformation adds a privacy-preserving layer to the proposed framework. By integrating adversarial learning, reconstruction constraints and information-theoretic regularization, a model that strikes a balance between utility and privacy is obtained. The synthetic embeddings produced are resilient to inversion attacks and make it impossible to reverse-engineer biometric templates. This greatly increases the total security of the system, especially when combined with the subsequent encryption and digital signature processes.

4.4. Multimodal Fusion

The proposed SBDP framework is a multimodal fusion framework to improve the accuracy of biometric recognition and privacy preservation by merging discriminative representations extracted from multiple biometric modalities into a common latent feature space. Rather than using a single biometric source, the framework fuses privacy-preserving embeddings generated from face, retina, and fingerprint modalities to leverage their complementary biometric characteristics and enhance robustness against spoofing, identity leakage, and feature reconstruction attacks. The fusion mechanism allows the model to capture inter-modal correlations while maintaining secure feature abstraction through GAN-based transformation learning. The proposed framework jointly optimizes multimodal representations in a compact latent space to obtain better authentication reliability, stronger privacy preservation, and increased resistance to adversarial inference attacks in secure biometric authentication environments. Let

z_{f}^{'}, z_{r}^{'}, z_{p}^{'} \in R^{d_{m}}

denote the privacy-preserving feature embeddings generated by the GAN for each modality. The multimodal fusion is defined as:

z_{fusion} = ϕ ([z_{f}^{'} ∥ z_{r}^{'} ∥ z_{p}^{'}])

(66)

where

∥

denotes concatenation and

ϕ

is a learnable transformation function mapping the concatenated vector into a compact latent space

R^{d}

.

The transformation function

ϕ

is implemented as a nonlinear projection:

z_{fusion} = σ (W_{f} [z_{f}^{'} ∥ z_{r}^{'} ∥ z_{p}^{'}] + b_{f})

(67)

where

W_{f} \in R^{d \times 3 d_{m}}

is the fusion weight matrix,

b_{f} \in R^{d}

is the bias term, and

σ

is a nonlinear activation function.

To improve stability and representation consistency, normalization is applied as follows:

z_{fusion} \leftarrow \frac{z_{fusion}}{∥ z_{fusion} ∥_{2}}

(68)

An adaptive weighted fusion approach is presented to address modality dependability.

z_{fusion} = \sum_{m \in {f, r, p}} α_{m} z_{m}^{'}

(69)

subject to:

\sum_{m} α_{m} = 1, α_{m} \geq 0

The weights

α_{m}

can be learned dynamically based on modality quality or confidence scores. The fusion process is optimized to maximize discriminative capability:

m i n L_{f u s i o n} = \sum_{i, j} I (y_{i} = y_{j}) ∥ z_{i} - z_{j} ∥_{2}^{2} - \sum_{i, k} I (y_{i} \neq y_{k}) ∥ z_{i} - z_{k} ∥_{2}^{2}

(70)

where

I (\cdot)

denotes the indicator function.

Property 1:

The fused representation

z_{fusion}

achieves higher discriminative power and robustness than any individual modality embedding, i.e.:

E [A c c (z_{fusion})] \geq m a x (E [A c c (z_{f}^{'})], E [A c c (z_{r}^{'})], E [A c c (z_{p}^{'})])

(71)

And

∥ z_{fusion} - z_{fusion}^{(n o i s e)} ∥ \leq ϵ

for bounded noise in any single modality.

Proof:

Each modality captures distinct biometric characteristics:

▪: Face → global appearance features
▪: Retina → vascular patterns
▪: Fingerprint → ridge structures

Let the information contribution of each modality be represented as:

I (Y; z_{f}^{'}) + I (Y; z_{r}^{'}) + I (Y; z_{p}^{'}) > I (Y; z_{m}^{'})

(72)

for any single modality

m

. Since fusion aggregates these independent contributions, the mutual information between fused features and identity increases:

I (Y; z_{fusion}) \geq m a x I (Y; z_{m}^{'})

(73)

Additionally, if one modality is corrupted by noise

δ

, the remaining modalities still preserve identity information:

z_{fusion} = α_{1} z_{1}^{'} + α_{2} z_{2}^{'} + α_{3} (z_{3}^{'} + δ)

(74)

Since

α_{3} < 1

, the impact of noise is attenuated:

∥ z_{fusion} - z_{fusion}^{c l e a n} ∥ \leq α_{3} ∥ δ ∥

(75)

ϵ

denotes maximum acceptable deviation (noise bound) ensuring that the fused representation remains stable and robust.

Thus, the fused representation remains stable. □

Multimodal fusion phase exploits complementary biometric data for improving the effectiveness and robustness of proposed system. Combining multiple modalities helps to overcome the limitations of single sources thus increasing the resilience to spoofing and noise. Adaptive weighting and nonlinear transformation are used for improving the discriminative effectiveness of fused representation. This stage is vital for reliable biometric authentication, especially in real-world settings where data quality may differ across modalities. The proposed SBDP framework fuses heterogeneous biometric features from face, retina and fingerprint modalities into a single and compact latent representation through the fusion process. Each modality contributes different discriminative patterns which together improve the reliability of recognition and reduces the risk of authentication failure due to low-quality samples or partial biometric corruption. The multimodal approach also improves the resistance to impersonation and adversarial attacks as an attacker would have to compromise multiple independent biometric traits simultaneously. Besides, the fusion mechanism learns the optimal inter-modal relations in the training, which can increase feature diversity and reduce redundancy. To enhance the fused representation, the framework utilizes learnable transformation layers along with adaptive feature projection mechanisms that dynamically weigh the most informative biometric components and suppress noisy or less relevant features. The proposed nonlinear fusion operation allows the model to learn complex dependencies between modalities, which increases feature separability in the latent space and, thus, improves authentication accuracy, accelerates convergence behavior and stabilizes optimization during training. The use of privacy-preserving embeddings derived from GANs ensures the protection of sensitive biometric information and the preservation of privacy while still preserving high discriminative capability for identity verification. The multimodal fusion stage also improves the scalability and generalization ability of the proposed framework in different operating environments. In real-world biometric systems, variations in illumination, pose, sensor quality, occlusion, and environmental noise often degrade the recognition performance of a single modality. The proposed fusion strategy overcomes these limitations by enabling unaffected modalities to compensate for degraded biometric samples, thereby providing stable and reliable authentication performance. Thus, the multimodal fusion module is an integral part of the proposed SBDP architecture to realize secure, accurate, privacy-preserving, and robust biometric authentication in real-world deployment scenarios.

To provide a better understanding of the operational workflow, Algorithm 1 summarizes the full implementation procedure of the proposed secure biometric data protection framework. The algorithm combines multimodal biometric preprocessing, optimized CNN based feature extraction, privacy preserving representation learning, classification, and integrity verification into a single unified secure authentication pipeline. The stepwise formulation illustrates how the proposed framework converts raw biometric inputs into secure and verifiable biometric representations while preserving recognition accuracy and privacy protection.

Algorithm 1: Optimized CNN-based secure multimodal biometric feature extraction and encryption

1.: Initialization: { $D = {I_{f}, I_{r}, I_{p}}$ : multimodal biometric inputs; $O_{c n n}$ : optimized CNN model; $F_{i}$ : intermediate feature maps; $F$ : final feature vector/feature extraction; $P (\cdot)$ : preprocessing function; $S i g$ : SHA-256 signature; $C_{s}$ : secured output; $C$ : class label; $E_{F}$ : encrypted feature vector; $Conv (\cdot)$ : convolution; $BN (\cdot)$ : batch normalization; $σ (\cdot)$ : activation function (e.g., ReLU); $Pool (\cdot)$ : pooling; $F_{n - 1}$ : input to the current block}
2.: $Input : \{I_{f}, I_{r}, I_{p}\}$
3.: $Output : (C_{s} ∣ E_{F})$
4.: Set $(I_{i}) \leftarrow R e s i z e (I_{i}) \circ N o r m a l i z e (I_{i}) \circ A u g m e n t (I_{i}; θ_{a})$ //where: $θ_{a} = \{r o t a t i o n, s c a l i n g, f l i p p i n g, b r i g h t n e s s\}$
5.: $for each I_{i} \in D$ do//*Preprocessing
6.: $I_{i} \leftarrow P (I_{i})$
7.: $Store P (I_{i}) = A u g m e n t (N o r m a l i z e (R e s i z e (I_{i})); θ_{a})$
8.: End for
9.: $Perform F \to$ $O_{c n n}$ //Feature extraction on CNN Blocks
10.: $for each I_{i} \in D$ do
11.: $Compute F_{4} \leftarrow P o o l (σ (BN (Conv (I_{i}))))$ //Output of CNN block 1
12.: $Compute F_{8} \leftarrow P o o l (σ (BN (Conv (F_{4}))))$ /Output of CNN block 2
13.: $Compute F_{n} \leftarrow P o o l (σ ((BN (Conv (F_{4} - 1))))$ $/ Output of n$ blocks
14.: End for
15.: $Regularize F_{9} \leftarrow D r o p o u t (F_{8})$ //Regularization process
16.: $Compute F \leftarrow FC (F_{9})$
17.: $Predict C \leftarrow Softmax (F)$
18.: Encrypt $E_{F} \leftarrow ElGamal (F)$
19.: Compute $S i g \leftarrow SHA 256 (E_{F})$
20.: $Form C_{s} \leftarrow (C, E_{F}, S i g)$
21.: $Return (C_{s} ∣ E_{F})$

This Algorithm 1 provides an optimized CNN model for a secure multi-modal biometric recognition pipeline. Step 1 initializes all variables, including the biometric dataset

D = {I_{f}, I_{r}, I_{p}}

. Steps 2–3 define the input and output respectively. The input consists of face, retina, and fingerprint samples. The output is either the secured biometric class

C_{s}

or the encrypted feature vector

E_{F}

. Step 4 defines the preprocessing operation. Each biometric image is resized, normalized, and augmented using rotation, scaling, flipping, and brightness variation. Steps 5–8 apply preprocessing to every biometric sample

I_{i} \in D

. The processed image is stored as

P (I_{i})

, which improves consistency and generalization before CNN processing. Step 9 sends the preprocessed biometric features to the optimized CNN model

O_{c n n}

for feature extraction. Steps 10–14 extract features from each biometric sample using CNN blocks. The first CNN block generates

F_{4}

through convolution, batch normalization, activation, and pooling. The second CNN block generates

F_{8}

, which represents deeper biometric features. The generalized

F_{n}

represents the output of additional CNN blocks if more layers are used. Step 15 applies dropout to

F_{8}

, producing

F_{9}

. This reduces overfitting and improves model generalization. Step 16 passes

F_{9}

through a fully connected layer to compute the final feature vector

F

, which represents high-level identity-specific biometric information. Step 17 applies the SoftMax classifier to predict the biometric class label

C

. Step 18 encrypts the final feature vector

F

using ElGamal encryption to protect biometric confidentiality. Step 19 computes a SHA-256 signature from

E_{F}

to ensure data integrity and authenticity. Step 20 forms the secured biometric output

C_{s} = (C, E_{F}, S i g)

, combining the predicted class, encrypted feature vector, and signature. Step 21 returns the secured biometric output

C_{s}

or the encrypted feature vector

E_{F}

.

5. Experimental Setup and Results

This section encompasses the implementation environment and the testing outcomes derived from the completed experiments.

5.1. Experimental Setup

The experimental setup used multimodal biometric datasets with TensorFlow, Keras, and PyCryptodome in the Google Colab environment to evaluate OCNN-based feature extraction, GAN-based privacy preservation, and ElGamal cryptographic security.

5.1.1. Research Design

This study employs an experimental strategy to develop and assess a safe biometric data security system utilizing artificial intelligence and cryptography techniques. The system’s foundation is the integration of convolutional neural networks (CNNs), generative adversarial networks (GANs), and the El-Gamal cryptosystem. The amalgamation of these technologies facilitates the extraction, transformation, encryption, and authentication of biometric data in a manner that preserves privacy and ensures security. The model was evaluated utilizing publically accessible datasets including facial photos, fingerprints, and retinal scans.

5.1.2. Techniques Used

A secure system for processing and storing biometric data was developed by integrating optimal convolutional neural networks, generative adversarial networks, and the El-Gamal cryptosystem in the research. OCNN is widely employed in computer vision applications, especially in biometric identification and security purposes. The OCNN is utilized to autonomously extract and interpret complex features from images for the identification of faces, fingerprints, irises, and other unique biometric characteristics. This is especially vital due to the increasing incidence of attacks on authentication systems, requiring both high accuracy and swift data processing. The fundamental elements of an OCNN are convolutional layers that extract spatial and statistical information, and subsampling (pooling) layers that reduce dimensionality and highlight the most significant aspects of the image. Biometric systems employing OCNN offer enhanced identification precision and robustness against variations in angle, lighting, partial obstructions, and other anomalies. These characteristics make OCNNs highly proficient for real-world applications, where the quality of incoming data may fluctuate. Architectures may integrate many convolutional layers, normalization methods, dropout, and residual connections (as demonstrated by ResNet), enabling the creation of deep models while ensuring training stability. A primary advantage of OCNNs is its ability to autonomously extract features, hence obviating the need for manually designed filters. The network autonomously learns from examples, identifying optimal convolutional kernels that effectively capture the visual characteristics of a biometric object. The accuracy of the template is essential when handling biometric photographs, as it directly affects the dependability of authentication. Generative Adversarial Networks represent a significant improvement in machine learning over the past decade and are widely employed to enhance the security of biometric systems. Their main characteristic is the ability to generate synthetic data that closely resembles real data, making them very beneficial for protecting personal information. When combined with cryptography and convolutional neural networks, GANs generate innovative prospects for the advancement of secure and intelligent authentication systems. Generative Adversarial Networks depend on the competition between two neural network models: a generator and a discriminator. The generator aims to provide credible but fictitious data, while the discriminator is responsible for distinguishing between generated samples and genuine ones. During the training process, both networks augment their abilities: the generator increasingly excels at mimicking genuine characteristics, while the discriminator enhances its skill in identifying them. This dynamic enhances the GAN architecture’s efficacy in tasks necessitating high realism in generated content. In the field of biometrics, GANs enable the creation of synthetic biometric features that effectively conceal genuine data. These attributes can supplant original patterns in templates, so enhancing the system’s robustness against spoofing, reconstruction attacks, and statistical scrutiny. The original biometric vectors are not employed directly, hence reducing the risk of their compromise. The integration of generated and genuine templates significantly increases data entropy, expanding the range of possible combinations and complicating an attacker’s capacity to access the original material. The application of GANs in biometric systems strengthens the principles of differential privacy. Synthetic data can be employed for training and testing without the risk of disclosing genuine user templates, which is especially relevant when handling personal data under strict information protection regulations (such as GDPR). Modern cryptographic algorithms are crucial for ensuring cybersecurity, particularly in their application with artificial intelligence technology and biometric systems. The El-Gamal cryptosystem offers a viable alternative owing to its strong resistance to cryptographic attacks. When selecting a cryptosystem for safeguarding biometric data, it is essential to attain a balance between processing speed and security level. The El-Gamal cryptosystem demonstrates strong resilience to attacks due to the complexity of the discrete logarithm problem and enables probabilistic encryption, making it less vulnerable to ciphertext analysis. Unlike RSA, El-Gamal is more suited for biometric data processing as it avoids deterministic encryption, hence reducing the risk of template tampering. Unlike AES, El-Gamal offers superior security for data transfer without requiring a prior key exchange, which is particularly vital in decentralized biometric systems. Despite ECC being a resilient choice because to its diminished key sizes and efficiency, its implementation requires more complex mathematical operations and careful configuration. Ultimately, El-Gamal constitutes the optimal choice for biometric systems, offering a combination of reliability, versatility, and enhanced security. The integration of the El-Gamal cryptosystem with artificial intelligence and biometric technology offers new opportunities for improving cybersecurity. Adaptive key generation based on user behavior analysis enhances attack resistance, while asymmetric encryption reduces the risk of biometric data disclosure. Interactive biometric authentication, augmented by machine learning algorithms, enhances identification precision, while personalized protection strategies strengthen cybersecurity against hackers and biometric template forgery.

5.1.3. Datasets

The system was assessed utilizing three publicly accessible biometric datasets. This study utilizes the CelebA dataset https://www.kaggle.com/datasets/jessicali9530/celeba-dataset, accessed on 14 May 2026, which comprises of 202,599 facial pictures of 10,177 individuals. A selection of 10,000 aligned facial photos in .jpg format, each having a resolution of 178 × 218 pixels, was selected for this experiment. The photographs illustrate faces in diverse stances and lighting scenarios. The fingerprint dataset was sourced from the Fingerprint Feature Extraction for Biometrics initiative. This collection has more than 5000 fingerprint photos saved in .bmp format, including a resolution of 640 × 480 pixels. Fingerprints were obtained from several people utilizing inkless scanners, reflecting diverse finger placements. The iris dataset utilized in this study is the UBIRIS v2 dataset, designed for iris recognition within the visible wavelength range. The dataset comprises 11,000 photos sourced from 261 subjects, with each image saved in .jpg format at a resolution of 400 × 300 pixels. A total of 10,000 retinal pictures were randomly chosen for training and testing reasons.

The aggregate dataset for this study comprised more than 30,000 photos. Experiments were performed by randomly sampling and equilibrating the quantity of pictures across all three biometric modalities.

5.1.4. Experimental Environment and Implementation Details

All experiments were performed in the Google Colab cloud environment utilizing a T4 GPU accelerator, 12.7 GB of RAM, and 15 GB of GPU memory. The execution was conducted via Python 3. The machine learning models were developed utilizing TensorFlow and Keras libraries, whereas the cryptographic functions, encompassing encryption and digital signature creation, were executed with the PyCryptodome package. The CNN model necessitated roughly 25 min for training across 10 epochs. The GAN training was conducted for 10,000 epochs and required approximately 3 to 4 h. The feature extraction, encryption, and signature for an individual sample were accomplished in an average of 1 to 2 s.

5.1.5. Baseline Methods and Fair Experimental Comparison

The baseline methods including CNN, ResNet-18, vision transformer (ViT) and ConvGRU were trained and tested under the same experimental settings so that the evaluation can be fair and unbiased. Specifically, all methods utilized the same biometric datasets (CelebA for facial images, UBIRIS v2 for iris images, and the fingerprint dataset), the same preprocessing pipeline consisting of image resizing, normalization, and data augmentation, and the same multimodal feature fusion strategy. Furthermore, all competing methods were trained using identical training and testing splits, the Adam optimizer, a learning rate of 0.001, batch size of 32, and the same number of training epochs. Hyperparameter tuning for each baseline was performed within comparable search spaces to avoid any unfair performance advantage. In addition to generic deep learning architectures, the comparison also considers representative privacy-preserving biometric protection frameworks reported in the literature, including cancelable biometric templates, blockchain-assisted biometric authentication, homomorphic encryption-based biometric protection, and secure template transformation methods. These approaches represent important state-of-the-art biometric security paradigms because they focus on template protection, privacy preservation, encrypted-domain matching, and resistance to reconstruction attacks. Unlike these approaches, the proposed SBDP framework jointly optimizes multimodal biometric recognition, privacy-preserving synthetic feature generation, cryptographic confidentiality, and authentication integrity within a unified end-to-end architecture. Therefore, the comparative analysis evaluates not only recognition accuracy but also privacy protection capability, reconstruction resistance, encryption security, and authentication reliability under consistent experimental conditions.

5.2. Results

This section presents the experimental results of the proposed framework, encompassing biometric classification performance, GAN training behavior, encryption reliability, and digital signature verification results.

5.2.1. Performance Evaluation Metrics

The performance of the proposed secure biometric data protection framework was thoroughly evaluated using widely adopted classification metrics namely Accuracy, Precision, Recall, F1-score and area under receiver operating characteristic curve. These metrics provide a quantitative assessment of the recognition capability, discrimination power and robustness of the proposed OCNN-based multimodal biometric authentication framework. The classification accuracy is defined as follows:

Accuracy = \frac{T P + T N}{T P + T N + F P + F N}

(76)

where

T P

denotes the number of true positives,

T N

represents the number of true negatives,

F P

denotes the number of false positives, and

F N

denotes the number of false negatives.

Precision measures the proportion of correctly predicted positive samples among all predicted positive samples and is expressed as

Precision = \frac{T P}{T P + F P}

(77)

Recall, also referred to as sensitivity, evaluates the proportion of correctly identified positive samples among all actual positive samples and is computed as follows:

Recall = \frac{T P}{T P + F N}

(78)

The F1-score is the harmonic mean of Precision and Recall, providing a balanced measure of classification performance, particularly in the presence of class imbalance. It is defined as follows:

F 1 - score = \frac{2 \times Precision \times Recall}{Precision + Recall}

(79)

The area under the receiver operating characteristic curve (AUC) is employed to evaluate the discriminative capability of the biometric classifier across different decision thresholds. It is given by

AUC = \int_{0}^{1} T P R (F P R) d (F P R)

(80)

where

T P R

denotes the True Positive Rate and

F P R

denotes the False Positive Rate. The True Positive Rate is defined as follows:

T P R = \frac{T P}{T P + F N}

(81)

while the False Positive Rate is expressed as follows:

F P R = \frac{F P}{F P + T N}

(82)

A higher AUC value indicates superior discrimination capability and enhanced robustness of the biometric authentication system in distinguishing legitimate users from impostors.

These evaluation metrics are extensively employed in biometric authentication and machine learning literature because they jointly assess classification accuracy, sensitivity, predictive reliability, and the capability of the proposed SBDP framework to minimize false acceptance and false rejection errors. Consequently, they provide a comprehensive and reliable evaluation of the effectiveness of the proposed privacy-preserving multimodal biometric authentication framework.

5.2.2. Training Accuray

The proposed SBDP framework is compared with several other baseline deep learning architectures in terms of training accuracy over ten training epochs, as shown in Figure 3a. The result illustrates the convergence behavior, learning efficiency and classification capability of each model during multimodal biometric recognition training. The higher accuracy values reflect better feature extraction capability, stronger learning stability and enhanced biometric discrimination performance.

The conventional CNN model showed a slow rise in the training accuracy from around 82.4% in the first epoch to almost 97.1% in the tenth epoch. The consistent improvement verifies the capacity of CNNs to learn hierarchical biometric features successfully. But the relatively slow convergence rate and lower final accuracy score imply constraints in dealing with very complex multimodal biometric patterns and privacy-preserving feature learning. The ResNet architecture achieved better convergence than the traditional CNN model. The training accuracy increased from about 84.1% in the first epoch to about 97.8% in the last epoch. The residual learning mechanism was successfully applied to improve the feature propagation as well as to reduce the optimization difficulties in the deeper layers, thus improving the classification stability and the overall recognition performance. The Vision Transformer model exhibited a progressive learning behavior, with the training accuracy increasing from about 79.5% to almost 97.4% across the training epochs. The transformer attention mechanism enabled efficient modeling of long-range feature dependencies and contextual biometric information. However, the model showed slower convergence during the initial training stages despite good final performance, as transformer architectures generally demand higher optimization capacity and training iterations. During the training process, the ConvGRU framework exhibited stable and consistent learning behavior. The accuracy increased from about 81.8% for the first epoch to near 97.0% for the last epoch. The incorporation of gated recurrent units enhanced the temporal feature consistency and sequential representation learning. However, its performance was marginally inferior to ResNet, given that the recurrent mechanisms introduced additional computational dependencies that could influence the efficiency of the optimization.

The proposed SBDP framework outperformed all baseline methods consistently during the training process. The model achieved a training accuracy of ~88.6% in the first epoch and quickly converged to nearly ~99.8% in the last epoch. The efficiency of the proposed unified framework combining optimized CNN-based multimodal feature extraction, GAN-based privacy-preserving representation learning and cryptographic security schemes is demonstrated by the improved convergence speed and much better final accuracy. The smooth learning curve further validates the stability, scalability and high discriminatory power of the proposed system for secure biometric recognition and authentication applications.

5.2.3. Precision–Recall–F1 Comparison

Figure 3b shows the comparative performance of Precision, Recall and F1-Score of the proposed SBDP framework with different baseline deep learning architectures in multimodal biometric recognition. These performance metrics are used to evaluate the effectiveness of each model in correctly identifying biometric identities while minimizing misclassification errors. Higher values of these metrics indicate better recognition reliability, improved feature discrimination and higher robustness against misclassification. The traditional CNN model achieved precision of 96.8%, recall of 96.5% and F1-score of 96.6%. These results indicate the CNN’s ability in efficiently extracting hierarchical biometric features; however, its relatively lower performance also indicates its limited ability in handling highly complex multimodal biometric relationships and privacy-preserving representations. The Resnet architecture achieved better performance with the precision of 97.3%, recall of 97.1% and F1-score of 97.2%. The residual learning technique enables deeper feature extraction and better gradient flow, thus yielding better classification stability and improved recognition accuracy across multiple biometric modalities. The Vision Transformer model produced a precision of 97.0%, recall of 96.8% and F1-score of 96.9%. The attention-based transformer architecture efficiently captures global feature dependencies and contextual relationships among biometric patterns. However, its performance is slightly worse than ResNet, because transformer-based models usually require more optimization and larger-scale training data. The ConvGRU framework achieved 96.7% precision, 96.5% recall, and 96.6% F1-score. Gated recurrent units contribute to better sequential feature consistency and temporal dependency modeling. However, the model shows relatively lower discriminative ability when dealing with highly heterogeneous multimodal biometric representations. The proposed SBDP framework achieved the best performance in all evaluation metrics, with 99.7% precision, 99.6% recall, and 99.6% F1-score. These results demonstrate the efficacy of combining optimized CNN-based feature extraction, GAN-based privacy-preserving synthetic representation learning and ElGamal cryptographic protection in a unified secure biometric architecture. The persistently superior performance validates that the proposed framework provides highly accurate biometric recognition, while strongly preserving privacy, validating integrity and resisting identity forgery and reconstruction attacks.

5.2.4. Multimodal Biometric Accuracy Comparison

Figure 3c shows the biometric authentication accuracy performance of different deep learning architectures on different biometric modalities, such as Face Only, Retina Only, Fingerprint Only, Face + Retina, and the complete Proposed SBDP framework. The evaluated models are the CNN, ResNet, Vision Transformer, ConvGRU, and the proposed SBDP approach. The y-axis is authentication accuracy in percentage, and the x-axis is biometric modalities used for evaluation. For the Face Only modality, the CNN model achieved approximately 96.8% accuracy, and the ResNet further improved the performance to approximately 97.0% due to its residual feature learning capability. The Vision Transformer achieved approximately 96.6%, which is slightly lower than other methods under unimodal facial recognition. ConvGRU achieved good results with an accuracy of almost 96.9%, owing to its ability to model sequential dependency of features. The proposed SBDP framework outperformed all comparative models with an accuracy of about 97.6%, highlighting the importance of the proposed feature representation and secure biometric learning capability. In the iris, only modality, all models experienced an increase in the recognition accuracy due to the uniqueness of the iris vascular patterns. The accuracy of CNN is about 97.1%, ResNet is about 97.4%, Vision Transformer is about 97.0% and ConvGRU is about 97.3%. The proposed SBDP framework outperformed with an accuracy of about 98.0%, indicating the effectiveness of optimized multimodal feature extraction and adaptive learning strategies. For the Fingerprint Only modality, the authentication accuracy was further improved due to the high discriminative nature of the biometric features of the fingerprint ridge structures. The accuracy obtained for CNN was around 97.4%, for ResNet was around 97.8%, for Vision Transformer was around 97.3%, and for ConvGRU was around 97.6%. The proposed SBDP framework again obtained the best performance with an accuracy of around 98.3%, suggesting a strong feature learning and improved recognition stability. In the Face + Retina multimodal configuration, the fusion of multiple biometric sources greatly improved the reliability of authentication and lowered the recognition ambiguity. CNN achieved around 98.0%, ResNet around 98.3%, Vision Transformer around 97.9%, and ConvGRU around 98.1% accuracy. The proposed SBDP framework achieved around 98.9% accuracy, indicating the effectiveness of multimodal biometric fusion and privacy-aware representation learning. Ultimately, the entire Proposed SBDP multimodal framework delivered the best recognition performance among all studied methods. CNN achieved approximately 98.5% accuracy, ResNet achieved approximately 98.8%, Vision Transformer achieved approximately 98.4% and ConvGRU achieved approximately 98.6%. The Proposed SBDP framework achieved the highest overall accuracy of approximately 99.4%, demonstrating the superiority of the proposed secure biometric data protection system. the results show that the proposed SBDP framework consistently outperforms traditional deep learning architectures across all biometric modalities. The proposed framework can significantly improve the authentication accuracy, robustness and secure biometric recognition capabilities via multimodal biometric fusion, adaptive learning, GAN-based privacy preservation and secure cryptographic protection, and thus has a wide prospect of practical application in the real-world biometric security applications.

5.2.5. Encryption Efficiency and Security Analysis

Figure 3d depicts the encryption efficiency and security robustness comparison graph among different cryptographic techniques like AES, RSA, ECC and the proposed SBDP framework. It compares two critical performance metrics namely the encryption time in milliseconds and the overall security score in percentage. Less encryption time means better computational efficiency and faster processing capabilities. Higher security means the product protects better against cryptographic attacks, data leakage and unauthorized access. The AES encryption method had the minimum encryption time of ~11 ms, which demonstrated great computational efficiency and fast processing ability. However, its security score was maintained at 88%, indicating that AES had a good symmetric encryption performance but relatively lower robustness against advanced biometric privacy threats and multimodal reconstruction attacks considered in the proposed framework. The maximum time for encryption was observed for RSA algorithm which was ~29 ms. This was due to the additional computation overhead of asymmetric key generation and large integer arithmetic. The RSA, with its higher processing cost, achieved a security score of 91% which was better than AES in terms of cryptographic protection. However, the high computation complexity may limit its applicability to real time multimodal biometric systems with large scale authentication requirements. The ECC-based method achieved a relatively balanced tradeoff between efficiency and security. Encryption time was minimized to 18 ms and security score was maximized to 94%. ECC provides better security level with smaller key length than RSA based encryption schemes due to reduced computational complexity of elliptic curve operations. The proposed SBDP framework achieved a maximum overall security score of 99% with a relatively low encryption time of 15 ms. The results confirm the effectiveness of the proposed solution of combining optimal multimodal biometric representation learning with ElGamal cryptographic safeguards and secure privacy-preserving solutions. The proposed framework provides significant security robustness improvements over the traditional cryptographic solutions and reasonable computational efficiency for real-time biometric authentication systems. In summary, the results support that the proposed SBDP framework achieves a good trade-off between encryption efficiency and security strength, making it a good candidate for secure multimodal biometric protection, privacy-preserving authentication and real-time identity verification applications.

5.2.6. ROC-AUC Performance Comparison

The comparative performance evaluation of the proposed SBDP framework against several baseline deep learning architectures applied in biometric recognition and security analysis is illustrated in Figure 4a. The Area Under the Curve (AUC) is a measure of the model’s ability to distinguish between genuine and fraudulent biometric samples. The higher the AUC score, the better the classification power, the higher the robustness and the lower the false positive and false negative rates. The traditional CNN model achieved an AUC of 0.972, demonstrating good recognition abilities via its hierarchical feature extraction. However, its relatively lower score indicates limitations in effectively capturing more complex multimodal biometric interdependencies and security-aware representations. The ResNet architecture achieved an AUC value of 0.981, which is better than the standard CNN because of its residual learning mechanism that improves gradient propagation and allows learning of deeper feature representations. Such improvement indicates better discrimination ability and more stability during the biometric classification. Vision Transformer model achieved an AUC score of 0.978, showing competitive performance through attention-based global feature modeling. Although the transformer architecture captures long-range feature dependencies effectively, its performance is slightly lower than ResNet because of higher computational complexity and dependency on large-scale training optimization. ConvGRU-based framework achieved an AUC score of 0.975 and showed stable sequential and spatial learning capability. The recurrent gated units integration improves temporal feature consistency, but the model shows slightly lower discrimination performance compared to ResNet and Vision Transformer approaches. The proposed SBDP framework achieved the highest AUC of 0.998, significantly outperforming all baseline methods. Such a superior AUC performance shows the effectiveness of integrating the optimized CNN-based multimodal feature extraction, GAN-based privacy-preserving representation learning, and ElGamal cryptographic protection into a unified framework. The AUC score confirms the framework’s ability to accurately distinguish biometric identities while simultaneously achieving privacy preservation, reconstruction attack resistance, and robust security performance.

5.2.7. Privacy Leakage Resistance Comparison

Figure 4b compares the resistance to privacy leakage of different biometric learning and protection frameworks, including CNN-only, ResNet, ConvGRU, GAN-based current methods, and the proposed SBDP framework. The result is a leakage probability which is defined as a probability of leakage of sensitive biometric information that can be reconstructed, inferred, or disclosed during storage, transmission, or authentication procedures. Smaller leakage probability values are desirable implying better privacy protection and higher resistance against biometric reconstruction and inversion attacks. As shown in Figure 3b, the CNN-only framework reaches the highest leakage probability of about 22%, which indicates that traditional convolutional feature extraction cannot protect sensitive biometric representations from advanced inference and reconstruction attacks. However, due to the lack of specific privacy-preserving mechanisms, the high recognition ability of CNNs also leads to an increase in information exposure risk. The ResNet-based framework reduced the leakage probability to almost 18% due to its deeper residual feature learning ability and better feature abstraction. Residual architecture improves the quality of representation and reduces the exposure of some low-level features, yet the framework still lacks explicit privacy-preserving transformation mechanisms for secure biometric protection. The ConvGRU based method achieved leakage probability of about 15%, which was better than CNN-only and ResNet models for privacy protection. The use of recurrent gated units improves the modeling of temporal feature dependencies and mitigates the threat of direct reconstruction; however, the lack of synthetic privacy-preserving representation generation limits its ability to fully protect biometric information. The present GAN-based framework further reduced the leakage probability to almost 11%. The generative adversarial mechanism improves privacy protection by transforming biometric representations into synthetic latent spaces that are more resistant to inversion attacks. However, existing GAN-based approaches may still retain partial structural biometric information that can potentially be exploited under sophisticated reconstruction attempts. The proposed SBDP framework achieved the least leakage probability of only 3% which is a significant improvement over all baseline methods. This significant reduction demonstrates the effectiveness of the integration of optimized CNN based multimodal feature extraction, GAN driven privacy-preserving transformation, ElGamal encryption and SHA-256 based integrity protection in a unified secure biometric architecture. The proposed framework effectively minimizes the exposure of sensitive information while preserving recognition accuracy and authentication reliability. Overall, the results confirm that the proposed SBDP framework provides superior privacy preservation and strong resistance against biometric leakage, reconstruction and identity inference attacks, making it highly suitable for secure multimodal biometric authentication and privacy-sensitive real-world applications.

5.2.8. Computational Efficiency Comparison

The performance comparison of different deep learning frameworks for multimodal biometric recognition and secure authentication in terms of computational efficiency is shown in Figure 4c. The efficiency is measured by the average processing time (in seconds per sample) for each biometric sample using CNN, ResNet, Vision Transformer, ConvGRU and the proposed SBDP framework. The lesser processing time, the higher the computational efficiency and the higher the potential for real-time execution, which is critical for practical biometric authentication systems. The standard CNN model showed the minimum processing time of around 0.9 s/sample, which indicates its good computational efficiency and lightweight feature extraction ability. In general, CNN architecture is not as computationally intensive as deeper or transformer-based networks, thus are suited for applications requiring fast inference. But the reduction in computational complexity could limit the capacity to learn high-level multimodal biometric representations. The processing time of ResNet framework was found to be nearly 1.2 s/sample which is due to the increased depth and residual learning operations in the network architecture. Although the residual connections improve the feature learning and recognition accuracy significantly, the deeper architecture adds additional computational overhead during training and inference processes. The Vision Transformer model produced the highest processing cost, approximately 1.5 s/sample. Transformer-based architecture heavily rely on self-attention mechanisms and global feature modeling, which greatly escalate the computational complexity and memory consumption. While effective in learning contextual features, the increased processing requirements may limit scalability and efficiency in real-time deployment. The ConvGRU framework achieved a processing time of ~1.1 s/sample which presented a balanced tradeoff between efficiency and sequential learning capability. The integration of gated recurrent units improves temporal dependency modeling while maintaining moderate computational overhead compared with transformer-based architectures. The proposed SBDP framework achieved a processing time of approximately 1.4 s/sample, slightly higher than conventional CNN and ResNet models due to the integration of multimodal fusion, GAN-based privacy-preserving transformation, cryptographic protection, and secure authentication operations. Despite the additional computational overhead, the proposed framework provides substantially stronger recognition accuracy, privacy preservation, and security robustness compared with baseline approaches. The results demonstrate that although the proposed SBDP framework introduces moderate computational complexity, the additional processing cost is justified by the significant improvements in biometric recognition accuracy, privacy leakage resistance, cryptographic security, and multimodal authentication reliability.

5.2.9. Comparative Training Loss Analysis

Figure 5a shows the comparison of training loss analysis of CNN, ResNet, Vision Transformer, ConvGRU, and proposed SBDP framework in the multimodal biometric learning process. The graph compares the convergence behavior and optimization efficiency of each model with over 10,000 training iterations. The x-axis indicates the number of training iterations, and the y-axis indicates the corresponding loss values generated during the model training. The lower loss values indicate better convergence performance and the improved feature learning capability and the enhanced optimization stability. The CNN model initially started with a loss value of nearly 1.75 and gradually decreased with the passage of the training. Although the CNN architecture demonstrated stable convergence behavior, its loss reduction remained slower than the proposed approach because conventional convolutional structures have limited capability in capturing complex multimodal biometric dependencies. The ResNet model converges better than CNN because the residual learning connections make the gradient flow easier during optimization. The loss value decreases obviously from about 1.60 to almost 0.35, which indicates that the features are extracted more efficiently and the learning is more stable in the training process. The highest initial loss value of about 1.85 was observed for the Vision Transformer architecture since the transformer-based attention mechanisms require higher optimization complexity during the early stages of the learning process. Although the loss decreased gradually with increasing iterations, the Vision Transformer had relatively higher final loss values than the proposed framework, as the global attention operations are computationally expensive.

The ConvGRU model demonstrated moderate convergence performance with smoother optimization behavior. The recurrent gated learning mechanism enabled effective sequential feature dependency modeling, resulting in gradual reduction of the loss values from approximately 1.65 to around 0.38. This indicates improved temporal feature learning compared with traditional CNN architectures. The proposed SBDP framework achieved the best convergence performance among all evaluated methods. The training loss quickly dropped from around 1.28 to around 0.18, which is much better than CNN, ResNet, Vision Transformer and ConvGRU models. The proposed framework converged faster in the early stage of training and maintained highly stable optimization behavior in the later iterations. This superior performance is achieved by optimized multimodal biometric fusion, adaptive deep feature representation learning, GAN-based secure transformation mechanisms and enhanced privacy-preserving optimization strategies. Figure 4a demonstrates that the proposed SBDP framework provides faster convergence speed, lower final training loss, improved learning stability, and superior optimization efficiency compared with existing deep learning approaches. The reduced loss values confirm the capability of the proposed framework to learn highly discriminative and secure biometric representations for robust multimodal biometric authentication systems.

5.2.10. Comparative Generator Loss Analysis

Figure 5b shows the comparative generator loss analysis of CNN, ResNet, Vision Transformer, ConvGRU, and the proposed SBDP framework in the GAN-based multimodal biometric training process. It measures the generator optimization behavior during 10,000 training iterations. The x-axis is training iterations and y-axis is generator loss values. The generator loss shows the model’s capability to generate realistic and privacy-preserving synthetic biometric representations under the adversarial training framework. CNN-based generators initially showed moderate loss growth behavior with the generator loss rapidly increasing from around 0.8 and finally stabilizing at approximately 3.9 after several training iterations. Although CNN learned biometric feature distributions, the limited feature abstraction capability of CNN led to comparatively higher final generator loss values. The ResNet model showed better generator stability since the residual connections helped to propagate the gradients more smoothly during adversarial optimization. The generator loss gradually increased and converged around 3.7 which indicated the better synthetic feature learning compared with the conventional CNN architecture. The Vision Transformer model produced the highest generator loss values among all compared approaches. The loss increased steadily beyond 4.0, reflecting the complexity of global self-attention optimization in adversarial environments. While Vision Transformer well represented long-range feature dependencies, the optimization was still computationally expensive and unstable in GAN training. The introduced ConvGRU framework realized a relatively balanced generator learning behavior with recurrent gated feature modeling, and the generator loss converged at around 3.8, exhibiting better temporal feature consistency and stable adversarial learning performance than CNN and Vision Transformer models. The proposed SBDP framework realized the lowest and most stable generator loss over the whole training process, where the generator loss started from around 0.45, and gradually stabilized around 3.5, outperforming all the competing methods. The lower generator loss implies that the proposed framework can produce highly realistic, privacy-preserving, and discriminative biometric representations while ensuring stable adversarial optimization. This superior performance is primarily attributed to the optimized multimodal feature fusion, adaptive GAN-based transformation learning, secure biometric representation encoding, and efficient optimization strategies integrated into the proposed SBDP architecture. Overall, the results presented in Figure 4b show that the proposed SBDP framework has better generator learning stability, better adversarial convergence behavior, and better synthetic biometric generation ability compared to CNN, ResNet, Vision Transformer and ConvGRU. The decrease in generator loss confirms the effectiveness of the proposed framework in secure and privacy-preserving multimodal biometric authentication applications.

5.2.11. Comparative Discriminator Loss Analysis

Figure 5c shows the comparative analysis of the discriminator loss of CNN, ResNet, Vision Transformer, ConvGRU, and the proposed SBDP framework under GAN-based multimodal biometric training. The result analyzes the discriminator component’s capability to discriminate between the original and synthetically generated biometric representations over the training of 10,000 iterations. The x-axis denotes the training iterations, and the y-axis shows the discriminator loss values. The lower values of discriminator loss indicate better adversarial learning stability, better discrimination capability, and better optimization efficiency. The CNN-based discriminator started with a relatively high loss value of about 1.60, which gradually decreased during training and stabilized around 0.40. Even though CNN architecture is able to learn discriminative biometric patterns, it converges relatively slow, as traditional convolutional structures have a limited capacity in modeling highly complex multimodal relationships. ResNet model exhibited better convergence behavior than CNN by its residual learning architecture which allows efficient gradient propagation during optimization. The discriminator loss consistently decreased from around 1.42 to about 0.30 showing more stable adversarial learning and better feature discrimination performance. Among all the compared methods, the discriminator loss of Vision Transformer architecture was the highest. The loss was initially above 1.65 and gradually converged to around 0.35 after a long period of training iterations. Although Vision Transformer architecture can effectively capture the global feature dependence via self-attention mechanisms, the optimization process was still computationally complex during the training of the discriminator. The ConvGRU model achieved balanced discriminator learning behavior with gradual reduction in loss values from approximately 1.48 to around 0.33. The recurrent gated learning mechanism enabled efficient temporal dependency modeling and smoother adversarial convergence compared with conventional CNN-based approaches.

The proposed SBDP framework achieved the lowest discriminator loss and the fastest convergence performance throughout the entire training process. The discriminator loss decreased rapidly from approximately 1.18 to about 0.18, which was significantly better than the CNN, ResNet, Vision Transformer, and ConvGRU models. Lower loss values indicate that the proposed framework effectively distinguishes the real and generated biometric samples and maintains the adversarial optimization very stable. This superior performance is due to the optimized multimodal biometric fusion, secure GAN-based transformation learning, adaptive privacy-preserving feature encoding, and improved optimization methods used in the proposed SBDP architecture. Figure 4c shows that the proposed SBDP framework achieves better discriminator convergence behavior, better adversarial learning stability, and better biometric discrimination capability than existing deep learning methods. The significantly lower discriminator loss confirms the efficiency of the proposed framework for secure, reliable, and privacy-preserving multimodal biometric authentication systems.

5.2.12. Privacy Risk and Attack Resistance Analysis

Figure 6a shows the relative reconstruction risk of different biometric protection methods including Traditional Features, Cancelable Templates, Encrypted Templates and proposed SBDP framework. The reconstruction risk is described as the probability of reconstructing the original biometric information from the stored biometric template or feature representation. The lower the reconstruction risk, the higher the privacy protection and the better the resistance to the biometric reconstruction attacks. The results indicate that Traditional Features have the highest reconstruction risk of 85% which means the raw or directly extracted biometric features are highly susceptible to reconstruction attacks. This means that if an attacker gains access to these features, the original biometric information can be reconstructed with high accuracy. This shows the shortcomings of traditional biometric storage techniques storing the biometric templates without sufficient privacy-preserving transformations.

The use of Cancelable Templates reduces the reconstruction risk to 55%. Although cancelable biometrics introduce irreversible transformations to the biometric templates, a moderate level of reconstruction vulnerability still remains. Similarly, Encrypted Templates have a lower reconstruction risk of 35% which shows the benefits of encryption for template security. But, encryption does not fully prevent the reconstruction of the template if the protected representation is compromised.

In contrast, the proposed SBDP framework achieves the lowest reconstruction risk of only 10%, showing significant improvement over existing methods. This is primarily because of the GAN-based synthetic biometric embedding generation mechanism. Instead of storing the original biometric features, the proposed framework stores privacy-preserving synthetic embeddings generated from latent feature representations. Consequently, the relationship between the stored representation and the original biometric information becomes highly non-linear and difficult to reverse. The results indicate that the proposed framework significantly minimizes the risk of reconstructing the original biometric data and thereby enhances privacy preservation.

Figure 6b presents a comparison of the relative privacy leakage among different biometric feature protection approaches. Privacy leakage is the amount of sensitive biometric information that can be potentially leaked or inferred from the stored templates or feature representations. Smaller privacy leakage values mean stronger protection against unauthorized disclosure of biometric information. The result demonstrates that CNN Features exhibit the highest privacy leakage level of 80%. Although CNN-based features provide excellent discriminative capability for biometric recognition, they still retain a significant amount of identity-related information that may be exploited by adversaries. Consequently, directly storing CNN features poses substantial privacy risks.

The use of Feature Hashing reduces privacy leakage to 45%, as the hashing process partially obscures the original biometric information. Feature hashing, however, may still retain correlations that can possibly leak sensitive information under sophisticated attacks. GAN Synthetic Features achieve a much lower privacy leakage of 18%. This result shows effectiveness of adversarial feature transformation in terms of information disclosure. By generating synthetic biometric embeddings instead of storing the original features, GAN-based representations hide sensitive biometric features while preserving discriminative information needed for authentication. The proposed SBDP framework achieves the lowest privacy leakage value of only 8%. The results show that the combination of the GAN-based privacy-preserving transformation and the ElGamal encryption with the SHA-256 integrity verification can reduce the amount of leaked information significantly and thus has the potential to be used in the privacy-preserving biometric systems. The generated synthetic embedding can provide privacy protection by avoiding the exposure of the original biometric information directly and meanwhile keep the authentication performance. Therefore, the proposed framework significantly improves privacy preservation compared to conventional feature extraction and template protection methods.

The expected resistance of the proposed SBDP framework against different privacy attacks, i.e., Reconstruction Attack, Model Inversion Attack, Replay Attack, and Privacy Leakage Attack is shown in Figure 5c. The resistance level reflects the theoretical ability of the framework to resist the attempts for compromising biometric privacy. As shown in the figure, the proposed framework provides 96% resistance against Reconstruction Attacks. The significant resistance is attributed to the GAN-based synthetic embedding generation process that eliminates any direct reconstructive information from the stored biometric templates. As a result, attackers cannot easily recover the original biometric samples from the synthetic representations. For model Inversion attacks, the proposed framework achieves a resistance level of 94%. Model inversion attacks attempt to reconstruct sensitive input data from the learned model or feature representation. The addition of stochastic noise to the process of adversarial feature generation and the use of synthetic latent embeddings decreases the amount of information an attacker can exploit, improving inversion resistance. The highest resistance is against Replay Attacks with a resistance level of 98%. Replay attacks mean reuse of previously intercepted biometric templates or authentication messages to get unauthorized access. ElGamal probabilistic encryption and SHA-256 integrity verification together bring randomness and cryptographic protection that do not allow replayed biometric templates to be accepted by the authentication system. Further, the proposed framework offers 95% resistance to Privacy Leakage Attacks, indicating strong protection against leakage of biometric data. The privacy-preserving synthetic representation generated by the GAN reduces identity leakage, and cryptographic protection guarantees confidentiality and integrity during storage and transmission. Table 2 presents a study of conceptual privacy preservation and attack resistance according to the proposed SBDP framework. Appendix A provides performance comparison across deep learning models.

6. Discussion of Results

Experimental results indicate that the proposed Secure Biometric Data Protection framework surpasses the state-of-the-art deep learning and biometric protection methods in terms of biometric recognition accuracy, privacy preservation, cryptographic security, and adversarial robustness. The combination of optimized CNN-based multimodal feature extraction, GAN-based synthetic biometric transformation, and ElGamal cryptographic protection enables the framework to efficiently overcome the major limitations of traditional biometric authentication systems. The training accuracy analysis confirmed that the proposed framework converges faster and achieves higher recognition accuracy than CNN, ResNet, Vision Transformer, and ConvGRU architectures. The proposed SBDP framework achieved an average classification accuracy of around 99.8%, which shows that the multimodal fusion of face, retina, and fingerprint modalities can significantly improve the identity discrimination and authentication reliability. The improved convergence behavior further shows the stability of the proposed optimization strategy and the effectiveness of adaptive multimodal feature learning. Furthermore, the evaluation metrics of precision, recall, F1-score, and ROC-AUC reveal the discriminative capability of the proposed framework. The proposed framework outperforms baseline methods in all evaluation metrics and in false positive and false negative rates. In particular, the ROC-AUC score of approximately 0.998 confirms the ability of the proposed architecture to correctly discriminate authentic biometric identities from fraudulent or manipulated samples under secure multimodal authentication settings. The results show that the integration of optimized deep feature learning and privacy-preserving representation transformation improves both recognition robustness and classification stability. The GAN-based privacy-preserving transformation mechanism showed strong resistance to biometric reconstruction and information leakage attacks. The proposed framework can obtain a significant reduction of the biometric leakage probability to nearly 3%, which is superior to CNN-only, ResNet, ConvGRU, and existing GAN-based methods. The adversarial learning strategy can effectively transfer the biometric features into synthetic representations while preserving the identity-discriminative characteristics. This confirms that the proposed framework can simultaneously maintain authentication accuracy and privacy protection without directly exposing sensitive biometric templates. The cryptographic analysis indicates that the proposed SBDP framework offers an optimal trade-off between encryption efficiency and security strength. Though the framework incurs slightly more computational overhead due to multimodal fusion, GAN-based transformation, ElGamal encryption, and SHA-256 signature generation, the attained security score of about 99% confirms the effectiveness of the proposed secure authentication pipeline. The probabilistic property of ElGamal encryption also increases the resilience against ciphertext analysis and replay attacks, and SHA-256-based verification ensures the integrity and authenticity of data during the storage and transmission of biometric data. The stability of the proposed adversarial learning framework was further confirmed by the training, generator, and discriminator loss analyses. The proposed model exhibited lower and more stable convergence curves compared to the baseline methods, suggesting efficient optimization and well-balanced GAN training behavior. The reduction of the generator and discriminator losses indicates that the proposed architecture can produce realistic privacy-preserving biometric representations, achieve stable adversarial learning, and minimize reconstruction risks. Despite the promising performance, the proposed framework still has some minor limitations. The integration of multimodal feature extraction, GAN-based transformation, encryption, and signature verification introduces moderate computational overhead compared with lightweight unimodal biometric systems. In addition, the current framework was evaluated using face, retina, and fingerprint modalities only, and performance may vary when additional biometric traits or extremely large-scale real-world deployments are considered. Furthermore, GAN-based adversarial training requires careful parameter tuning to maintain stable optimization under different dataset distributions. However, these limitations are relatively minor compared with the substantial improvements achieved in biometric recognition accuracy, privacy preservation, cryptographic protection, and secure authentication reliability. A limitation of the present study is that the facial, fingerprint, and iris modalities are obtained from independent public datasets and therefore do not represent naturally paired multimodal biometric identities. Although the representation-level fusion strategy enables evaluation of the proposed privacy-preserving and cryptographic framework, future work will validate the proposed SBDP framework using a real multimodal biometric dataset containing multiple biometric traits acquired from the same individuals to further investigate identity-level multimodal authentication performance.

7. Conclusions and Future Work

This section concludes entire article and provides future directions.

7.1. Conclusions

In this paper, we propose a new framework for biometric data security which combines optimized CNN-based multimodal feature extraction, GAN-based privacy-preserving transformation, ElGamal encryption, and SHA-256 digital signature verification in a unified secure biometric authentication system. The framework effectively resolves the main challenges of traditional biometric systems, such as biometric leakage, spoofing, reconstruction attacks, and insecure transmission of biometric data. Experimental results indicated that the proposed SBDP framework consistently outperformed CNN, ResNet, Vision Transformer, and ConvGRU models across all evaluation metrics. The framework achieved about 99.8% classification accuracy, 99.7% precision, 99.6% recall, 99.6% F1-score, and an almost 0.998 ROC-AUC score. The effectiveness of the proposed framework was validated by achieving about 99.4% authentication accuracy for multimodal biometric evaluation by efficiently fusing face, retina, and fingerprint modalities. The privacy-preserving mechanism based on GAN reduced the chance of biometric leakage to nearly 3%, whereas the combined ElGamal encryption and SHA-256 verification provided high confidentiality, integrity, and authentication reliability. Although the framework induced moderate computational overhead, the additional processing cost remained practical considering the significant improvements in biometric security, privacy preservation, and recognition performance. In summary, the proposed SBDP framework offers a robust, accurate and secure solution for real-world biometric authentication and identity protection applications.

7.2. Future Work

Future work includes reducing computational complexity and improving efficiency for real-time deployment on edge devices, IoT systems, and mobile biometric platforms. Authentication can be enhanced in terms of robustness and scalability by incorporating other biometric modalities such as iris, voice, palm vein, and behavioral biometrics. Future research can further explore the quantum-resilient cryptographic techniques, the blockchain-assisted biometric management, and the federated learning-based secure biometric authentication. Moreover, the sophisticated adversarial defense strategies and the explainable AI techniques can be integrated to enhance the robustness against the spoofing, deepfake, and reconstruction attacks and to increase the transparency and trustworthiness of the biometric decision-making systems.

Author Contributions

Conceptualization, A.R. and T.C.; methodology, S.T. and A.R.; software, A.R., Z.T. and Y.C.; validation, S.T., A.R., Z.T. and D.S.M.H.; formal analysis, A.R. and D.S.M.H.; investigation, A.R.; resources, D.S.M.H.; data curation, S.T., T.C. and A.R.; writing—original draft preparation, A.R., T.C. and Y.C.; writing—review and editing, S.T., A.R., T.C., Y.C., Z.T. and D.S.M.H.; visualization, A.R.; supervision, A.R.; project administration, A.R. and T.C.; funding acquisition, A.R.,T.C. and D.S.M.H. All authors have read and agreed to the published version of the manuscript.

Funding

Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2026R751), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.

Data Availability Statement

The original data presented in the study are openly available in CelebA dataset at https://www.kaggle.com/datasets/jessicali9530/celeba-dataset.

Acknowledgments

The authors would like to thank the support of Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2026R751), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1 Training convergence behavior of different deep learning models in terms of initial and final epoch accuracies. The proposed SBDP model has the highest final training accuracy of about 99.8%, which shows better learning ability and convergence performance than CNN, ResNet, Vision Transformer and ConvGRU. The results show that the proposed framework can better capture complex biometric pattern during training phase. Table A2 shows the results of comparing the performance of classification in terms of precision, recall and F1-score. The proposed SBDP model always gets the highest scores for all three measures, with the precision of 99.7%, the recall of 99.6% and the F1-score of 99.6%. The results show that the proposed method is robust in accurately identifying legitimate users with low classification errors. Table A3 gives accuracy of authentication for different biometric modalities such as face, retina, fingerprint and multimodal combinations. All the models perform better when multiple biometric traits are fused, however, the proposed SBDP framework achieves the best accuracy for all the modalities reaching 99.4% for the whole multimodal framework. This shows how well multimodal biometric fusion can improve the reliability and security of authentication. Table A4 Comparison of encryption methods in terms of computational efficiency and security strength Despite the least time of encryption of AES, the proposed SBDP approach reaches the highest security score of 99% and a competitive time of encryption of around 15 ms. This shows that the proposed framework is able to effectively balance security enhancement and the practical computational overhead.

Table A5 evaluates the privacy-preserving ability and detection performance of different methods in terms of AUC, leakage probability, and processing time. The proposed SBDP model can achieve an AUC value up to 0.998 and reduce the privacy leakage probability to around 3%, which is significantly better than the competing approaches. The results show that the proposed framework can achieve better authentication accuracy and privacy protection without introducing excessive processing delay.

Table A1. Training accuracy comparison across models.

Model	Epoch 1 Accuracy (%)	Final Epoch Accuracy (%)
CNN	≈82.4	≈97.1
ResNet	≈84.1	≈97.8
Vision Transformer	≈79.5	≈97.4
ConvGRU	≈81.8	≈97.0
Proposed SBDP	≈88.6	≈99.8

Table A2. Precision, recall, and F1-score comparison.

Model	Precision (%)	Recall (%)	F1-Score (%)
CNN	96.8	96.5	96.6
ResNet	97.3	97.1	97.2
Vision Transformer	97.0	96.8	96.9
ConvGRU	96.7	96.5	96.6
Proposed SBDP	99.7	99.6	99.6

Table A3. Authentication accuracy across biometric modalities.

Modality	CNN (%)	ResNet (%)	Vision Transformer (%)	ConvGRU (%)	Proposed SBDP (%)
Face Only	96.8	97.0	96.6	96.9	97.6
Retina Only	97.1	97.4	97.0	97.3	98.0
Fingerprint Only	97.4	97.8	97.3	97.6	98.3
Face + Retina	98.0	98.3	97.9	98.1	98.9
Complete Multimodal Framework	98.5	98.8	98.4	98.6	99.4

Table A4. Encryption efficiency and security comparison.

Method	Encryption Time (ms)	Security Score (%)
AES	≈11	88
RSA	≈29	91
ECC	≈18	94
Proposed SBDP	≈15	99

Table A5. AUC, privacy leakage, and processing time comparison.

Model/Method	AUC	Leakage Probability (%)	Processing Time (Sec/Sample)
CNN/CNN-only	0.972	≈22	≈0.9
ResNet	0.981	≈18	≈1.2
Vision Transformer	0.978	≈20	≈1.5
ConvGRU	0.975	≈15	≈1.1
Existing GAN-based method	0.973	≈11	≈1.4
Proposed SBDP	0.998	≈3	≈1.4

References

Alajlan, A.M.; Razaque, A. A Quantum-Enhanced Biometric Fusion Network for Cybersecurity Using Face and Voice Recognition. Comput. Model. Eng. Sci. (CMES) 2025, 145, 919–946. [Google Scholar] [CrossRef]
Tran, H.Y.; Hu, J.; Hu, W. Biometrics-Based Authenticated Key Exchange with Multi-Factor Fuzzy Extractor. IEEE Trans. Inf. Forensics Secur. 2024, 19, 9344–9358. [Google Scholar] [CrossRef]
Wang, M.; Yin, X.; Hu, J. Cancellable Deep Learning Framework for EEG Biometrics. IEEE Trans. Inf. Forensics Secur. 2024, 19, 3745–3757. [Google Scholar] [CrossRef]
Osorio-Roig, D.; González-Soler, L.J.; Rathgeb, C.; Busch, C. Privacy-Preserving Multi-Biometric Indexing Based on Frequent Binary Patterns. IEEE Trans. Inf. Forensics Secur. 2024, 19, 4835–4850. [Google Scholar] [CrossRef]
Chen, Z.; Yao, Z.; Jin, B.; Lin, M.; Ning, J. FIBNet: Privacy-Enhancing Approach for Face Biometrics Based on the Information Bottleneck Principle. IEEE Trans. Inf. Forensics Secur. 2024, 19, 8786–8801. [Google Scholar] [CrossRef]
Van Hamme, T. A Novel Evaluation Framework for Biometric Security: Assessing Guessing Difficulty as a Metric. IEEE Trans. Inf. Forensics Secur. 2024, 19, 8369–8384. [Google Scholar] [CrossRef]
Andas, A.; Sabitov, A.; Abdul, R.; Ajmal Khan, M. Hybrid Graphical Password Authentication System Using Intuitive Approach. In Proceedings of the 2025 1st International Conference on Secure IoT, Assured and Trusted Computing (SATC), Dayton, OH, USA, 25–27 February 2025; IEEE: New York, NY, USA, 2025; pp. 1–5. [Google Scholar] [CrossRef]
Tian, Y. Cross-Optical Property Image Translation for Face Anti-Spoofing: From Visible to Polarization. IEEE Trans. Inf. Forensics Secur. 2025, 20, 1192–1205. [Google Scholar] [CrossRef]
Jiang, F. Cross-Scenario Unknown-Aware Face Anti-Spoofing with Evidential Semantic Consistency Learning. IEEE Trans. Inf. Forensics Secur. 2024, 19, 3093–3108. [Google Scholar] [CrossRef]
Zheng, T. MFAE: Masked Frequency Autoencoders for Domain Generalization Face Anti-Spoofing. IEEE Trans. Inf. Forensics Secur. 2024, 19, 4058–4069. [Google Scholar] [CrossRef]
Rezgui, Z.; Strisciuglio, N.; Veldhuis, R. Gender Privacy Angular Constraints for Face Recognition. IEEE Trans. Biom. Behav. Identity Sci. 2024, 6, 352–363. [Google Scholar] [CrossRef]
Grosz, S.A.; Jain, A.K. AFR-Net: Attention-Driven Fingerprint Recognition Network. IEEE Trans. Biom. Behav. Identity Sci. 2024, 6, 30–42. [Google Scholar] [CrossRef]
Liu, Y.; Cheng, K.H.M.; Savic, M. 3D Face De-Identification with Preserving Multi-Facial Attributes: A Benchmark. IEEE Trans. Biom. Behav. Identity Sci. 2025, 7, 681–694. [Google Scholar] [CrossRef]
Zhang, Y.; Ji, J.; Wang, T.; Zhao, R.; Wen, W.; Xiang, Y. Make Identity Indistinguishable: Utility-Preserving Face Dataset Publication with Provable Privacy Guarantees. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 48, 127–139. [Google Scholar] [CrossRef] [PubMed]
Wang, T.; Wen, W.; Xiao, X.; Hua, Z.; Zhang, Y.; Fang, Y. Beyond privacy: Generating privacy-preserving faces supporting robust image authentication. IEEE Trans. Inf. Forensics Secur. 2025, 20, 2564–2576. [Google Scholar] [CrossRef]
Pan, Z.; Jiang, S.; Yang, X.; Yuan, H.; Wang, J. Hierarchical Cross-Modal Image Generation for Multimodal Biometric Recognition with Missing Modality. IEEE Trans. Inf. Forensics Secur. 2025, 20, 4308–4321. [Google Scholar] [CrossRef]
Li, S.; Zhang, B.; Hu, Q. Dual-Cohesion Metric Learning for Few-Shot Hand-Based Multimodal Recognition. IEEE Trans. Inf. Forensics Secur. 2025, 20, 3566–3575. [Google Scholar] [CrossRef]
Luo, D.; Huang, J.; Yang, W.; Shakeel, M.S.; Kang, W. RSNet: Region-Specific Network for Contactless Palm Vein Authentication. IEEE Trans. Inf. Forensics Secur. 2025, 20, 2734–2747. [Google Scholar] [CrossRef]
Shao, X.; Chang, C.; Gan, J.Q.; Wang, H. An Interpretable Contrastive Learning Transformer for EEG-Based Person Identification. IEEE Trans. Inf. Forensics Secur. 2025, 20, 5069–5082. [Google Scholar] [CrossRef]
Shi, Z.; Li, F.; Hao, D.; Sun, Q. Handwritten Signature Verification via Multimodal Consistency Learning. IEEE Trans. Inf. Forensics Secur. 2025, 20, 3995–4007. [Google Scholar] [CrossRef]
Liu, Y.; Li, Z.; Wu, L. Dual Consistency Regularization for Generalized Face Anti-Spoofing. IEEE Trans. Inf. Forensics Secur. 2025, 20, 2171–2183. [Google Scholar] [CrossRef]

Figure 1. Proposed optimized CNN-based secure multimodal biometric authentication framework integrating face, retina, and fingerprint modalities.

Figure 2. Detailed system framework and data-flow architecture of the proposed Secure Biometric Data Protection framework showing the sequential interaction among multimodal preprocessing, OCNN feature extraction, GAN-based privacy-preserving synthetic feature generation, multimodal fusion, ElGamal encryption, SHA-256 digital signature generation, and secure biometric authentication.

Figure 3. (a) Comparison of training accuracy across different deep learning models, highlighting the improved convergence and superior performance of the proposed SBDP framework; (b) Comparison of precision, recall, and F1-score for different deep learning models, demonstrating the superior classification performance of the proposed SBDP framework; (c) Figure Comparison of classification accuracy across different biometric modalities and deep learning models, demonstrating the superior multimodal authentication performance of the proposed SBDP framework; and (d) Comparison of encryption time and security score for different cryptographic methods, demonstrating the efficiency and enhanced security performance of the proposed SBDP framework.

Figure 4. (a) Comparison of processing time per sample for different deep learning models, demonstrating the computational efficiency of the proposed SBDP framework; (b) Comparison of area under the curve scores for different deep learning models, demonstrating the superior classification performance of the proposed SBDP framework; (c) Comparison of biometric data leakage probability for different methods, demonstrating the enhanced privacy preservation capability of the proposed SBDP framework.

Figure 5. (a) Training loss comparison of different deep learning models, showing improved convergence performance of the proposed SBDP framework; (b) Comparison of generator loss convergence for different deep learning models, demonstrating stable adversarial training performance of the proposed SBDP framework; (c) Comparison of discriminator loss convergence for different deep learning models, highlighting the stable and efficient adversarial learning performance of the proposed SBDP framework.

Figure 6. Theoretical privacy analysis of the proposed SBDP framework: (a) reconstruction risk comparison, (b) privacy leakage comparison, and (c) resistance against major privacy attacks. The proposed SBDP framework achieves the lowest reconstruction risk and privacy leakage while demonstrating high resistance to reconstruction, model inversion, replay, and privacy leakage attacks.

Table 1. Comparative analysis of existing biometric security frameworks and the proposed SBDP framework.

Methods	Application Domain	Data Processing Mode	Core Analytical Technique	Multimodal Support	Scalability	Privacy/Template Protection	Encryption/Security	Decision Support Readiness	Key Limitations
Wang et al. [15]	Multimodal biometrics (iris + fingerprint)	Feature extraction + fusion	CNN + cancelable transformation	Yes	Moderate	Strong (template revocability)	No explicit encryption	Low (template-level only)	No cryptographic layer; no integrity verification; no synthetic privacy
Zaiyu et al. [16]	Multimodal authentication	Preprocessing + feature extraction	ConvGRU + hashing	Yes	Moderate	Moderate (hash-based protection)	Weak (hashing only)	Medium (recognition-focused)	No public-key encryption; no signature; no GAN-based privacy
Shuyi et al. [17]	Biometric authentication (face + dorsal hand)	Blockchain-based processing	Fuzzy vault + blockchain	Yes	Low–Moderate	High (decentralized protection)	Blockchain security	Medium (secure authentication)	High overhead; not suitable for real-time; no AI-based feature optimization
Dacan et al. [18]	IoT-based biometric systems	Lightweight distributed processing	Distance-preserving hashing	Limited	High (IoT scaling)	Moderate (template protection)	No strong encryption	Low	Vulnerable to correlation/inversion attacks; no GAN; no signature mechanism
Xinghan et al. [19]	Finger-vein authentication	Masked biometric processing	CNN + random masking	No (single modality)	High	High (cancelable templates)	Implicit masking security	Medium	Modality-specific; no multimodal fusion; no cryptographic authentication
Zhaosen et al. [20]	Face recognition (cloud-based)	Encrypted feature processing	Deep embeddings + homomorphic encryption	No	Moderate	High (privacy-preserving matching)	Strong (CKKS encryption)	Medium–High	Computationally expensive; no multimodal fusion; no GAN integration
Yongluo et al. [21]	Secure biometric authentication	Cryptographic template processing	Lattice-based + homomorphic encryption	Limited	Low–Moderate	Very High (quantum-resistant)	Very strong (post-quantum)	Medium	High complexity; lacks AI-based feature extraction; no GAN-based privacy
Proposed SBDP Framework	Multimodal biometrics (face, retina, fingerprint)	End-to-end integrated pipeline	OCNN + GAN + cryptography	Yes	High (modular architecture)	Very High (synthetic + protected features)	Strong (ElGamal + SHA-256)	High (secure decision-ready output)	Computational overhead under large-scale deployment

Table 2. Conceptual privacy preservation and attack resistance analysis of the proposed SBDP framework.

Security Objective	Compared Method/Attack	Performance (%)	Observation
Reconstruction Risk	Traditional Features	85	Highest reconstruction risk
	Cancelable Templates	55	Moderate reconstruction risk
	Encrypted Templates	35	Reduced reconstruction risk
	Proposed SBDP Framework	10	Lowest reconstruction risk
Privacy Leakage	CNN Features	80	High privacy leakage
	Feature Hashing	45	Moderate privacy leakage
	GAN Synthetic Features	18	Low privacy leakage
	Proposed SBDP Framework	8	Lowest privacy leakage
Attack Resistance	Reconstruction Attack	96	Strong resistance
	Model Inversion Attack	94	Strong resistance
	Replay Attack	98	Highest resistance
	Privacy Leakage Attack	95	Strong resistance

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tynymbayev, S.; Razaque, A.; Chinibayeva, T.; Temirbekova, Z.; Chinibayev, Y.; Hassan, D.S.M. A Secure Multimodal Biometric Data Protection Framework Using Optimized CNN, GAN-Based Privacy Preservation, and ElGamal Cryptography. Appl. Sci. 2026, 16, 6528. https://doi.org/10.3390/app16136528

AMA Style

Tynymbayev S, Razaque A, Chinibayeva T, Temirbekova Z, Chinibayev Y, Hassan DSM. A Secure Multimodal Biometric Data Protection Framework Using Optimized CNN, GAN-Based Privacy Preservation, and ElGamal Cryptography. Applied Sciences. 2026; 16(13):6528. https://doi.org/10.3390/app16136528

Chicago/Turabian Style

Tynymbayev, Sakhybay, Abdul Razaque, Tolganay Chinibayeva, Zhanerke Temirbekova, Yersain Chinibayev, and Dina S. M. Hassan. 2026. "A Secure Multimodal Biometric Data Protection Framework Using Optimized CNN, GAN-Based Privacy Preservation, and ElGamal Cryptography" Applied Sciences 16, no. 13: 6528. https://doi.org/10.3390/app16136528

APA Style

Tynymbayev, S., Razaque, A., Chinibayeva, T., Temirbekova, Z., Chinibayev, Y., & Hassan, D. S. M. (2026). A Secure Multimodal Biometric Data Protection Framework Using Optimized CNN, GAN-Based Privacy Preservation, and ElGamal Cryptography. Applied Sciences, 16(13), 6528. https://doi.org/10.3390/app16136528

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Secure Multimodal Biometric Data Protection Framework Using Optimized CNN, GAN-Based Privacy Preservation, and ElGamal Cryptography

Featured Application

Abstract

1. Introduction

1.1. Main Contributions

1.2. Novelty and Distinction of the Proposed Framework

2. Related Work

3. Problem Formulation and Mathematical Modeling

3.1. Multimodal Input Space and Notation

3.2. Preprocessing Transformation

3.3. Deep Feature Extraction via OCNN

3.4. Classification Objective

3.5. Privacy-Preserving Synthetic Feature Generation

3.6. Multimodal Fusion Representation

3.7. Cryptographic Security Modeling (ElGamal)

3.8. Integrity and Authentication Constraint

3.9. Unified Optimization Problem

3.10. Final Secure Mapping

4. Proposed Optimized Convolutional Neural Network

4.1. Framework Overview

4.2. OCNN Feature Extraction

4.3. GAN-Based Transformation

4.4. Multimodal Fusion

5. Experimental Setup and Results

5.1. Experimental Setup

5.1.1. Research Design

5.1.2. Techniques Used

5.1.3. Datasets

5.1.4. Experimental Environment and Implementation Details

5.1.5. Baseline Methods and Fair Experimental Comparison

5.2. Results

5.2.1. Performance Evaluation Metrics

5.2.2. Training Accuray

5.2.3. Precision–Recall–F1 Comparison

5.2.4. Multimodal Biometric Accuracy Comparison

5.2.5. Encryption Efficiency and Security Analysis

5.2.6. ROC-AUC Performance Comparison

5.2.7. Privacy Leakage Resistance Comparison

5.2.8. Computational Efficiency Comparison

5.2.9. Comparative Training Loss Analysis

5.2.10. Comparative Generator Loss Analysis

5.2.11. Comparative Discriminator Loss Analysis

5.2.12. Privacy Risk and Attack Resistance Analysis

6. Discussion of Results

7. Conclusions and Future Work

7.1. Conclusions

7.2. Future Work

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI