1. Introduction
Biometric authentication methods are commonly used in contemporary security systems. They offer reliable and easy identity authentication using unique physiological and behavioral characteristics such as images of the face, fingerprints, and retina patterns. These systems are used extensively by vital sectors such as defense, healthcare, financial services and digital identity management. However, there are security and privacy issues in the existing systems that may lead to template leakage, spoofing and identity theft despite their several advantages. Unlike traditional authentication data, biometric features are non-revocably and irreversibly, which brings serious challenges in protecting against reconstruction attacks and misuse [
1]. Recent advances in artificial intelligence, especially deep learning, have greatly improved the accuracy of the biometric recognition systems. Convolutional neural networks (CNNs) are powerful tools to extract discriminative and hierarchical information from complex biometric data for robust detection under different environmental conditions. Existing advanced techniques use multimodal learning and complex representation techniques to improve the accuracy and robustness against spoofing attacks [
2]. Multimodal biometric systems, which use multiple sources of information, have shown better performance than unimodal systems by reducing ambiguity and increasing robustness against malicious interference [
3]. However, the fusion of multiple biometric modalities introduces further challenges in terms of privacy, preservation, secure storage, and transmission. It has been recently shown that deep feature representations can still leak sensitive information and are prone to reconstruction attacks, thus exposing biometric identities to adversaries [
4,
5]. In addition, emerging threats, such as presentation attacks and domain generalization issues, still affect the reliability of biometric systems in real-world deployments [
6,
7]. These limitations motivate a need for secure frameworks that achieve high recognition accuracy while providing strong guarantees for confidentiality, integrity and authenticity. Significant efforts have concentrated on employing cryptographic approaches in biometric systems to address these concerns. Public-key cryptosystems and secure authentication techniques have been proposed to guarantee confidentiality and resistance against interception attacks to protect biometric templates from being compromised during storage and transmission [
8,
9]. Privacy-preserving indexing and template protection mechanisms have been proposed to reduce the hazards of template leaking and illegal reconstruction [
10]. However, traditional cryptographic methods are not sufficient to alleviate privacy issues, especially when raw biometric data or feature representations are disclosed during the processing. Meanwhile, privacy-preserving machine learning techniques have become a promising direction to improve the security of biometrics. Generative models and synthetic data generation techniques can convert sensitive biometric data into non-reversible representations while keeping the key identity information. Recent works have shown that privacy-enhancing transformations, such as feature bottleneck learning and biometric de-identification frameworks, can significantly reduce the risk of identity leakage while retaining the recognition performance [
11,
12]. Advances in robust feature learning and privacy-aware data publishing can further improve the security of biometric systems against adversarial threats [
13,
14]. Motivated by these challenges, we propose an SBDP framework that integrates optimized deep learning, generative modelling, and cryptographic security in a unified end-to-end architecture. The proposed framework can overcome the main drawbacks of the existing biometric systems through improved feature representation, reduced risk of sensitive biometric data exposure and secure storage and transmission. We propose a holistic and robust solution for secure biometric authentication via a combination of optimized convolutional neural networks for discriminative feature extraction, generative modelling for privacy preservation, and cryptographic mechanisms for data protection. This work advances the state-of-the-art in bridging the gap between high-performance biometric recognition and strong security guarantees. The proposed system provides a unified solution for recognition and protection, traditionally treated as two separate problems, improving robustness against adversarial attacks, reducing the possibility of biometric data leakage, and guaranteeing reliable identity verification in real-world scenarios. This renders the framework particularly relevant for deployment in security-critical settings, where both recognition and accuracy and data protection are of paramount importance. This paper is organized as follows:
Section 2 provides a literature review on biometric encryption and AI applications in security.
Section 3: Formulate the problem and mathematical modelling.
Section 4 outlines the proposed framework.
Section 5 describes the experimental setup and the results.
Section 6 discusses the results and limitations.
Section 7 concludes with implications, limitations and future directions.
1.1. Main Contributions
The major contributions of this article are summarized as follows:
- ▪
A unified optimization framework for secure multimodal biometric authentication is proposed. Instead of optimizing recognition accuracy, privacy preservation, cryptographic confidentiality, and authentication integrity separately as in traditional biometric systems, the proposed framework optimizes all of them simultaneously.
- ▪
We propose a GAN-based privacy-aware latent transformation mechanism to synthesize synthetic biometric embeddings that preserve identity-discriminative information and drastically reduce the biometric reconstruction risk, privacy leakage, and susceptibility to inversion attacks.
- ▪
An adaptive multimodal biometric fusion strategy based on secure latent feature learning is proposed for facial, retinal, and fingerprint modalities to improve authentication accuracy, robustness against spoofing attacks, and resilience to noisy or partially corrupted biometric inputs.
- ▪
We propose a secure end-to-end AI-cryptography framework to convert the raw multimodal biometric inputs into encrypted and digitally signed authentication outputs by tightly integrating the OCNN feature extraction, GAN-based privacy-preserving transformation, ElGamal encryption, and SHA-256 integrity verification. Experimental results show that it achieves high recognition accuracy, strong privacy protection, stable adversarial training behavior, and secure biometric authentication.
1.2. Novelty and Distinction of the Proposed Framework
Although the proposed SBDP framework employs well-established components such as convolutional neural networks, generative adversarial networks, ElGamal cryptography, and SHA-256 hashing, the novelty of this work does not reside in introducing an entirely new standalone algorithm. Rather, the novelty lies in the design of a unified optimization framework that jointly addresses biometric recognition, privacy preservation, encryption, and authentication integrity within a single end-to-end architecture. Unlike conventional biometric systems where feature extraction, template protection, privacy preservation, and cryptographic authentication are implemented as separate and independently optimized modules, The proposed SBDP framework formulates biometric recognition and privacy-preserving representation learning as jointly optimized objectives through the OCNN and GAN modules, while the cryptographic protection mechanisms, namely ElGamal encryption and SHA-256 integrity verification, are integrated procedurally to provide confidentiality and authenticity of the learned biometric representations. Consequently, the framework combines differentiable learning and deterministic cryptographic protection within a unified secure biometric pipeline. The optimized CNN learns discriminative multimodal biometric embeddings, while the GAN does not generate synthetic biometric images but instead transforms the learned latent representations into privacy-preserving synthetic embeddings that retain identity-discriminative characteristics while minimizing the possibility of biometric reconstruction and privacy leakage. Furthermore, the proposed framework introduces a secure transformation pipeline that maps raw multimodal biometric inputs directly into encrypted and digitally signed authentication outputs. This architecture tightly couples OCNN-based feature learning, GAN-based privacy-aware representation learning, ElGamal probabilistic encryption, and SHA-256 integrity verification into a unified security framework. Such an integrated optimization strategy is fundamentally different from existing approaches that primarily focus on a single protection mechanism, template transformation, encrypted matching, or biometric recognition independently. Therefore, the principal contribution of this work is not merely the integration of existing modules, but the formulation of a unified AI-cryptography framework that simultaneously optimizes recognition accuracy, privacy preservation, cryptographic confidentiality, and authentication integrity for secure multimodal biometric systems.
2. Related Work
Recent studies show increasing interest in integrating biometric recognition, template protection, encryption, and privacy-preserving learning. However, most existing approaches address only one or two components of the secure biometric pipeline, such as template transformation, encrypted matching, synthetic generation, or blockchain-based protection, rather than providing an end-to-end framework that jointly covers feature extraction, synthetic representation, encryption, and authentication. Wang et al. [
15] proposed a cancelable template protection method based on the convolutional neural network for multimodal biometrics with iris and fingerprint traits. Their solution extracts feature from each modality, fuses them into a common representation and applies a cancelable transformation to protect the biometric template. The main advantage of this work is that it improves template revocability and strengthens protection against direct template compromise. However, the method mainly focuses on cancelable template transformation and does not provide an integrated cryptographic layer for encrypted storage, secure transmission, or digital signature-based integrity verification. Therefore, although it is relevant to multimodal template protection, it does not fully address the combined requirements of confidentiality, authenticity, and privacy-preserving synthetic representation. Zaiyu et al. [
16] developed a secure multimodal biometric framework using a deep ConvGRU-based architecture. They enhance the performance of multimodal authentication through a combination of preprocessing, feature extraction and hashing. The major advantage of this method is that it can model spatial and sequential dependencies using deep learning, which may improve the discriminative biometric representation. The solution is mainly based on hashing and deep feature learning, but it does not fully harness the more robust public key encryption, digital signatures and synthetic feature generation. This method improves recognition and template protection, but does not ensure secure transmission of biometrics and verifiable integrity. A decentralized fuzzy vault-based multimodal biometric authentication scheme using blockchain technology was proposed by Shuyi et al. [
17]. Their system uses biometric data from face and dorsal hand biometrics and an improved fuzzy vault mechanism for privacy preserving authentication. The advantage of this approach is the blockchain allows for more decentralization, auditability and single point compromise resiliency. However, the storage on the blockchain introduces additional communication and computational overhead that may reduce the suitability for lightweight or real-time biometric environments. Moreover, the work does not focus on GAN-based privacy-preserving biometric representation or CNN-optimized multimodal feature extraction, which limits its closeness to AI-driven biometric data protection. Dacan et al. [
18] introduced enhanced biometric template protection schemes based on distance-preserving hashing for IoT-based biometric authentication. Their work is important because IoT biometric systems require lightweight template protection and efficient matching under constrained computational resources. The advantage of the proposed method is its focus on practical biometric template protection in distributed environments. However, distance-preserving hashing may still require careful security analysis against correlation, inversion, and linkage attacks. In addition, the method does not integrate asymmetric encryption, SHA-based digital signature verification, or synthetic biometric generation, which are central to the proposed SBDP framework. Xinghan et al. [
19] proposed a cancelable random masking framework combined with lightweight deep learning for secure finger-vein authentication. The solution transforms biometric templates using cryptographic random masks and then applies CNN-based learning directly on masked inputs. The main advantage of the method is that it provides cancelability, revocability, interpretability and real-time authentication while minimizing the risk of template inversion and replay attacks. The work is modality-specific, as it deals with finger-vein biometrics rather than multimodal biometric inputs like face, fingerprint and retina. It also does not contain the synthetic biometric generation or ElGamal-based encryption with SHA-256 signature verification. This restricts its application in a larger secure multimodal biometric system. Zhaosen et al. [
20] proposed a privacy-preserving face recognition framework using FaceNet/ArcFace embeddings, locality-preserving projection and CKKS homomorphic encryption. The main advantage of this work is that encrypted-domain matching allows biometric comparison without revealing facial embeddings, hence enhancing privacy in cloud-based biometric recognition. The use of dimensionality reduction also helps reduce ciphertext size and computational overhead. However, homomorphic encryption remains computationally expensive and may not be feasible for all resource-constrained biometric systems. In addition, the framework is mostly face-centric and does not support multimodal fusion, GAN-based privacy-preserving synthetic generation, and ElGamal digital-signature authentication. Yongluo et al. [
21] proposed AEGIIS, a quantum-proof biometric authentication framework using binary lattices and homomorphic encryption for cancelable templates. The main advantage of this approach is its resistance to classical and quantum attacks, which is more and more important with regard to long-term biometric security. The use of lattice-based mechanisms provides strong theoretical security and improves the revocability of protected biometric templates. However, the framework may introduce higher mathematical and computational complexity, especially for practical large-scale deployment. Furthermore, it focuses mainly on quantum-resistant template protection and does not provide a complete AI–cryptography pipeline involving optimized CNN feature extraction, GAN-based synthetic representation, ElGamal encryption, and SHA-256-based integrity validation. The existing studies confirm that recent biometric security research is moving toward cancelable templates, encrypted-domain matching, blockchain-assisted storage, and privacy-preserving deep learning. Nevertheless, most existing works remain limited to a specific biometric modality, a single protection mechanism, or a partial security pipeline. In contrast, the proposed SBDP framework combines optimized CNN-based multimodal feature extraction, GAN-based privacy-preserving biometric representation, ElGamal encryption, and SHA-256 signature verification in a unified architecture. This integration directly addresses the major gaps in existing work by jointly improving recognition reliability, biometric confidentiality, integrity verification, and resistance to identity forgery.
Table 1 presents a comparative comparison of the existing biometric security frameworks and the proposed SBDP framework.
3. Problem Formulation and Mathematical Modeling
The present work is focused on the design of a unified biometrics system capable of ensuring high recognition accuracy, strong privacy protection and cryptographic security. Multimodal biometrics refers to the use of heterogeneous data sources like facial images, iris scans, fingerprint patterns, and so on. The input space consists of such sources where each modality provides complementary identity information. The goal is to learn a mapping function that maps these multi-modal inputs into concise and discriminative representations without information leakage and guarantees secure storage and transmission. The proposed formulation differs from traditional biometric systems, where extraction, template protection, and authentication are treated as separate processes. Instead, it views these elements as interdependent functions that are optimized together. The challenge is to unify feature learning with privacy and security constraints in a single objective. In particular, the system must learn mode-invariant feature representations that maximize classification performance while ensuring that the extracted features do not contain enough information to reconstruct the original biometric inputs. To formalize this objective, the feature extraction process is modelled using an optimized convolutional neural network, which defines a non-linear transformation from the input space to a latent feature space. However, directly utilizing such learned representations poses a privacy risk, as these features may still contain reconstructable information. To address this issue, the formulation incorporates a generative mechanism that converts the learned features into synthetic representations. These representations need to satisfy two competing constraints, (i) to retain identity-discriminative information for accurate classification, and (ii) to be robust to inversion or reconstruction attacks to ensure privacy. In addition to representation learning and privacy preservation, the formulation also imposes severe cryptographic constraints on the feature space. We need probabilistic encryption to map the transformed biometric features into a secure domain, where the probability of recovering the original feature vector from the encrypted representation is computationally negligible. An integrity constraint is enforced by means of a cryptographic hashing mechanism that allows for reliable detection of any modification to the encrypted data. These requirements introduce additional dependencies between the learning process and the security layer as the generated features should be compatible with the encryption and the verification operations. The global problem can be formulated as a constrained multi-objective optimization problem, aiming to maximize recognition performance while minimizing privacy leakage and satisfying cryptographic security constraints. This leads to a tightly coupled pipeline in which multimodal feature extraction, synthetic data generation, encryption, and integrity verification are jointly optimized. The proposed formulation facilitates a consistent transformation from raw biometric inputs to secure and verifiable output representations, thus overcoming the fundamental limitations of existing biometric systems regarding data exposure, tampering, and identity forgery.
3.1. Multimodal Input Space and Notation
The multimodal biometric dataset be defined as:
Each biometric observation is annotated with a discrete identity label representing its corresponding class in the recognition task. These labels provide the ground truth required for supervised optimization of the feature extraction model. It should be noted that the biometric datasets used in this study, namely CelebA for facial images, UBIRIS v2 for iris images, and the fingerprint dataset for fingerprint samples, are independent publicly available datasets and do not contain biometric samples collected from the same individuals. Consequently, the experiments are not intended to establish subject-level multimodal identity correspondence across modalities. Instead, the proposed SBDP framework is evaluated using a representation-level multimodal protocol, in which each biometric modality is processed independently by the OCNN to extract discriminative latent representations. These latent features are subsequently transformed by the GAN into privacy-preserving synthetic embeddings and fused into a unified multimodal representation. The primary objective of this experimental protocol is to evaluate the capability of the proposed framework to perform secure multimodal feature learning, privacy-preserving synthetic representation generation, multimodal fusion, and cryptographic protection using ElGamal encryption and SHA-256 integrity verification. Therefore, the reported results should be interpreted as an evaluation of the integrated privacy-preserving biometric security framework rather than a benchmark on a naturally paired multimodal biometric identity dataset. The collection of identity labels is defined as:
where
,
, and
are height, width and number of channels respectively,
denotes the face modality;
denotes the retina modality, and
present the fingerprint modality,
denotes complete multimodal biometric dataset,
is total number of biometric samples in the dataset,
is the
-th multimodal biometric sample,
denotes the sample index,
is the Set of identity labels for all biometric samples,
denotes the ground-truth identity label associated with the
-th biometric sample, and
is the total number of identity classes (subjects) in the dataset.
3.2. Preprocessing Transformation
A standard preprocessing pipeline is applied to each biometric modality prior to featuring extraction to provide uniformity, reduce noise and improve the quality of the input data. This step is essential for improving the robustness and generalization capability of the learning model, especially when dealing with heterogeneous multimodal inputs. The preprocessing stage includes spatial alignment, intensity normalization, and stochastic data augmentation to account for variations in illumination, scale, and orientation. Accordingly, each modality undergoes a deterministic preprocessing operator defined as follows:
where
denotes the spatial normalization;
is the intensity normalization, and
is the stochastic augmentation operator.
Thus, the processed sample becomes:
3.3. Deep Feature Extraction via OCNN
Let the optimized CNN be parameterized by
. The hierarchical feature extraction is modeled as a composition of nonlinear operators. Thus, the deep feature representation
extracted from the CNN before the final embedding layer is given as follows:
where
denotes the transformation,
: denotes preprocessed multimodal biometric input corresponding to the
-th sample,
is the nonlinear mapping learned by the optimized CNN, and
denotes total number of feature extraction blocks in the CNN.
Each block transformation is defined as:
where
denotes the convolution at layer
;
is the batch normalization,
denotes nonlinear activation, and
denotes the pooling operator, and
transformation performed by the
-th CNN block.
The final embedding
is defined as follows:
where
is the fully connected projection function that transforms the deep features into the embedding space,
: denotes the dimension of the final embedding vector, and
denotes
-dimensional real-valued embedding space.
3.4. Classification Objective
Following the feature extraction and representation learning stages, the task is to project the learned feature embeddings to the corresponding identity classes correctly. This is achieved by adopting a supervised classification framework, which trains the model to output higher probabilities for the correct class labels and lower probabilities for wrong predictions. The high-level feature representations are then fed into the classification layer, which outputs a probability distribution over all the possible classes, thereby allowing effective identity discrimination over the multimodal biometric inputs. The expected class distribution
is then defined as
The classification loss
is defined as:
where
is the softmax activation function,
and
represent the weight matrix and bias vector of the classification layer,
denotes predicted posterior probability of the
-th class for the
-th biometric sample generated by the Softmax classifier, and
is the class index.
3.5. Privacy-Preserving Synthetic Feature Generation
To address the important issue of the leakage of biometric data and the reconstruction attacks, we propose a privacy-preserving feature transformation mechanism based on the generative adversarial framework. Instead of directly using the original feature representations, the model is trained to learn to generate synthetic feature embeddings that are still identity-discriminative but also sensitive-irrelevant to help with the reconstruction of original biometric inputs. Such an adversarial learning strategy imposes a trade-off between the feature utility and privacy, thereby improving the overall security of the biometric system. The synthetic feature representation
generated for the
-th sample is determined as follows:
The adversarial objective is determined by:
The supremum over all possible reconstruction functions (worst-case adversary) can be determined as follows:
This ensures resistance against reconstruction attacks.
Where denotes the generator function that maps a noise vector to a synthetic feature space, is the random noise vector for the -th sample, is the multivariate normal (Gaussian) distribution with zero mean and identity covariance matrix, is the learnable parameter, : denotes learnable parameters of the discriminator network, denotes the discriminator function, reparents the real feature vector, is distribution of real biometric feature representations, is the noise vector sampled from a predefined distribution, denotes prior noise distribution, is the expectation operator, denotes the natural logarithm used in adversarial loss formulation, is the reconstruction (inverse) function, denotes the original biometric input sample, is the squared Euclidean (L2) norm measuring reconstruction error, is the expectation over the data distribution, and denotes the privacy threshold that defines the minimum acceptable reconstruction error.
3.6. Multimodal Fusion Representation
We propose an effective fusion mechanism to leverage complementary information from heterogeneous biometric modalities by merging modality-specific feature embeddings into a unified representation. This is important to improve recognition robustness and discriminative capability, since single modalities can be affected by noise, occlusion or acquisition variability. The model generates a holistic identity representation by combining features extracted from facial, retinal and fingerprint inputs, capturing common and modality-specific features. Thus, the modality-specific embeddings can be expressed as follows:
denotes feature embedding corresponding to the face modality for the -th sample, is the feature embedding corresponding to the retina modality, denotes the feature embedding corresponding to the fingerprint modality for the -th sample.
Fusion is defined as:
where
denotes concatenation and
is a learnable transformation.
3.7. Cryptographic Security Modeling (ElGamal)
The selection of ElGamal encryption in the proposed secure biometric data protection framework is driven by the security requirements of multimodal biometric authentication rather than by ciphertext compactness alone. We acknowledge that ElGamal, as a classical public-key cryptosystem based on the Diffie–Hellman assumption, incurs ciphertext expansion and increased storage and bandwidth requirements compared with lightweight symmetric encryption schemes. Moreover, ElGamal does not provide native homomorphic computation capabilities. However, these limitations do not significantly affect the proposed framework due to its hybrid architecture. First, the proposed SBDP framework does not encrypt raw biometric images directly. Instead, facial images, retinal scans, and fingerprint samples are initially processed by the OCNN to extract compact latent representations, which are subsequently transformed by the GAN into privacy-preserving synthetic embeddings. Therefore, ElGamal encryption is applied only to the fused synthetic biometric representation rather than to the original high-dimensional biometric data. This significantly reduces the impact of ciphertext expansion and limits additional storage and communication overhead. Second, ElGamal is a probabilistic encryption scheme in which a random ephemeral key is generated during each encryption process. Consequently, the same biometric feature vector produces different ciphertexts at different encryption instances. This probabilistic characteristic is highly desirable in biometric systems because it prevents ciphertext pattern leakage and improves resistance against replay attacks, statistical inference attacks, and biometric template correlation attacks. Third, the proposed framework adopts a layered security design in which OCNN performs discriminative feature extraction, GAN provides privacy-preserving transformation, ElGamal ensures confidentiality of biometric embeddings, and SHA-256 digital signatures guarantee integrity and authenticity. Within this hybrid architecture, the confidentiality benefits of ElGamal outweigh its moderate storage and bandwidth overhead, particularly because encryption is performed only on compact latent biometric representations. Therefore, although ElGamal introduces ciphertext expansion and lacks homomorphic properties, it provides an effective trade-off between strong confidentiality guarantees, probabilistic security, implementation simplicity, and compatibility with the proposed OCNN-GAN hybrid biometric protection framework. The fused biometric representation is stored and transmitted securely using a probabilistic public-key cryptographic mechanism. This approach injects randomness during encryption, unlike deterministic encryption schemes. This prevents any pattern leakage in the ciphertext and increases the robustness against statistical and chosen-plaintext attacks. This design is consistent with contemporary cryptographic requirements, where the security of sensitive feature representations is paramount to counter possible reconstruction or inference attacks, as also emphasized in studies on cryptographic resilience.
Let
be a cyclic group of prime order, the key generation is expressed as follows:
Encryption of feature vector is described as follows:
where
denotes cyclic group,
is a large prime number,
is a generator element of the cyclic group
,
is the private key, randomly selected from the finite field
,
denotes public key component,
is a Ephemeral random value, and
is the encryption function that maps the input feature vector into ciphertext space.
3.8. Integrity and Authentication Constraint
To complement the confidentiality guarantees provided by encryption, the proposed framework incorporates an integrity and authentication mechanism to ensure that the secured biometric data remains unaltered and verifiable during storage and transmission. In practical biometric systems, encrypted data is not immune to tampering or substitution attacks, hence a cryptographic verification scheme binding the encrypted representation is essential. This is achieved through a secure hash-based signature generation process which generates a fixed-length digest uniquely associated with the encrypted feature representation, ensuring reliable verification of data authenticity.
Signature generation is determined as follows:
Verification
condition is given by:
where
is the cryptographic hash function,
is the set of binary strings of arbitrary length,
denotes set of 256-bit binary outputs, and
denotes generated digital signature for the
-th sample.
3.9. Unified Optimization Problem
To jointly address the interdependent tasks of recognition accuracy, privacy preservation, and cryptographic security, the proposed framework is presented as a unified optimization problem where all functional components are incorporated into a single learning objective. Such a formulation can impose a coordinated optimization strategy that considers the trade-off among the classification performance, adversarial robustness, privacy constraints, and security guarantees, unlike the conventional methods where extraction, adversarial learning, and security mechanisms are optimized independently. The overall objective function includes these components through weighted contributions, enabling the model to learn discriminative feature representations, produce privacy-preserving synthetic embeddings, and fulfill encryption and verification requirements in a unified training paradigm.
The complete framework is formulated as:
The security loss component
which combines the decryption consistency and verification reliability is determined as follows:
where
is an overall unified objective,
are the non-negative weighting coefficients balancing different objectives,
denotes the decryption function mapping encrypted representation back to feature space, and
denotes the biometric feature embedding (latent feature vector).
3.10. Final Secure Mapping
We propose a formalization of the end-to-end behavior of the proposed framework by defining the whole pipeline as a unique secure mapping from the raw multimodal biometric input data to a protected and verifiable output representation. This mapping involves all the above-described stages, i.e., feature extraction, synthetic representation, encryption and integrity verification, but guarantees the preservation of identity-discriminative information in the output and the satisfaction of very strict security constraints. The formulation highlights that the system does not reveal any intermediate representations, but provides a compact output that contains classification, encrypted features and a cryptographic signature that can be securely stored, transmitted and verified through a single transformation. The overall system defines a secure transformation
is determined as follows:
where
denotes the input space of multimodal biometric samples.
The secure mapping allows a unified transformation from multimodal biometric inputs to protected authentication outputs, while retaining identity-discriminative information. The system combines feature extraction, synthetic representation learning, encryption and integrity verification in a single framework, thus avoiding exposure of intermediate biometric templates. The encrypted representation provides confidentiality, while the hash-based verification mechanism ensures data integrity and authenticity. In addition, the proposed mapping increases robustness against spoofing, replay and reconstruction attacks, and allows secure biometric storage and transmission in real-world authentication environments.
4. Proposed Optimized Convolutional Neural Network
The framework combines face, retina and fingerprint modalities to improve reliability, robustness and resistance against spoofing and identity leakage attacks. First, multimodal biometric inputs are acquired and passed through a preprocessing stage that includes resizing, normalization and augmentation to improve image quality and augment the diversity of the dataset. These operations help reduce noise variations and increase the consistency of biometric representations prior to feature extraction. The optimized CNN extractor is trained with preprocessed data to learn deep discriminative features of multimodal biometric data.
The convolutional layer along with batch normalization and the ReLU activation function in the CNN extractor enhance the learning of nonlinear features while stabilizing the training process. In the CNN extractor, max pooling is employed to down-sample the spatial dimensions and keep the most informative feature characteristics. Dropout regularization is applied in the CNN extractor to avoid overfitting by randomly turning off neurons in the training process and thus enhancing the generalization capability of the model. The learned deep features are input to the fully connected layer that converts them into high-level biometric representations suitable for classification. A probability distribution over various biometric classes is created by the softmax classifier and the final biometric identity representation or feature vector is generated. The extracted features are then protected using ElGamal encryption scheme with SHA-256 digital signature generation. The ElGamal encryption scheme provides data confidentiality, and the SHA-256 signature mechanism guarantees the data integrity and authenticity by identifying any unauthorized change during the storage and transmission. The framework ultimately enables secure transmission and storage of biometric data along with high authentication accuracy, privacy preservation and computational efficiency.
Figure 1 depicts the complete architecture of the proposed Secure Biometric Data Protection framework for multimodal biometric authentication and secure data preservation. To further clarify the interaction among the proposed components,
Figure 1 presents the complete data-flow architecture of the SBDP framework. The framework follows a sequential processing strategy in which raw multimodal biometric inputs, The multimodal biometric input space is made up of heterogeneous biometric sources such as facial images, iris images and fingerprint patterns, in which each modality provides complementary identity information. The iris modality is represented by images taken from the UBIRIS v2 database that provides unconstrained iris images acquired under visible wavelength conditions. The preprocessing operations consist of normalization, resizing, and augmentation. The preprocessed biometric images are then forwarded to the OCNN, which extracts discriminative latent feature representations from each modality. The extracted OCNN feature embeddings are then forwarded to the Generative Adversarial Network (GAN). Instead of directly working on the raw biometric images, the GAN translates the latent features generated by the OCNN into privacy-preserving synthetic representations that retain identity-discriminative information while suppressing the possibility of reconstructing the original biometric data. Such conversion improves privacy protection and lowers the risk of identity leakage. After generating synthetic features, the modality specific representations are fused by a multimodal fusion mechanism to create a unified biometric embedding. The fused embedding is then secured with ElGamal public-key cryptosystem, which provides probabilistic encryption and confidentiality for storage and transmission. Finally, a SHA-256 based digital signature is generated on the encrypted representation to ensure integrity, authenticity and tamper detection. Therefore, the complete data flow of the proposed framework is formally expressed as:
Multimodal Input → Preprocessing → OCNN Feature Extraction → GAN Privacy Transformation → Multimodal Fusion → ElGamal Encryption → SHA-256 Signature → Secure Authentication.
This explicit architecture makes it clear that privacy preservation based on GAN is done after the feature extraction of OCNN and before cryptographic protection. This clearly separates feature learning, privacy enhancement and security enforcement.
Figure 2 depicts the comprehensive system framework and data-flow architecture of the proposed Secure Biometric Data Protection framework, highlighting the sequential interactions among multimodal preprocessing, OCNN feature extraction, GAN-based privacy-preserving synthetic feature generation, multimodal fusion, ElGamal encryption, SHA-256 digital signature generation, and secure biometric authentication.
4.1. Framework Overview
This section presents the proposed Secure Biometric Data Protection framework that integrates deep learning, generative modeling and cryptographic security in a unified end-to-end pipeline that transforms raw multimodal biometric data (e.g., facial images, retinal scans, fingerprint data) into secure and verifiable information. Unlike conventional approaches where extraction, privacy preservation and encryption are considered as separate components, the proposed framework optimizes these components jointly to achieve high recognition accuracy and provide guarantees of confidentiality, integrity and privacy.
It should be noted that the term “end-to-end” in the proposed SBDP framework refers to the complete data-flow transformation from raw multimodal biometric inputs to secure and verifiable authentication outputs rather than to a fully differentiable optimization process. The trainable learning components of the framework consist of the optimized convolutional neural network (OCNN) and the GAN-based privacy-preserving transformation module, which are optimized jointly through gradient-based learning. In contrast, the ElGamal encryption and SHA-256 digital signature modules are deterministic cryptographic operations and do not participate in backpropagation or parameter optimization. Instead, these cryptographic components are applied procedurally after the OCNN-GAN learning stage to provide confidentiality, integrity, and authenticity of the generated biometric representations. Therefore, the proposed framework should be interpreted as a hybrid AI-cryptography architecture in which differentiable learning modules and non-differentiable cryptographic modules are tightly integrated in a sequential and secure processing pipeline.
The proposed SBDP framework consists of four tightly coupled stages: multimodal pre-processing, optimized CNN-based feature extraction, GAN-based privacy-preserving transformation, and cryptographic protection using ElGamal encryption with SHA-256-based digital signature generation. Let the multimodal input space be denoted by
, where
,
, and
represent facial, retinal, and fingerprint biometric samples, respectively. Each input modality is first normalized and enhanced through a preprocessing function
, after which an optimized CNN extracts discriminative latent representations. These representations are then converted by a GAN into synthetic privacy-preserving features that are fused into a single biometric embedding. The fused representation is then encrypted and digitally signed to ensure secure storage, transmission and verification. The complete system describes the following secure mapping:
where
denotes the predicted identityand
represents the encrypted feature vector. More specifically, the secure transformation
can be expressed as:
Thus, the final secured output is obtained as follows:
where
denotes the modality index corresponding to face, retina, and fingerprint, respectively;
denotes the generator network parameterized by
;
is a random noise vector;
represents the synthetic privacy-preserving feature embedding;
denotes the concatenation operator;
and
are the weight matrix and bias vector of the Softmax classifier, respectively;
denotes the encrypted biometric representation generated by the ElGamal encryption function
using public key
;
denotes the digital signature generated by
using the private signing key
;
is the cryptographic hash function; and
denotes the final secured output of the proposed framework.
Hypothesis 1: The proposed SBDP framework enhances safe biometric authentication by maintaining identity-discriminative information, minimizing biometric privacy leakage, and assuring cryptographic secrecy and integrity.
Formally:
where
denotes recognition accuracy,
represents the mutual information between the original biometric input
and the synthetic representation
, and
denotes the signature verification function.
Proof: The OCNN extracts identity-relevant features from the preprocessed biometric input:
The classifier predicts the identity label
which is given as follows:
The purpose of classification
reduces the following:
When
, the predicted identity approaches the true class as
. Hence, recognition accuracy increases:
For privacy preservation, the GAN transforms the real feature vector
into a synthetic representation:
The privacy objective is to reduce reconstructability of the original biometric input that is given as follows:
Where
denotes the operator that returns the class index corresponding to the maximum posterior probability,
is the sample index,
represents the natural logarithm function,
denotes the recognition accuracy, and
represents the synthetic privacy-preserving feature representation generated by the GAN.
where
is the reconstructed biometric sample and
denotes the minimum acceptable reconstruction error threshold. If the reconstruction error remains above
, then the adversary cannot reliably recover the original biometric input from
. Therefore:
This demonstrates that the synthetic feature representation mitigates biometric information leakage. To ensure confidentiality, the fused representation is encrypted with ElGamal encryption.
Since
is randomly selected for each encryption operation, the same biometric feature vector produces different ciphertexts:
Thus, ciphertext pattern leakage is prevented. For integrity and authenticity, the digital signature is generated as:
The verification process is determined as follows:
If the ciphertext is modified from
to
, then:
This confirms that tampering can be detected reliably. Hence, the proposed framework simultaneously satisfies:
Therefore, the hypothesis is proven. □
The computational complexity of the proposed SBDP framework is determined by the cumulative cost of OCNN feature extraction, GAN-based privacy-preserving transformation, multimodal fusion, ElGamal encryption, and SHA-256 digital signature generation. The overall complexity
is expressed as follows:
where
denotes kernel size,
is the number of filters,
represents feature map size,
is the number of epochs,
denotes generator parameters, and
is the discriminator parameters.
4.2. OCNN Feature Extraction
The proposed optimized convolutional neural network consists of four convolutional blocks followed by fully connected classification layers. Each convolutional block consists of a convolutional layer, batch normalization, Rectified Linear Unit (ReLU) activation, and max-pooling operation. The first, second, third and fourth convolutional layers use 32, 64, 128 and 256 filters respectively, with a kernel size of 3 × 3 and stride of 1. Max-pooling with a pooling size of 2 × 2 is applied after each convolutional block to decrease spatial dimensionality while retaining discriminative information. We apply batch normalization after each convolutional layer to enhance the stability of the training process and speed up convergence. ReLU is selected as the activation function because of its computational efficiency and its capability of solving the vanishing gradient problem. To decrease overfitting and improve generalization performance, we add a dropout layer with a dropout rate of 0.5 before the fully connected layer.
The OCNN is optimized by Adam optimizer with learning rate of 0.001, batch size 32 and categorical cross entropy loss function. The last classification layer of the OCNN uses Softmax activation function to output probability distributions over the biometric identity classes. This optimized architecture can achieve efficient hierarchical feature extraction from multimodal biometric inputs with high recognition accuracy and stable convergence behavior.
It is worth mentioning that all competing deep learning architectures, namely CNN, ResNet-18, Vision Transformer (ViT), ConvGRU, and proposed SBDP framework, have been trained on the same multimodal biometric inputs, same preprocessing steps, similar training budgets, and comparable optimization settings. To be specific, all models used the same training and testing partitions, image normalization and augmentation procedures, Adam optimizer with a learning rate of 0.001, batch size of 32, and the same training epoch. Furthermore, the same multimodal fusion protocol was used to ensure consistency across all experiments. As such, the reported differences in performance are mainly attributed to the architectural design and privacy-preserving capabilities of the methods, rather than differences in data preparation, feature fusion strategies, or training configurations. This guarantees that the comparison is objective, reproducible, and experimentally fair.
The proposed OCNN is a light-weighted architecture based on the ResNet-18 framework with modifications for improved computational efficiency and multimodal biometric feature learning. The proposed OCNN is built of four convolutional blocks with 32, 64, 128 and 256 filters respectively, which is different from the standard ResNet-18 architecture. Each block has a convolutional layer, batch normalization, ReLU activation and max-pooling operation. The principles of residual learning are used to ensure stable gradient propagation and faster convergence of the model during the training phase. Dropout regularization with a dropout rate of 0.5 is also used before the fully connected layer to mitigate overfitting. The network is trained using the Adam optimizer with a learning rate of 0.001 and a batch size of 32. Such architectural modifications make the proposed OCNN more suitable for multimodal biometric authentication while achieving high recognition accuracy and computational efficiency.
The optimization is achieved by a combination of architectural refinement, hyperparameter tuning, regularization, and training strategy, specifically tailored for multimodal biometric authentication. The optimization comprises: (i) adopting a lightweight ResNet-18 inspired architecture with four convolutional blocks with 32, 64, 128 and 256 filters, respectively; (ii) batch normalization after each convolutional layer to stabilize training and speed up convergence; (iii) ReLU activation and max-pooling operations to enhance nonlinear feature learning and computational efficiency; (iv) dropout regularization with dropout rate of 0.5 to alleviate overfitting and enhance generalization; and (v) hyperparameter optimization using the Adam optimizer with a learning rate of 0.001, batch size of 32, and categorical cross-entropy loss. Hence, the term “optimized” pertains to the joint optimization of architecture design, training parameters, and regularization mechanisms for secure multimodal biometric feature extraction.
The optimized convolutional neural network is utilized to extract hierarchical and discriminative feature representations from preprocessed biometric data. Given a normalized input , the OCNN learns a nonlinear transformation that maps the input space into a compact latent feature space, which refers as .
where denotes the OCNN parameterized by , and is the learned feature embedding.
The OCNN is composed of
stacked convolutional blocks, where each block performs feature transformation
as follows:
Where
,
and
are the convolutional weights and biases,
is batch normalization, and
is a nonlinear activation function (e.g., ReLU).
Pooling is applied to reduce spatial dimensionality
.
The OCNN is trained using a classification objective that maximizes inter-class separability while minimizing intra-class variation. The predicted class probability
is determined as follows:
The cross-entropy loss is defined as follows:
Minimizing
enforces, which is given as follows:
To improve generalization, dropout is as follows:
This reduces overfitting and enhances robustness against noisy biometric inputs.
Lemma 1: Let
and
be two preprocessed inputs such that:
Then, the OCNN mapping satisfies:
where
is Lipschitz constantly dependent on network parameters, and
denotes maximum allowable perturbation bound between the two preprocessed inputs.
Proof: Each OCNN layer consists of convolution, batch normalization, and activation functions, all of which are Lipschitz continuous. Let the Lipschitz constants of these operations be . Then, for each layer:
The recursive application across
layers is expressed as follows:
Let
, then:
Hence, the OCNN mapping is Lipschitz continuous and stable. □
The OCNN is a key building block of the proposed framework since it projects heterogeneous biometric inputs into a common latent space. Hierarchical structure enables the extraction of low-level and high-level features, and the stability property provides robustness against noise and adversarial perturbations. In conjunction with the ensuing GAN-based transformation and cryptographic protection, OCNN provides a solid basis for secure and accurate biometric authentication.
4.3. GAN-Based Transformation
The GAN-based transformation mechanism can preserve biometric privacy by learning a robust nonlinear mapping from the original feature embeddings to the synthetic latent representations. The adversarial learning strategy guarantees discriminative biometric patterns while substantially reducing the likelihood of biometric reconstruction and identity leakage in comparison with the traditional feature perturbation approaches. During the training process, the generator is continuously improved to generate realistic synthetic embeddings that match the statistical distribution of the original latent space, while the discriminator learns to recognize the subtle differences between the real and the generated representations. This competitive optimization process drives the generator to produce highly secure synthetic biometric features that can maintain the authentication reliability without disclosing sensitive personal information. Moreover, the stochastic noise component adds randomness to the generation process, which increases the diversity and improves the resistance against the inversion and replay attacks. Thus, the proposed transformation framework can achieve an effective trade-off between biometric recognition performance, privacy preservation and adversarial robustness for secure multimodal biometric authentication systems.
Given a latent feature vector
extracted by the OCNN, the generator
produces a synthetic embedding:
where
is a stochastic noise vector and
denotes the parameters of the generator network. The objective is to ensure that
preserves identity-discriminative characteristics while obfuscating sensitive biometric information that could enable reconstruction of the original input.
The GAN framework consists of a generator
and a discriminator
, where the discriminator attempts to distinguish real embeddings from synthetic ones. The adversarial optimization is formulated as follows:
A modified loss function (non-saturating variant) is employed to stabilize training and enhance gradient behavior:
Training GANs is inherently difficult due to unstable optimization dynamics, vanishing gradients and mode collapse, where the generator produces only a limited diversity of outputs. To ease these challenges, the proposed framework adopts the Wasserstein generative adversarial network with gradient penalty (WGAN-GP) objective, which offers smoother gradients and improves the convergence stability during adversarial training. In WGAN-GP, the discriminator is replaced by a critic network that estimates the Wasserstein distance between the distributions of real biometric embeddings and synthetic privacy-preserving embeddings. The critic is optimized as follows:
where
denotes the critic network,
represents the synthetic biometric embedding generated from the latent noise vector
, and
denotes the real biometric embedding extracted by the OCNN feature extractor. The parameter
is the gradient penalty coefficient, while
is sampled uniformly along the straight lines connecting real and generated embeddings.
The generator is trained to minimize the Wasserstein distance by maximizing the critic score of the samples generated. The generator loss is formulated as follows:
It encourages the generator to synthesize privacy-preserving biometric embeddings whose distribution closely approximates that of the original biometric feature space. To preserve identity-discriminative characteristics while simultaneously enhancing privacy protection, the overall generator objective is augmented with additional regularization terms as follows:
where
denotes the classification consistency loss that preserves identity-related discriminative information,
represents the reconstruction-resistance loss that prevents recovery of the original biometric data, and
and
are weighting coefficients controlling the contribution of the corresponding loss components.
Furthermore, the gradient penalty term
enforces the Lipschitz continuity constraint on the critic and stabilizes the adversarial optimization process. Consequently, the proposed WGAN-GP framework effectively mitigates mode collapse, improves convergence stability, and generates diverse privacy-preserving biometric embeddings suitable for secure multimodal biometric authentication.
To explicitly enforce resistance against reconstruction attacks, a reconstruction adversary
is introduced, which attempts to recover the original biometric input
from the synthetic feature
. The reconstruction loss is defined as:
A privacy constraint is imposed such that:
where
is a predefined privacy threshold ensuring that reconstructed outputs remain significantly different from the original inputs.
To sustain classification efficacy, the synthetic representation must retain identity-relevant information. Consequently, a classification consistency loss is integrated:
where
denotes the classifier mapping. This ensures that the synthetic embedding
retains discriminative power.
The overall GAN optimization problem is formulated as follows:
where
are balancing coefficients controlling the trade-off between utility preservation and privacy protection.
The transformation aims to minimize the mutual information between the original biometric input
and the synthetic representation
:
subject to the constraint that identity information is preserved:
where
denotes informative for classification but uninformative for reconstructing the original biometric sample.
Lemma 2: If the reconstruction loss satisfies
, then the probability of accurately reconstructing the original biometric input from
is bounded by:
where
and
denotes reconstruction variance, and
is reconstruction error.
Proof: By applying concentration inequalities on the reconstruction error distribution and using the lower bound constraint , the probability of small reconstruction error decays exponentially. Hence, reconstruction becomes statistically infeasible. □
Corollary 2: Under the above lemma, the mutual information between
and
satisfies:
for sufficiently large , where is a small constant.
Implication: The synthetic representation
effectively eliminates sensitive biometric information while preserving identity-discriminative features.
The GAN-based transformation adds a privacy-preserving layer to the proposed framework. By integrating adversarial learning, reconstruction constraints and information-theoretic regularization, a model that strikes a balance between utility and privacy is obtained. The synthetic embeddings produced are resilient to inversion attacks and make it impossible to reverse-engineer biometric templates. This greatly increases the total security of the system, especially when combined with the subsequent encryption and digital signature processes.
4.4. Multimodal Fusion
The proposed SBDP framework is a multimodal fusion framework to improve the accuracy of biometric recognition and privacy preservation by merging discriminative representations extracted from multiple biometric modalities into a common latent feature space. Rather than using a single biometric source, the framework fuses privacy-preserving embeddings generated from face, retina, and fingerprint modalities to leverage their complementary biometric characteristics and enhance robustness against spoofing, identity leakage, and feature reconstruction attacks. The fusion mechanism allows the model to capture inter-modal correlations while maintaining secure feature abstraction through GAN-based transformation learning. The proposed framework jointly optimizes multimodal representations in a compact latent space to obtain better authentication reliability, stronger privacy preservation, and increased resistance to adversarial inference attacks in secure biometric authentication environments. Let
denote the privacy-preserving feature embeddings generated by the GAN for each modality. The multimodal fusion is defined as:
where
denotes concatenation and
is a learnable transformation function mapping the concatenated vector into a compact latent space
.
The transformation function
is implemented as a nonlinear projection:
where
is the fusion weight matrix,
is the bias term, and
is a nonlinear activation function.
To improve stability and representation consistency, normalization is applied as follows:
An adaptive weighted fusion approach is presented to address modality dependability.
The weights
can be learned dynamically based on modality quality or confidence scores. The fusion process is optimized to maximize discriminative capability:
where
denotes the indicator function.
Property 1: The fused representation
achieves higher discriminative power and robustness than any individual modality embedding, i.e.:
for bounded noise in any single modality.
Proof: Each modality captures distinct biometric characteristics:
- ▪
Face → global appearance features
- ▪
Retina → vascular patterns
- ▪
Fingerprint → ridge structures
Let the information contribution of each modality be represented as:
for any single modality
. Since fusion aggregates these independent contributions, the mutual information between fused features and identity increases:
Additionally, if one modality is corrupted by noise
, the remaining modalities still preserve identity information:
Since
, the impact of noise is attenuated:
denotes maximum acceptable deviation (noise bound) ensuring that the fused representation remains stable and robust.
Thus, the fused representation remains stable. □
Multimodal fusion phase exploits complementary biometric data for improving the effectiveness and robustness of proposed system. Combining multiple modalities helps to overcome the limitations of single sources thus increasing the resilience to spoofing and noise. Adaptive weighting and nonlinear transformation are used for improving the discriminative effectiveness of fused representation. This stage is vital for reliable biometric authentication, especially in real-world settings where data quality may differ across modalities. The proposed SBDP framework fuses heterogeneous biometric features from face, retina and fingerprint modalities into a single and compact latent representation through the fusion process. Each modality contributes different discriminative patterns which together improve the reliability of recognition and reduces the risk of authentication failure due to low-quality samples or partial biometric corruption. The multimodal approach also improves the resistance to impersonation and adversarial attacks as an attacker would have to compromise multiple independent biometric traits simultaneously. Besides, the fusion mechanism learns the optimal inter-modal relations in the training, which can increase feature diversity and reduce redundancy. To enhance the fused representation, the framework utilizes learnable transformation layers along with adaptive feature projection mechanisms that dynamically weigh the most informative biometric components and suppress noisy or less relevant features. The proposed nonlinear fusion operation allows the model to learn complex dependencies between modalities, which increases feature separability in the latent space and, thus, improves authentication accuracy, accelerates convergence behavior and stabilizes optimization during training. The use of privacy-preserving embeddings derived from GANs ensures the protection of sensitive biometric information and the preservation of privacy while still preserving high discriminative capability for identity verification. The multimodal fusion stage also improves the scalability and generalization ability of the proposed framework in different operating environments. In real-world biometric systems, variations in illumination, pose, sensor quality, occlusion, and environmental noise often degrade the recognition performance of a single modality. The proposed fusion strategy overcomes these limitations by enabling unaffected modalities to compensate for degraded biometric samples, thereby providing stable and reliable authentication performance. Thus, the multimodal fusion module is an integral part of the proposed SBDP architecture to realize secure, accurate, privacy-preserving, and robust biometric authentication in real-world deployment scenarios.
To provide a better understanding of the operational workflow, Algorithm 1 summarizes the full implementation procedure of the proposed secure biometric data protection framework. The algorithm combines multimodal biometric preprocessing, optimized CNN based feature extraction, privacy preserving representation learning, classification, and integrity verification into a single unified secure authentication pipeline. The stepwise formulation illustrates how the proposed framework converts raw biometric inputs into secure and verifiable biometric representations while preserving recognition accuracy and privacy protection.
| Algorithm 1: Optimized CNN-based secure multimodal biometric feature extraction and encryption |
- 1.
Initialization: {: multimodal biometric inputs; : optimized CNN model; : intermediate feature maps; : final feature vector/feature extraction; : preprocessing function; : SHA-256 signature; : secured output; : class label; : encrypted feature vector; : convolution; : batch normalization; : activation function (e.g., ReLU); : pooling; : input to the current block} - 2.
- 3.
- 4.
Set //where: - 5.
do//*Preprocessing - 6.
- 7.
- 8.
End for - 9.
//Feature extraction on CNN Blocks - 10.
do - 11.
//Output of CNN block 1 - 12.
/Output of CNN block 2 - 13.
blocks - 14.
End for - 15.
//Regularization process - 16.
- 17.
- 18.
Encrypt - 19.
Compute - 20.
- 21.
|
This Algorithm 1 provides an optimized CNN model for a secure multi-modal biometric recognition pipeline. Step 1 initializes all variables, including the biometric dataset . Steps 2–3 define the input and output respectively. The input consists of face, retina, and fingerprint samples. The output is either the secured biometric class or the encrypted feature vector . Step 4 defines the preprocessing operation. Each biometric image is resized, normalized, and augmented using rotation, scaling, flipping, and brightness variation. Steps 5–8 apply preprocessing to every biometric sample . The processed image is stored as , which improves consistency and generalization before CNN processing. Step 9 sends the preprocessed biometric features to the optimized CNN model for feature extraction. Steps 10–14 extract features from each biometric sample using CNN blocks. The first CNN block generates through convolution, batch normalization, activation, and pooling. The second CNN block generates , which represents deeper biometric features. The generalized represents the output of additional CNN blocks if more layers are used. Step 15 applies dropout to , producing . This reduces overfitting and improves model generalization. Step 16 passes through a fully connected layer to compute the final feature vector , which represents high-level identity-specific biometric information. Step 17 applies the SoftMax classifier to predict the biometric class label . Step 18 encrypts the final feature vector using ElGamal encryption to protect biometric confidentiality. Step 19 computes a SHA-256 signature from to ensure data integrity and authenticity. Step 20 forms the secured biometric output , combining the predicted class, encrypted feature vector, and signature. Step 21 returns the secured biometric output or the encrypted feature vector .
6. Discussion of Results
Experimental results indicate that the proposed Secure Biometric Data Protection framework surpasses the state-of-the-art deep learning and biometric protection methods in terms of biometric recognition accuracy, privacy preservation, cryptographic security, and adversarial robustness. The combination of optimized CNN-based multimodal feature extraction, GAN-based synthetic biometric transformation, and ElGamal cryptographic protection enables the framework to efficiently overcome the major limitations of traditional biometric authentication systems. The training accuracy analysis confirmed that the proposed framework converges faster and achieves higher recognition accuracy than CNN, ResNet, Vision Transformer, and ConvGRU architectures. The proposed SBDP framework achieved an average classification accuracy of around 99.8%, which shows that the multimodal fusion of face, retina, and fingerprint modalities can significantly improve the identity discrimination and authentication reliability. The improved convergence behavior further shows the stability of the proposed optimization strategy and the effectiveness of adaptive multimodal feature learning. Furthermore, the evaluation metrics of precision, recall, F1-score, and ROC-AUC reveal the discriminative capability of the proposed framework. The proposed framework outperforms baseline methods in all evaluation metrics and in false positive and false negative rates. In particular, the ROC-AUC score of approximately 0.998 confirms the ability of the proposed architecture to correctly discriminate authentic biometric identities from fraudulent or manipulated samples under secure multimodal authentication settings. The results show that the integration of optimized deep feature learning and privacy-preserving representation transformation improves both recognition robustness and classification stability. The GAN-based privacy-preserving transformation mechanism showed strong resistance to biometric reconstruction and information leakage attacks. The proposed framework can obtain a significant reduction of the biometric leakage probability to nearly 3%, which is superior to CNN-only, ResNet, ConvGRU, and existing GAN-based methods. The adversarial learning strategy can effectively transfer the biometric features into synthetic representations while preserving the identity-discriminative characteristics. This confirms that the proposed framework can simultaneously maintain authentication accuracy and privacy protection without directly exposing sensitive biometric templates. The cryptographic analysis indicates that the proposed SBDP framework offers an optimal trade-off between encryption efficiency and security strength. Though the framework incurs slightly more computational overhead due to multimodal fusion, GAN-based transformation, ElGamal encryption, and SHA-256 signature generation, the attained security score of about 99% confirms the effectiveness of the proposed secure authentication pipeline. The probabilistic property of ElGamal encryption also increases the resilience against ciphertext analysis and replay attacks, and SHA-256-based verification ensures the integrity and authenticity of data during the storage and transmission of biometric data. The stability of the proposed adversarial learning framework was further confirmed by the training, generator, and discriminator loss analyses. The proposed model exhibited lower and more stable convergence curves compared to the baseline methods, suggesting efficient optimization and well-balanced GAN training behavior. The reduction of the generator and discriminator losses indicates that the proposed architecture can produce realistic privacy-preserving biometric representations, achieve stable adversarial learning, and minimize reconstruction risks. Despite the promising performance, the proposed framework still has some minor limitations. The integration of multimodal feature extraction, GAN-based transformation, encryption, and signature verification introduces moderate computational overhead compared with lightweight unimodal biometric systems. In addition, the current framework was evaluated using face, retina, and fingerprint modalities only, and performance may vary when additional biometric traits or extremely large-scale real-world deployments are considered. Furthermore, GAN-based adversarial training requires careful parameter tuning to maintain stable optimization under different dataset distributions. However, these limitations are relatively minor compared with the substantial improvements achieved in biometric recognition accuracy, privacy preservation, cryptographic protection, and secure authentication reliability. A limitation of the present study is that the facial, fingerprint, and iris modalities are obtained from independent public datasets and therefore do not represent naturally paired multimodal biometric identities. Although the representation-level fusion strategy enables evaluation of the proposed privacy-preserving and cryptographic framework, future work will validate the proposed SBDP framework using a real multimodal biometric dataset containing multiple biometric traits acquired from the same individuals to further investigate identity-level multimodal authentication performance.