1. Introduction
As information and communication technologies have advanced rapidly, there has been an exponential rise in the global volume of digital data. Among various data types, image data stands out due to its rich semantic content, high dimensionality, and widespread use in applications such as social media, healthcare, surveillance, and biometric authentication [
1,
2,
3,
4]. In this data-driven era, ensuring the security and privacy of image content has become a pressing concern. Traditional privacy protection techniques—such as box blurring or pixelation—are no longer sufficient in confronting modern threats; such methods may offer a superficial level of obfuscation, but fail to resist advanced reconstruction or inference attacks, especially when applied to large-scale visual datasets [
4,
5,
6,
7]. Moreover, the unique characteristics of image data pose additional challenges to conventional cryptographic techniques [
7,
8,
9,
10]. Unlike text, digital images exhibit high redundancy, strong spatial correlation among adjacent pixels, and a typically non-uniform, clustered distribution of features [
7,
11,
12,
13]. These traits significantly reduce the effectiveness of traditional text-based encryption algorithms when directly applied to image domains [
14,
15,
16,
17]. Simultaneously, the stakes for image security have grown considerably. Facial images, for instance, have emerged as critical biometric identifiers in access control, authentication, and surveillance systems. The leakage or unauthorized manipulation of such images may result in severe consequences, including identity theft, personal data breaches, and financial fraud [
18,
19,
20]. Therefore, developing specialized image encryption schemes that account for the structural properties of images and are resilient to both statistical and cryptanalytic attacks is imperative [
21,
22,
23,
24]. To address these challenges, researchers have explored various encryption strategies tailored for image data, including chaotic systems, DNA/RNA coding, fractal transformations, and neural network–based methods [
25,
26,
27,
28]. Among them, chaos-based encryption has gained particular attention due to its desirable properties such as sensitivity to initial conditions [
29,
30,
31,
32,
33], ergodicity [
34,
35,
36,
37,
38], and pseudo-randomness [
39,
40,
41,
42]—characteristics that align well with security requirements like diffusion and confusion.
Although global encryption algorithms such as AES and DES were once considered the cornerstone of secure communication, their computational intensity imposes significant burdens in large-scale image processing. Moreover, they lack the flexibility required for hierarchical privacy protection. These limitations are particularly pronounced in real-time surveillance systems [
43] and cloud-based facial recognition services [
44], where there is an urgent need for encryption paradigms that balance both security and efficiency.
Recent research has introduced a range of multidimensional approaches to tackle the challenges of image encryption [
4,
45,
46]. In 2021, Wang et al. [
47] proposed a robust triple encryption scheme that integrates 2D chaotic systems, compressed sensing, and 3D discrete cosine transform (3D-DCT), significantly improving both visual security and decryption efficiency. In 2022, Yamni et al. [
48] introduced a high-concealment audio watermarking method combining dual-tree complex wavelet transform with fractional Charlier moments. Cao et al. [
49] recently developed a robust watermarking technique for screen content images, which dynamically embeds retrievable information and improves extraction accuracy in copyright protection scenarios. These developments indicate a clear evolution from single-algorithm optimizations to multimodal cooperative protection strategies [
50].
In the domain of facial privacy protection, encryption technologies have been widely explored [
51,
52,
53,
54,
55,
56]. Winkler et al. [
57] developed TrustCAM, a privacy-aware smart camera that uses edge detection and gradient operators to generate alternative regions of interest (ROIs) and grants access to original ROIs through key-based mechanisms. To maintain a degree of visual usability while preventing unauthorized facial recognition, Zhou et al. (year not specified) proposed the Thumbnail-Preserving Encryption (TPE) method based on Generative Adversarial Networks (GANs), which generates encrypted thumbnails of facial images that block both human and machine recognition. Zhao et al. [
58] introduced the Enhanced TPE (E-TPE), leveraging a bijection between pixel triplets and their ranks, along with a novel triplet-rank mapping technique to enable efficient and stable encryption and decryption. Chai et al. [
59] proposed TPE-ADE, which integrates Huffman coding with reversible data hiding in JPEG images, enabling reversible encryption while improving visual utility and reducing data expansion. Furthermore, the PR3 method by Zhao et al. [
58] replaces the seven least significant bits in an image via summation data embedding, storing any overflow in the most significant bits and preserving thumbnail approximations while allowing additional information to be embedded.
Despite these advancements, research on differential privacy for facial images remains in the exploratory stage due to the complexity of visual data. Existing methods face challenges such as excessive distortion, high computational cost, and the delicate balance between privacy and utility. For example, pixelation with Laplacian noise [
60] can render images unrecognizable, while SVD-based noise injection may lead to impractical results. Feature-space approaches such as Eigenface perturbation (PEEP) offer lightweight protection but compromise recognition accuracy. Region-growing techniques with differential privacy effectively obscure sensitive areas but are computationally expensive. Frequency-domain approaches [
61] distort images by removing DC components and injecting noise. GAN-based methods [
62] perturb latent spaces but may suffer from unstable representations and information loss. The IdentityDP framework [
63] anonymizes facial features through identity disruption but may compromise background quality. These methods often suffer from high computational cost and complexity that impair real-time performance. Selective encryption reduces computational load by encrypting only critical image regions, yet heavily relies on key security and lacks flexibility in privacy level adjustments, complicating data usage and model training.
The main contributions of this work are as follows:
Unlike conventional image encryption methods that process the entire image uniformly—often wasting resources on redundant regions—our approach targets encryption specifically on facial regions. By integrating face detection technology, the proposed scheme focuses computational effort on sensitive areas, thereby improving both security and encryption efficiency while avoiding unnecessary overhead.
Traditional biological coding methods often suffer from limited adaptability due to fixed encoding schemes and static operational rules, making them susceptible to cryptanalytic attacks. To address this, the proposed algorithm employs dynamic RNA encoding combined with variable rule selection mechanisms, introducing greater randomness and complexity to enhance resistance against unauthorized decryption.
Many existing encryption frameworks lack structural robustness, particularly in scenarios involving known-plaintext or chosen-plaintext attacks. Drawing on insights from our previous cryptanalysis research, we identified structural vulnerabilities in static key designs and insufficient sensitivity to plaintext changes. To address these issues, our method integrates plaintext-dependent chaotic key generation. This correlation-based design not only enhances dynamic behavior but also significantly strengthens the system’s resistance to various cryptographic attacks.
The following section outlines the structure of the rest of this paper.
Section 2 introduces the chaotic system used, the MTCNN technique, and the RNA rules.
Section 3 introduces the encryption algorithm designed in this paper.
Section 4 presents experimental and simulation results. The last section concludes this paper.
2. Related Theories
2.1. Face Detection
Face detection is a critical prerequisite in facial recognition and encryption tasks, as the accurate extraction of facial regions significantly impacts subsequent processing steps. Early approaches such as the Viola–Jones algorithm, which is based on Haar-like features, offer fast detection speeds but suffer from limited accuracy under complex conditions. In recent years, the advancement of deep learning has led to the emergence of numerous face detection methods based on convolutional neural networks (CNNs), including Faster R-CNN [
64], RetinaFace [
65], and YOLO-Face [
66], each balancing detection speed and accuracy differently.
A representative method in this domain is the Multi-task Cascaded Convolutional Network (MTCNN) proposed by Zhang et al. [
67], which performs both face detection and facial landmark localization. MTCNN adopts a three-stage cascaded structure consisting of the Proposal Network (P-Net), the Refine Network (R-Net), and the Output Network (O-Net). These subnetworks progressively filter face candidates in a coarse-to-fine manner and regress the precise bounding box positions. In the final stage, the network also predicts five facial landmarks: the centers of both eyes, the nose tip, and the corners of the mouth. The primary rationale for selecting MTCNN lies in its multi-task cascaded architecture, online hard example mining, and lightweight optimization. These advantages enable it to comprehensively outperform traditional methods (e.g., Viola–Jones) and other deep learning models (e.g., single-task CNNs) in accuracy, speed, and robustness. Empirical results demonstrate state-of-the-art (SOTA) performance on challenging benchmarks such as FDDB, WIDER FACE, and AFLW, while meeting real-time requirements (>99 FPS). Consequently, MTCNN is an ideal solution for facial analysis tasks in unconstrained environments, such as surveillance systems or mobile devices.
The core of MTCNN lies in integrating face classification, bounding box regression, and landmark localization into a unified multi-task learning framework. During training, the model jointly optimizes the loss functions of the three subtasks. Face classification uses a cross-entropy loss, while both the bounding box and landmark regression tasks are trained using the Euclidean (L2) loss function. The bounding box regression loss is defined as
Here,
denotes the predicted bounding box offset by the network, while
t represents the corresponding ground-truth offset. The loss function for facial landmark localization follows a similar form and is given by
Here,
and
l denote the predicted and ground-truth facial landmark coordinates, respectively. MTCNN achieved state-of-the-art performance on several public benchmark datasets, including FDDB, WIDER FACE, and AFLW, while maintaining high computational efficiency (up to 99 fps on GPU). Owing to its lightweight architecture and well-integrated multi-task design, MTCNN has been widely adopted in various face-related tasks and has served as a valuable reference for the development of subsequent lightweight object detection frameworks.
Given that the facial recognition encryption task in this study relies on accurate face region extraction and landmark-assisted alignment, MTCNN provides a favorable balance between detection accuracy, model compactness, and landmark localization capability. Therefore, MTCNN is employed in this work as the face detection module in the preprocessing stage to ensure the applicability and stability of the proposed encryption scheme.
2.2. 4D-NDS Chaotic System
We used a novel non-degenerate four-dimensional discrete-time chaotic system (4D-NDS) [
25], and the process is as follows: First, a transformation matrix is derived, denoted as
M. Next, employing a uniformly bounded anti-control function
h together with a control matrix
Q, inverse control is applied to
M, expressed by the relation
where
represents the iterative sequence, while
and
are parameters associated with the anti-control mechanism. By performing pole assignment based on Equation (
3), one can compute the Lyapunov exponent
, indicating the resulting system achieves an
m-dimensional discrete hyperchaotic behavior.
Under this framework, the specific form of the four-dimensional discrete non-degenerate chaotic model is given as follows:
Accordingly, the complete discrete-time dynamic system can be written compactly as
Here, the parameters satisfy , and . The initial conditions are assigned as , , , and , with and .
2.3. RNA Coding and Operation Rules
Ribonucleic acid (RNA) is a vital biomolecule that plays a central role in the transmission and expression of genetic information within biological systems. Structurally, RNA is a single-stranded polymer composed of ribonucleotides connected via phosphodiester bonds. Each ribonucleotide includes one of four nitrogenous bases: adenine (A), guanine (G), uracil (U), and cytosine (C). These bases exhibit specific complementary pairing behavior—adenine (A) pairs with uracil (U), and cytosine (C) pairs with guanine (G)—forming the basis of information replication and biological coding.
This inherent base-pairing principle, often denoted as A–U and C–G complementarity, provides a natural framework for representing binary or symbolic data in computational systems. Inspired by the precision and stability of genetic coding mechanisms, researchers have introduced RNA-based models into the field of information security, particularly in image encryption. By mimicking the way genetic information is encoded and transformed in living organisms, RNA encoding schemes can introduce nonlinearity, redundancy, and dynamic mapping into cryptographic operations.
In the context of image encryption, RNA coding offers a flexible method for encoding pixel data into symbolic sequences, which can then be manipulated using biologically inspired transformation rules. These rules—typically defined based on the eight valid permutations of base pairings—are capable of achieving confusion and diffusion effects essential for secure encryption. The eight RNA coding rules, as summarized in
Table 1, provide a diverse set of mappings that can be dynamically selected based on chaotic sequences or secret keys, enhancing both unpredictability and resistance to cryptanalysis.
In addition, the RNA has six types of operations: addition, subtraction, addition–complement, subtraction–complement, exclusive OR (XOR) and exclusive NOR (XNOR), like the binary system. The specific operation rules are shown in
Table 2 and
Table 3.
3. Proposed Encryption Algorithm
The proposed image encryption algorithm consists of four modules: (1) key generation and processing, (2) block permutation encryption, (3) RNA encryption, and (4) bit diffusion encryption. Throughout, we denote the plaintext color image as
P of size
(height
H, width
W, three color channels), and the final ciphertext image as
C of the same dimensions. Each module uses pseudo-random sequences generated from a 4D chaotic system driven by the image contents. Below we describe each module in detail, defining all variables and operations. The overall diagram of the encryption method proposed in the paper is shown in
Figure 1.
3.1. Key Generation and Processing
The encryption keys are derived from the plaintext image
P itself using a chaotic system. First, compute the SHA-256 hash of
P (viewed as a byte array). Denote this hash (a 64-byte hexadecimal string) by
. Define the following integer keys:
where
denotes the
j-th byte (in decimal) of the hash (indexing from 1). The plaintext-dependent keys are then used to initialize a 4-dimensional discrete chaotic map.
Set the initial chaotic state vector
by
Let
and
. Define constant matrices as follows:
and compute
(so that
A is a fixed
matrix). The chaotic map iterates for
as
Here
A has rows indexed by the linear combinations in code. Iterate this system for
steps, then discard the first 5000 values to eliminate transients. Denote the remaining sequences by
Next, form two long 1D sequences by concatenating
X and
Y, and
Z and
W:
From these, extract six chaotic key sequences of lengths proportional to :
= the first elements of ;
= the next elements of ;
= the first elements of ;
= the next elements of ;
= the next elements of ;
= the elements through of .
Each is thus a 1D array; in later steps, we reshape them to match the image dimensions or color layers as needed. In particular, we will reshape into three arrays (for R, G, B) as explained below.
Finally, define a 4-element block-encryption key vector
by taking the first two entries of
and
:
These keys seed the random block-scrambling in the next module. In summary, all sequences and key vector K are derived from the plaintext P; they serve as pseudo-random keys for the subsequent encryption steps.
3.2. Block Permutation Encryption
The first encryption module permutes and substitutes the pixels of P under chaotic control. Denote the working plaintext matrix by (converted to double precision). We apply four operations in sequence: (a) spatial channel permutation, (b) chaotic single-plane transform, (c) block scramble, and (d) substitution. The result is an intermediate ciphertext of size . We describe each in turn.
3.2.1. Spatial Channel Permutation (SpatialTrans)
Reshape the sequence
into an
array by extracting three contiguous
slices: let
or pixel
. At each pixel,
,
,
form three chaotic values. Sort these three values to obtain a permutation
of
. Then the pixel’s color channels are permuted according to this ordering. Formally, if
sorted in ascending order corresponds to indices
, then set
In effect, the R, G, B values at are shuffled by a channel permutation determined by the chaotic sequence. Denote the output of this step as .
3.2.2. Chaotic Single-Plane Transform (ChaoticMagicTrans)
This step applies additional intra-channel position shifts. Use the chaotic sequence
(reshaped similarly as needed) to generate row-shift and column-shift vectors
. Specifically, let
,
,
,
,
, and
. Sort each vector to obtain permutation indices
. Then construct three
shift maps
by
for
. Finally, permute each color channel
of
by
and then rearrange rows cyclically by the rule
The nested loops in code implement these shifts. The output is an intermediate matrix of the same size. In summary, the single-plane transform mixes pixel positions within each color plane according to the chaotic maps derived from .
3.2.3. Block Scramble Encryption (BlockScram_EnAlgorithm)
Divide into non-overlapping blocks of size with (assuming H and W are multiples of 4; if not, pad to the next multiple of 4). Denote the number of blocks by . Using the key vector as random seeds, generate four pseudo-random integer sequences each of length equal to the number of blocks:
: random permutation of (seed );
: integers in (seed );
: bits in (seed );
: integers in (seed ).
Perform the following encryption on the blocks at block–row p and block–column q:
- (a)
Block Permutation
Rearrange all blocks according to . That is, swap block i with block for .
- (b)
Block Rotation/Flip
For each block index i (corresponding to block ), use to rotate or flip the block:
: do nothing (no rotation).
: rotate the block clockwise.
: rotate .
: rotate clockwise.
: flip horizontally (reflect across vertical center).
: flip vertically.
- (c)
Negative–Positive Transform
For each block index i, if , then XOR every pixel in the block with 1; if , then replace each pixel value x by . (This flips bits or inverts values.) The specific algorithm is shown in Algorithm 1.
- (d)
Color Channel Permutation
For each block index i, let and perform a channel permutation inside the block:
: do nothing (keep RGB order).
: permute RGB → RBG (swap G,B).
: permute RGB → GRB (swap R,G).
: permute RGB → BGR (swap R,B).
: permute RGB → BRG (swap (G,B) then (R,B)).
: permute RGB → GBR (swap (R,G) then (G,B)).
In code, each swap is performed by XOR swapping of the corresponding planes.
Upon completion of the aforementioned block operations, the transformed blocks are reassembled to generate the final permuted image, denoted as
.
Algorithm 1 Block scrambling procedure. |
- Require:
Intermediate image of size , block size , key K - Ensure:
Permuted image - 1:
Divide into non-overlapping blocks of size - 2:
Generate chaotic sequences , , , and using key K - 3:
for to do - 4:
Swap block i with block - 5:
end for - 6:
for each block index i do - 7:
Rotate or flip the block according to - 8:
if then - 9:
- 10:
else - 11:
- 12:
end if - 13:
Permute RGB channels of based on - 14:
end for - 15:
Reassemble all processed blocks to form
|
This yields an image heavily scrambled in blocks, orientations, and color channels.
3.2.4. Substitution (Value Diffusion)
Finally, we mix pixel values across the image. Let be the permuted image (still size ). We perform a forward scan over pixels (row by row, within each row left to right, and channel by channel) to compute the intermediate ciphertext . Let be the modulus (for 8-bit pixels). Using the reshaped chaotic matrix from (converted to integer mod F) as , apply the following:
Initial pixel in red channel: (Here is the blue value of the last pixel of .)
First pixel of other channels for :
First row, general column for , any channel k:
General case with :
In other words, each pixel of is the sum (mod 256) of the current plaintext value, the previously encrypted neighbor, and a chaotic offset. This introduces diffusion of values across the image and channels. The resulting matrix is the output of the block permutation encryption module.
The overall effect of
Section 2 is to transform
P into a scrambled, diffused intermediate ciphertext
using chaotic permutations and substitutions.
3.3. RNA Encryption
The second module applies a DNA/RNA-based substitution process on , inspired by genetic coding rules. Treat as three gray-scale images of size . For each channel, we perform the following operations using chaotic sequences :
3.3.1. Bit-Layer Conversion
Interpret each 8-bit pixel of () as two 4-bit halves: the high 4 bits and the low 4 bits. Stack the high halves of all pixels into a binary matrix of size (since 3 channels × 4 bits per channel), and similarly form from the low halves. (This step is conceptual; in implementation we simply operate on the 8-bit values bitwise).
3.3.2. Octal Synthesis
For the high-bit matrix , group each 3-bit slice into one octal digit. Concretely, for each channel separately, take 3 high-order bits (interpreted in base-2) and form an octal number in . Let denote the resulting matrix of octal values (one per pixel per channel).
3.3.3. Chaotic Encode-Key Generation
Flatten
to a vector of length
. From
and
, generate two integer arrays:
and reshape encode to an
matrix
E. These values
select one of eight RNA mapping rules (per pixel pair of bits).
Similarly, compute
and reshape into an
matrix
D. These are used for decoding later.
Also extract
N chaotic bytes from
to form an
key matrix (denoted
) by
Each value lies in .
3.3.4. Dynamic RNA Encoding
Define two DNA/RNA mapping operations: encodes an numeric matrix M into an character matrix of by mapping each pixel’s 8 bits into 4 letters using the rule for that column pair. Specifically, each 2-bit nibble of a pixel is converted to one letter according to a genetic code rule chosen by E.
Compute
where
is the matrix formed by the next
N bytes from
mod 256 (these serve as a DNA “key image”). Both
and
are
matrices of letters in
.
3.3.5. Chaotic Genetic Operations
Perform chaotic combination of
and
letterwise. For each position
, look up the operation code
. Then compute
Each of these operations takes two bases (A, C, G, U) and returns a base according to a fixed table (as given in the implementation code). The result is a new base matrix .
3.3.6. Dynamic RNA Decoding
Finally, convert back to bytes. Using the decoding rule for each group of 4 bases (each pixel), apply to reconstruct an numeric matrix. This yields the encrypted channel for the k-th color.
3.3.7. Final Output
Repeat the steps above for . Let be the image combining . This is the output of the RNA encryption module.
Each pixel’s bits are expanded into RNA bases, combined with a chaotic DNA key via one of six genetic operations, then collapsed back into pixel values. The chaotic sequences determine the encode/decode rules and operations, ensuring high sensitivity.
3.4. Bit Diffusion Encryption
After the RNA encryption, we obtain the intermediate cipher image . In this module, we introduce a multi-directional bit-level diffusion process using chaotic sequences to enhance the pixel correlation disruption and avalanche effect.
3.4.1. Pixel Rearrangement
Let
be of size
, composed of three channels:
,
, and
. First, reshape each color channel into a 1D vector of length
:
Generate a permutation sequence
P from the chaotic sequence
as
Then perform position permutation on the vectors:
3.4.2. Bit-Level Decomposition and Diffusion
Each 8-bit value in
is decomposed into individual bits:
where
extracts the
b-th bit (from MSB to LSB) of byte
x.
Form three binary matrices of size , with each row corresponding to one bit-plane.
Chaotic Key Mask Generation: Generate
chaotic bits from
, construct a bit-mask matrix
of size
:
for
,
.
Bitwise Diffusion: Apply circular bit-level XOR and chaotic-driven rotation:
For each bit-plane
b, perform
After XOR diffusion, rotate each row
b by
bits, where
3.4.3. Bit Recomposition and Output
Recombine the diffused bits back into 8-bit pixel values:
Apply inverse permutation
to restore image structure:
Finally, reshape
to
to get the final encrypted channel:
Let the final ciphertext image be
a three-channel RGB image of size
.
4. Experimental Results and Analysis
The experimental platform utilized was a MacBook Air (M4 chip) with MATLAB R2025a computational software installed. The device features an M4 processor with 10-core CPU and 10-core GPU, 32 GB unified memory, running on the macOS Sequoia 15.5 operating system. All experimental images presented in this study were sourced from the standardized USC-SIPI image database to ensure result reliability.
4.1. Statistics Histogram
Image encryption algorithms need to be adaptable to various application scenarios, ensuring that different types of images can be encrypted into unrecognizable cipher images. The original image can only be fully recovered with the correct key, and without it, no useful information about the original image can be obtained. In this paper, the encryption process is simulated using test images of different colors, and their pixel histograms are displayed in
Figure 2. Analyzing the histogram of an image provides valuable information. Furthermore,
Figure 3 illustrates the three-dimensional histograms of both the original image and the corresponding encrypted image. Observing these histograms highlights that the encrypted image exhibits an even distribution of pixels on the red, green, and blue planes. This demonstrates the algorithm’s effectiveness in encrypting natural images into high-performance cipher images.
4.2. Coefficient of Adjacent Pixels
In natural images, adjacent pixels typically exhibit high similarity in intensity values, resulting in strong correlations in horizontal, vertical, and diagonal directions. This inherent redundancy facilitates statistical analysis and potential cryptanalysis. An effective image encryption algorithm should significantly reduce or eliminate such correlations, thereby impeding any attempt to extract useful information from ciphertext images through statistical means.
The correlation coefficient between adjacent pixels is mathematically defined as
where
and
denote the intensity values of the
i-th pair of neighboring pixels in horizontal, vertical, diagonal, or anti-diagonal directions, and
N is the total number of such pixel pairs. Here,
represents the covariance between pixel intensities
x and
y,
and
denote their variances, and
and
are their respective expected values. The correlation coefficient
thus quantifies the degree of linear dependency between adjacent pixel pairs.
To visually demonstrate the decorrelation effect achieved by the proposed encryption algorithm, the scatter plots in
Figure 4 compare the pixel correlations of plaintext images with their corresponding ciphertext images across different directions. As shown, plaintext images exhibit tight clustering along the diagonal line, indicating strong correlations, whereas ciphertext images display random distributions, signifying that adjacent pixel correlations are effectively disrupted by the encryption process.
4.3. Information Entropy
Another key metric for evaluating the distribution of grayscale values in an image and measuring the randomness of image information is information entropy, which can be expressed as
where
L is the total number of symbols
and
denotes the probability of the symbols. The experimental results are shown in
Table 4. We can see that the experimental results are close to the ideal value of 8, so the proposed algorithm has good information entropy properties.
4.4. Differential Statistical Analysis
Differential statistical analysis is a critical metric for evaluating the resistance of image encryption algorithms against differential attacks. In particular, two widely adopted indicators, the Number of Pixel Change Rate (NPCR) and the Unified Average Changing Intensity (UACI), are used to quantify the sensitivity of the encryption algorithm to slight changes in the plaintext image. An ideal encryption algorithm should ensure that even a single-pixel modification in the plaintext image results in a significant and unpredictable change in the corresponding ciphertext image.
Mathematically, NPCR and UACI between two ciphertext images are defined as
where
denotes the size of the image, and
and
are the pixel intensities of the two ciphertext images generated from plaintext images that differ by only a single pixel. The binary function
is defined as
The ideal theoretical values of NPCR and UACI for 8-bit grayscale or color images are approximately and , respectively. Values approaching these theoretical thresholds indicate high sensitivity of the encryption algorithm to plaintext changes, which is crucial for resisting differential attacks.
In this study, NPCR and UACI were computed for a series of standard test images from various open-source datasets. For each image, a single pixel in the plaintext was modified, and the corresponding encrypted images were compared across the red (R), green (G), and blue (B) channels. The experimental results are summarized in
Table 5.
The data reveal that the proposed algorithm consistently achieves NPCR values around and UACI values close to across different images and resolutions. These results demonstrate that the algorithm exhibits excellent diffusion properties, ensuring that even minimal changes in the plaintext propagate extensively throughout the ciphertext. Consequently, the proposed encryption scheme demonstrates strong resistance to differential attacks.
4.5. Image Quality Analysis
To further evaluate the visual quality and structural integrity of decrypted images, the mean squared error (MSE), peak signal-to-noise ratio (PSNR), and structural similarity index (SSIM) were employed.
The MSE is defined as
where
I and
denote the original and decrypted images of size
, respectively. A lower MSE indicates higher similarity between
I and
.
The PSNR, derived from MSE, is given by
where
is the maximum possible pixel value (255 for 8-bit images). A higher PSNR reflects better reconstruction quality and reduced noise distortion.
The SSIM metric measures perceptual similarity and is defined as
where
and
are the mean intensities of images
x and
y,
and
are their standard deviations,
is the covariance, and
and
are small constants to stabilize the division.
As presented in
Table 6, the average MSE values of the decrypted images range from
to
, yielding corresponding PSNR values within 25.59–34.75 dB. Images such as
4.1.02 achieve the highest PSNR (
dB), demonstrating accurate reconstruction with minimal distortion.
Meanwhile,
Table 7 shows that the SSIM values of encrypted images remain close to zero (below
), indicating a high degree of structural decorrelation between the encrypted and plaintext images, which effectively conceals perceptual content during encryption.
These results confirm that the proposed scheme ensures robust encryption security by suppressing structural information in ciphertexts, while maintaining high-quality image recovery upon decryption.
4.6. Sensitivity Analysis
In this section, the performance metrics of the algorithm are analyzed in terms of both key and plaintext sensitivity, respectively. The security algorithm should be highly sensitive, which means that if there is a slight change in the key or plaintext image information during encryption or decryption, it will have a huge impact on the result of the subsequent encryption.
4.6.1. Analysis of Sensitivity to the Key
Key sensitivity is evaluated by examining the ciphertexts produced when encrypting the same plaintext image with two keys that differ only by a minute perturbation. Specifically, the plaintext image is encrypted using the original key, denoted as
, and then re-encrypted using perturbed keys defined as
,
,
,
,
, and
, respectively. The differences between the resulting ciphertexts are quantified using the NPCR and UACI metrics, which are defined in Equation (
42).
As shown in
Table 8, even a minimal perturbation of the encryption key results in ciphertext images with significant differences, as reflected by NPCR and UACI values approaching their theoretical ideals of
and
, respectively. These findings demonstrate that the proposed encryption algorithm exhibits a high sensitivity to slight key variations, thereby ensuring strong resistance against differential attacks.
4.6.2. Analysis of Plaintext Sensitivity
Plaintext sensitivity refers to the extent of variation in the ciphertext resulting from minor changes in the plaintext. An encryption algorithm with poor plaintext sensitivity may be vulnerable to plaintext attacks, as adversaries could exploit the relationships between plaintext–ciphertext pairs to infer the encryption mechanism. Therefore, strong plaintext sensitivity is essential for resisting such attacks.
In this section, we evaluate the plaintext sensitivity of the proposed algorithm by modifying the pixel value of the plaintext image by 1 at four distinct positions:
,
,
, and
. The resulting ciphertexts are then compared with the original ciphertext using NPCR and UACI metrics. The results, summarized in
Table 9, demonstrate that even a single-pixel modification yields NPCR values approaching the ideal
and UACI values near the theoretical
.
These results indicate that a slight alteration in plaintext leads to substantial differences in the corresponding ciphertext, effectively preventing adversaries from exploiting plaintext–ciphertext correlations. Consequently, the proposed encryption scheme exhibits strong plaintext sensitivity and is robust against plaintext attacks.
4.7. Analysis of the Robustness
4.7.1. Cropping Attack
Cropping attacks are a common type of attack during information transmission, inevitably leading to the loss of ciphertext data. Therefore, the robustness of encryption algorithms is crucial to their security. By artificially cropping ciphertext into different sizes and positions and then decrypting it, we can determine the preservation of image information after decryption. The decryption results are shown in
Figure 5.
4.7.2. Salt and Pepper Noise
Salt and pepper noise effectively simulates noise interference and attacks during communications and is also effective in verifying the robustness of encryption algorithms. By adding varying degrees of salt and pepper noise to ciphertext and then decrypting it, we can determine the information retained in the decrypted image. The results are shown in
Figure 6. These two experiments demonstrate the algorithm’s excellent robustness, retaining key image information even after experiencing data loss.
5. Conclusions
This paper proposes an innovative facial image encryption scheme designed to address the increasingly serious issue of image privacy leakage in the era of big data and artificial intelligence, particularly the increased security risks posed by images containing sensitive information, such as faces. The scheme integrates facial detection technology with a multi-level encryption framework, employing a multi-task convolutional neural network (MTCNN) to precisely extract facial regions, ensuring processing efficiency and positioning accuracy. Furthermore, a layered encryption process is constructed, including chaotic systems based on nonlinear dynamics (e.g., the 4D-NDS model), lightweight block permutation, dynamic RNA cryptography encoding (leveraging eight RNA rules and operations), and a bit diffusion mechanism to increase data complexity and unpredictability. Experimental validation demonstrates that the proposed scheme excels across multiple dimensions. In terms of security, it significantly reduces the correlation between adjacent pixels (nearly zero), achieves near-ideal information entropy (approximately 8.0), and demonstrates strong resistance to known-plaintext and chosen-plaintext attacks, demonstrated by high NPCR (approximately 99.6%) and UACI (approximately 33.46%) values. In terms of efficiency, the algorithm optimizes computational load, focusing on encryption of sensitive areas, reducing redundant processing overhead, and achieving high PSNR (e.g., 34.75 dB) in image reconstruction quality, ensuring practicality and robustness. Furthermore, the introduction of a plaintext-dependent key generation mechanism (based on dynamic feedback from image content) further enhances the system’s attack resistance. Overall, this research not only provides a solution that balances security and efficiency, suitable for scenarios such as real-time surveillance and cloud biometrics, but also lays a technical foundation for image privacy protection in intelligent systems. Future applications could be extended to high-resolution video or resource-constrained devices, broadening its scope.