1. Introduction
Security has become a matter of utmost significance because of the enormous advancements and complexity observed in today’s wireless communication networks and big data applications [
1,
2,
3]. Thus, utilizing cryptography [
4,
5], steganography [
6,
7], watermarking [
8] and their combinations [
9,
10] to protect data has become essential for ensuring the secure functioning and use of millions of online services. For many years, well-established cryptosystems were put in use, for virtually every type of data. Those included the Data Encryption System (DES) [
11], its variant, the Triple DES [
12], as well as the Advanced Encryption Standard (AES) [
13,
14]. However, it quickly became clear that not all cryptosystems are well-suited for application on multimedia such as 2D and 3D images and videos. This is because images and videos have vast amounts of data, redundancy, as well as strong cross-correlation among adjacent pixels. To this end, the literature on developing secure, robust image cryptosystems has been on the rise in recent years. Algorithms that carry out image encryption are based on mathematical operations that are derived from, or related to, chaos theory [
15,
16,
17,
18], electric circuits [
19,
20], DNA encoding [
21,
22,
23,
24,
25], and cellular automata [
26], to name a few. The following paragraphs highlight the significance of DNA cryptography and chaos theory in security applications, as well as their use in cutting-edge image encryption algorithms. Substitution boxes (S-boxes), a powerful component for introducing confusion into a cryptosystem, are discussed next.
DNA cryptography makes use of both biological and computational properties to offer more confidentiality over classical cryptographic algorithms while encrypting data [
24]. Traditional cryptosystems often only provide one layer of protection, and it is possible that their secrecy is compromised as the underlying computational techniques are made public. On the contrary, DNA cryptography utilizes the self-assembling characteristics of DNA bases in combination with a cryptographic approach to provide many security measures that enhance the amount of data confidentiality [
25]. For example, the authors of [
22] convert ciphertext to a genomic form using amino acid tables. The tables’ protein sequence composition add to the ciphertext’s level of ambiguity. In [
23], the authors propose a DNA encoding algorithm that is built on a unique string matrix data structure producing distinctive DNA sequences. They employ these sequences to encode plaintext as DNA sequences. While DNA cryptography has garnered the interest of scientists and engineers in recent years, it definitely has not gained as much attention as that dedicated to chaotic and dynamical systems use in image cryptosystems.
The intrinsic properties of chaotic functions as a random phenomenon in nonlinear systems are favourable for cryptography [
27]. Specifically, their sheer sensitivity to initial conditions, control parameters, periodicity, pseudo-randomness, and ergodicity. These properties are incorporated into the design of image encryption algorithms. Broadly, such schemes are divided into two classes: (a) One-dimensional (1D) and (b) Multi-dimensional (MD). The utilization of 1D chaotic maps provides for simpler and more efficient software and hardware implementations. However, this also translates into less desirable characteristics, in terms of shorter chaotic periods, non-uniform distribution of their chaotic output, as well as a greater susceptibility to cryptanalysis. On the contrary, the utilization of MD chaotic maps in image encryption algorithms provides stronger security levels at the expense of increased complexity and, consequently, longer running times for software and hardware implementations. Extensive literature exists on the use of 1D and MD chaotic functions in image cryptosystems. The authors of [
4], for instance, propose an image encryption algorithm that is based on a combination of encryption keys designed using the Arnold cat map, the 2D logistic sine map, the linear congruential generator, the Bernoulli map and the tent map. In a similar manner, the authors of [
17] also employ multiple chaotic maps, however, in their implementation, they aim at reaching a minimum number of encryption rounds while maintaining a high degree of security and robustness. In [
28], the authors employ a finite field aiming to generalise the logistic map and search for an auto morphic mapping between two logistic maps in order to compute parameters over the finite field
${Z}_{N}$. Shannon’s ideas are fully put into use in the work of [
18], where an LA-semi group is applied for confusion, and a chaotic continuous system is adopted for diffusion. The authors of [
29] present an interesting work that employs a zigzag transform in a conjoint manner with a dynamic arrangement that alternates in a bidirectional crossover approach to image encryption. In their proposed cryptosystem, both the logistic map and a hyperchaotic Chen dynamical system are utilized. In actuality, this paragraph merely touches upon the use of chaos theory in color image cryptosystems. Recent writing on the subject is voluminous. The following paragraph focuses on a distinct but vital component of numerous image encryption algorithms: substitution boxes.
A crucial part of contemporary block cryptosystems is an S-box. It makes it easier to convert any given plaintext into ciphertext. The confusion property is provided by the straightforward addition of an S-box to a cryptosystem, which results in a non-linear translation between the input and output data [
30]. An S-box provides better security the more uncertainty there is. For many block encryption techniques, the level of security offered by using one or more S-boxes closely correlates with how resistant they are against assaults. While such algorithms may have numerous stages, an S-box is typically the only non-linear stage that improves the security of sensitive data [
31]. To be acceptable for real-time data encryption, the design of an S-box should be efficient and low in complexity. Recent literature provides multiple instances of design and utilization of S-boxes in image encryption algorithms. For example, the authors of [
15] proposed an S-box utilising a third-order nonlinear digital filter. Its non-linearity was enhanced using a novel optimisation approach. In [
31], the authors developed an optimization algorithm for a chaos-based entropy source to generate an S-box. Multiple stage image encryption schemes where an S-box is a core stage are rather popular among scientists and engineers, since such a combination satisfies Shannon’s ideas of confusion and diffusion. Furthermore, employing more than one encryption stage provides security against known plaintext attacks [
32]. The authors of [
16] propose one such example of a 3-stage image encryption algorithm, where in one of the stages they utilize the S-box proposed in [
33]. This S-box is based on a modular approach and is thus, highly nonlinear. In the other two stages, a Lucas sequence and a Sine logistic map are made use of to generate encryption keys. Another 3-stage algorithm is proposed in [
34], where an S-box is also utilized as a core stage, sandwiched between the application of two encryption keys. The first key is a Mersenne Twister based PRNG, while the second key is a tan variation of the logistic map. In [
35], the authors follow a similar approach, generating a PRNG-based S-box using Wolfram Mathematica
^{®}, and utilizing the Rossler system and the Recaman’s sequence, for each of the encryption keys.
Combining DNA cryptography with chaotic functions and S-boxes in image encryption algorithms gained the attention of many researchers in the field as an attempt to achieve better performance [
36], either in terms of improved security or lower computational complexity and thus ever decreasing encryption and decryption times. The work presented in [
37] introduced a cryptosystem for color images based on a combination of chaotic maps and DNA sequences. The reported theoretical and statistical analyses reflected the robustness of combining DNA with chaotic maps against statistical and brute force attacks [
37]. In [
38], a 2D Henon-Sine map and DNA coding was proposed. Exclusive-OR (XOR) and DNA random coding encryption operations were synthesized using an S-box for image diffusion, while image scrambling was carried out through swap operations on the pixels of the image. The work presented in [
38] highlighted the merits of generalizing DNA encryption with S-box substitution in image encryption techniques. In [
21], the authors proposed an image cryptosystem that utilizes parallel compressive sensing, chaos theory and DNA encoding. The authors of [
39] provide an interesting work for grayscale image encryption that combines a 4D chaotic system with DNA encoding, the hash function SHA-2 and the random movement of a chess piece (Castle), through an iterative process. The authors of [
40] not only combine the use of DNA encoding, SHA-512 hashing and multiple hyperchaotic maps, but also utilize a novel variation of a chaotic map, a logistic-tan map, as well as a pixel-shifting algorithm that is based on the Zaslavskii map.
The contributions of this paper are as follows:
We propose a 3-stage image encryption scheme that makes use of Shannon’s ideas of confusion and diffusion. In the first stage, DNA coding is employed, providing diffusion at the bit level. In the second stage, an S-box based on the numerical solution of the Lorenz differential equations and a linear descent algorithm is developed and used for confusion at the pixel level. In the third stage, the logistic map is utilized to produce an encryption key, providing diffusion at the bit level. The concatenation of these three stages guarantees output encrypted images to be completely asymmetric to their original (plain) counterparts.
We propose an efficient and fast encryption scheme, with images of dimensions $128\times 128$ encrypted in only 0.377145 s, achieving an average encryption rate of $1.015$ Mbps.
We propose a multi-stage image encryption scheme. Using more than one encryption stage provides security against known plaintext attacks.
We propose an image encryption scheme that possesses a large key space of ${2}^{372}$, and is effectively resistant to brute force attacks.
We utilize both conventional (information entropy, pixel cross-correlation, MSE, PSNR, MAE, NPCR, UACI and NIST SP 800 suite) and unconventional performance evaluation metrics such as the Fourier transform and advanced bit dependency metrics to gauge the security and robustness of the proposed cryptosystem.
This paper is organized as follows.
Section 2 outlines the mathematical background and describes the proposed image cryptosystem.
Section 3 presents the computed numerical results and carries out a comparative analysis with counterpart cryptosystems from the literature. Finally,
Section 4 draws the conclusions and suggests possible future research directions.
3. Numerical Results and Performance Evaluation
An encryption algorithm’s performance is evaluated by how well it can withstand various visual, statistical, entropy, differential and brute-force attacks. In this section, the suggested image encryption algorithm’s numerical findings are presented and discussed along with a comparison to counterpart algorithms from the literature. The various analyses were run on Wolfram Mathematica^{®} v.13.1. The utilized computer had the following specifications: 2.9 GHz Intel^{®} Core^{TM} i9, 32 GB. For the sake of these tests, values used as keys for the experimental encryption process are assigned as follows: $\sigma =10,\beta =8/3,\rho =28$ and ${X}_{0}=0.5$. Multiple images that are commonly used in image processing applications and experimentation were utilized, all of dimensions of $256\times 256$, unless otherwise stated. The performed tests are:
The National Institute of Standards and Technology Analysis (
Section 3.9).
3.1. Visual and Histogram Analysis
In
Figure 8,
Figure 9,
Figure 10,
Figure 11 and
Figure 12 (including sub-figures), a number of input plain images, their encrypted forms along with their respective histograms, are shown. It is evident that the encrypted images’ pixels are asymmetric. This results in the encrypted images being distorted to a high level, to the extent that all visual features in an input (plain) image being totally absent from the encrypted one.
Moreover, histograms of the encrypted images are demonstrated as well. As the histogram of an image displays the frequency distribution of the pixels, the histogram of an encrypted image must be homogeneous to have a reliable encryption method. The core reason behind that is that a uniform histogram distribution reveals that each of the image’s gray levels has a probability that is essentially equivalent. Therefore, the image will be more robust to statistical attacks as a result.
3.2. Mean Squared Error
The Mean Squared Error (MSE) is one of the most common tools used in evaluating the similarity between two sets of numbers (in the most general form). As a variant of the Sum of Squared Differences (SSD), the same properties are inherited. For further elaboration, given two sets
S and
${S}^{\prime}$ of same size
n, the SSD is calculated as follows:
In light of this representation, there are three main operations performed: subtraction, squaring and summing. Subtraction is mandatory as the operation desired revolves around detecting differences. Consequently, subtraction, as a mathematical operation, produces two results: direction and magnitude. As global difference (as over the whole set) is required, differences in opposing directions are to be added up (instead of cancelling each other out). Performing that, the importance of directional differences among corresponding individual elements in both lists is neglected. Mathematically, either absolute values (retaining the magnitude) or squared values (amplifying the magnitude) are used in order to remove the polarity, which is the main difference between the Sum of Absolute Differences (SAD) and the Sum of Squared Differences (SSD). Finally, summation allows all individual elements in the list to contribute (equally) to the final result, achieving a calculation of the global perspective.
In case of the MSE, the mathematical representation is modified to the following:
Given two images I and ${I}^{\prime}$ of dimensions $M\times N$, as images rectangular areas, two-dimensional summing is required. (Each dimension of the summation is starting at 0 ($i=0$ and $j=0$), and ending by $M-1$ and $N-1$ assuming the image (as a 2D array) is zero-indexed.) What can be perceived as a considerable alteration to the SSD is the division by the image dimensions. Such a step is performed in order to facilitate the comparison between MSE values in which image pairs’ dimensions are different. For example, ${I}_{1}$ and ${I}_{1}^{\prime}$ are both of size $256\times 256$ with Mean Squared Error $MS{E}_{1}$ (both images must be of the same size). Comparing $MS{E}_{1}$ to $MS{E}_{2}$ which is calculated for ${I}_{2}$ and ${I}_{2}^{\prime}$ of dimensions $512\times 512$ is meaningful as both values are normalized with respect to images’ scales. This entails that MSE can be preserved as the average error (in terms of SSD) per pixel, given two images.
In the scope of this work, MSE is evaluated for input images and their encrypted counterparts. In such a case, the ideal value for a well-performing encryption technique is expected to be high. In other words, as the target of encryption is to distort the visual attributes of images, the similarity should be minimal, resulting in a maximal error factor.
Table 4 shows the computed MSE values for various input image examples, alongside showcasing how these values stand in comparison to other encryption techniques in the literature, demonstrating comparable results.
It is common practice to report MSE and Peak Signal to Noise Ratio (PSNR) values together when analyzing image encryption algorithms. This is because the computation of PSNR is based on the value of MSE. However, the authors of [
39,
40] only provide PSNR values in their respective works, with no mention of MSE values. This explains why
Table 4 displays columns of N/A under the headings of [
39,
40].
3.3. Peak Signal to Noise Ratio
Based on the MSE discussed in
Section 3.2, PSNR, given a signal, aims at relating the error margin (represented by MSE), with respect to the peak value in the signal. In the scope of this work, the peak signal value is evaluated as the maximum pixel intensity in a given image. Accordingly, given image
I, PSNR is equated as:
such that
${I}_{max}$ is the maximum pixel intensity in
I. Due to the fact that MSE is SSD based (which indicates that MSE is calculated to a squared order of magnitude),
${I}_{max}$ is necessarily squared.
As shown in (
19), PSNR in inversely proportional to MSE. Such mathematical representation steers the preference of the PSNR to be the inverse of the preference of MSE, hence a minimal value is more ideal.
Table 5 presents the calculated values for the proposed cryptosystem as well as those of counterpart algorithms from the literature. In terms of MSE and PSNR, the proposed cryptosystem is shown to be superior to [
16,
18,
26,
39,
40], but inferior to [
17].
3.4. Mean Absolute Error
Building on the argument presented in
Section 3.2, an alternate technique to Sum of Squared Differences, SSD, would be Sum of Absolute Differences, SAD. In such alternative, the task of eliminating the polarity of the per-pixel error is performed by the absolute operation instead of the square one. Therefore, parallel to the SSD equation (Equation (
17)), SAD is equated as:
As discussed before in
Section 3.2, the role of squaring, beside eliminating polarity, is to amplify the magnitude. On the other hand, using the absolute instead maintains the linearity of the behaviour of the error distribution among pixels, which is accordingly maintained in the global perspective of the whole image.
Upon such variation in the core function, MAE is represented mathematically as:
for two images
I and
${I}^{\prime}$. As per this work (similar to the MSE scenario), MAE is evaluated for input images and their encrypted images. Moreover, for a well-performing encryption technique, MAE value is preferred to be maximal.
Table 6 presents the numerical results of performing the MAE test on three images (Lena, Peppers, and Mandrill) in comparison to counterpart algorithms from the literature. As the numerical results demonstrate, the proposed algorithm fares comparably to them.
3.5. Information Entropy
In the domain of gray-scale images, information entropy is employed to measure the randomness of the distribution of gray pixel values of an image. According to Shannon’s theory, information entropy is calculated as:
where
$p\left({m}_{i}\right)$ refers to the probability of occurrence of symbol
m, while
M represents the total number of bits for each symbol. With respect to images, as a gray scale image has 256 different values
$[0-255]$, which are
${2}^{8}$ different possible permutations, the entropy value of an encrypted image at greatest evaluations approaches 8. Therefore, information entropy can be used to evaluate the degree of randomness of encrypted images. Entropy values of the proposed algorithm for the the images displayed and tested in this paper, along with counterpart algorithms from the literature are displayed in
Table 7. The computed entropy values for the various images are very close to the ideal value of 8, which means that the proposed algorithm is resistant to entropy attacks. Furthermore, the differences across entropy values of the various cryptosystems is shown to be diminutive.
3.6. Fourier Transformation Analysis
In order to showcase the co-relation among pixels before and after encryption, the application of the Fourier transformation (for both images) is utilized, more accurately, for Discrete Fourier Transform (DFT). The main aim is, in the frequency domain, visual features such as edges and regions (which are not easily definable in the spatial domain) separate into different frequency ranges. This facilitates visual analysis and comparisons of images. Such separation takes place as a result of the interaction between the pixels in the spatial image with the increasing frequencies of the sine and cosine waves. Aiming towards transforming a spatial domain image into the frequency domain, the following Fourier transformation equation is used:
such that
$f(a,b)$ is the image representation in the spatial domain, with the exponential term being the basis function that corresponds to each point
$F(k,l)$ in the Fourier space.
Interpreting a Fourier transformation of an image (presented as 2D data), two main regions of high relevance are to be looked at [
46]. The first region is the middle area of the 2D grid of the Fourier transformed image. The main significance of this area is that it represents the amount of pixels with high similarity in values on the pixel level. Therefore, if a Fourier transformation image is generated out of an image with large regions (flat, same color areas), it is expected for the middle area to contain high values.
The second region is the center row, column and alongside diagonals of the Fourier transformed image. This is due to their representation of vertical, horizontal, and diagonal edges in the input image, respectively. (The Fourier image is considered to be a transposed matrix of the input image.) Thus, if the input image includes only vertical edges, the Fourier transformed image is expected to have a bright middle row, and vice versa. The rest of the Fourier transformed image represents the other features existing in the input image.
Conclusively, for a naturally looking image with wide regions and profound edges (Peppers in this example), the resulting Fourier transformed image contains a bright plus sign at its center, as observed in
Figure 13b. On the other hand, on a distorted (encrypted) image, an equal description of values in the Fourier transformed image is expected due to the lack of profound regions or edges, as shown in
Figure 13e.
3.7. Correlation Coefficient Analysis
In this evaluation method, the consistency of a single image is evaluated. The aim of such evaluation method is to assess (or provide an estimated coefficient for) the cohesion of near-proximity pixels. In other words, correlation coefficient analysis, in the domain of images, aims at calculating the percentage of the uniform regions with respect to the edge transitions. Hence, in a normal image case, a relatively high correlation coefficient value is expected, as it consists more of regions than edges (in terms of pixel count). On the other hand, as high distortion is aimed for in encrypted images, a smaller correlation coefficient is anticipated for it.
For mathematical evaluation, the correlation coefficient is calculated through the following equations:
where:
Starting by the mean average of each distribution as in (
27), two distribution behaviours are measured. The first behaviour is the dispersion (using (
26)), which represents the uncertainty of the distribution. The second behaviour is the covariance (using (
25)), which evaluates the linear direction similarity. Combining both mathematical phenomenons results in reaching an evaluation for the correlation coefficient, represented in (
24).
As previously mentioned, for a normal image (the input), a high correlation value is expected. On the other hand, a highly distorted image would result in having a low correlation coefficient.
Table 8 demonstrates performing the correlation coefficient analysis on three images (Lena, Peppers and Mandrill), for both images, input and encrypted. Moreover, as the covariance is a directional relation, the three main directions are calculated, which are horizontal, vertical and diagonal. As shown by the numerical results, the input image showed a value approaching 1 in all cases, while the encrypted showed a value approaching 0.
Table 9 presents the comparison between the proposed approach and counterpart algorithms from the literature, which showcases nearly similar results. Moreover,
Table 10 and
Table 11 show the results of numerical comparison among the proposed algorithm and its counterparts from the literature, focusing on the color channels separately, with respect to the three directions for the images Lena and Mandrill, respectively.
Alongside the numerical analysis provided by (
24), directional covariance can be visualized by the plotting of the co-occurrence matrix. In case of images with natural visual aspects (more pixels representing homogeneous regions than transitional edges), values of high similarity tends to co-exist with a higher probability, resulting in a mostly linear distribution of magnitudes within the matrix. Oppositely, in a highly distorted (encrypted) image, a more equal distribution of values is expected to take place instead. For demonstration, this is carried out in
Figure 13, where
Figure 13c is for the plain image, while
Figure 13f is for its encrypted version. It is clear that the 3D plot for the plain image is diagonal in nature, unlike that of the encrypted image, which resembles a mountain in 3D space, as expected for an encrypted image, signifying random pixel locations. For further demonstration, focusing on the Papers image,
Figure 14,
Figure 15 and
Figure 16 show 3D plots of the co-occurrence matrices for the red, green and blue color channels, respectively. As demonstrated, pixel correlations are fully distorted on each color level individually.
3.8. Differential Attack Analysis
In this section, the quality of image cryptosystems is judged on the grounds of the direct difference between the input and the encrypted images. In other words, the input image is directly compared with the encrypted on pixel by pixel, or mean average bases. Such evaluation is performed in order to calculate a numerical percentage showcasing the difference in color intensities (per pixel, or as a mean average) taking place as a result of the encryption process. According to the fact that the lack of similarity among corresponding pixels within both images is encouraged, such pixel by pixel evaluation is necessarily performed. Moreover, a more global perspective of the cumulative pixels change rates among images (presented in the mean averages) is evaluated, which denotes the existence of general color intensity similarity among these images. In the literature, two tests are most commonly performed to fulfill these test aspects: NPCR for pixel by pixel comparison, and UACI for the mean average difference evaluation.
Number of pixels changing rate (NPCR) represents the percentage evaluation of the amount of changed pixels. The difference between pixels is performed with a strict equality perspective. Given two images
${I}_{1}$ and
${I}_{2}$ (of dimensions
$M\times N$), the difference per pixel
$D(x,y)$ (where
x and
y are the coordinates of the pixel) is calculated as:
Accordingly, NPCR is equated as:
As per this representation, a higher percentage denotes larger difference between the two images. As a large difference is desired, in the literature, $99\%$ is the target NPCR score for a good encryption technique.
In another perspective, the unified average change intensity (UACI) evaluates the difference between two images in terms of the mean averages. Mathematically, UACI is equated as:
In the literature, a target percentage of more than $33\%$ denotes a strong encryption technique. (With respect to the color range $[0,255]$, $33\%$ is approximated to 85 steps of difference in intensity.)
Table 12 shows the result of performing NPCR and UACI, comparing the input and the encrypted image generated by the proposed algorithm, using various images as inputs. (As discussed before, the NPCR is greater than
$99.6\%$ and the UACI should also be greater than
$33\%$.) For these tests, the proposed algorithm is also compared with its counterparts from the literature, as shown in
Table 13 and
Table 14. As shown, the computed NPCR value of the proposed algorithm is >
$99\%$ in all cases. On the other hand, the UACI value did not meet the optimal value. However, the exception to this case is the Girl image, which resulted in
$32.1283$,
$36.4526$, and
$37.4855$ for color channels red, green and blue, respectively, alongside
$35.3554$ as an overall.
3.9. The National Institute of Standards and Technology Analysis
The National Institute of Standards and Technology (NIST) SP 800 analysis is a set of statistical tests which ensure the necessary cryptographic properties of the random number sequences are met. The encrypted images’ equivalent binary stream are run through the NIST analysis. To ensure resilience to cryptographic attacks, the results ought to surpass a
p-value of
$0.01$.
Table 15 shows the outcome of running the analysis. The results reflect the cryptographic robustness of the proposed scheme, with all the tests’ outcomes larger than
$0.01$. Hence, we can safely conclude the validity of our proposed cryptosystem.
3.10. Key Space Analysis
Key space analysis is calculated as the Cartesian product of the domains of the key values involved in the encryption procedure. Such a step is carried out in order to compute the amount of unique keys that can be utilized in the encryption procedure, which accordingly results in creating various possible encryption instances for the same input image. In the proposed image encryption algorithm, there is a total of seven variables involved in a single encryption procedure. The first two variables are used (in the first stage
Section 2.1) for the generation of the bit-stream using the tan variation of the logistic map, which is used in the DNA encoding, namely
$\alpha $ and
${X}_{0}$. In the second stage (
Section 2.2), a Lorenz system is used for the S-box generation, which demands three variables (
$\sigma $,
$\rho $ and
$\beta $). Finally, in the last stage discussed in
Section 2.3, two keys are used, which are
r and
${t}_{r}\left(0\right)$. As the largest machine precision is
${10}^{-16}$, the key space is about
${10}^{7\times 16}={10}^{112}\approx {2}^{372}$, which exceeds the threshold earlier proposed in [
61] as
${2}^{100}$. This means that our proposed scheme can resist brute–force attacks. Furthermore, an examination of key space values of related image encryption schemes from the literature, as in
Table 16 indicates that the proposed scheme utilizes a comparably much larger key space than most of its counterparts. The only exception here being the work of [
39], in which one of the encryption stages relies on the random movement of a chess piece (Castle), which results in a very large key space.
3.11. Histogram Dependency Tests
According to the aim of image encryption (distorting image details in a reversible manner), all forms of correlation between the plain image and its encrypted version is to be absent. In such a test scenario, the images (plain and encrypted) are evaluated on the histogram level. Moreover, these tests are performed on the image as one unit, as well as on the color channels separately. Therefore, given two histograms of images, the comparisons performed aims at evaluating the level of the linear dependency between them. As any dependency (as a form of correlation) test evaluates the level of association between two variables [
67], in a well performing encryption technique, the dependency level should be as low as possible. Accordingly, calculating the dependency coefficient as a value in the range
$[-1,1]$, it is desirable to be as close to 0 as possible (where 1 means strong dependency, and −1 means strong inverse dependency). In other words, given two distributions, dependency between them is evaluated as the alignment of one with respect to the other, which is either both are following the same linearity (evaluating to 1), or following perpendicular linearity (evaluating to
$-1$), or there is no linearity (evaluating to 0). As mentioned, in the context of this work, the pair distributions of variables to evaluate are the histograms of both the input and the encrypted images. Out of many dependency evaluation techniques, in this work, five tests are performed, namely: Blomqvist
$\beta $, Goodman-Kruskal
$\gamma $, Kendall
$\tau $, Spearman
$\rho $, and Pearson correlation
r [
68].
As a medial correlation coefficient, Blomqvist evaluates correlation between two distributions of variables
X and
Y, with their medians
$\overline{x}$ and
$\overline{y}$, respectively, as per the following Equation:
Considering the median as a reference point, pairs of elements across the two distributions of variables are either on the same side of the median (creating a linear correlation), or not (breaking the linear correlation).
Based on the relative order of succeeding elements in the two distributions of variables, the Goodman-Kruskal measure of monotonic association is used in a pairwise manner. Transforming the two histograms into one set of pairs in a 1 to 1 formation (containing pairs of the form
$({H}_{1}\left(n\right),{H}_{2}\left(n\right))$), comparing two pairs (
$({H}_{1}\left(i\right),{H}_{2}\left(i\right))$ and
$({H}_{1}\left(j\right),{H}_{2}\left(j\right))$ for example), they are either aiding the linear correlation (concordant pairs) or breaking it (discordant pairs). Counting both concordant pairs and discordant pairs provides two counts,
${n}_{c}$ and
${n}_{d}$, respectively. Given these two counts, Goodman-Kruskal correlation is equated as:
Based on the same concept of concordant pairs and discordant pairs, Kendall evaluates correlation with respect to the sample size
n, equating
$\tau $ as follows:
For a rank-based correlation test, Spearman rank correlation test relates the rank of an element (its position if the list was sorted), with respect to the mean rank value. Spearman rank correlation is equated as:
where
x and
y are the two evaluated variables,
${R}_{il}$ is the rank of element
i in list
l, and
${\overline{R}}_{l}$ is the mean of ranks of
l.
Finally, Pearson correlation, as the most popular and straightforward correlation technique, simply relates elements in the distributions directly to their mean averages. Pearson correlation is equated as:
where
$\overline{X}$ and
$\overline{Y}$ are the means of the distributions
X and
Y, respectively.
Table 17 shows the results of performing the six tests on various images. As all scores are approaching 0, there is a very minimal dependency between the input and encrypted images in terms of histograms over all color channels.
3.12. Execution Time Analysis
Encryption and decryption times are used to determine an algorithm’s complexity and suitability for real-time applications.
Table 18 displays those values for the Lena image at various dimensions,
$N\times N$, where
$N\in \{64,128,256,512,1024\}$. Depending on the image dimensions, the overall encryption and decryption time varies from 0.228822s, for
$N=64$, to just under a minute, for
$N=1024$. In addition,
Table 19 presents a comparison of the encryption time among the proposed algorithm and its counterparts from the literature. Note that the differences in encryption time depend on numerous factors, including the algorithm’s complexity, the machine specifications on which the algorithm is executed (i.e., processing power and accessible memory), and the software package or programming language used to execute the algorithm. In this work, Wolfram Mathematica
^{®} is employed, while in [
39,
66,
69,
70,
71] Mathworks Matlab
^{®} was the software of choice. The average encryption speed for the proposed scheme is
$1.015$ Mbps.
3.13. S-Box Performance Analysis
As a stable component, which is almost always at the core of any image encryption technique, also, as it holds the responsibility towards implementing Shannon’s property of confusion in a cryptosystem, an S-box should be evaluated in isolation from the total perspective of the whole encryption process. There are five tests most commonly performed in order to evaluate the confusion capability of an S-box. The first test is nonlinearity [
72], which represents the measure of how many bits in the truth table of a Boolean function need to be changed in order to approach the closest affine function (optimal value of 120, with most commonly achieved of 112). The second test is linear approximation probability (LAP) [
73], which identifies the probability of bias for a given S-box (optimal value of 0.0625). Third test is a differential approximation probability (DAP) [
74], which is a technique that examines the impact of specific variations in inputs and its effect on encrypted output (optimal value of 0.0156). Fourth, are the bit independence criterion (BIC) [
75], which evaluates the relation between encryption procedures and the repeated patterns in the encrypted output (optimal value of 112). Finally, strict avalanche criterion (SAC) [
75], which calculates the rate of change in the encrypted output with respect to the change in the input on a bit by bit level (optimal value of 0.5).
In commonly used methods for S-box generation, the procedure followed in the process of S-box generation takes the evaluation methods into consideration. In this work, as demonstrated in
Section 2.2, a PRNG approach is followed, which introduces some advantages alongside some disadvantages. The main advantage is that the overall encryption process requires more keys, increasing the overall key space, resulting in more resistance to attacks. On the other hand, as per the keys provided to the S-box generation process, the evaluation scores for each S-box generation scenario is not fixed. As per that, the relation between various keys in the key space and encrypting strengths of the generated S-boxes is to be correlated, as a future work.
Evaluating an S-box generated using keys:
$\sigma =10,\beta =8/3,$ and
$\rho =28$ (shown in
Table 3) utilizing the aforementioned evaluation methods results in the findings presented in
Table 20. As the evaluations demonstrate, not all optimal values were met. More precisely, while nonlinearity and SAC showed near optimal scores, DAP scored average, and LAP was far from optimal. For better reference,
Table 21 shows scores comparison with popular S-boxes in the literature, which represent acceptable scores overall. These shortcomings are a natural result of the mechanism we adopted for the S-box generation, which is completely random, with full disregard to major S-box design criteria, which aim at avoiding fixed points and short ring cycles [
76,
77]. On the other hand, they can be regarded as trade-offs to increasing the key space of the overall encryption process, due to the addition of the new three tunability parameters (
$\sigma ,\rho $ and
$\beta $ of the Lorenz system). In other words, instead of having a fixed (well-performing) S-box as one of the encryption stages, a randomly-generated S-box will allow for increasing the number of encrypted images per a single input plain image. In the case of the proposed work, the factor of increase in encrypted images per a single input plain image is
${2}^{19}$, as thoroughly discussed in
Section 3.10.