1. Introduction
Given the current volume of digital image transactions over the internet, image compression has become necessary for reducing transmission times and storage requirements [
1]. JPEG (Joint Photographic Experts Group) is the most widely used format for such purposes [
2], making it especially suitable for platforms like social media, where efficient image handling is critical. Another relevant context is the Internet of Things, where images generated by connected devices must be processed and stored efficiently [
3].
However, digital images often contain sensitive information, making them vulnerable when transmitted over untrusted channels such as the Internet [
4]. Cryptography provides a solution to safeguard the privacy of such content through encryption, using a wide range of technologies and methods, such as neural networks [
5]. According to the encryption taxonomy proposed by Ahmad et al. [
6], algorithms based on number theory and chaos theory (e.g., [
7,
8]) are among the most secure, as they are capable of concealing all the information contained in the image. Another category is perceptual encryption, which aims to obscure only the human-perceivable information of the image. In this category, pixel-level operations generally render the image incompressible, whereas block-based approaches maintain compressibility by performing encryption on pixel blocks. Ahmad et al. [
6] conclude that this represents a trade-off between security and the ability to maintain compatibility with standard image storage formats.
On the other hand, while JPEG compression effectively addresses the need for reduced transmission time and storage space, it does not inherently address privacy concerns. Consequently, there is a need for encryption methods that are compatible with JPEG’s lossy compression [
9], ensuring data security without compromising the usability of the JPEG file format.
Generally, image encryption schemes that account for compression can be categorized into three types [
10]:
- 1.
Encryption-then-Compression (EtC), where encryption is performed before compression.
- 2.
Compression-then-Encryption (CtE), where the compression precedes encryption.
- 3.
Simultaneous Compression and Encryption (SCE), where both processes are integrated into a unified framework.
SCE algorithms require specialized decoding mechanisms because the standard compression steps are altered to incorporate encryption, which differs from conventional JPEG processes. For instance, Li and Lo modified the standard Discrete Cosine Transform (DCT) by introducing new orthogonal transforms [
11], an area that continues to be actively explored [
12]. Another example is the work of Wang and Lo, who used a deep learning-based compression network that requires this approach for decoding [
13]. However, the present work does not consider such approaches, as it aims to remain fully compatible with the standard JPEG compression and decompression.
In contrast, the present work does not adopt such approaches, as it aims to remain fully compatible with the standard JPEG compression and decompression process. Zhang et al. proposed a JPEG-compatible encryption scheme that simultaneously performs compression and encryption across three stages of image processing: DCT transformation, quantization, and entropy encoding [
14]. However, in this work, the primary objective is to preserve the original JPEG compression process, allowing the use of existing, time-optimized JPEG implementations while avoiding numerical overflows during compression. Therefore, the proposed method applies encryption techniques from both the EtC and CtE types, integrating their key features while maintaining full compatibility with the JPEG file format.
However, some EtC algorithms are tailored to non-standard compressed formats. For instance, Singh et al. employed wavelet-based compression [
15] for ensuring the subsequent decryption. However, in EtC scheme, reconstruction must be performed on encrypted, lossy data, complicating the process. This necessitates encryption algorithms that account for such complexity, but it leads to consider more steps for image decryption. For example, Jian et al. analyzed relationships between compressed images and discarded pixels to enable accurate reconstruction [
16].
In the context of JPEG-specific EtC approaches, Kurihara et al. introduced a perceptual encryption method that allows the encrypted image to retain a visually appearance through four block-based encryption steps [
17]. However, their work does not include an analysis of security metrics such as entropy, correlation, resistance to differential attacks, etc. There are not quantitative evaluations of encryption, as is done in this work. Subsequently, Chuman et al. proposed a block-scrambling-based encryption scheme that supports JPEG compression with grayscale encryption of originally images to mitigate color compression issues and demonstrated its applicability on social media platforms [
18]. However, this approach requires modifying the image dimensions from
to
, resulting in grayscale encrypted content. In addition, no security measures such as entropy were reported to assess the benefits of this modification. In contrast, our proposal preserves the original image dimensions and color information, avoiding limitations in applicability of grayscale encryption. Another related study was conducted by Imaizumi and Kiya, who applied encryption independently to each color channel within image blocks [
19], rather than encrypting the three-color channels simultaneously as complete pixels. Encrypting channels independently reduces compression efficiency compared to the conventional block-based approach. By contrast, our proposal follows the traditional block-based strategy while introducing a second encryption layer that does not further compromise JPEG compression quality.
Overall, block-based perceptual encryption algorithms have been widely reported as suitable for JPEG image encryption. For example, Ahmad and Shin enhanced this approach by incorporating both inter- and intra-block processing [
20]. Unlike our proposal, their scheme includes intra-block encryption techniques designed to improve the security of typical inter-block EtC methods. However, this additional encryption layer is also applied before compression, which affects more the compression efficiency. In contrast, our proposal introduces a second encryption layer applied after compression, which does not directly impact compression performance, aside from small variations due to JPEG markers. Finally, in several of these prior works, the improvements in security are considered only before compression, meaning that the added encryption stages directly degrade compression performance in terms of visual quality and storage. Moreover, some of these studies omit essential security metrics, limiting the completeness of their evaluation.
Regarding the CtE works, He et al. proposed a method that permutes quantized DC coefficient differences directly within the bitstream domain, in alignment with the way these values are stored [
21]. In contrast, Su et al. inverted DC coefficients and then applied the differential pulse-code modulation (DPCM) step to the modified values, requiring compression time for DPCM [
22]. He’s method has the advantage of preserving file size, maintaining parity with plain JPEG images. Similarly, Peng et al. introduced a scheme that permutes DC coefficients while also preserving both file size and format [
23]. However, because these methods perform encryption only after compression, they additionally require encryption of the AC coefficients to prevent attacks capable of revealing image edges. By contrast, our proposal incorporates encryption prior to compression, thereby preventing the preservation of edge structures in the first place. As a result, encryption of AC coefficients and additional DPCM recompression steps are unnecessary.
In CtE schemes, storage efficiency is typically prioritized alongside security. For example, Yuan et al. proposed an approach to further reduce storage requirements after compression by removing AC coefficients and applying permutation steps [
24]. While this reduces file size, it introduces additional computational cost. In comparison, our proposal considers only permutation after compression. Another example is the work by Hirose et al., who preserved JPEG file format and size by utilizing restart markers strategically inserted between Minimum Coded Units (MCUs) for encryption purposes [
25]. However, this approach requires selecting specific regions of interest within the image. In contrast, our method applies encryption across the entire image.
In summary, existing proposals for JPEG image encryption presents the following characteristics. Some proposals emphasize preserving the file format and maintaining compression efficiency. However, traditional security evaluations, such as entropy and correlation, are often missing [
26,
27]. To address this limitation, the present work incorporates not only compression performance and visual quality assessments but also entropy and correlation metrics. Moreover, most prior studies implement encryption at a single stage. Exploring encryption across multiple stages offers the potential to combine the strengths of both approaches in terms of security, storage efficiency, and visual quality. Introducing multiple encryption phases can also enhance resistance against targeted attacks, whether they occur before or after compression [
28,
29]. A central challenge, however, lies in ensuring compatibility between encryption schemes applied at different stages. Many existing methods assume encryption is performed on a plain image, or that compression is the final step before storage the image, which complicates their integration into multi-stage encryption.
Motivation: Most existing encryption algorithms designed to support compression apply encryption at a single stage (either before, during, or after compression). However, to achieve enhanced security it may be beneficial to combine encryption techniques across multiple stages. A primary challenge in doing so lies in preserving compatibility with the JPEG format, which imposes specific structural and encoding constraints. The main objective of this work is to design and implement an encryption algorithm that integrates both encryption-then-compression and compression-then-encryption (CtE) techniques, while maintaining full compliance with the JPEG file format. This integration aims to enhance security, preserve visual quality, and sustain efficient storage through JPEG lossy compression.
Contribution: We propose an encryption algorithm that maintains JPEG file format compatibility while applying encryption both before and after compression, forming a structure encryption–compression–encryption. This dual encryption-stage approach combines the strengths of EtC and CtE techniques, resulting in improved security and efficient compression (in terms of storage, and degradation of visual quality).
The structure of this paper is organized as follows:
Section 2 presents the materials and methods used in this work, including the JPEG compression algorithm, the baseline mode, and the evaluation metrics used to assess security, storage efficiency, and visual quality.
Section 3 describes the decoding process of the JPEG bitstream, including the generation of Huffman codes, followed by the proposed encryption algorithm, and information about the key and permutations. The encryption is applied in two stages: prior to compression in the pixel domain and subsequently after compression in the bitstream domain.
Section 4 presents the experimental results, structured according to the three types of evaluation: security, storage, and visual quality.
Section 5 provides a discussion and analysis of the results, highlighting key findings and comparing them with existing approaches. Finally,
Section 6 concludes the paper.
2. Materials and Methods
This section provides an overview of the JPEG compression, from the image pixels to the compressed JPEG file. The section also includes the description of the baseline JPEG mode, as it is the setting mode of the proposed method. The section also outlines the metrics used to evaluate performance: entropy, linear correlation, and the differential attack measures for security assessment, bits per pixel (bpp) for storage efficiency, peak signal-to-noise ratio (PSNR), and the structural similarity index for visual quality.
2.1. JPEG Overview
The process of generating a JPEG bitstream involves distinct steps, each critical to the image compression procedure. Below is a concise explanation (with an example) of the key steps leading to the creation of a JPEG image.
2.1.1. Color Space Transformation
Any plain image can be represented in a three-dimensional space, known as the RGB color space, defined by the three primary colors: red (R), green (G), and blue (B), ranging from 0 to 255. However, JPEG converts the image from the RGB color space to the YCbCr color space using a system of three equations, as defined in Equation (
1),
where:
The purpose of this step is to separate the image into its luminance and chrominance components, ranged from 0 to 255. Since the human visual system is more sensitive to variations in brightness (luminance) than to color (chrominance), this separation allows for more intensive compression of the chrominance components, compared to the luminance component.
2.1.2. Subsampling and Blocks Division
In this stage, the chrominance components undergo subsampling, while both luminance and subsampled chrominance components are divided into blocks of 8 × 8 elements, as the subsequent processing stages operate exclusively on 8 × 8 blocks. On the other hand, the subsampling factors indicate the way of subsampling. The values used in this work are:
- 1.
Luminance: A vertical sampling factor of 2 and a horizontal sampling factor of 2.
- 2.
Chrominance (Cb and Cr): A vertical sampling factor of 1 and a horizontal sampling factor of 1.
The vertical sampling factor of one indicates that for every chrominance block read in this direction, two luminance blocks are vertical processed, as the vertical subsampling factor for luminance is set to two. Similarly, this proportion applies in the horizontal direction. As a result, for every 16 × 16-pixel area (256 pixels) of the image, four 8 × 8 luminance blocks are processed, while only one 8 × 8 block each of blue and red chrominance is retained.
2.1.3. Discrete Cosine Transform
After dividing the image into blocks of size
, each of the 64 elements in every block is subtracted by 128. This operation shifts the original intensity range of Y, Cb, Cr, from
to
, resulting in a new block, having entries for each channel denoted as
, for
. Subsequently, each modified block undergoes the Discrete Cosine Transform (DCT), as defined in Equation (
2). This transformation produces a new
block consisting of 64 frequency-domain coefficients, denoted as
for
. Each one is derived by combining all spatial-domain values
weighted by the cosine basis functions.
where:
: Represents the intensity values within an block (64 elements) of the Y, Cb, or Cr channels after subtracting 127. The elements are indexed from 0 to 7 along both the x and y directions.
: Denotes the Discrete Cosine Transform (DCT) coefficients corresponding to the block of Y, Cb, or Cr channels, indexed from 0 to 7 in the u and v directions.
cos: The cosine function, evaluated in radians ().
: Normalization factors defined in Equation (
3).
2.1.4. Quantization
The block resulting from the application of Equation (
2), the values
are divided by a fixed quantization table, denoted as
, which is applied entry by entry to all blocks. This quantization step serves two primary purposes. First, it introduces zeros in the DCT coefficients
with low magnitudes, which typically represent less perceptually significant information. Second, it reduces the magnitude of the remaining significant coefficients, enabling more efficient compression by requiring fewer bits for storage. The result of the division, denoted as
, is computed by rounding the quotient to the nearest integer
, ensuring that subsequent steps work exclusively with integers. This process is mathematically represented by Equation (
4).
where:
: The DCT coefficient located at coordinates .
: The quantization value corresponding to the frequency position .
: The quantized DCT coefficient obtained after dividing by and rounding to the nearest integer.
2.1.5. Zig-Zag Permutation
Next, the 64 quantized DCT coefficients
are rearranged according to the Zig-Zag permutation, as illustrated in
Figure 1a. The first element in this sequence is
, referred to as the DC coefficient, which captures the most significant information in the block. Following Equation (
2),
corresponds to eight times the average of the input values
. The remaining 63 coefficients are known as AC coefficients, representing the higher-frequency components of the block. To facilitate the compression, each AC coefficient is assigned an index from 1 to 63, following the order imposed by the Zig-Zag pattern. This indexing is illustrated in
Figure 1c.
2.1.6. Run-Length Encoding
This compression technique, known as Run-Length Encoding (RLE), is applied exclusively to the AC coefficients. The method involves forming pairs of values in the following manner (see
Figure 2):
- 1.
The first value s indicates the number of consecutive zero-valued coefficients preceding a non-zero coefficient.
- 2.
The second value t represents the non-zero AC coefficient that terminates the sequence of zeros.
Rather than storing each zero individually, this approach compactly represents long runs of zeros, which are common due to the Zig-Zag ordering introduced in the previous stage. To illustrate this process, the AC coefficients from
Figure 1c are compressed using Run-Length Encoding. The resulting encoded output of
Figure 1 is shown in
Figure 3.
2.1.7. Differential Pulse Code Modulation
The encoding of DC coefficients differs from that of AC coefficients because each
block contains only a single DC coefficient. In JPEG compression, DC coefficients are not stored using their original values (except for the first block’s DC coefficient, DC
1). Instead, each DC coefficient is encoded as the difference ΔDC
i between its value DC
i and the DC value of the previous adjacent block DC
i−1. This technique is known as Differential Pulse Code Modulation (DPCM). The differential value, denoted as ΔDC
i−1, is computed according to Equation (
5). For the first block, since no previous DC coefficient exists, its value is stored directly:
.
where:
: The DC coefficient of block i.
: The DC coefficient of the block immediately preceding block i.
: The differential value of with respect to the previous DC coefficient .
Thus, the value actually stored in the JPEG file is
, not the original
. The advantage of this method lies in the fact that adjacent blocks often have similar values, resulting in small differences. This makes
typically smaller in magnitude than
itself, allowing for more efficient compression. This process is illustrated in
Figure 4, which shows the transformation of four DC coefficients into their corresponding differential values via DPCM.
2.1.8. Huffman Symbols and Additional Bits
Huffman symbols: In this step, the AC coefficients encoded with RLE are processed as follows: the first number in each pair
s, indicating the number of preceding zeros, is now represented with four bits
. Subsequently, these are concatenating it with four bits
that indicate the number of bits required to represent the absolute value (
) of the following non-zero coefficient. The result of this concatenation is known as the Huffman symbol. In summary, the Huffman symbol (HS) is always 8 bits long and represents a byte value
x, as it is illustrated in
Figure 5.
Since DC coefficients are not preceded by sequences of zeros, the first four bits of their Huffman symbol are always set to zero. The remaining four bits, denoted as
, represent the number of bits required to encode the absolute value of the differential coefficient
. Thus, the complete Huffman symbol for a DC coefficient takes the form:
(see
Figure 6).
Additional bits: To complete the AC coefficient representation, Additional Bits (AB) are appended immediately after the Huffman symbol. These bits are denoted by the value
y, as shown in
Figure 5. Unlike the Huffman symbol, which always consists of 8 bits, the number
y of additional bits varies depending on the binary length required to encode the coefficient
t. The key issue in encoding
t is that AC coefficients can be either positive or negative, and binary representation must reflect the sign. To handle this, a sign convention is applied according to Equation (
6). If
, then
. If
, the convention involves computing a power of 2,
the number of bits needed to represent the positive number
, and ⊕ the XOR operation. The procedure for encoding the additional bits (
y) for DC coefficients follows the same approach as with AC coefficients. Instead of encoding the number
t, the differential value
is used. This value is then assigned to the variable
y.
Then, an example of encoding the
block, following the application of RLE for AC coefficients (
Figure 3) and DPCM for DC coefficient (
Figure 4), is presented in
Figure 7. The encoded output is organized sequentially, starting with the Huffman symbol
x followed by its corresponding additional bits
y. Together, these components represent both the DC and AC coefficients of the block.
where:
t: Represents either the differential DC value or a nonzero AC coefficient.
y: The binary representation used to encode the coefficient t as a positive value.
⊕: The bitwise XOR (exclusive OR) operation.
: The absolute value of the coefficient t.
: The base-2 logarithm function.
After this stage, the Huffman code is applied only to the Huffman symbols [
30], the additional bits remain with the same value.
2.2. Bitstream Decoding of Coefficients
Since the second stage of the proposed encryption scheme operates on the bitstream domain, it is essential to understand how AC and DC coefficients are decoded from the JPEG bitstream. For AC components, we refer to the quantized AC coefficients, and for DC components, to the quantized values of the differential DC coefficients. The first step is decoding the Huffman symbols, which were initially encoded using Huffman codes prior to storage. In other words, the JPEG bitstream does not directly store the Huffman symbols themselves; instead, it stores their corresponding Huffman codes, which are variable-length binary sequences.
As the bitstream is composed of a continuous sequence of bits without explicit delimiters between symbols and coefficient values, the start and end of each Huffman code must be identified. Neither does the JPEG format explicitly store which Huffman code corresponds to each Huffman symbol. Rather, it provides the code-length counts which is a list with the number of Huffman codes utilized of each length from 1 to 16 bits, and a list of the Huffman symbols used. The mapping between Huffman codes and symbols is determined by the order in which the symbols appear: they are assigned in sequence to the codes of increasing length, as specified by the code-length counts. This information is sufficient to reconstruct the Huffman codes and their interpretation of Huffman symbols in the bitstream.
Throughout the JPEG bitstream, specific markers indicate the start of structural elements such as the Define Huffman Table (DHT) marker, composed of two bytes: “FFC4”, which signals the beginning of the Huffman table. This section contains all the information of the Huffman codes length and the Huffman symbols. Immediately following the marker are two additional bytes, which together define the length
e (in bytes) of the Huffman table segment. This length includes the two bytes used to specify it, so the actual content of the table spans
bytes (see
Figure 8a). The first byte after the length field is divided in two: The first four bits (from left to right) indicate the Table ID. A value of 0 designates the Huffman table for the luminance channel, while a value of 1 designates the chrominance channel. The remaining four bits specify the table type. A value of 0 indicates a DC coefficient table, and a value of 1 indicates an AC coefficient table.
The next 16 bytes define the number (#) of Huffman codes for each possible code length, from 1 to 16 bits. For example, if the first byte has a value of 0, it means there are no Huffman codes of length 1. If the second byte has a value of 2, it indicates there are two Huffman codes of length 2, and so on. However, this section does not specify which Huffman codes correspond to those lengths, only how many codes exist for each length. The sum of the 16 values provides the total number
w of Huffman symbols, each of which is represented by a single byte. Therefore, the next
w bytes in the bitstream contain the Huffman symbols, ordered according to the lengths defined in the previous 16 bytes. All this information can be found in
Figure 8a.
An example of this process is illustrated in
Figure 8b, which is a part of the JPEG bitstream beginning with the DHT marker ff c4. The two subsequent bytes define the Huffman table length: 00 1d, indicating a total of 29 bytes. The following byte contains 00, where the first half indicates Table ID = 0 (luminance) and the second half (0) indicates DC coefficient table. This configuration is summarized in
Figure 8c. The next 16 bytes define the number of Huffman codes for each length, and their total sum is 10. Thus, 10 Huffman symbols follow, as illustrated in
Figure 8d, which summarizes the mapping between Huffman symbols and their corresponding Huffman code lengths.
It is important to note that decoding the DC and AC coefficients is in the Start of Scan (SOS) marker, which is identified by the hexadecimal value “FFDA”. In the JPEG baseline mode, there is only one scan. Immediately following the marker, the length of the scan segment is specified using two bytes. This two-bytes value does not count the two bytes used to indicate the marker (“FFDA”) or the two bytes used to specify the length itself.
The next byte indicates the number of color components involved in the scan, typically three for color images (Y, Cb, and Cr). Subsequently, for each component, two bytes follow:
- 1.
The first byte identifies the component number (1 for Y, 2 for Cb, and 3 for Cr).
- 2.
The second byte is divided into two parts: the upper 4 bits indicate the Huffman table used for DC coefficients, and the lower 4 bits indicate the Huffman table used for AC coefficients.
Then, the next three bytes "00 3F 00" are standard in the JPEG baseline scan header. Following this header, the compressed image data is presented, encoding the quantized DC and AC coefficients.
2.3. Huffman Code Generation Procedure
Below, we describe the process for generating Huffman codes based on the number of codes for each bit length. The 16 bytes that store this information in the JPEG bitstream compose the Number of Huffman Codes array, denoted as NHC. In this array, the entry NHC[i] indicates the number of Huffman codes of length i bits, for to 16.
- 1.
Step 1. Initialize the variable cd to 0. This variable represents the current Huffman code being generated and is updated iteratively throughout the process.
- 2.
Step 2. For each code length i from 1 to 16:
- (a)
If NHC[i] = 0, this means there are no Huffman codes of length i, and the variable cd is updated as cd = cd × 2 (i.e., cd = cd << 1) to prepare for the next code length.
- (b)
If NHC[i] > 0, then for each of the NHC[i] Huffman codes of length i (from to NHC[i]):
- i.
Assign the current value of cd as a Huffman code of length i.
- ii.
Increment cd by 1 (i.e., cd = cd + 1).
- (c)
After all codes of length i have been assigned, update cd = cd × 2 to ensure the correct prefix property for the next code length.
The generated Huffman codes are assigned to the Huffman symbols in the order in which the symbols appear in the JPEG bitstream. In this way, each Huffman symbol is paired with a unique Huffman code of specified length. The resulting Huffman codes corresponding to the lengths defined in
Figure 8d are illustrated in
Figure 9a, where the mapping between Huffman symbols and their generated Huffman codes is shown.
Additionally,
Figure 9b presents an example bitstream as it appears in a JPEG file, where the bits are arranged sequentially. In this example, the first bit does not correspond to any valid Huffman code. Therefore, two bits are considered. Referring to
Figure 9a, this two-bit sequence matches the second Huffman code, which maps to the second stored Huffman symbol (with a value of 6 in this case).
Decoding involves replacing each Huffman code in the bitstream with its corresponding Huffman symbol. For DC coefficients, the decoded symbol indicates how many additional bits should be read. These additional bits follow immediately after the Huffman code and represent the actual DC value. This decoding process is illustrated in
Figure 9c.
2.4. JPEG Baseline Mode
The JPEG (Joint Photographic Experts Group) standard, developed by a committee of the same name, defines two main types of image compression: lossless and lossy [
31]. For the lossy compression scheme, the baseline mode refers to the simplest and most widely supported variant of JPEGs. It is characterized by a straightforward encoding and decoding process and is particularly favored for its balance between compression efficiency and computational simplicity. Its key features are outlined below [
32,
33]:
Processes the image in non-overlapping blocks of size pixels.
Employs a sequential rather than progressive encoding approach, meaning the image is encoded in a single scan from left to right and top to bottom.
Uses the Type-II Discrete Cosine Transform, as defined in Equation (
2), instead of wavelet-based prediction techniques.
Relies on Huffman coding for encoding Huffman symbols, rather than arithmetic coding.
Utilizes default Huffman tables where the Huffman codes are reconstructed by the decoder, as Huffman codes are not included within the compressed image file.
Supports compression in various color spaces, with better performance in those that separate chromatic (color) and achromatic (luminance) components (e.g., YCbCr).
Employs at most two sets of Huffman tables: one for the luminance (achromatic) component and another for the chrominance (chromatic) components. Each set includes one table for DC coefficients and another for AC coefficients.
Marks the beginning of a baseline-compressed frame with the Start of Frame (SOF0) marker, which has a hexadecimal value of FFC0.
2.5. Information Entropy
Information entropy, denoted as
H and defined by Equation (
7), quantifies the degree of randomness or unpredictability within a sequence of bits. In the context of cryptography, it is widely used to assess the security of encrypted data [
34]. In this study, entropy is calculated for each color channel (red, green, and blue) of the compressed and encrypted images independently.
The function
represents the base-2 logarithm. An ideal encrypted image should exhibit an entropy value close to the maximum of 8.0, which indicates a highly random distribution of pixel values and, therefore, a stronger resistance to statistical attacks.
where:
2.6. Correlation Coefficient
The correlation coefficient
r is a statistical measure (from −1 to 1) used to evaluate the linear dependency between adjacent pixels in an image. In encryption analysis, lower correlation values are desirable, as they suggest a greater disruption of spatial relationships, thereby indicating stronger encryption [
35]. A correlation value close to zero indicates minimal linear relationship, which is ideal for ensuring effective image encryption. It is calculated according to Equation (
8) and applied separately to each color channel (red, green, blue).
where:
w: The number pixels randomly selected from the image.
: The value of pixel i.
: The value of the adjacent neighbor pixel of pixel i.
: The mean of the sampled pixels.
: The mean of the corresponding neighbor pixels.
2.7. Number of Pixel Change Rate and Unified Average Changing Intensity
The robustness of the proposed encryption scheme against differential attacks is evaluated using two metrics: the Number of Pixel Change Rate (NPCR) and the Unified Average Changing Intensity (UACI). Both metrics measure the sensitivity of the encryption algorithm to variations in the input image, specifically analyzing how a one-pixel change in the plain image affects the resulting encrypted image, at each position
[
36].
In this evaluation, two plain images of identical dimensions , differing only in the value of a single pixel, while all other pixel values are equal, where:
C is the encrypted image of the initial plain image.
is the encrypted image of the image that differs in one pixel of the plain image.
p is the number of pixel rows.
q is the number of pixel columns.
The NPCR metric measures the percentage of pixels whose values differ between two encrypted images and is defined as Equation (
9):
where:
C: The encrypted image of the initial plain image.
: The encrypted image of the image that differs in one pixel of the plain image.
p: The number of pixel rows.
q: The number of pixel columns.
: The variable defined in Equation (
10).
On the other hand, UACI measures the average intensity difference between the original encrypted image
C and the encrypted image
using Equation (
11):
A desirable NPCR value is 99.5693, and for UACI a value within the range of .
2.8. Peak Signal-to-Noise Ratio
To assess the visual quality of images before and after lossy compression, the Peak Signal-to-Noise Ratio (PSNR) is employed, as shown in Equation (
12).
where:
: The value of pixel of the original uncompressed image I.
: The value of pixel of the decompressed image K.
This metric is derived from the Mean Squared Error (MSE) between the corresponding pixel intensities of the two images. The MSE quantifies the average of the squared differences between corresponding pixel values in
I and
K [
37]. This value constitutes the denominator in the PSNR expression. The numerator reflects the dynamic range of pixel values in the image, which for 8-bit images is 255. The PSNR is then computed as the logarithm (base 10) of the ratio between the dynamic range and the MSE, scaled by a factor of 20. A higher PSNR value indicates better preservation of image quality after compression. In this work, PSNR is computed separately for each color channel (red, green, and blue). In addition, the PSNR is utilized to measure the encryption quality [
36], where the lower PSNR value the better quality of encryption.
2.9. Bits per Pixel
Bits per pixel (bpp) represents the average number of bits required to store a single pixel in an image. It is calculated by dividing the total number of bits used to store the image by the total number of pixels, as shown in Equation (
13). This metric is particularly useful for evaluating the efficiency of compression algorithms by comparing the storage requirements of compressed images with those of uncompressed formats. For instance, a standard uncompressed 24-bit image uses 24 bits per pixel. Alternative metrics such as bytes per pixel or bits per channel can be derived from the bpp value if needed [
38].
4. Results
To evaluate the proposed ECEA methodology, five images were used: Airplane (
Figure 12a), Baboon (
Figure 12b), Donkey (
Figure 12c), Peppers (
Figure 12d), and Sailboat (
Figure 12e). All images are color images of 512 × 512 pixels. The images are used for academic and non-commercial research purposes, the images are standard benchmarks in image encryption, and all copyright belongs to the original holders [
42]. JPEG compression was performed using the built-in functionality of C++ Builder 12 Community Edition, that presents a compression quality of 75%. For encryption, three different block sizes were tested: 8 × 8, 16 × 16, and 32 × 32 pixels. It is important to note that in order to maintain compatibility with JPEG compression, block sizes must be multiples of 8.
Figure 12f shows the Airplane image encrypted with ECEA using an 8 × 8 block size. Similarly,
Figure 12g–j present the encrypted versions of the Baboon, Donkey, Peppers, and Sailboat images, respectively.
It is important to emphasize that all results, even those labeled as “plain image,” correspond to JPEG images. The notation used in the tables of this section is as follows:
- 1.
“C” refers to the image that has only been JPEG-compressed.
- 2.
“EC” indicates an image that was first encrypted and then JPEG-compressed.
- 3.
“ECE” refers to an image that underwent encryption, followed by JPEG compression, and then a second encryption stage on the compressed bitstream.
Results are reported for each block size: 8 × 8, 16 × 16, and 32 × 32. Additionally, the results are presented per image and per color channel. The rows labeled “R,” “G,” and “B” refer to the red, green, and blue color channels, respectively, while the row labeled “A” reports the average across all three color channels. The results are organized into three categories: security analysis, storage analysis, and visual quality assessment.
4.1. Security Results
Table 1 presents the entropy results, where values closer to 8.0 indicate higher entropy, and stronger encryption security, while
Table 2 provides a comparison with other works. Additionally, using three alternative keys different from the one employed in
Table 1, the Baboon image was also encrypted, and the resulting entropy values are reported in
Table 3.
Table 4 shows the correlation coefficients, where values closer to 0.0 indicate weaker linear correlations between adjacent pixels, which enhances security. Conversely, values approaching 1.0 or −1.0 suggest strong linear dependencies, which are less desirable for encryption. In addition,
Table 5 presents a correlation comparison. Finally,
Table 6 and
Table 7 present the distribution of consecutive DC coefficients with identical signs for both the Baboon and Donkey images, illustrating the effect of block sizes and encryption before compression on DC coefficient grouping.
4.2. Storage Results
Table 8 reports the file size (in bytes) of each JPEG image, including the plain image, the encrypted and compressed image for various block sizes, and the final encrypted–compressed–encrypted (ECE) image.
Table 9 presents the number of bits per pixel (bpp), representing the number of bits required to store a pixel with three color components.
4.3. Visual Quality Results
Table 10 also includes the Peak Signal-to-Noise Ratio (PSNR) values, which provide an indication of visual information quality following JPEG lossy compression. The PSNR values are computed after full decryption of the DC coefficients and decompression back to the pixel domain. These results are reported for each color channel and each block size. It is important to note that the lossy compression occurs only after the first stage of pixel-domain encryption; the second encryption stage operates on the compressed bitstream and does not affect PSNR results; then, the visual quality of EC is the same as ECE. In addition,
Table 11 presents the PSNR values comparing the original uncompressed image and the encrypted ones to measure the visual security.