1. Introduction
The secure transmission of images across public networks is becoming increasingly critical due to the rise of cyberattacks and unauthorized access to visual information. With the rapid expansion of image exchange across personal, corporate, and governmental settings, the risks of interception, manipulation, and exposure have become increasingly significant. These threats motivate the development of protection strategies that maintain confidentiality, integrity, and authenticity, especially for image data. Unlike text, images have a large size and exhibit redundancy and strong interpixel correlations; therefore, a direct transfer of techniques designed for text encryption is often inefficient.
DES and AES are examples of classical block ciphers that are essential for secure communication. However, when they are used directly to encrypt images, they typically fail to exploit intrinsic image statistics and the parallel structure of pixel arrays [
1,
2,
3]. To address these limitations, image focused cryptosystems have emerged around two interrelated approaches. The first approach is based on chaotic dynamics. Their sensitivity to initial conditions and ergodic behavior yield unpredictable, key dependent sequences suitable for diffusion and keystream generation [
4,
5]. The second approach is the substitution and permutation network, which realizes Shannon’s principles of confusion and diffusion. This is achieved through iterated S-box and P-box operations [
6,
7,
8,
9].
These approaches are often integrated by using chaotic maps such as a chaotic logistic map (CLM) to parameterize or key the components of the substitution and permutation network. This integration produces lightweight schemes with strong resistance to cryptanalysis [
7,
10]. To achieve higher security levels, researchers have explored modified logistic maps and higher dimensional chaotic systems. These enhancements increase the key sensitivity and further strengthen the system’s unpredictability. At the same time, surveys continue to organize and synthesize this approach [
3].
There is a large amount of data being generated and transmitted due to the current advancements in data digitization. This makes the storage and transmission of sensitive data over open and unsecure communication channels a complex challenge [
11]. Steganography, the process of hiding data within cover media such as images, audio, and video files, offers an effective solution to this challenge. It conceals secret information, rendering it undetectable to unauthorized individuals or attackers [
12].
The different steganography schemes can be mainly divided into three main domains. These are the spatial, transform, and adaptive domains. The spatial domain, being the simplest, involves the modification of the pixel values of the cover image without considering the features and textures of the images. This results in a uniform capacity rate of data hiding across the whole cover image regardless of how smooth or complex an area of the cover image is [
13]. The least significant bit (LSB) method is a spatial domain method that modifies the least significant bits of pixel values, making it a lightweight and efficient method for embedding encryption keys or other data within an image without significantly altering its appearance [
14].
Hiding information within the least significant bit (LSB) of the steganography image is a particularly easy and common practice in image steganography [
15,
16]. This method changes the least significant bit planes of the image pixels so that no obvious difference is encountered, except for a tonal variation which is not discernable to the human eye. Usually, the process requires the R, G, and B channels of the images, with an interval range of 0–255 and a depth of 8 × 3 bits per pixel, offering more space for data to be hidden. After the RGB channels from the cover image have been separated and the secret data bits have been embedded into the LSB of any of the channels, the channels can be merged together to form the final steganography image, effectively hiding the data.
On the other hand, the Pixel Value Differencing (PVD) method exploits the advantages of the human visual system (HVS) by embedding information using the difference between the consecutive pixels of the cover image. This makes it possible for the PVD method to be capable of embedding data on the smooth and edged regions of the image with low and high embedding capacities, respectively. It is also able to keep the visual content intact, thereby improving secrecy.
Such spatial domain techniques, for example, image LSB substitution and PVD techniques, are highly sought after because of the ease they provide when embedding a lot of information within a cover image that does not change much visually. More specifically, PVD methods work better because of the nature of their approach, which is to embed information within the cover image areas of less detailed pixel intensity variations. Throughout the history of the development of steganography methods, both LSB substitution and PVD have retained their relevance and are still very much in demand thanks to their effectiveness, speed, and low influence on the perceptibility of images, which means safety for the pertinent information within.
The transform domain of steganography involves the transformation of the cover image into the frequency domain using mathematical functions such as the Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT), or Discrete Fourier Transform (DFT). The embedding process involves the manipulation of the frequency components of the cover image instead of the image itself. The secret data is embedded into specific frequency coefficients, which are usually in the middle or high frequency ranges [
17]. The transform domain usually achieves a high level of imperceptivity because modifications in this domain are not very noticeable to the human eye. The transform domain techniques are also highly resistant to steganalysis attacks as they are robust against many image processing operations, including scaling, compression, and noise addition [
18].
DCT-based steganography is one of the more common transform domain techniques and is frequently applied to images. It involves the embedding of data into quantized DCT coefficients, focusing on the frequency components that have much less impact on the visual quality of the image. The DWT is another transform based technique where an image is decomposed into multiple frequency sub-bands (low and high frequencies). The data embedding is performed in the high frequency sub-bands which are less susceptible to detection through the human visual system [
19,
20].
The field of image encryption has undergone significant evolution over the years, with numerous algorithms being developed to enhance both security and efficiency. This literature review examines various encryption techniques, with a focus on those that incorporate chaotic maps, substitution–permutation networks (SPNs), and least-significant-bit (LSB) substitution. It also illustrates the evolution of image encryption and steganography techniques from traditional methods to advanced chaotic and combined approaches. These advancements not only enhance data security against a wide range of attacks but also address the need for maintaining visual integrity in encrypted images.
This paper presents an innovative hybrid approach that merges encryption and steganography through the implementation of an SPN, chaotic maps, and LSB substitution. We develop a fast, efficient, and secure steganographic method that first encrypts the data and then embeds the encrypted data within an image while maintaining the image’s imperceptibility and integrity. We evaluate the effectiveness of the proposed methodology by conducting comprehensive experiments utilizing a range of performance metrics, including the mean square error (MSE) and peak signal-to-noise ratio (PSNR), and pixel difference histogram (PDH) analysis to quantify distortion, perceptual quality, and statistical regularity, respectively. The key contributions are as follows:
We develop a hybrid approach that integrates encryption and steganography by utilizing a combination of SPN, CLM, and LSB substitution techniques. The integration of these methods results in a solution characterized by enhanced efficiency, robust security, and significant resilience against statistical attacks.
In the proposed method, we introduce an SHA2-256 hash to ensure that the encryption algorithm responds to even minor changes in the plaintext. A small modification in plaintext results in a different hash value, which affects all encryption and steganography processes.
The proposed algorithm allows the embedding of various data types, including text, images, and audio, by reshaping them into 2D matrices. This transformation is essential for leveraging a lightweight and efficient encryption algorithm through optimized matrix operations, thereby enhancing performance and scalability.
We also perform a detailed evaluation of the proposed hybrid data encryption and image steganography algorithm, comparing it with other methods in the literature. The results show significant improvements in execution time, MSE, and PSNR, highlighting the method’s superior performance and effectiveness.
The remainder of the paper is organized as follows:
Section 2 presents the proposed hybrid encryption and steganography method.
Section 3 reports the experimental results.
Section 4 concludes the paper.
3. Experimental Results and Performance Analysis
This section presents various experiments conducted to evaluate the performance of the proposed algorithm. Several performance metrics are considered, including the visual quality, embedding capacity, mean square error (MSE), and PSNR. The performance is evaluated on a Windows 10 Pro system equipped with an Intel® Xeon® CPU E5-1620 v4 running at 3.50 GHz and featuring 32.0 GB of RAM. We use MATLAB R2023b to run the proposed algorithm, performing all tests 20 times.
A selection of cover images with dimensions of
,
, and
pixels are used for testing in both grayscale (8-bit) and color (24-bit) formats. These images were sourced from the standard USC Image Database [
25], as illustrated in
Figure 5. Due to the nature of the LSB substitution method, there are multiple ways to compute the maximum number of embedding bits that can be inserted. These ways depend on the number of LSBs (
k-LSB) substituted per pixel. The properties of these images and their maximum 1-bit capacities are summarized in
Table 1. We note that the payload for 1-LSB is considered to be 1 bpp. This is one bit per pixel, such that
k-LSB is then
k bpp, which thus means k bits per pixel.
Table 1 shows that the Baboon image has a capacity of 786,432 bits, which is less than 100 KB of data. When the Clock image is loaded as an image, it contains 524,288 bits of data. This indicates that the Clock image can be successfully embedded inside the Baboon image with substantial remaining capacity. However, a data compression approach can also be used to handle the payload by reducing its size and reading the data as a binary file in
uint8 format, thereby increasing the effective embedding capacity.
3.1. Execution Time
The embedding process involves converting the secret data into a binary stream, modifying pixels using the LSB substitution method, and reshuffling the pixels using the CLM. The extraction process reverses these operations, retrieving the hidden data while maintaining the integrity of the cover image.
Table 2 presents the execution time for embedding the ASCII characters of
"Hello Steganography" in different plaintext images. The execution time results are averaged over 20 iterations, and the final result is reported. To evaluate the algorithm’s performance for larger data sizes,
Table 3 reports the execution time when embedding a 10 KB randomly generated file containing 10,240 alphanumeric characters into the cover images.
The results indicate that embedding time varies depending on the image content and complexity. Higher embedding times are observed for more detailed images due to pixel variations affecting LSB substitution and chaotic shuffling. The extraction process, however, is consistently faster since it only involves reversing the pixel modifications and retrieving the hidden bits. The average total execution time across all tested images is 0.0113 s. Also, the average total execution time when embedding 10 KB of data is 0.0162 s. These results demonstrate the efficiency of the proposed algorithm.
From
Table 2 and
Table 3, it can be observed that the execution time for embedding is longer than for extraction. This is expected because the embedding process involves more data processing steps than the extraction time. The execution times in
Table 3 are significantly larger than those in
Table 2, primarily due to the difference in data size. A comparison of the execution time with other methods from the literature is shown in
Table 4. This runtime is computed based on the average runtime of embedding 10 KB in all of the test images.
Table 4 shows that the proposed algorithm outperforms other methods from the literature. This is due to the absence of long
for loops in the embedding process. All computations are performed using bit manipulation and vector operation strategies. Also, larger data is converted to two dimensions in the encryption process to leverage the advantage of the fast, efficient, and secure image encryption method employed.
Table 5 presents the execution time when embedding the Clock and Tree images as secret data into the different cover images. To perform this experiment, there is a need to embed in more space than the number of pixels available in a cover image. Thus, k-LSB is utilized such that, when data cannot be embedded further in a significant bit, a higher significant bit is selected for the embedding. However, since the steganography images degrade faster at higher LSBs, the limit is set to 4-LSB.
From
Table 5, it can be observed that some values are unavailable. This is because the available embedding capacity of the cover image is insufficient. It is noted that a limit of 4-LSB is set for this experiment. This means that, if the size of the secret data to be embedded is greater than 4 bits, it is considered an invalid operation and skipped.
Figure 6 presents the resulting steganography images after the Clock and Tree images have been embedded.
From
Figure 6, it can be seen that there is very little distortion in the steganographic images. This shows that the proposed algorithm is effective enough to avoid human visual perception. However, the Clock image is also not present in the result. This is because the Clock cover image is unable to contain the Clock secret image. This is expected, since eight times the number of pixels would be required, which is much larger than the predefined maximum of four times.
Figure 7 shows the result of embedding the secret Tree image within the various cover images.
Figure 7 shows that slight distortion can still be noticed after embedding the Tree image. This suggests that embedding the Tree image does not significantly impact human visual perception of the four images. However, this is not the case for the Tree, Boat, and Cameraman, as the number of bits to be embedded surpasses the required maximum of 4-LSB previously set.
3.2. Mean Square Error (MSE) Analysis
The MSE is a metric widely used in image processing, cryptography, and steganography to quantify the difference between an original and a modified image. It computes the average squared difference between corresponding pixel values, effectively penalizing larger discrepancies more than smaller ones. This characteristic makes the MSE advantageous in some applications while limiting in others. A key benefit of the MSE is its computational efficiency, allowing for rapid image quality assessment.
The MSE between the cover and its corresponding steganographic images is given by the following:
where
H and
W denote the image height and width, respectively, while
and
represent the pixel values of the cover and steganographic images at position
.
Table 6 presents the computed MSE values for various plaintext images when embedding 19 bytes (19 B) and 10 KB of data. Additionally,
Table 7 compares the MSE values of the proposed method against those reported by Emam et al. [
16], highlighting the method’s effectiveness.
The results indicate that the proposed method consistently achieves lower MSE values than the method presented by Emam et al. [
16], demonstrating its superior ability to preserve image quality while embedding secret data. This improvement suggests that the proposed technique effectively maintains imperceptibility, a critical requirement for secure steganographic applications.
3.3. Peak Signal-to-Noise Ratio (PSNR) Analysis
The PSNR is a metric widely used for evaluating image quality by measuring the ratio between the maximum possible signal power and the noise introduced by image modifications [
28]. The PSNR is derived from the MSE between the original and modified images, where higher PSNR values indicate a lower perceptible distortion. The PSNR is computed as follows:
where
represents the squared maximum possible pixel value, typically
for 8-bit images.
Table 8 presents the PSNR results for various cover images when embedding 19 B and 10 KB of data. The results indicate that embedding a smaller payload (19 B) results in significantly higher PSNR values, implying minimal perceptible changes to the cover image.
The observed trends indicate that lower embedding rates result in better image quality. For example, the PSNR for the Baboon image is 86.875 with a 19 B payload but drops to 60.979 when embedding 10 KB. Similar patterns are evident in all the images tested.
Table 9 compares the PSNR values of the proposed method with those reported in previous studies, illustrating its superior performance. The results indicate that the proposed method achieves a higher PSNR compared to previous techniques, reflecting its ability to embed data while maintaining superior image quality.
Table 10 and
Table 11 further validate its efficiency by comparing the performance with Wu and Tsai [
29] and Hameed et al. [
26].
The findings indicate that the proposed method achieves notable improvements in PSNR, confirming its effectiveness in preserving image quality while embedding data. Even when compared to existing methods, the approach maintains higher imperceptibility and lower distortion, making it a promising choice for secure image steganography.
3.4. Pixel Difference Histogram Security Analysis
The pixel difference histogram is a steganalysis method that examines the distribution of pixel intensity differences to detect hidden information in images. This method operates by computing the differences between adjacent pixels, which, in natural images, are expected to follow specific statistical patterns [
26]. When secret data is embedded using methods such as LSB steganography, these intensity differences often appear more random and deviate from their typical distribution. Plotting a histogram of these pixel differences allows forensic analysts to identify anomalies that can point to the existence of concealed data [
31].
The main goal of the pixel difference histogram analysis is to look at how regular and smooth the pixel variations are. For example, pixel differences in natural images typically follow a predictable pattern, with smaller variations between neighboring pixels occurring more frequently. However, secret data is introduced when steganography is used, changing this pattern and causing odd spikes or histogram shifts. By showing how much the pixel difference values have changed, the analysis can also be used to distinguish between different steganographic methods. This technique is helpful since it is reasonably easy to use and efficient at identifying even minute alterations in the pixel level structure of an image [
32].
The Baboon and Peppers images are used as test images for the pixel difference histogram (PDH). Three different tests are performed based on the data capacity to be embedded. These are 10 KB, half the complete image size, and the full image size of the cover. It should be noted that ‘full size’ refers to the modification of all pixels in an image. The PDH curves of the cover image (the plaintext image) and the steganography image (the image with embedded data) are analyzed to detect any visible histogram deviations, which would suggest the influence of data embedding on the image.
As illustrated in
Figure 8, the results of the embedding test with 10 KB show that there is essentially no discernible difference between the cover image and the steganography image curves. This similarity suggests that the pixel difference histogram is not substantially altered by embedding a small amount of data 10 KB, thereby enabling the image to maintain its original structural and visual characteristics. This result shows that small data sizes are well suited for secure and imperceptible steganographic applications, as they preserve the integrity of the pixel relationships within the cover image.
As can be observed in
Figure 9, the curves of the cover and steganography images exhibit a slight divergence in the half-full-size embedding (embedding into half of the pixels). However, this discrepancy suggests a minor influence on the pixel relationships within the image; the variation is negligible and does not significantly disrupt the histogram pattern. This subtle deviation implies that the PDH curve maintains a high degree of similarity to the original even with a moderate embedding size, indicating the method’s robustness against moderate data payloads.
The differences between the cover and steganography curves become more apparent when data is embedded into the full image size, as illustrated in
Figure 10. This increase in deviation indicates that the pixel difference histogram is more significantly influenced by higher embedding capacities, as expected. However, the steganography curve still bears a striking resemblance to the original cover curve, despite the greater variation, to the extent that the discrepancies could be regarded as minor. This implies that the visual quality of the image remains acceptable in most applications despite the fact that full-size embedding introduces detectable modifications. The impact is relatively contained, confirming that the proposed method can resist attacks.
3.5. Effect of Changing the B-LSB Values on the PSNR
Altering the value of b-LSB by embedding multiple bits per pixel within the LSB method of image steganography also impacts the PSNR value. The parameter
b defines the number of bits modified in each pixel of the cover image. When
, only the least significant bit is adjusted, resulting in minimal perceptual distortion. Conversely, as
b increases, the changes become more pronounced, potentially negatively affecting image quality.
Figure 11 illustrates the degradation in PSNR as the number of modified bits increases, specifically for the Baboon image.
As shown in
Figure 11, the PSNR decreases linearly as
b increases. Meanwhile, the MSE also rises, indicating a greater discrepancy between the original and steganographic images. This highlights a trade-off between higher embedding capacity and visual imperceptibility.
Notably, when b exceeds 3, the MSE increases sharply, indicating that higher bit plane embedding introduces substantial distortion. We stress that this behavior is well established in the steganography literature and is not claimed as a novel finding in this work. Here, we report it only as an empirical confirmation under the proposed hybrid pipeline (SPN + chaotic permutation + LSB embedding) and to justify the choice of a practical operating range (capped at ) that balances data hiding capacity and steganographic image quality in our evaluation.
4. Conclusions and Future Work
This paper presents a novel approach to secure data transmission by integrating encryption and steganography. The proposed method encrypts the message using the CLM and SPN encryption algorithms, employing row and column permutations to improve security. The encrypted data is then embedded into a cover image using LSB substitution steganography, ensuring imperceptibility while maintaining robust protection. The experimental results demonstrate that the proposed technique effectively balances image quality, embedding capacity, and security. The PSNR values indicate minimal perceptual distortion, while the efficient execution time further underscores the method’s practicality. Pixel difference analysis also supports the method’s effectiveness, as histogram variations remain nearly imperceptible when embedding data up to half the size of the cover image. We also note that a data compression approach can also be used to handle a large payload by reducing its size and reading the data as a binary file in uint8 format, thereby increasing the effective embedding capacity.
In future research, adaptive embedding techniques can be leveraged by integrating deep learning models or transformers to identify optimal embedding regions. This dynamic approach can further enhance imperceptibility and security. Additionally, exploring multidimensional chaotic systems could further improve unpredictability and robustness. Another possible direction for further research is to adapt the proposed method towards creating a highly efficient hybrid video encryption and steganography algorithm which utilizes the same lightweight chaotic map for random pixel selection.
The use of advanced steganalysis tools is also necessary to verify the imperceptibility of the approach and its level of resistance to steganalysis attacks. Some such tools include XuNet (structural design of conventional neural networks) [
33,
34], while more are being developed. There is also a need for a unified benchmark that supports fair and transparent evaluation of different steganography methods.