Abstract
In this paper, we firstly study the security enhancement of three steganographic methods by using a proposed chaotic system. The first method, namely the Enhanced Edge Adaptive Image Steganography Based on LSB Matching Revisited (EEALSBMR), is present in the spatial domain. The two other methods, the Enhanced Discrete Cosine Transform (EDCT) and Enhanced Discrete Wavelet transform (EDWT), are present in the frequency domain. The chaotic system is extremely robust and consists of a strong chaotic generator and a 2-D Cat map. Its main role is to secure the content of a message in case a message is detected. Secondly, three blind steganalysis methods, based on multi-resolution wavelet decomposition, are used to detect whether an embedded message is hidden in the tested image (stego image) or not (cover image). The steganalysis approach is based on the hypothesis that message-embedding schemes leave statistical evidence or structure in images that can be exploited for detection. The simulation results show that the Support Vector Machine (SVM) classifier and the Fisher Linear Discriminant (FLD) cannot distinguish between cover and stego images if the message size is smaller than 20% in the EEALSBMR steganographic method and if the message size is smaller than 15% in the EDCT steganographic method. However, SVM and FLD can distinguish between cover and stego images with reasonable accuracy in the EDWT steganographic method, irrespective of the message size.
1. Introduction
Steganography is an increasingly important security domain; it aims to hide a message (secret information) in digital cover media without causing perceptual degradation (in this study, we use images as cover media). It should be noted that many steganographic methods have been proposed in the spatial and frequency domains. In the spatial domain, pixels are directly used to hide secret messages; these techniques are normally easy to implement and have a high capacity. However, they are not generally robust against statistical attacks [1,2]. In the transform domain, coefficients of frequency transforms, such as DCT (Discrete Cosine Transform), FFT (Fast Fourier Transform), and DWT (Discrete Wavelet Transform), are used to hide secret data. Generally, these techniques are complex, but they are more robust against steganalysis (to noise and to image processing).
The main steganographic methods in the spatial domain [3,4,5,6,7,8,9,10,11,12,13,14,15,16,17] are LSB-based (Low Significant Bit). Recently, entropy has also been extensively used to support data-hiding algorithms [18,19,20]. The LSB methods entail replacing the least significant bit of pixels with a bit of the secret data. Among these methods, the EALSBMR method [3] is an edge adaptive scheme with respect to the message size and can embed data according to the difference between two consecutive pixels in the cover image. To the best of our knowledge, we conclude that this method is the best (good PNSR, high embedding capacity, and especially adaptive), but it suffers from low security in terms of message detection. For this reason, we have enhanced its security.
Frequency domain steganography, as a watermarking domain [21,22,23,24,25,26,27,28,29], is widely based on the DCT and DWT transforms. The DCT usually transforms an image representation into a frequency representation by grouping pixels into 8 × 8 pixel blocks and transforming each block, using the DCT transform, into 64 DCT coefficients. A message is then embedded into the DCT coefficients. The Forward Discrete Wavelet Transform is, in general, suitable for identifying areas in the cover image where a secret message can be effectively embedded due to excellent space-frequency localization properties. In particular, these properties allow exploiting the masking effect of a human visual system so that if a DWT coefficient is modified, it modifies only the region that corresponds to that coefficient. The Haar wavelet is the simplest possible wavelet that can achieve the DWT.
However, the aforementioned steganographic methods are not secure in terms of message detection. To protect the content of messages, chaos can be used. Indeed, chaotic sequences play an important role in information hiding and in security domains, such as cryptography, steganography, and watermarking, because of their properties such as sensitivity to initial conditions and parameters of the system, ergodicity, uniformity, and pseudo-randomness. Steganography generally leaves traces that can be detected in stego images. This can allow an adversary, using steganalysis techniques, to divulge a hiding secret message. There are two types of opponents: passive and active. A passive adversary only examines communication to detect whether communication contains hidden messages. In this case, the content of the communication is not modified by the rival. An active adversary can intentionally cause disruption, distortion, or destruction of communication, even in the absence of evidence of secret communication. The main steganographic methods have been designed for cases of passive adversary. In general, there are two kinds of steganalysis: specific and universal. Specific steganalysis is designed to attack a specific steganography algorithm. This type of specific steganalysis can generally produce more accurate results, but it fails to produce satisfactory results if the inserted secret messages are in the form of a modified algorithm. Universal steganalysis, on the other hand, can be regarded as a universal technique to detect various types of steganography. Moreover, it can be used to detect new steganographic techniques where specific steganalysis does not yet exist. In other words, universal steganalysis is an irreplaceable tool for detection if the integration algorithm is unknown or secret.
In this paper, we first integrate an efficient chaotic system into the three steganographic methods mentioned above to make them more secure. The chaotic system quasi-chaotically chooses pixel positions in the cover image where the bits of the secret message will be embedded. Thus, the inserted bits of the secret message becomes secure against message bits recovery attacks because their position is unknown.
Second, we study and apply three universal steganalysis methods to the aforementioned chaos-based steganographic methods. The first steganalysis method, developed by Farid [30], uses higher-order statistics of high-frequency wavelet sub-bands and their prediction errors to form the feature vectors. In the second steganalysis method, as formulated by Shi et al. [31], the statistical moments of the characteristic functions of the prediction-error image, the test image, and their wavelet sub-bands are selected as the feature vectors. The third steganalysis method, introduced by Wang et al. [32], uses the features that are extracted from both the empirical probability density function (PDF) moments and the normalized absolute characteristic function (CF). For the three steganalysis algorithms, we applied FLD analysis and the SVM method with the RBF kernel as classifiers between cover images and stego images.
The paper has been organized as follows: In Section 2, we describe the proposed chaotic system. In Section 3, we present the three enhanced steganographic algorithms. In Section 4, we illustrate the experimental results and analyze the enhanced algorithms. In Section 5, we develop, in detail, the steganalysis techniques for the previous algorithms. In Section 6, we report the results of the steganalysis, and in the last section, we conclude our work.
2. Description of the Proposed Chaotic System
This system is made of a perturbed chaotic generator and a 2-D cat map. The chaotic generator supplies the dynamic keys for the process of provides the position of the new random pixel (see Figure 1). The chaotic system allows inserting a message both in a secretive and uniform manner [33,34,35,36,37,38,39,40].
Figure 1.
Proposed chaotic generator.
The generator of discrete chaotic sequences exhibits orbits with very large lengths. It is based on two connected non-linear digital IIR filters (cells). The discrete PWLCM and SKEW TENT maps (non-linear functions) are used. A linear feedback shift register (m-LFSR) is then used to disturb each cell (Figure 2). The disturbing technique is associated with the cascading technique, which allows controlling and increasing the length of the orbits that are produced. The minimum orbit length of the generator output is calculated using Equation (1):
Figure 2.
Chaotic generator.
In the above equation, is the least common multiple, = 23 and = 21 are the degrees of the LFSR’s primitive polynomials, and and are the lengths and of outputs cells, respectively, without disturbance. The equations of the chaotic generators are formulated as follows:
The two previously mentioned functions, PWLCM map and Skew map, are defined according to the following relations:
The control parameter is used for the PWLCM map and ranges from 1 to , and is the control parameter that is used for the Skew map and ranges from 1 to . is the word length used for simulations. The size of the secret key K, formed by all initial conditions and parameters of the chaotic generator, is (6 × 32 + 5 × 32 + 31 + 23 +21) = 427 bits. It is large enough to resist a brute-force attack.
Description of the Cat Map Used
The permutation process is based on the modified Cat map and is calculated in a very efficient manner using the equation below [37]:
In the above equation, and are the original and permuted square matrices of size , from which we calculate the matrix as follows:
The dynamic key is structured as follows:
In the above equations, are the parameters of the Cat map and r is the number of rounds.
3. Enhanced Steganographic Algorithms
In this section, we describe three enhanced steganographic algorithms by using an efficient chaotic system.
3.1. Enhanced EALSBMR (EEALSBMR)
Below, we present the insertion procedure and the extraction procedure of the proposed enhancement of the EALSBMR method (EEALSBMR) [41].
3.1.1. Insertion Procedure
The flow diagram of the embedding scheme can be found in Figure 3.
Figure 3.
EEALSBMR insertion procedure.
The detailed embedding steps for this algorithm have been explained as follows:
- Step 1:
- Capacity estimation
- To estimate the insertion capacity, we arrange the cover image into a 1D vector V, and we divide its content into non-overlapping embedding units (blocks) consisting of two consecutive pixels . Following this, we calculate the difference between the pixels of each block, and we increase by one the content of the vector-difference of 31 elements , in which each element contains number of blocks where is a set of pixel pairs whose absolute differences are greater than or equal to t, as shown below:
- For a given secret message M of size bits, the threshold T used in the embedding process is determined by the following expression and pseudo-code (Algorithm 1):
Algorithm 1 Pseudo-code determining the value of the threshold T - 1:
- procedure
- 2:
- = 0;
- 3:
- for t = 31:-1:1 do
- 4:
- = + ;
- 5:
- if (2* > = |M|) then
- 6:
- ;
- 7:
- break;
- 8:
- end if;
- 9:
- end for;
- 10:
- end procedure
- Step 2:
- Embedding process
- The embedding process is achieved as follows: we divide the cover image into two sub-images; one includes the odd columns, and the other includes the even columns.
- Following this, the chaotic system chooses a pixel position () from the odd sub-image; the second pixel position of the corresponding block must have the same in the even image. If a pair of pixel units satisfies Equation (8), then a 2 bit-message can be hidden (one bit by pixel); otherwise, the chaotic system chooses another .
- For each unit , we perform data-hiding based on the following four cases [42]:
- Case 1:
- if and
- Case 2:
- if and
- Case 3:
- if and
- Case 4:
- if and
In the above equations, and are the th and th secret bits of the message to be embedded; r is a random value belonging to , and denotes the pixel pair after data-hiding. The function f is defined as follows: - Readjustment if necessary: After hiding, may be out of range [0, 255] or the new difference value may be less than the threshold T. In these cases, we need to readjust and , and the new readjusted values, and , are calculated as follows [3]:with :are two arbitrary numbers from ; when:then :The sequence follows as such for each new block position.
- Finally, we embed the parameter T of the stego image into the first five pixels or the last five pixels, for example.
3.1.2. Extraction Procedure
- Extract the parameter T from the stego image.
- Divide the stego image into two sub-images; one includes the odd columns, and the other includes the even columns.
- Generate a pseudo-chaotic position (using the same secret key K), as done in the insertion procedure, to obtain the same order of pixel unit position as the odd sub-image. The second pixel block has the same in the even image.
- Verify if and then extract the two secret bits of M as follows:with : orOtherwise, the chaotic system chooses another pseudo-chaotic position. The sequence follows as such for each unit position until all messages have been extracted.
- Example of insertion:The cover image is this image of “peppers” as in Figure 4:
Figure 4. “Peppers” as cover image.The embedded message appears as follows in 40 × 40 pixels as shown in Figure 5:
Figure 5. “Bike” is as embedded message.The corresponding sequence of the bits message has been given as follows:The length of the binary message is 13,120 bits.Capacity estimation produces the thresholdSuppose that the pseudo-chaotic positions of a block to embed the two bits message and are (354, 375) and (354, 376) that correspond to the 141 and 129 gray values (see Figure 6).
Figure 6. Pseudo-chaotic block selection and its corresponding gray value.Hiding the message bits:We are in Case 2:Therefore, the new pixel values are as follows:The difference between the new pixel values is:Then we need to adjust the new pixel values: - Extraction of the bits message in the previous insertion example:The extraction is performed using the following equation:
3.2. Enhanced DCT Steganographic Method (EDCT)
The DCT transforms a signal or image from the spatial domain into the frequency domain [43,44]. A DCT expresses a sequence of finitely many data points in terms of a sum of cosine functions, oscillating at different frequencies. The 2D DCT is calculated as follows:
where:
The block diagram of the proposed enhanced steganographic-based DCT transform has been shown in Figure 7.
Figure 7.
Diagram of the enhanced steganographic-based DCT transform.
3.2.1. Insertion Procedure
The embedding process consists of the following steps:
- Read the cover image and the secret message.
- Convert the secret message into a 1-D binary vector.
- Divide the cover image into 8 × 8 blocks. Then apply the 2D DCT transformation to each block (from left to right, top to bottom).
- Use the same chaotic system to generate a pseudo-chaotic .
- Replace the LSB of each located DCT coefficient with the one bit of the secret message to hide.
- Apply the 2D Inverse DCT transform to produce the stego image.
3.2.2. Extraction Procedure
The extraction procedure consists of the following steps:
- Read the stego image.
- Divide the stego image into 8 × 8 blocks and then apply the 2D DCT to each block.
- Use the same chaotic system to generate pseudo-chaotic .
- Extract the LSB of each pseudo-located coefficient.
- Construct the secret image.
3.3. Enhanced DWT Steganographic Method (EDWT)
The embedded secret image in the lower frequency sub-band is generally more robust than the other sub-bands, but it significantly decreases the visual quality of the image, as normally, most of the image energy is stored in this sub-band. In contrast, the edges and textures of the image and the human eye are not generally sensitive to changes in the high-frequency sub-band ; this allows secret information to be embedded without being perceived by the human eye. However, the sub-band is not robust against active attacks (filtering, compression, etc.). The compromise adopted by many DWT-based algorithms to achieve accepted performance of imperceptibility and robustness enables embedding the secret image in the middle-frequency sub-bands or . In the block diagram of the proposed steganographic EDWT method shown in Figure 8, we embed the secret image in the sub-band of the cover image (the size of the secret message must, at most, be equal to the size of the sub-band of the cover image).
Figure 8.
Diagram of the EDWT algorithm.
3.3.1. Insertion Procedure
The embedding process consists of the following steps:
- Read the cover image and the secret image.
- Transform the cover image into one level of decomposition using Haar Wavelet.
- Permute the secret image in a pseudo-chaotic manner.
- Fuse the DWT coefficients of the cover image and the permuted secret image as follows [45]:In the above equations, is the modified DWT coefficient ; X is the original DWT coefficient . and are the embedding strength factors; they are chosen such that the resulting stego image has a large . In our experiments, we tested some values of , and the best value was found to be approximately 0.01.
- Apply Inverse Discrete Wavelet Transform (IDWT) to produce the stego image in the spatial domain.
3.3.2. Extraction Procedure
The extraction procedure involves the following steps:
- Read the stego image.
- Transform the stego image into one level of decomposition using Haar Wavelet.
- Apply inverse fusion transform to extract the permuted secret image as follows:The extraction procedure is not blind, as we need the cover image to extract the permuted secret message.
- Apply the inverse permutation procedure using the same chaotic system to obtain the secret image.
4. Experimental Results and Analysis
In the experiments, we first create the stego images by using the implemented steganographic methods that were applied on the standard gray level cover images “Lena”, “Peppers”, “Baboon” in 512 × 512 pixels and using “Boat” as a secret message with different sizes (embedding rates, ranging from 5% to 40%). The six criteria used to evaluate the qualities of the stego images have been listed as follows: Peak Signal-to-Noise Ratio () [46], Image Fidelity (), structural similarity (), the entropy (E), the redundancy (R), and the image redundancy (). They can be represented by the following equations:
In the above equations, and are the pixel value of the th row and th column of the cover and stego image; M and N are the width and height of the considered cover image.
, are the average of the cover and stego images; , are the variance of the cover and stego images; is the co-variance of the cover-stego; , are two variables that are used to stabilize the division with a weak denominator; L is the dynamic range of the pixel values, and , are two much smaller constants compared to 1. We considered = = 0.05.
The higher the , , and , the better the quality of the stego image. values falling below 40 dB indicate a fairly low quality. Therefore, a high-quality stego should strive to be above 40 dB.
Additionally, we used three other parameters to estimate the qualities of the stego images. These parameters have been listed as follows:
- -
- The Entropy E, given by the following relation:L is already defined. is the probability of the pixel value .
- -
- The Redundancy R is usually represented by the following formula:Here, . However, this relationship is problematic because the value of the minimal entropy is not known. For that, Tasnime [47] proposed using the following relationship, which seems to be more precise:Called Image Redundancy () with:
- S being the size of the image under test;
- being the number of occurrences of each pixel value;
- being the optimal number of occurrences that each pixel value should have to get a non-redundant image.
In the following section, we present and compare the performance of the three implemented steganographic methods.
4.1. Enhanced EALSBMR
The results obtained from the parameters , , and for the algorithm have been presented in Table 1; their values indicate the high quality of the stego images, even with a high embedding rate of 40%. We observe that the , , and values decrease, as expected, when the size of the secret message increases.
Table 1.
, , and values for the EEALSBMR method.
In Figure 9a–c, we show the “Baboon” cover image and the corresponding stego images for 5% and 40% embedding rates, respectively. The visual quality obtained from the “Baboon” stego images is very high because visually, it is impossible to discriminate between the cover and stego images.
Figure 9.
(a) Cover image, (b) Stego image with embedding rate of 5%, (c) Stego image with embedding rate of 40%.
Just to fix the ideas, using the Lina image as the cover, and to obtain approximately identical capacity, we globally compared the obtained of the EEALSBMP method with that obtained by the following methods: [4,5,6,17]. We observed that only the method proposed by Borislav et al. [17] produces a better than the EEALSBMP method. However, this method cannot be adapted.
4.2. Enhanced DCT Steganographic Method
The results obtained from this method, as presented in Table 2, indicate the high quality of the stego images, even with a high embedding rate. Additionally, even the visual quality obtained is very high, as shown in Figure 10.
Table 2.
, , and values for the EDCT method.
Figure 10.
(a) Cover image, (b) Stego image with embedding rate of 5%, (c) Stego image with embedding rate of 40%.
4.3. Enhanced DWT Steganographic Method
Table 3 presents the results obtained from the EDWT algorithm, which indicate that the steganographic algorithm exhibits good performance. Furthermore, no visual trace can be found in the resulting stego images, as shown in Figure 11a–c.
Table 3.
, , and values for EDWT method.
Figure 11.
(a) Cover image, (b) Stego image with embedding rate of 5%, (c) Stego image with embedding rate of 40%.
4.4. Performance Comparison of the Three Steganographic Methods
Table 1, Table 2 and Table 3 of , , and of the three methods show that the EEALSBMR and EDCT methods, in comparison with the EDWT method, ensure better quality of the stego images at different embedding rates. There is approximately a 10-dB difference in s at a 5% embedding rate and a 5 to 8 dB difference in s at a 40% embedding rate.
4.5. Performance Using Parameters E, R and
The results obtained from parameters E, R, and for the three algorithms on the stego images with different embedding rates have been presented in Table 4, Table 5 and Table 6. As we can see, these values, given in Table 7, are too close to the values obtained over the original images. This is consistent with the previous results obtained from the parameters , , and regarding the high quality of the stego images.
Table 4.
E, R, and for the EEALSBMR method.
Table 5.
E, R, and values for the EDCT method.
Table 6.
E, R, and values for EDWT method.
Table 7.
E, R, and values for the cover images.
5. Universal Steganalysis
A good steganographic method should be imperceptible not only to human vision systems but also to computer analysis. Steganalysis is the art and science that detects whether a given image has a message hidden in it [1,48]. The extensive range of natural images and the wide range of data embedding algorithms make steganalysis a difficult task. In this work, we consider universal steganalysis to be based on statistical analysis.
Universal (blind) steganalysis attempts to detect hidden information without any knowledge about the steganographic algorithm. The idea is to extract the features of cover images and the features of stego images and then use them as the feature vectors that are used by a supervised classifier (SVM, FLD, neural networks…) to distinguish whether the image under test is a stego image. This procedure is illustrated in Figure 12. The left side of the flowchart displays the different steps of the learning process while the right side illustrates the different steps of the testing process.
Figure 12.
Flowchart of the blind steganalysis process.
5.1. Multi-Resolution Wavelet Decomposition
The DWT, which uses a sub-bands coding algorithm, is found to quickly compute the Wavelet Transform. Furthermore, it is easy to implement and reduces the computation time and the number of resources required. The DWT analyses the signal at different frequency bands with different resolutions by decomposing the signal into a coarse approximation and into detailed information. The decomposition of the signal into different frequencies is achieved by applying separable low-pass and high-pass filters along the image axes. The DWT computes the approximation coefficients matrix A and details coefficients matrices H, V, and D (horizontal, vertical, and diagonal, respectively) of the input matrix X, as illustrated in Figure 13.
Figure 13.
Multi-resolution wavelet decomposition.
5.2. Feature Vector Extraction
As the amount of image data is enormous, it is not feasible to directly use the complete image data for analysis. Therefore, for steganalysis, it is useful to extract a certain amount of useful data features that represent the image instead of the image itself. The addition of a message to a cover image may not affect the visual appearance of the image, but it will affect some statistics. The features required for steganalysis should be able to detect these minor statistical disorders that are created during the data-hiding process.
Three feature-extraction techniques are used in this paper to detect the presence of a secret message; these methods calculate the statistical properties of the images by employing multi-resolution wavelet decomposition.
5.2.1. Method 1: Feature Vectors Extracted from the Empirical Moments of the PDF-Based Multi-Resolution Coefficients and Their Prediction Error
The multi-resolution wavelet decomposition employed here is based on separable quadrature mirror filters (QMFs). This decomposition splits the frequency space into multiple scales and orientations. This is accomplished by applying separable low-pass and high-pass filters along the image axes, generating a vertical, horizontal, diagonal, and low-pass sub-band. The horizontal, vertical, and diagonal sub-bands at scale m = 1, 2, ..., n are denoted as , and .
In our work, the first set of features is extracted from the statistics over coefficients (x,y) of each sub-band and for levels (scales) m = 1 and n = 3. These characteristics represent the following: mean , variance , skewness , and kurtosis . They can be represented as follows:
From Equation (24), we can build the first feature vector of 4 × 3 × 3 = 36 elements, where , and n are the number of moments, sub-bands, and scales. The feature vector is represented as follows:
where:
The second set of statistics is based on the prediction errors of coefficients of an optimal linear predictor. The sub-band coefficients are correlated with their spatial, orientation, and scale neighbors. Several prediction techniques of coefficients , , and (m = 1, 2, 3) may be used. In this work, we used a linear predictor, specifically the one proposed by Farid in [30], as shown below:
For more clarity, in Figure 14, we provide the block diagram for the prediction of coefficient .
Figure 14.
Block diagram for the prediction of coefficient .
The parameters (scalar weighting values) of the error prediction coefficients of each sub-band for a given level m are adjusted to minimize the prediction error by minimizing the quadratic error function, as shown below:
The columns of the matrix Q contain the neighboring coefficient magnitudes, as specified in Equations (25)–(27). The quadratic error function is minimized analytically as follows:
Then, we obtain:
For the optimal predictor, we use the log error given by the following equation to predict error coefficients of each sub-band for a given level m:
By using Equation (31), additional statistics are collected, namely the mean, variance, skewness, and kurtosis (see Equation (24)). The feature vector is similar to ; it is represented as follows:
where:
Finally, the feature vector that will be used for the learning classifier is represented by . It contains 72 components.
5.2.2. Method 2: Feature Vectors Extracted from Empirical Moments of CF-Based Multi-Resolution
The first set of feature vectors is extracted based on the CF and the wavelet decomposition, as proposed by Shi et al. [31]. The statistical moments of the characteristic function of order 1 to 3 are represented for each sub-band at different levels m = 1, 2, and 3 of the wavelet decomposition as follows:
is a component of the characteristic function at frequency k, calculated from the histogram of the sub-band , and N is the total number of points of the histogram. Equation (32) allows us to build the first feature vector of size 12 × 3 = 36 components and 3 moments of the initial image. The feature vectors have been listed as follows:
In the above equation, are the moments of the initial image.
The second category of features is calculated from the moments of prediction-error image and its wavelet decomposition.
Prediction-error image:
In steganalysis, we only care about the distortion caused by data-hiding. This type of distortion may be rather weak and, hence, covered by other types of noises, including those caused due to the peculiar feature of the image itself. To make the steganalysis more effective, it is necessary to keep the noise of the dissimulation and eliminate most of the other noises. For this purpose, we calculate the moments of characteristic functions of order 1, 3 of the predicted error image and of its wavelet decomposition at the various levels 1, 2, and 3 (see Equation (32)). The prediction-error image is obtained by subtracting the predicted image (in which each predicted pixel grayscale value in the cover image uses its neighboring pixels’ grayscale values (see Equation (34))) from the cover image. Such features make the steganalysis more efficient because the hidden data is usually unrelated to the cover media. The prediction pixel is expressed as follows:
In the above equation, a, b, c are the context of the pixel x under consideration; is the prediction value of x. The location of a, b, c can be illustrated as in Figure 15.
Figure 15.
Prediction context of a pixel x.
The feature vector is represented as follows:
In the above equation, are the 1st, 2nd , and 3rd order moments of the corresponding CFs, from the sub-band of the 1st level decomposition on the error image.
Finally, the feature vector that will be used for learning classification is , containing 78 components.
5.2.3. Method 3: Feature Vector Extracted from Empirical Moments Based on the FC and the PDF of Image Prediction Error and Its Different Sub-Bands of the Multi-Resolution Decomposition
The first characteristic vector combines two types of normalized moments: moments based on the function density of probability and moments based on the characteristic function of various sub-bands of the multi-resolution decomposition at three levels of the gray image. We use the expression of Wang and Moulin [32] to calculate the moments of order 1 to 6 of the initial image and its sub-band of the three-level ( 1 to 3) wavelet decomposition, as shown below:
is a component of the characteristic function at frequency k, estimated from the histogram. Equation (35) already allows having a feature vector of 6 × 1 + 6 × (4 × 3) = 78 components. Also, to improve the performance of the learning system, we calculate the moments of the sub-bands , , , obtained from the decomposition of the diagonal sub-band . Therefore, the total size of the vector is 78 + (6 × 4) = 102 components.
For example, are the first six order moments of the original image.
The second category of characteristics consists of the first six moments of the prediction error, which is of coefficients of each sub-band for a given level m, as shown below:
The vector of the second category is defined by , as shown below:
for each
The size of is 3 x 6 x 3 = 54 components.
Finally, the feature vector to be used for classification by learning is . It has 156 components.
5.3. Classification
The last stage of the learning and test process of the universal steganalysis is classification (see Figure 12). Its objective is to group the images into two classes, class of the cover images and class of the stego images, according to their feature vectors. We adopt the Fisher linear discriminator (FLD) and the support vector machine (SVM) for training and testing.
5.3.1. FLD Classifier
Below, we reformulate the FLD classifier for our application and apply it to two classes. Let be a set of feature vectors, each with dimensions. Among these vectors, vectors are feature vectors labeled 1, indicating cover images. vectors are labeled 2, indicating stego images, with . We want to form all projection values of dimension N through linear combinations of feature vectors as follows:
In the above equation, is an orientation vector of dimension .
In our study, the feature vector is projected into a space of two classes. This projection tends to maximize the distance between the projected class means while minimizing projected class scatters .
- Learning processThe learning process involves optimizing the following expression:where:is the mean feature vector of cover class after projection, andis the mean feature vector of cover class of dimension .The mean feature vector of stego class after projection is represented as follows:where:is the mean feature vector of a stego class of dimension .The scatter matrix of the cover class after projection has been shown as follows:where:is the scatter matrix (of dimension ) of a cover class.The scatter matrix of the projected samples of a stego class has been shown as follows:where:is a scatter matrix (of dimension ) for the samples in the original feature space of a stego class.
- Testing processThe testing process (classification step) is conducted as follows:Let Z be the matrix containing the feature vectors of covers and stegos.The projection of Z on the orientation vector gives all projected values .b is a threshold of discrimination between both classes, and it can be fixed to a value that is halfway between both averages projected on the cover and stego.with:In the above equations, is the transposed of .The result , determines the cover or stego class of every test image.Indeed, if , then the image under test is cover; otherwise, it is stego.
5.3.2. SVM Classifier
According to numerous recent studies, the SVM classification method is better than the other data classification algorithms in terms of classification accuracy [50]. SVM performs classification by creating a hyper-plan that separates the data into two categories in the most optimal way.
Let be a set of training examples, each example , being the dimension of the input space; it belongs to a class labeled as . SVM classification constructs a hyper-plan , which best separates the data through a minimizing process, as shown below:
Variables are called slack variables, and they measure the error made at point .
Parameter C can be viewed as a way to control overfitting.
and is the trade-off between regularization and constraint violation.
Problems related to quadratic optimization are a well-known class of mathematical programming problems, and many (rather intricate) algorithms exist to aid in solving them. Solutions involve constructing a dual problem where a Lagrange multiplier is associated with every constraint in the primary problem, as shown below:
or Lagrange multipliers are also known as support values.
The linear classifier presented previously is very limited. In most case, classes not only overlap, but the genuine separation functions are non-linear hyper-surfaces. The motivation for such an extension is that an SVM that can create a non-linear decision hyper-surface will be able to non-linearly classify separable data.
The idea is that the input space can always be mapped on to a higher dimensional feature space where the training set is separable.
The linear classifier relies on the dot product between vectors . If every data point is mapped on to a high-dimensional space via some transformation , the dot product becomes:. Then in the dual formulation, we maximize the following:
Subsequently, the decision function turns into the following:
It should be noted that the dual formulation only requires access to the kernel function and not the features , allowing one to solve the formulation in very high-dimensional feature spaces efficiently. This is also called the kernel trick.
There are many kernel functions in SVM. Therefore, determining how to select a good kernel function is also a research issue. However, for general purposes, there are some popular kernel functions [50,51], which have been listed as follows:
- Linear Kernel:
- Polynomial Kernel:
- RBF Kernel:
- Sigmoid Kernel:
Here, , r, and d are kernel parameters.
In our work, we used the RBF kernel function.
6. Experimental Results of Steganalysis
In this section, we present some experimental results that were obtained from the studied steganalysis system that was applied to the enhanced steganographic methods in the spatial and frequency domain. For this purpose, the image dataset UCID [52,53] is used, which includes 1338 uncompressed color images, and all the images were converted to grayscale before conducting the experiments.
In our experiments, we first created the stego images using the following steganographic methods: Enhanced EALSBMR (EEALSBMR), Enhanced DCT steganography (EDCT), and Enhanced DWT steganography (EDWT). We used these methods with different embedding rates of 5%, 10%, and 20%. Following this, we extracted the image features using the three feature-extraction techniques described above (Farid, Shi, and Moulin techniques) for both the cover and stego images. Finally, we employed the classifiers FLD and SVM to classify the images as either containing a hidden message or not. The evaluation of the classification (binary classification) and the steganalysis (also indirectly the efficiency of insertion methods) is performed by calculating the following parameters: sensibility, specificity, and precision of the confusion matrix and the Kappa coefficient (see Table 8 and Equation (64))
with:
Table 8.
Confusion matrix.
In the above equation, is the total agreement probability (related to the accuracy), and is the agreement probability that arises out of chance.
Here is one possible interpretation of Kappa values:
- Poor agreement = Less than 0.20
- Fair agreement = 0.20 to 0.40
- Moderate agreement = 0.40 to 0.60
- Good agreement = 0.60 to 0.80
- Very good agreement = 0.80 to 1.00
6.1. Classification Results Applied to the Steganographic Method EEALSBMR
In Table 9, Table 10, Table 11, Table 12, Table 13 and Table 14, we present the classification results (steganalysis) based on the classifiers FLD and SVM and the features of Farid, Shi, and Moulin for the EEALSBMR insertion method with different insertion rates of 5%, 10%, and 20%. The results show that steganalysis is not effective for all insertion rates. Indeed, the values , , and vary around 50%, so these values are not informative values and do not give any idea about the nature of the data. The value of the Kappa coefficient (lower than 0.2) confirms these results. The EEALSBMR steganographic method is robust against statistical steganalysis techniques.
Table 9.
FLD classification evaluation of EEALSBMR algorithm using Farid features.
Table 10.
FLD classification evaluation of EEALSBMR algorithm using Shi features.
Table 11.
FLD classification evaluation of EEALSBMR algorithm using Moulin features.
Table 12.
SVM classification evaluation of EEALSBMR algorithm using Farid features.
Table 13.
SVM classification evaluation of EEALSBMR algorithm using Shi features.
Table 14.
SVM classification evaluation of EEALSBMR algorithm using Moulin features.
6.2. Classification Results Applied to the Steganographic Method EDCT
The classification results (steganalysis) provided in Table 15, Table 16, Table 17, Table 18, Table 19 and Table 20 for the EDCT insertion method show that with the FLD classifier, when the insertion rate is equal to or higher than 20%, steganalysis is very effective with Shi features and Moulin features, but it is less effective with Farid features. With the SVM classifier, except in the case of Shi features, when an insertion rate of 20% is applied, the results obtained are quite similar to those obtained from the EEALSBMR algorithm and, therefore, steganalysis is not effective. It should be noted that the FLD classifier is more effective for a feature vector of a high dimension than the SVM classifier.
Table 15.
FLD classification evaluation of EDCT algorithm using Farid features.
Table 16.
FLD classification evaluation of EDCT algorithm using Shi features.
Table 17.
FLD classification evaluation of EDCT algorithm using Moulin features.
Table 18.
SVM classification evaluation of EDCT algorithm using Farid features.
Table 19.
SVM classification evaluation of EDCT algorithm using Shi features.
Table 20.
SVM classification evaluation of EDCT algorithm using Moulin features.
6.3. Classification Results Applied to the Steganographic Method EDWT
With respect to the EDWT method, the results are provided in Table 21, Table 22, Table 23, Table 24, Table 25 and Table 26. These results obtained with the classifiers FLD and SVM indicate that the values of the parameters , , , , and are high for all insertion rates and feature vectors (Farid, Shi, and Moulin). These results can easily inform us about the presence of hidden information; therefore, steganalysis can be concluded to be very effective. As a result, the insertion method is not robust. It should be noted that steganalysis is very effective here because both the steganagraphic method and feature vectors are based on multi-resolution wavelet decomposition.
Table 21.
FLD classification evaluation of EDWT algorithm using Farid features.
Table 22.
FLD classification evaluation of EDWT algorithm using Shi features.
Table 23.
FLD classification evaluation of EDWT algorithm using Moulin features.
Table 24.
SVM classification evaluation of EDWT algorithm using Farid features.
Table 25.
SVM classification evaluation of EDWT algorithm using Shi features.
Table 26.
SVM classification evaluation of EDWT algorithm using Moulin features.
6.4. Discussion
The enhanced adaptive LSB methods of steganography in the spatial domain (EEALSBMR) and frequency domain (EDCT and EDWT) provide stego images with a good visual quality up to an embedding rate of 40%: the is over 50 dB, and the distortion is not visible to the naked eye. Security of the message contents, in case detected by an opponent, is ensured by using the chaotic system. On the other hand, we applied a universal steganalysis method that can work well with all known and unknown steganography algorithms. Universal steganalysis methods exploit the changes in certain inherent features of the cover images when a message is embedded. The accuracy of the classification (discrimination between two classes: cover and stego) of the system greatly relies on several factors, such as the choice of the right characteristic vectors, the classifier, and its parameters.
7. Conclusions
In this work, we first improved the structure and security of three steganagraphic methods that are studied in the spatial and frequency domain by integrating them with a robust proposed chaotic system. Following this, we built a statistical steganalysis system to evaluate the robustness of the three enhanced steganographic methods. In this system, we selected three different feature vectors, namely higher-order statistics of high-frequency wavelet sub-bands and their prediction errors, statistical moments of the characteristic functions of the prediction-error image, the test image, and their wavelet sub-bands, and both empirical PDF moments and the normalized absolute CF. After this, we applied two types of classifiers, namely FLD and SVM, with the RBF kernel.
Extensive experimental work has demonstrated that the proposed steganalysis system based on the multi-dimensional feature vectors can detect hidden messages using the EDWT steganographic method, irrespective of the message size. However, it cannot distinguish between cover and stego images using the EEALSBMR steganographic and EDCT methods if the message size is smaller than 20% and 15%, respectively.
Author Contributions
Funding acquisition, T.M.H.; Supervision, B.B., O.D. and M.K.; Writing—original draft preparation, D.B.; Writing—review & editing, S.E.A., T.M.H.
Funding
This work is supported by the National Foundation for Science and Technology Development (NAFOSTED) of Vietnam through the grant number 102.04-2018.06.
Acknowledgments
The authors thank the anonymous reviewers for useful comments.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Xia, Z.; Wang, X.; Sun, X.; Liu, Q.; Xiong, N. Steganalysis of LSB matching using differences between nonadjacent pixels. Multimed. Tools Appl. 2016, 75, 1947–1962. [Google Scholar] [CrossRef]
- Mohammadi, F.G.; Abadeh, M.S. Image steganalysis using a bee colony based feature selection algorithm. Eng. Appl. Artif. Intell. 2014, 31, 35–43. [Google Scholar] [CrossRef]
- Luo, W.; Huang, F.; Huang, J. Edge Adaptive Image Steganography Based on LSB Matching Revisited. IEEE Trans. Inf. Forensics Secur. 2010, 5, 201–214. [Google Scholar]
- Chan, C.K.; Cheng, L. Hiding data in images by simple LSB substitution. Pattern Recognit. 2004, 37, 469–474. [Google Scholar] [CrossRef]
- Wu, H.C.; Wu, N.I.; Tsai, C.S.; Hwang, M.S. Image steganographic scheme based on pixel-value differencing and LSB replacement methods. IEE Proc.-Vis. Image Signal Process. 2005, 152, 611–615. [Google Scholar] [CrossRef]
- Jung, K.; Ha, K.; Yoo, K. Image Data Hiding Method Based on Multi-Pixel Differencing and LSB Substitution Methods. In Proceedings of the 2008 International Conference on Convergence and Hybrid Information Technology, Daejeon, Korea, 28–30 August 2008; pp. 355–358. [Google Scholar] [CrossRef]
- Huang, Q.; Ouyang, W. Protect fragile regions in steganography LSB embedding. In Proceedings of the 2010 Third International Symposium on Knowledge Acquisition and Modeling, Wuhan, China, 20–21 October 2010; pp. 175–178. [Google Scholar]
- Xi, L.; Ping, X.; Zhang, T. Improved LSB matching steganography resisting histogram attacks. In Proceedings of the 2010 3rd International Conference on Computer Science and Information Technology, Chengdu, China, 9–11 July 2010; Volume 1, pp. 203–206. [Google Scholar]
- Swain, G.; Lenka, S.K. Steganography using two sided, three sided, and four sided side match methods. CSI Trans. ICT 2013, 1, 127–133. [Google Scholar] [CrossRef][Green Version]
- Islam, S.; Modi, M.R.; Gupta, P. Edge-based image steganography. EURASIP J. Inf. Secur. 2014, 2014, 1–14. [Google Scholar] [CrossRef]
- Mungmode, S.; Sedamkar, R.; Kulkarni, N. A Modified High Frequency Adaptive Security Approach using Steganography for Region Selection based on Threshold Value. Procedia Comput. Sci. 2016, 79, 912–921. [Google Scholar] [CrossRef][Green Version]
- Akhter, F. A Novel Approach for Image Steganography in Spatial Domain. arXiv 2015, arXiv:1506.03681. [Google Scholar]
- Iranpour, M.; Rahmati, M. An efficient steganographic framework based on dynamic blocking and genetic algorithm. Multimed. Tools Appl. 2015, 74, 11429–11450. [Google Scholar] [CrossRef]
- Kumar, R.; Chand, S. A reversible high capacity data hiding scheme using pixel value adjusting feature. Multimed. Tools Appl. 2016, 75, 241–259. [Google Scholar] [CrossRef]
- Muhammad, K.; Ahmad, J.; Farman, H.; Jan, Z. A new image steganographic technique using pattern based bits shuffling and magic LSB for grayscale images. arXiv 2016, arXiv:1601.01386. [Google Scholar]
- Kordov, K.; Stoyanov, B. Least Significant Bit Steganography using Hitzl-Zele Chaotic Map. Int. J. Electron. Telecommun. 2017, 63, 417–422. [Google Scholar] [CrossRef]
- Stoyanov, B.P.; Zhelezov, S.K.; Kordov, K.M. Least significant bit image steganography algorithm based on chaotic rotation equations. C. R. L’Academie Bulgare Sci. 2016, 69, 845–850. [Google Scholar]
- Taleby Ahvanooey, M.; Li, Q.; Hou, J.; Rajput, A.R.; Chen, Y. Modern Text Hiding, Text Steganalysis, and Applications: A Comparative Analysis. Entropy 2019, 21, 355. [Google Scholar] [CrossRef]
- Sadat, E.S.; Faez, K.; Saffari Pour, M. Entropy-Based Video Steganalysis of Motion Vectors. Entropy 2018, 20, 244. [Google Scholar] [CrossRef]
- Yu, C.; Li, X.; Chen, X.; Li, J. An Adaptive and Secure Holographic Image Watermarking Scheme. Entropy 2019, 21, 460. [Google Scholar] [CrossRef]
- Hashad, A.; Madani, A.S.; Wahdan, A.E.M.A. A robust steganography technique using discrete cosine transform insertion. In Proceedings of the 2005 International Conference on Information and Communication Technology, Cairo, Egypt, 5–6 December 2005; pp. 255–264. [Google Scholar]
- Fard, A.M.; Akbarzadeh-T, M.R.; Varasteh-A, F. A new genetic algorithm approach for secure JPEG steganography. In Proceedings of the 2006 IEEE International Conference on Engineering of Intelligent Systems, Islamabad, Pakistan, 22–23 April 2006; pp. 1–6. [Google Scholar]
- McKeon, R.T. Strange Fourier steganography in movies. In Proceedings of the 2007 IEEE International Conference on Electro/Information Technology, Chicago, IL, USA, 17–20 May 2007; pp. 178–182. [Google Scholar]
- Abdelwahab, A.; Hassaan, L. A discrete wavelet transform based technique for image data hiding. In Proceedings of the 2008 National Radio Science Conference, Tanta, Egypt, 18–20 March 2008; pp. 1–9. [Google Scholar]
- Singh, I.; Khullar, S.; Laroiya, D.S. DFT based image enhancement and steganography. Int. J. Comput. Sci. Commun. Eng. 2013, 2, 5–7. [Google Scholar]
- Samata, R.; Parghi, N.; Vekariya, D. An Enhanced Image Steganography Technique using DCT, Jsteg and Data Mining Bayesian Classification Algorithm. Int. J. Sci. Technol. Eng. (IJSTE) 2015, 2, 9–13. [Google Scholar]
- Karri, S.; Sur, A. Steganographic algorithm based on randomization of DCT kernel. Multimed. Tools Appl. 2015, 74, 9207–9230. [Google Scholar] [CrossRef]
- Pan, J.S.; Li, W.; Yang, C.S.; Yan, L.J. Image steganography based on subsampling and compressive sensing. Multimed. Tools Appl. 2015, 74, 9191–9205. [Google Scholar] [CrossRef]
- Ali, M.; Ahn, C.W.; Siarry, P. Differential evolution algorithm for the selection of optimal scaling factors in image watermarking. Eng. Appl. Artif. Intell. 2014, 31, 15–26. [Google Scholar] [CrossRef]
- Farid, H. Detecting hidden messages using higher-order statistical models. In Proceedings of the International Conference on Image Processing, Rochester, NY, USA, 22–25 September 2002; Volume 2. [Google Scholar]
- Shi, Y.Q.; Zou, D.; Chen, W.; Chen, C. Image steganalysis based on moments of characteristic functions using wavelet decomposition, prediction-error image, and neural network. In Proceedings of the 2005 IEEE International Conference on Multimedia and Expo, Amsterdam, The Netherlands, 6 July 2005; p. 4. [Google Scholar]
- Wang, Y.; Moulin, P. Optimized Feature Extraction for Learning-Based Image Steganalysis. IEEE Trans. Inf. Forensics Secur. 2007, 2, 31–45. [Google Scholar] [CrossRef]
- Abutaha, M. Real-Time and Portable Chaos-Based Crypto-Compression Systems for Efficient Embedded Architectures. Ph.D. Thesis, University of Nantes, Nantes, France, 2017. [Google Scholar]
- Abu Taha, M.; El Assad, S.; Queudet, A.; Deforges, O. Design and efficient implementation of a chaos-based stream cipher. Int. J. Internet Technol. Secur. Trans. 2017, 7, 89–114. [Google Scholar] [CrossRef]
- El Assad, S. Chaos based information hiding and security. In Proceedings of the 2012 International Conference for Internet Technology and Secured Transactions, London, UK, 10–12 December 2012; pp. 67–72. [Google Scholar]
- Song, C.Y.; Qiao, Y.L.; Zhang, X.Z. An image encryption scheme based on new spatiotemporal chaos. Opt.-Int. J. Light Electron Opt. 2013, 124, 3329–3334. [Google Scholar] [CrossRef]
- Tataru, R.L.; Battikh, D.; Assad, S.E.; Noura, H.; Déforges, O. Enhanced adaptive data hiding in spatial LSB domain by using chaotic sequences. In Proceedings of the 2012 Eighth International Conference on Intelligent Information Hiding and Multimedia Signal Processing, Piraeus, Greece, 18–20 July 2012; pp. 85–88. [Google Scholar]
- Assad, S.E.; Noura, H. Generator of Chaotic Sequences and Corresponding Generating System. International Patent No. WO2011121218A1, 28 March 2011. [Google Scholar]
- Farajallah, M.; El Assad, S.; Deforges, O. Fast and secure chaos-based cryptosystem for images. Int. J. Bifurc. Chaos 2015. [Google Scholar] [CrossRef]
- El Assad, S.; Farajallah, M. A new chaos-based image encryption system. Signal Proc. Image Commun. 2015. [Google Scholar] [CrossRef]
- Battikh, D.; El Assad, S.; Bakhache, B.; Déforges, O.; Khalil, M. Enhancement of two spatial steganography algorithms by using a chaotic system: Comparative analysis. In Proceedings of the 8th International Conference for Internet Technology and Secured Transactions (ICITST-2013), London, UK, 9–12 December 2013; pp. 20–25. [Google Scholar]
- Mielikainen, J. LSB matching revisited. IEEE Signal Process. Lett. 2006, 13, 285–287. [Google Scholar] [CrossRef]
- Habib, M.; Bakhache, B.; Battikh, D.; El Assad, S. Enhancement using chaos of a Steganography method in DCT domain. In Proceedings of the 2015 Fifth International Conference on Digital Information and Communication Technology and its Applications (DICTAP), Beirut, Lebanon, 29 April–1 May 2015; pp. 204–209. [Google Scholar]
- Danti, A.; Acharya, P. Randomized embedding scheme based on dct coefficients for image steganography. IJCA Spec. Issue Recent Trends Image Process. Pattern Recognit 2010, 2, 97–103. [Google Scholar]
- Boora, M.; Gambhir, M. Arnold Transform Based Steganography. Int. J. Soft Comput. Eng. (IJSCE) 2013, 3, 136–140. [Google Scholar]
- Walia, E.; Jain, P.; Navdeep, N. An analysis of LSB & DCT based steganography. Glob. J. Comput. Sci. Technol. 2010, 10, 4–8. [Google Scholar]
- Omrani, T. Conception et Cryptanalyse des CryptosystèMes LéGers Pour l’IoT. Ph.D. Thesis, El Manar University, Tunis, Tunisia, 2019. [Google Scholar]
- Song, X.; Liu, F.; Luo, X.; Lu, J.; Zhang, Y. Steganalysis of perturbed quantization steganography based on the enhanced histogram features. Multimed. Tools Appl. 2015, 74, 11045–11071. [Google Scholar] [CrossRef]
- Lee, C.K. Infrared Face Recognition. 2004. Available online: https://apps.dtic.mil/dtic/tr/fulltext/u2/a424713.pdf (accessed on 26 July 2019).
- Vapnik, V.N. Statistical Learning Theory; Adaptive and Learning Systems for Signal Processing, Communications, and Control; Wiley: Hoboken, NJ, USA, 1998. [Google Scholar]
- Vapnik, V.N. An overview of statistical learning theory. Neural Netw. IEEE Trans. 1999, 10, 988–999. [Google Scholar] [CrossRef] [PubMed]
- Schaefer, G.; Stich, M. UCID: An uncompressed color image database. In Proceedings of the Storage and Retrieval Methods and Applications for Multimedia 2004, San Jose, CA, USA, 18–22 January 2004; pp. 472–480. [Google Scholar]
- Battikh, D.; El Assad, S.; Deforges, O.; Bakhache, B.; Khalil, M. Stéganographie Basée Chaos Pour Assurer la Sécurité de L’information; Presses Académiques Francophones: Sarrebruck, France, 2015. (In French) [Google Scholar]
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).