Comparative Study of Three Steganographic Methods Using a Chaotic System and Their Universal Steganalysis Based on Three Feature Vectors

In this paper, we firstly study the security enhancement of three steganographic methods by using a proposed chaotic system. The first method, namely the Enhanced Edge Adaptive Image Steganography Based on LSB Matching Revisited (EEALSBMR), is present in the spatial domain. The two other methods, the Enhanced Discrete Cosine Transform (EDCT) and Enhanced Discrete Wavelet transform (EDWT), are present in the frequency domain. The chaotic system is extremely robust and consists of a strong chaotic generator and a 2-D Cat map. Its main role is to secure the content of a message in case a message is detected. Secondly, three blind steganalysis methods, based on multi-resolution wavelet decomposition, are used to detect whether an embedded message is hidden in the tested image (stego image) or not (cover image). The steganalysis approach is based on the hypothesis that message-embedding schemes leave statistical evidence or structure in images that can be exploited for detection. The simulation results show that the Support Vector Machine (SVM) classifier and the Fisher Linear Discriminant (FLD) cannot distinguish between cover and stego images if the message size is smaller than 20% in the EEALSBMR steganographic method and if the message size is smaller than 15% in the EDCT steganographic method. However, SVM and FLD can distinguish between cover and stego images with reasonable accuracy in the EDWT steganographic method, irrespective of the message size.


Introduction
Steganography is an increasingly important security domain; it aims to hide a message (secret information) in digital cover media without causing perceptual degradation (in this study, we use images as cover media). It should be noted that many steganographic methods have been proposed in the spatial and frequency domains. In the spatial domain, pixels are directly used to hide secret messages; these techniques are normally easy to implement and have a high capacity. However, they are not generally robust against statistical attacks [1,2]. In the transform domain, coefficients of frequency transforms, such as DCT (Discrete Cosine Transform), FFT (Fast Fourier Transform), and DWT (Discrete Wavelet Transform), are used to hide secret data. Generally, these techniques are complex, but they are more robust against steganalysis (to noise and to image processing). The generator of discrete chaotic sequences exhibits orbits with very large lengths. It is based on two connected non-linear digital IIR filters (cells). The discrete PWLCM and SKEW TENT maps (non-linear functions) are used. A linear feedback shift register (m-LFSR) is then used to disturb each cell ( Figure 2). The disturbing technique is associated with the cascading technique, which allows controlling and increasing the length of the orbits that are produced. The minimum orbit length of the generator output is calculated using Equation (1): In the above equation, lcm is the least common multiple, k 1 = 23 and k 2 = 21 are the degrees of the LFSR's primitive polynomials, and ∆ 1 and ∆ 2 are the lengths s 1 and s 2 of outputs cells, respectively, without disturbance. The equations of the chaotic generators are formulated as follows: s i (n) = NLF i {u i (n − 1) , p i } , i = 1, 2 u i (n − 1) = mod s i (n − 1) × c i,1 + s i (n − 2) × c i,2 , 2 N , i = 1, 2 s (n) = s 1 (n) + s 2 (n) (2) The two previously mentioned functions, PWLCM map and Skew map, are defined according to the following relations: s 2 (n) = NLF 2 [u 2 (n − 2) , p 2 ] =    2 N × u 2 (n−1) p 1 i f 0 ≤ u 2 (n − 1) < p 2 2 N × 2 N −u 2 (n−1) The control parameter p 1 is used for the PWLCM map and ranges from 1 to 2 N−1 − 1, and p 2 is the control parameter that is used for the Skew map and ranges from 1 to 2 N − 1. N = 32 is the word length used for simulations. The size of the secret key K, formed by all initial conditions and parameters of the chaotic generator, is (6 × 32 + 5 × 32 + 31 + 23 +21) = 427 bits. It is large enough to resist a brute-force attack.

Description of the Cat Map Used
The permutation process is based on the modified Cat map and is calculated in a very efficient manner using the equation below [37]: The dynamic key K p is structured as follows: K p = k p 1 , k p 2 , ..., k p r k p i = {u i , v i , rl i , rc i } ; i = 1, 2, ..., r In the above equations, 0 ≤ u i , v i , rl i , rc i ≤ M − 1 are the parameters of the Cat map and r is the number of rounds.

Enhanced Steganographic Algorithms
In this section, we describe three enhanced steganographic algorithms by using an efficient chaotic system.

Enhanced EALSBMR (EEALSBMR)
Below, we present the insertion procedure and the extraction procedure of the proposed enhancement of the EALSBMR method (EEALSBMR) [41].

Insertion Procedure
The flow diagram of the embedding scheme can be found in Figure 3. The detailed embedding steps for this algorithm have been explained as follows: Step 1: Capacity estimation

•
To estimate the insertion capacity, we arrange the cover image into a 1D vector V, and we divide its content into non-overlapping embedding units (blocks) consisting of two consecutive pixels (p i , p i+1 ). Following this, we calculate the difference between the pixels of each block, and we increase by one the content of the vector-difference VD of 31 elements t ∈ {1, 2, 3, ..., 31}, in which each element contains |EU (t)| number of blocks where EU (t) is a set of pixel pairs whose absolute differences are greater than or equal to t, as shown below: • For a given secret message M of size |M| bits, the threshold T used in the embedding process is determined by the following expression and pseudo-code (Algorithm 1): Algorithm 1 Pseudo-code determining the value of the threshold T 1: procedure 2: number_ pixels = 0; 3: for t = 31:-1:1 do 4: number_ pixels = number_ pixels + VD(t); 5: if (2*number_ pixels > = |M|) then 6: T = t; 7: break; 8: end if; 9: end for; 10: end procedure Step 2: Embedding process

•
The embedding process is achieved as follows: we divide the cover image into two sub-images; one includes the odd columns, and the other includes the even columns.

•
Following this, the chaotic system chooses a pixel position (Ind) from the odd sub-image; the second pixel position of the corresponding block must have the same Ind in the even image. If a pair of pixel units (p i , p i+1 ) satisfies Equation (8), then a 2 bit-message can be hidden (one bit by pixel); otherwise, the chaotic system chooses another Ind.
• For each unit (p i , p i+1 ), we perform data-hiding based on the following four cases [42]: In the above equations, m i and m i+1 are the ith and (i + 1)th secret bits of the message to be embedded; r is a random value belonging to {−1, 1}, and (p i , p i+1 ) denotes the pixel pair after data-hiding. The function f is defined as follows: • Readjustment if necessary: After hiding, (p i , p i+1 ) may be out of range [0, 255] or the new difference value p i − p i+1 may be less than the threshold T. In these cases, we need to readjust p i and p i+1 , and the new readjusted values, p " i and p " i+1 , are calculated as follows [3]: with : k 1 , k 2 are two arbitrary numbers from Z; when: 0 ≤ e 1 , e 1 ≤ 255 and |e 1 − e 2 | ≥ T then : The sequence follows as such for each new block position. • Finally, we embed the parameter T of the stego image into the first five pixels or the last five pixels, for example.

Extraction Procedure
• Extract the parameter T from the stego image. • Divide the stego image into two sub-images; one includes the odd columns, and the other includes the even columns. • Generate a pseudo-chaotic position (using the same secret key K), as done in the insertion procedure, to obtain the same order of pixel unit position as the odd sub-image. The second pixel block has the same Ind in the even image. • Verify if p s i − p s i+1 ≥ T and then extract the two secret bits of M (m i , m i+1 ) as follows: with : p s i = p i or p " i Otherwise, the chaotic system chooses another pseudo-chaotic position. The sequence follows as such for each unit position until all messages have been extracted.

•
Example of insertion: The cover image is this image of "peppers" as in Figure 4: The embedded message appears as follows in 40 × 40 pixels as shown in Figure 5: The corresponding sequence of the bits message has been given as follows: The length of the binary message is 13,120 bits.
Capacity estimation produces the threshold T = 12 Suppose that the pseudo-chaotic positions of a block to embed the two bits message m 1 = 1 and m 2 = 0 are (354, 375) and (354, 376) that correspond to the 141 and 129 gray values (see Figure 6). Hiding the message bits: We are in Case 2: Therefore, the new pixel values are as follows: The difference between the new pixel values is: Then we need to adjust the new pixel values: We test the values −50 < k 1 < 50 and −50 < k 2 < 50 until we obtain the smallest difference between the initial values p 1 and p 2 and the corresponding obtained values e 1 and e 2 by using Equations (12) and (13). In our example, we find k 1 = 0 and k 2 = −1 and then: Extraction of the bits message in the previous insertion example: The extraction is performed using the following equation:

Enhanced DCT Steganographic Method (EDCT)
The DCT transforms a signal or image from the spatial domain into the frequency domain [43,44]. A DCT expresses a sequence of finitely many data points in terms of a sum of cosine functions, oscillating at different frequencies. The 2D DCT is calculated as follows: where: The block diagram of the proposed enhanced steganographic-based DCT transform has been shown in Figure 7.

Insertion Procedure
The embedding process consists of the following steps: • Read the cover image and the secret message.

•
Convert the secret message into a 1-D binary vector.

•
Divide the cover image into 8 × 8 blocks. Then apply the 2D DCT transformation to each block (from left to right, top to bottom).

•
Use the same chaotic system to generate a pseudo-chaotic Ind.

•
Replace the LSB of each located DCT coefficient with the one bit of the secret message to hide.

•
Apply the 2D Inverse DCT transform to produce the stego image.

Extraction Procedure
The extraction procedure consists of the following steps: • Read the stego image. • Divide the stego image into 8 × 8 blocks and then apply the 2D DCT to each block.

•
Use the same chaotic system to generate pseudo-chaotic Ind.

•
Extract the LSB of each pseudo-located coefficient.

•
Construct the secret image.

Enhanced DWT Steganographic Method (EDWT)
The embedded secret image in the lower frequency sub-band (A) is generally more robust than the other sub-bands, but it significantly decreases the visual quality of the image, as normally, most of the image energy is stored in this sub-band. In contrast, the edges and textures of the image and the human eye are not generally sensitive to changes in the high-frequency sub-band (D); this allows secret information to be embedded without being perceived by the human eye. However, the sub-band (D) is not robust against active attacks (filtering, compression, etc.). The compromise adopted by many DWT-based algorithms to achieve accepted performance of imperceptibility and robustness enables embedding the secret image in the middle-frequency sub-bands (H) or (V). In the block diagram of the proposed steganographic EDWT method shown in Figure 8, we embed the secret image in the sub-band (H) of the cover image (the size of the secret message must, at most, be equal to the size of the sub-band (H) of the cover image).

Insertion Procedure
The embedding process consists of the following steps: • Read the cover image and the secret image.

•
Transform the cover image into one level of decomposition using Haar Wavelet.

•
Permute the secret image in a pseudo-chaotic manner.

•
Fuse the DWT coefficients (H) of the cover image and the permuted secret image PSI as follows [45]: In the above equations, X is the modified DWT coefficient (H); X is the original DWT coefficient (H). α and β are the embedding strength factors; they are chosen such that the resulting stego image has a large PSNR.
In our experiments, we tested some values of β, and the best value was found to be approximately 0.01.

•
Apply Inverse Discrete Wavelet Transform (IDWT) to produce the stego image in the spatial domain.

Extraction Procedure
The extraction procedure involves the following steps: • Read the stego image.

•
Transform the stego image into one level of decomposition using Haar Wavelet.

•
Apply inverse fusion transform to extract the permuted secret image as follows: The extraction procedure is not blind, as we need the cover image to extract the permuted secret message.

•
Apply the inverse permutation procedure using the same chaotic system to obtain the secret image.

Experimental Results and Analysis
In the experiments, we first create the stego images by using the implemented steganographic methods that were applied on the standard gray level cover images "Lena", "Peppers", "Baboon" in 512 × 512 pixels and using "Boat" as a secret message with different sizes (embedding rates, ranging from 5% to 40%). The six criteria used to evaluate the qualities of the stego images have been listed as follows: Peak Signal-to-Noise Ratio (PSNR) [46], Image Fidelity (IF), structural similarity (SSI M), the entropy (E), the redundancy (R), and the image redundancy (IR). They can be represented by the following equations: In the above equations, p c (i, j) and p s (i, j) are the pixel value of the ith row and jth column of the cover and stego image; M and N are the width and height of the considered cover image. µ c , µ s are the average of the cover and stego images; σ 2 c , σ 2 s are the variance of the cover and stego images; µ cs is the co-variance of the cover-stego; c 1 = (k 1 L) 2 , c 2 = (k 2 L) 2 are two variables that are used to stabilize the division with a weak denominator; L is the dynamic range of the pixel values, and k 1 , k 2 are two much smaller constants compared to 1. We considered k 1 = k 2 = 0.05.
The higher the PSNR, IF, and SSI M, the better the quality of the stego image. PSNR values falling below 40 dB indicate a fairly low quality. Therefore, a high-quality stego should strive to be above 40 dB.
Additionally, we used three other parameters to estimate the qualities of the stego images. These parameters have been listed as follows: - The Entropy E, given by the following relation: L is already defined. p(P i ) is the probability of the pixel value P i . - The Redundancy R is usually represented by the following formula: Here, E max = 8. However, this relationship is problematic because the value of the minimal entropy is not known. For that, Tasnime [47] proposed using the following relationship, which seems to be more precise: Called Image Redundancy (IR) with: • S being the size of the image under test; • R i being the number of occurrences of each pixel value; • R opt being the optimal number of occurrences that each pixel value should have to get a non-redundant image.
In the following section, we present and compare the performance of the three implemented steganographic methods.

Enhanced EALSBMR
The results obtained from the parameters PSNR, IF, and SSI M for the algorithm have been presented in Table 1; their values indicate the high quality of the stego images, even with a high embedding rate of 40%. We observe that the PSNR, IF, and SSI M values decrease, as expected, when the size of the secret message increases. In Figure 9a-c, we show the "Baboon" cover image and the corresponding stego images for 5% and 40% embedding rates, respectively. The visual quality obtained from the "Baboon" stego images is very high because visually, it is impossible to discriminate between the cover and stego images. Just to fix the ideas, using the Lina image as the cover, and to obtain approximately identical capacity, we globally compared the obtained PSNR of the EEALSBMP method with that obtained by the following methods: [4][5][6]17]. We observed that only the method proposed by Borislav et al. [17] produces a better PSNR than the EEALSBMP method. However, this method cannot be adapted.

Enhanced DCT Steganographic Method
The results obtained from this method, as presented in Table 2, indicate the high quality of the stego images, even with a high embedding rate. Additionally, even the visual quality obtained is very high, as shown in Figure 10.   Table 3 presents the results obtained from the EDWT algorithm, which indicate that the steganographic algorithm exhibits good performance. Furthermore, no visual trace can be found in the resulting stego images, as shown in Figure 11a-c.   Tables 1-3 of PSNR, IF, and SSI M of the three methods show that the EEALSBMR and EDCT methods, in comparison with the EDWT method, ensure better quality of the stego images at different embedding rates. There is approximately a 10-dB difference in PSNRs at a 5% embedding rate and a 5 to 8 dB difference in PSNRs at a 40% embedding rate.

Performance Using Parameters E, R and IR
The results obtained from parameters E, R, and IR for the three algorithms on the stego images with different embedding rates have been presented in Tables 4-6. As we can see, these values, given in Table 7, are too close to the values obtained over the original images. This is consistent with the previous results obtained from the parameters PSNR, IF, and SSI M regarding the high quality of the stego images.

Universal Steganalysis
A good steganographic method should be imperceptible not only to human vision systems but also to computer analysis. Steganalysis is the art and science that detects whether a given image has a message hidden in it [1,48]. The extensive range of natural images and the wide range of data embedding algorithms make steganalysis a difficult task. In this work, we consider universal steganalysis to be based on statistical analysis.
Universal (blind) steganalysis attempts to detect hidden information without any knowledge about the steganographic algorithm. The idea is to extract the features of cover images and the features of stego images and then use them as the feature vectors that are used by a supervised classifier (SVM, FLD, neural networks. . . ) to distinguish whether the image under test is a stego image. This procedure is illustrated in Figure 12. The left side of the flowchart displays the different steps of the learning process while the right side illustrates the different steps of the testing process.

Multi-Resolution Wavelet Decomposition
The DWT, which uses a sub-bands coding algorithm, is found to quickly compute the Wavelet Transform. Furthermore, it is easy to implement and reduces the computation time and the number of resources required. The DWT analyses the signal at different frequency bands with different resolutions by decomposing the signal into a coarse approximation and into detailed information. The decomposition of the signal into different frequencies is achieved by applying separable low-pasŝ g(n) and high-passĥ(n) filters along the image axes. The DWT computes the approximation coefficients matrix A and details coefficients matrices H, V, and D (horizontal, vertical, and diagonal, respectively) of the input matrix X, as illustrated in Figure 13.

Feature Vector Extraction
As the amount of image data is enormous, it is not feasible to directly use the complete image data for analysis. Therefore, for steganalysis, it is useful to extract a certain amount of useful data features that represent the image instead of the image itself. The addition of a message to a cover image may not affect the visual appearance of the image, but it will affect some statistics. The features required for steganalysis should be able to detect these minor statistical disorders that are created during the data-hiding process.
Three feature-extraction techniques are used in this paper to detect the presence of a secret message; these methods calculate the statistical properties of the images by employing multi-resolution wavelet decomposition.

Method 1: Feature Vectors Extracted from the Empirical Moments of the PDF-Based Multi-Resolution Coefficients and Their Prediction Error
The multi-resolution wavelet decomposition employed here is based on separable quadrature mirror filters (QMFs). This decomposition splits the frequency space into multiple scales and orientations. This is accomplished by applying separable low-pass and high-pass filters along the image axes, generating a vertical, horizontal, diagonal, and low-pass sub-band. The horizontal, vertical, and diagonal sub-bands at scale m = 1, 2, ..., n are denoted as H m , V m and D m .
In our work, the first set of features is extracted from the statistics over coefficients S m (x,y) of each sub-band and for levels (scales) m = 1 and n = 3. These characteristics represent the following: mean µ, variance σ 2 , skewness ξ, and kurtosis κ. They can be represented as follows: From Equation (24), we can build the first feature vector Z s of N m × N bd × n = 4 × 3 × 3 = 36 elements, where N m , N bd , and n are the number of moments, sub-bands, and scales. The feature vector Z s is represented as follows: where: The second set of statistics is based on the prediction errors of coefficients S m (x, y) of an optimal linear predictor. The sub-band coefficients are correlated with their spatial, orientation, and scale neighbors. Several prediction techniques of coefficients S p H m (x, y), S p V m (x, y), and S p D m (x, y) (m = 1, 2, 3) may be used. In this work, we used a linear predictor, specifically the one proposed by Farid in [30], as shown below: For more clarity, in Figure 14, we provide the block diagram for the prediction of coefficient S p V 1 (x, y). The parameters w i (scalar weighting values) of the error prediction coefficients of each sub-band for a given level m are adjusted to minimize the prediction error by minimizing the quadratic error function, as shown below: The columns of the matrix Q contain the neighboring coefficient magnitudes, as specified in Equations (25)- (27). The quadratic error function is minimized analytically as follows: Then, we obtain: For the optimal predictor, we use the log error given by the following equation to predict error coefficients of each sub-band for a given level m: By using Equation (31), additional statistics are collected, namely the mean, variance, skewness, and kurtosis (see Equation (24)). The feature vector Z p is similar to Z s ; it is represented as follows: Finally, the feature vector that will be used for the learning classifier is represented by Z = [Z s |Z p ].
It contains 72 components.

Method 2: Feature Vectors Extracted from Empirical Moments of CF-Based Multi-Resolution
The first set of feature vectors Z s is extracted based on the CF and the wavelet decomposition, as proposed by Shi et al. [31]. The statistical moments of the characteristic function φ(k) of order n = 1 to 3 are represented for each sub-band (A m , H m , V m , D m ) at different levels m = 1, 2, and 3 of the wavelet decomposition as follows: is a component of the characteristic function at frequency k, calculated from the histogram of the sub-band S m , and N is the total number of points of the histogram. Equation (32) allows us to build the first feature vector Z m of size 12 × 3 = 36 components and 3 moments of the initial image. The feature vectors Z m have been listed as follows: In the above equation, M 1 I , M 2 I , M 3 I are the moments of the initial image.
The second category of features is calculated from the moments of prediction-error image and its wavelet decomposition.
Prediction-error image: In steganalysis, we only care about the distortion caused by data-hiding. This type of distortion may be rather weak and, hence, covered by other types of noises, including those caused due to the peculiar feature of the image itself. To make the steganalysis more effective, it is necessary to keep the noise of the dissimulation and eliminate most of the other noises. For this purpose, we calculate the moments of characteristic functions of order n = 1, 3 of the predicted error image and of its wavelet decomposition at the various levels m = 1, 2, and 3 (see Equation (32)). The prediction-error image is obtained by subtracting the predicted image (in which each predicted pixel grayscale value in the cover image uses its neighboring pixels' grayscale values (see Equation (34))) from the cover image. Such features make the steganalysis more efficient because the hidden data is usually unrelated to the cover media. The prediction pixel is expressed as follows: In the above equation, a, b, c are the context of the pixel x under consideration;x is the prediction value of x. The location of a, b, c can be illustrated as in Figure 15. The feature vector Z p is represented as follows: In the above equation, are the 1st, 2nd , and 3rd order moments of the corresponding CFs, from the sub-band A 1 of the 1 st level decomposition on the error image.
Finally, the feature vector that will be used for learning classification is Z = [Z s |Z p ], containing 78 components.

Method 3: Feature Vector Extracted from Empirical Moments Based on the FC and the PDF of Image Prediction Error and Its Different Sub-Bands of the Multi-Resolution Decomposition
The first characteristic vector Z s combines two types of normalized moments: moments based on the function density of probability and moments based on the characteristic function of various sub-bands of the multi-resolution decomposition at three levels of the gray image. We use the expression of Wang and Moulin [32] to calculate the moments of order n = 1 to 6 of the initial image and its sub-band (A m , H m , V m , D m ) of the three-level (m = 1 to 3) wavelet decomposition, as shown below: is a component of the characteristic function at frequency k, estimated from the histogram. Equation (35) already allows having a feature vector of 6 × 1 + 6 × (4 × 3) = 78 components. Also, to improve the performance of the learning system, we calculate the moments of the sub-bands A 2 , H 2 , V 2 , D 2 obtained from the decomposition of the diagonal sub-band D 1 . Therefore, the total size of the vector Z s is 78 + (6 × 4) = 102 components. The second category of characteristics consists of the first six moments of the prediction error, which is p m = log 2 S m − log 2 (|Qw opt |) of coefficients of each sub-band for a given level m, as shown below: The vector of the second category is defined by Z p , as shown below: for each m = {1, 2, 3} ; i = 1, 2, ..., 6 The size of Z p is 3 x 6 x 3 = 54 components.
Finally, the feature vector to be used for classification by learning is Z = [Z s |Z p ]. It has 156 components.

Classification
The last stage of the learning and test process of the universal steganalysis is classification (see Figure 12). Its objective is to group the images into two classes, class of the cover images and class of the stego images, according to their feature vectors. We adopt the Fisher linear discriminator (FLD) and the support vector machine (SVM) for training and testing.

FLD Classifier
Below, we reformulate the FLD classifier for our application and apply it to two classes. Let Z = {Z 1 , Z 2 , ..., Z N } be a set of feature vectors, each with nd dimensions. Among these vectors, N 1 vectors are Z c feature vectors labeled 1, indicating cover images. N 2 vectors are Z s labeled 2, indicating stego images, with N = N 1 + N 2 . We want to form all projection values (Z p ) = Z p 1 , Z p 2 , ..., Z p N of dimension N through linear combinations of feature vectors Z p as follows: In the above equation, W is an orientation vector of dimension nd.
In our study, the feature vector Z is projected into a space of two classes. This projection tends to maximize the distance between the projected class means (M cp , M sp ) while minimizing projected class scatters S cp , S sp .

•
Learning process The learning process involves optimizing the following expression: where: is the mean feature vector of cover class after projection, and is the mean feature vector of cover class of dimension nd.
The mean feature vector of stego class after projection is represented as follows: where: is the mean feature vector of a stego class of dimension nd. The scatter matrix of the cover class after projection has been shown as follows: is the scatter matrix (of dimension nd × nd) of a cover class. The scatter matrix of the projected samples of a stego class has been shown as follows: is a scatter matrix (of dimension nd × nd) for the samples in the original feature space of a stego class. The within-class scatter matrix after projection is defined as follows: where: The difference between the projected means is expressed as follows: where: We can finally express the Fisher criterion (Equation (39)) in terms of S B and S W as follows: The solution of Equation (52) is given by [49]. •

Testing process
The testing process (classification step) is conducted as follows: Let Z be the matrix containing the feature vectors of covers and stegos. The projection of Z on the orientation vector W opt gives all projected values Z p .
b is a threshold of discrimination between both classes, and it can be fixed to a value that is halfway between both averages projected on the cover and stego. with: In the above equations, W t opt is the transposed of W opt . The result Z p (j), j = 1, ..., N determines the cover or stego class of every test image. Indeed, if Z p (j) ≥ 0, then the image under test is cover; otherwise, it is stego.

SVM Classifier
According to numerous recent studies, the SVM classification method is better than the other data classification algorithms in terms of classification accuracy [50]. SVM performs classification by creating a hyper-plan that separates the data into two categories in the most optimal way. Let (Z i , y i ) (1≤i≤N) be a set of training examples, each example Z i ∈ R nd , nd being the dimension of the input space; it belongs to a class labeled as y i ∈ {−1, 1}. SVM classification constructs a hyper-plan W T Z + b = 0, which best separates the data through a minimizing process, as shown below: Variables ζ i are called slack variables, and they measure the error made at point (Z i , y i ).
Parameter C can be viewed as a way to control overfitting. ζ i ≥ 0 and C > 0 is the trade-off between regularization and constraint violation. Problems related to quadratic optimization are a well-known class of mathematical programming problems, and many (rather intricate) algorithms exist to aid in solving them. Solutions involve constructing a dual problem where a Lagrange multiplier α i is associated with every constraint in the primary problem, as shown below: α i or Lagrange multipliers are also known as support values. The linear classifier presented previously is very limited. In most case, classes not only overlap, but the genuine separation functions are non-linear hyper-surfaces. The motivation for such an extension is that an SVM that can create a non-linear decision hyper-surface will be able to non-linearly classify separable data.
The idea is that the input space can always be mapped on to a higher dimensional feature space where the training set is separable.
The linear classifier relies on the dot product between vectors K(Z i , Z j ) = Z T i Z j . If every data point is mapped on to a high-dimensional space via some transformation Φ : Z → ϕ(Z), the dot product becomes:K(Z i , Z j ) = ϕ(Z i ) T ϕ(Z j ). Then in the dual formulation, we maximize the following: Subsequently, the decision function turns into the following: It should be noted that the dual formulation only requires access to the kernel function and not the features Φ(.), allowing one to solve the formulation in very high-dimensional feature spaces efficiently. This is also called the kernel trick.
There are many kernel functions in SVM. Therefore, determining how to select a good kernel function is also a research issue. However, for general purposes, there are some popular kernel functions [50,51], which have been listed as follows: • Sigmoid Kernel: Here, γ, r, and d are kernel parameters. In our work, we used the RBF kernel function.

Experimental Results of Steganalysis
In this section, we present some experimental results that were obtained from the studied steganalysis system that was applied to the enhanced steganographic methods in the spatial and frequency domain. For this purpose, the image dataset UCID [52,53] is used, which includes 1338 uncompressed color images, and all the images were converted to grayscale before conducting the experiments.
In our experiments, we first created the stego images using the following steganographic methods: Enhanced EALSBMR (EEALSBMR), Enhanced DCT steganography (EDCT), and Enhanced DWT steganography (EDWT). We used these methods with different embedding rates of 5%, 10%, and 20%. Following this, we extracted the image features using the three feature-extraction techniques described above (Farid, Shi, and Moulin techniques) for both the cover and stego images. Finally, we employed the classifiers FLD and SVM to classify the images as either containing a hidden message or not. The evaluation of the classification (binary classification) and the steganalysis (also indirectly the efficiency of insertion methods) is performed by calculating the following parameters: sensibility, specificity, and precision of the confusion matrix and the Kappa coefficient (see Table 8 and Equation (64)) with: In the above equation, P 0 is the total agreement probability (related to the accuracy), and P a is the agreement probability that arises out of chance.
Here is one possible interpretation of Kappa values:   Tables 9-14, we present the classification results (steganalysis) based on the classifiers FLD and SVM and the features of Farid, Shi, and Moulin for the EEALSBMR insertion method with different insertion rates of 5%, 10%, and 20%. The results show that steganalysis is not effective for all insertion rates. Indeed, the values Se, Sp, and Pr vary around 50%, so these values are not informative values and do not give any idea about the nature of the data. The value of the Kappa coefficient (lower than 0.2) confirms these results. The EEALSBMR steganographic method is robust against statistical steganalysis techniques.

Classification Results Applied to the Steganographic Method EDCT
The classification results (steganalysis) provided in Tables 15-20 for the EDCT insertion method show that with the FLD classifier, when the insertion rate is equal to or higher than 20%, steganalysis is very effective with Shi features and Moulin features, but it is less effective with Farid features. With the SVM classifier, except in the case of Shi features, when an insertion rate of 20% is applied, the results obtained are quite similar to those obtained from the EEALSBMR algorithm and, therefore, steganalysis is not effective. It should be noted that the FLD classifier is more effective for a feature vector of a high dimension than the SVM classifier.

Classification Results Applied to the Steganographic Method EDWT
With respect to the EDWT method, the results are provided in Tables 21-26. These results obtained with the classifiers FLD and SVM indicate that the values of the parameters Se, Sp, Pr, Ac, and Kappa are high for all insertion rates and feature vectors (Farid, Shi, and Moulin). These results can easily inform us about the presence of hidden information; therefore, steganalysis can be concluded to be very effective. As a result, the insertion method is not robust. It should be noted that steganalysis is very effective here because both the steganagraphic method and feature vectors are based on multi-resolution wavelet decomposition.

Discussion
The enhanced adaptive LSB methods of steganography in the spatial domain (EEALSBMR) and frequency domain (EDCT and EDWT) provide stego images with a good visual quality up to an embedding rate of 40%: the PSNR is over 50 dB, and the distortion is not visible to the naked eye. Security of the message contents, in case detected by an opponent, is ensured by using the chaotic system. On the other hand, we applied a universal steganalysis method that can work well with all known and unknown steganography algorithms. Universal steganalysis methods exploit the changes in certain inherent features of the cover images when a message is embedded. The accuracy of the classification (discrimination between two classes: cover and stego) of the system greatly relies on several factors, such as the choice of the right characteristic vectors, the classifier, and its parameters.

Conclusions
In this work, we first improved the structure and security of three steganagraphic methods that are studied in the spatial and frequency domain by integrating them with a robust proposed chaotic system. Following this, we built a statistical steganalysis system to evaluate the robustness of the three enhanced steganographic methods. In this system, we selected three different feature vectors, namely higher-order statistics of high-frequency wavelet sub-bands and their prediction errors, statistical moments of the characteristic functions of the prediction-error image, the test image, and their wavelet sub-bands, and both empirical PDF moments and the normalized absolute CF. After this, we applied two types of classifiers, namely FLD and SVM, with the RBF kernel.
Extensive experimental work has demonstrated that the proposed steganalysis system based on the multi-dimensional feature vectors can detect hidden messages using the EDWT steganographic method, irrespective of the message size. However, it cannot distinguish between cover and stego images using the EEALSBMR steganographic and EDCT methods if the message size is smaller than 20% and 15%, respectively.