A Fast Selection Based on Similar Cross ‐ Entropy for Steganalytic Feature

: The mutual confrontation between image steganography and steganalysis causes both to iterate continuously, and as a result, the dimensionality of the steganalytic features continues to increase, leading to an increasing spatio ‐ temporal overhead. To this end, this paper proposes a fast steganalytic feature selection method based on a similar cross ‐ entropy. Firstly, the properties of cross ‐ entropy are investigated, through the discussion of different models, and the intra ‐ class simi ‐ larity criterion and inter ‐ class similarity criterion based on cross ‐ entropy are presented for the first time. Then, referring to the design principles of Fisher’s criterion, the criterion of feature contribu ‐ tion degree is further proposed. Secondly, the variation of the cross ‐ entropy function of a univariate variable is analyzed in principle, thus determining the normalized range and simplifying the sub ‐ sequent analysis. Then, within the normalized range, the variation of the cross ‐ entropy function of a binary variable is investigated and the setting of important parameters is determined. Thirdly, the concept of similar cross ‐ entropy is further presented by analyzing the changes in the value of the feature contribution measure under different circumstances, and based on this, the criterion for the feature contribution measure is updated to decrease the complexity of the calculation. Remarkably, the contribution measure criterion devised in this paper is a symmetrical structure, which equitably measures the contribution of features in different situations. Fourth, the feature component with the highest contribution is selected as the final selected feature based on the result of the feature metric. Finally, based on the Bossbase 1.01 image base that is a unique standard and recognized base in steganalysis, the feature selection on 8 kinds of low and high ‐ dimensional steganalytic fea ‐ tures is carried out. Through extensive experiments, comparison with several classic and state ‐ of ‐ the ‐ art methods, the method designed in this paper attains competitive or even better performance in detection accuracy, calculation cost


Introduction
Image steganography [1][2][3][4] refers to the use of some algorithm to embed secret information in an image for covert communication. In this regard, the original image is called the cover image and the image embedded with secret information is called the stego image. On the one hand, steganography ensures the privacy of subscribers and special confidential units, but on the other hand, it can be used by unscrupulous elements to compromise public security [5][6][7]. This has led to the advent of steganalysis [8][9][10][11], which uses feature extraction algorithms to analyze image characteristics and then uses classifiers to distinguish between cover images and stego images, thereby safeguarding national and public security. Common steganalysis algorithms are: Jan et al. [11] proposed 548-D CC-PEV feature (the PEV features enhanced by Cartesian Calibration), Holub et al. [12] proposed 8000-D DCTR feature (Discrete Cosine Transform Residual feature), Song et al. [9] proposed 17000-D GFR feature (jpeg Rich model utilizing Gabor Filters), Kodovský et al. [13] proposed 22510-D CC-JRM feature (Cartesian Calibrated JPEG domain Rich Model), Fridrich et al. [14] proposed 34671-D SRM feature (full Spatial domain Rich Model).
Nevertheless, along with the rapid development of adaptive steganography [15][16][17], in order to improve the detection accuracy of stego images, steganalysis needs to extract features from different scales and orientations [18][19][20], leading to an increasing number of feature dimensions [13,21,22], resulting in huge computational and storage overheads.
To efficiently solve the problem of distinguishing stego images from cover images, researchers have devised feature selection methods [23][24][25]. Depending on the application of the methods, the existing feature selection methods can be divided into: specific feature selection methods and general selection methods.
The specific feature selection method [25,26] means that the algorithm only works on one or some steganalytic features with weak generalization. For example, Yang et al. [25] proposed a feature subspace selection method based on Fisher's criterion (SSFC), which first calculates the Fisher value and probability value of individual feature components, and then calculates the weight of each feature component, finally, selects the feature component whose probability value is proportional to the weight as the final selected feature, which improves the detection accuracy of GFR features to a certain extent. Yu et al. [26] proposed a multi-scale feature selection method for steganalytic feature GFR (SRGS), which first used the SNR criterion to measure the uselessness of features and removed the useless features, then innovated the Relief algorithm to measure the importance of features and select the important features, and finally took the important features as the final selected features. The experiment verified that the algorithm reduced a certain number of dimensions and improved detection accuracy at the same time.
The generic feature selection method [24,[27][28][29][30][31][32] means that the algorithm achieves excellent results for most existing steganalytic features. For example, Qin et al. [27] devised a principal component analysis-based feature selection method (PCA-based), which first calculates the mean value of each feature component, and then calculates the covariance matrix as well as the eigenvalues. Then arranges the feature components in descending order according to the eigenvalues, and finally selects the feature components of a specified dimension as the final selected features according to the requirements. Wang et al. [28] devised a comprehensive criterion-based feature selection method (CGSM), which was guided by the disparity function and Person coefficients, and first selected the feature components with large disparity, on this basis, then removed the interference of redundant features, reducing the feature dimensionality while slightly improving the detection accuracy for stego images. Ma et al. [24] proposed a feature selection method based on decision rough set α-positive domain approximation, which first applied rough set theory to the steganalytic feature selection, then using the attribute separability measure (ASM) criterion to measure the divisibility of feature components, which was then extended to measure the divisibility of feature vectors, and finally, the final dominant features were selected based on the classifier, and the experiments demonstrated that the method significantly reduced the dimensionality of some features.
Even though the above feature selection methods have attained certain results, there are still problems such as the feature dimensionality is still high, the selection time is still long and the detection accuracy is still low, which limits its application in practice [33][34][35].
In order to solve the above problems, this paper attempts to devise a fast selection method for steganalytic features based on similarity cross-entropy (FSCE). Specifically, firstly, the properties of cross-entropy are investigated, the intra-class similarity criterion and inter-class similarity criterion are investigated and proposed, followed by a further feature contribution measure criterion regarding the model of Fisher's criterion. Secondly, the variation of cross-entropy functions of univariate and binary variables is analyzed separately to determine the problem of setting important parameters. Finally, the feature component with high contribution is selected as the final selected feature based on the results of the feature metric.
Remarkably, the feature selection in image steganalysis differs from the regular feature selection method with two manifestations: on the one hand, we use two symmetrical images, i.e., cover and stego images, during training and testing. On the other hand, in feature selection, we analyze two sets of symmetrical features, i.e., cover and stego features, for the convenience of calculation.
In order to verify the effectiveness and efficiency of FSCE, a large number of experiments are carried out in Bossbase 1.01 image database [36] (That contains 10,000 gray image pictures whose size is 512 × 512.), which is the only standard and recognized image base in the steganalysis field. This includes: Firstly, a comparison was made with the features selected under different thresholds to determine the correctness of the final selection threshold in this paper. Secondly, a comparison was made with the features selected by the original steganalytic features as well as randomly selected features of the same dimension. Finally, a comparison was made with several classical and state-of-the-art fast feature selection methods. The effectiveness, efficiency and generality of FSCE are verified by the above large number of experiments.
The rest of this paper is organized as follows. Section 2 introduces the related work. In Section 3, the variation of univariate and binary cross-entropy is investigated and analyzed, while the concept of similar cross-entropy is then devised for the first time, then the FSCE method is then introduced; A series of comparative experiments are conducted in Section 4 to verify the effectiveness, efficiency and generality of the FSCE. Finally, a summary of the whole paper is presented.

Materials and Methods
To devise an appropriate and rational feature selection method, in this section, a bit of background knowledge needed for the methods in this paper is presented. In particular, Section 2.1 introduces the Fisher criterion, which is the most popular today. In addition, Section 2.2 presents the concept of information entropy, from which the principle of cross-entropy and its properties are introduced and presented.

Fisher Criterion
The Fisher criterion was first introduced to the field of image steganalysis by Yang et al. [25], which uses the idea of "intra-class aggregation and inter-class dispersion" to measure the separability of feature components for the selection of important features. Since the feature selection process requires measuring the performance of each feature component, which is related to the rate of change of feature values and the statistical dispersion in different feature classes. In addition, the Fisher criterion, which is the classical method for measuring feature discrimination in pattern recognition, considers not only deviation between different classes of images but also dispersion of features in each class of images. As a result, it is widely used in feature selection [25,29]. The formula is as follows.

 
, H P Q represents the cross-entropy of P to Q , i p represents the ith value of P , i q represents the ith value of Q . Remarkably, the larger the   , H P Q , the less information there is about the difference between P and Q .

FSCE
Feature selection, also known as feature dimensionality reduction, aims to reduce the number of features while maintaining or even improving the detection accuracy of stego images and speeding up feature selection. In order to measure the contribution of a single feature component, this paper considers the cross-entropy principle as a guide to constructing a similar cross-entropy-based feature contribution criterion. Among them, Section 3.1 devises a criterion for the feature contribution measure, which provides a sound basis for the selection of feature components with a high contribution. Section 3.2 discusses the problem of setting some important parameters. In addition, Section 3.3 presents the overall process of the algorithm with performance analysis. Section 3.4 illustrates the advantages of the FSCE.

Contribution Probing
Drawing on the idea of the construction for Fisher's criterion, in this section, we attempt to construct intra-class similarity and inter-class similarity criteria for the feature components utilizing the cross-entropy principle. Specifically, Section 3.1.1 introduces some important notations in this paper. Section 3.1.2 proposes the construction of the intra-class similarity criterion. Section 3.1.3 introduces the construction of the inter-class similarity criterion, and Section 3.1.4 introduces the criterion for the feature contribution measure.

Symbol Description
represents all steganalytic feature in this case.  i.e., the more it should be retained.

Parameter Setting
For the Equation (8) proposed in this paper, in order to satisfy the thought of "intraclass aggregation and inter-class dispersion", we must set the size of the 1  parameters appropriately to ensure that the inter-class similarity does not conflict with the intra-class similarity, so as to better determine the extent to which this feature component contributes to the classification.
To this end, we first investigated the variation of the simpler monadic function . Based on this statistic, and for simplicity of calculation, we intend to normalize the steganalytic feature values via Equation (13) to restrict them to between , H X X is monotonically increasing. From this, it is given that is monotonically increasing. Then we just need to make sure that  to perfectly satisfy the "intra-class aggregation, inter-class dispersion" principle.
where, i f represents the ith feature component, , Then, on the basis of the above normalization, we investigated the variation of the binary function , only two cases exist. Subsequently, we set the change step for both X and Y to 0.01 and plotted Figure 1 using the meshgrid(), the subplot() and the surf() in Matlab 2016.
certain. Yet when X and Y increase simultaneously, is not monotonically decreasing, i.e., it does not satisfy our objective. Then when 1 1 is, when X is certain. Yet when X and Y increase simultaneously, is not monotonically decreasing, i.e., such a case again does not satisfy our objective.
To this end, after reviewing the literature, we found that one could try to replace   ln x with , which led to the concept of similar cross-entropy, as shown in Equation (14). Correspondingly, Equations (8) and (10) can be transformed into Equations (15) and (16), respectively.
where,   , SH P Q represents the similar cross-entropy between P and Q . Since is monotonically decreasing. This means that it thoroughly met the fundamental requirements of our design guidelines.
where, M represents the total number of cover/stego images, and M/2 represents the number of cover/stego images used for testing. Notably, the smaller the value of , the more important it is to be selected.

Overall Process and Performance Analysis
The FSCE algorithm principally consists of the following details. Firstly, normalize the eigenvalues to restrict their range to Secondly, calculate the similar cross-entropy of cover to stego and stego to cover, respectively, using Equation (16)  To illustrate in more detail the working principle of the FSCE method, we give a specific algorithm based on the major steps outlined above, which is shown in Algorithm 1.

The Merits of FSCE
The merits of the FSCE method can be summarized as follows. Firstly, an innovative improvement in the normalization range simplifies the algorithm design process. Compared to the traditional method of normalizing features to [0,1], normalizing to to be monotonically increasing in this interval, simplifying the subsequent analysis and making it easier to determine 1 Secondly, cross-entropy, which is applicable to image steganalysis, is investigated for the first time and a feature contribution measure criterion is constructed, which is similar to Fisher's criterion. Based on the advantage that cross-entropy can determine the difference information of two classes, we classify the models of different cases and then propose the intra-class similarity criterion and inter-class similarity criterion. Based on which we further propose the feature contribution measure criterion with reference to the design principle of Fisher's criterion.
Thirdly, the concept of similar cross-entropy is theoretically proposed and proved, based on which the complexity of the calculation is considerably reduced. In determining the values of the parameters, we analyzed the variation of the feature contribution measure under different situations and found that it did not meet the original intention of our design. For this reason, after searching for a large amount of literature, we proposed the concept of similar cross-entropy and updated the feature contribution measure criterion based on it. The analysis revealed that the new feature contribution measure criterion not only meets the requirements of the design but also decreases the computational complexity by changing the logarithmic operation into an exponential operation.
Fourth, FSCE is more general. Through a series of experiments in Section 4, it is found that the FSCE method is relatively effective in selecting many steganalytic features, attaining the goal of reducing the dimensions of feature by 40% while maintaining or even improving the detection accuracy of stego images.
Finally, the FSCE method has a low time complexity. With the performance analysis in Section 3.3, we find that the comparison demonstrates that the time complexity

O NM
of the FSCE method is significantly lower than the time complexity of the integrated classifier-based feature selection method. From this, one can find that FSCE is more efficient, which enables it to be used in time-critical applications.

Experiment
To verify the performance of the FSCE method proposed in this paper, in this section, we conduct experiments on the selection of different image steganalytic features. Specifically, Section 4.1 describes the experimental setup; Section 4.2 compares the features selected with different thresholds to determine the correctness of the final selection threshold in this paper; Section 4.3 compares the original features as well as randomly selected features to verify the effectiveness of FSCE; And finally, Section 4.4 compares the features with several classical and state-of-the-art feature selection methods to verify the efficiency and generality of FSCE.
Remarkably, all experiments in this paper were performed based on Matlab 2016B. It deserves mentioning that all algorithms are executed on a PC with 4 Intel(R) Core (TM) i7-8700 @ 3.20GHz CPUs, 8Gb memory.

Experiment Setup
The images used in this paper are taken from the only recognized image library in the steganography and steganalysis, BOSSbase 1.01 (http://dde.binghamton.edu/download/ImageDB/BOSSbase_1.01.zip, accessed date 7 November 2014), which contains 10,000 grey-scale images. To acquire the image steganalytic features, we performed the following operations on downloaded images.
(1) Set a specified quality factor QF and then transform the PGM images in Bossbase 1.01 into the JPEG images of a certain QF. (2) Set the embedding rate Payload, and then use the steganography algorithm to embed secret information into the JPEG images to acquire the stego images under the current Payload.
(3) Based on the set QF and Payload, use the steganalysis algorithm to extract the corresponding steganalytic features for the cover/stego images. (4) Depending on the steganography algorithm, steganalysis algorithm, QF and Payload (whose specific settings are shown in Table 1), by repeating (1)-(3), we will eventually construct a steganography detection image library containing 80,000 cover images and 400,000 stego images, and acquire a library containing 8 different steganalytic features.
Meanwhile, we continued to train and test the sample data along with the fisher linear discriminant (FLD) integrated classifier [40] by selecting 5000 cover and stego images as the training set and the remaining 5000 cover and stego images as the testing set, and then calculated the detection accuracy using Equation (18).
where, A P denotes the average detection accuracy, E P denotes the average detection error rate, FA P denotes the false alarm rate, MD P denotes the false positive rate. To ensure that the experimental results could be fair and reliable, we took the average detection accuracy of the ten-fold cross-check as the final result of this feature selection method. The experiments in this paper consist of three main parts.

Comparison Experiments with Features Selected under Different Thresholds
In order to determine the value of Ds in the FSCE method, in this subsection, we conducted a mass of experiments on the image steganalytic features extracted in Section 4.1, and then determined the relatively appropriate Ds based on the experimental results under different Ds . As for the range of Ds and the iteration step, after analyzing a large amount of literature, we found that most of the existing feature selection methods reduce the dimensionality to 70%, while a few feature selection methods can reduce the dimensionality of some features to about 50%. For example, the SRGS algorithm reduces the GFR feature to roughly 50%, the CGSM algorithm reduces the GFR feature to 65% and the CC-PEV feature to 67%. steganalysis-α reduces the J+SRM (union of SRMQ1 (SRM with the fixed quantization q = 1c) and CC-JRM) dimension to about 70%. To make the FSCE method more generalizable and effective for most of the steganalytic features, the range of Ds is specified between 0.5N and 0.7N in this paper, and the iteration step is 0.02N . In fact, based on this strategy (setting the same dimensions selected for different Payloads of the same feature), the dominant feature can be selected more efficiently. At the same time, this helps to propose new feature extraction methods. In general, the results of feature selection under different Ds are shown in Table 2.
As can be seen from     In summary, based on a combination of the feature dimensions selected and the detection accuracy of the stego images, we found that FSCE was better at selecting many features when 0.58 Ds N  . For example, for the F2 feature, when Payload = 0.1, the detection accuracy of FSCE is improved by 0.86% compared to the original feature, and when Payload = 0.2, the detection accuracy of FSCE is also improved by 1.20%. For the F4 feature, when Payload = 0.1, the detection accuracy of FSCE is improved by 0.92% compared to the original feature and when Payload = 0.3, the detection accuracy of FSCE is also improved by 0.80%. For the F5 feature, when Payload = 0.2, the detection accuracy of FSCE is improved by 0.18% compared to the original feature. For the F6 feature, when Payload = 0.5, the detection accuracy of FSCE is improved by 0.22% compared to the original feature. For the F7 feature, when Payload = 0.3, the detection accuracy of FSCE is improved by 0.24% compared to the origin-feature.

Comparison Experiments with Original Features and Randomly Selected Features
To verify the effectiveness of the FSCE, in this subsection we compare the features selected by the FSCE with the original features and the randomly selected features. Notably, the dimensionality of the "randomly selected features" is the same as the dimensionality of the features selected by the FSCE, so as to demonstrate the effectiveness of the FSCE. The results of the experiments are shown in Table 3. Table 3. Comparison results of FSCE-feature with Origin-feature and Random-feature. F1  F2  F3  F4  F5  F6  F7  F8  Payload  Quality  D  PA  PA  PA  PA  D  PA  PA  D  PA  D Table 2.

Feature
As can be seen from the table, the FSCE-feature has the best performance compared to the Random-feature and Origin-feature. Specifically, FSCE reduces the dimensionality by 42% while maintaining or even improving the detection accuracy. For example, for the F1 feature, when Payload = 0.5, FSCE improved the detection accuracy by 0.94% compared to the Random-feature. For the F2 feature, when Payload = 0.2, FSCE improved the detection accuracy by 3.85% compared to the Random-feature and by 1.20% compared to the Origin-feature. For the F3 feature, when Payload = 0.1, FSCE improved the detection accuracy by 5.91% compared to the Random-feature. For the F4 feature, when Payload = 0.1, FSCE improved the detection accuracy by 4.65% compared to the Random-feature and by 0.92% compared to the Origin-feature. For the F5 feature, when Payload = 0.4, FSCE improved the detection accuracy by 0.65% compared to the Random-feature and by 0.16% compared to the Origin-feature. For the F6 feature, when Payload = 0.5, FSCE improved the detection accuracy by 0.82% compared to the Random-feature and by 0.22% compared to the Origin-feature. For the F7 feature, when Payload = 0.3, FSCE improved the detection accuracy by 0.48% compared to the Random-feature and by 0.24% compared to the Origin-feature. For the F8 feature, when Payload = 0.4, FSCE improved the detection accuracy by 0.40% compared to the Random-feature.
For other Payloads, FSCE also achieved excellent results compared to Random-feature and Origin-feature, thus verifying the effectiveness of FSCE.

Comparison Experiments with Several Classical and State-of-the-Art Feature Selection Methods
To validate the efficiency and generality of FSCE, we conducted comparison experiments with PCA-based method [27], SRGS method [26] and CGSM method [28], where PCA-based method is more classical, SRGS is a more novel method for specific feature selection and CGSM is a more novel method for general feature selection. The results of the comparison between the algorithm proposed in this paper and the above three methods are shown in Table 4. From Table 4, it is clear that the performance of the proposed FSCE method is superior compared to the PCA-based method, SRGS method and CGSM method. Specifically, for example, for the F1 feature, when Payload = 0.5, the detection accuracy of FSCE is further improved by 13.00% compared to PCA-feature, and by 1.32% compared to SRGS-feature, while FSCE further reduces the dimensionality by 2363-D (about 29.54%) on SRGS-feature. In addition, a further improvement of 0.22% in detection accuracy compared to CGSM-feature. For the F2 feature, when Payload = 0.5, the detection accuracy of FSCE is further improved by 29.19% compared to PCA-feature, 4.41% compared to SRGS-feature and 1.09% compared to CGSM-feature, while the dimensionality is further reduced by 1109-D (about 13.86%). For the F3 feature, when Payload = 0.2, the detection accuracy of FSCE is further improved by 48.77% compared to PCA-feature, 0.28% compared to SRGS-feature, and the dimensionality is further reduced by 1623-D (about 20.29%) compared to CGSM-feature. For the F4 feature, when Payload = 0.3, the detection accuracy of FSCE is further improved by 49.29% compared to the PCA-feature, and the dimensionality is further reduced by 894-D (about 11.18%) compared to the SRGS-feature while maintaining comparable detection accuracy, and compared to the CGSM-feature a further improvement of 0.10%, while the dimensionality was further reduced by 2194-D (about 27.43%).
From the Table 5, for the F5 feature, when Payload = 0.5, the detection accuracy of FSCE is further improved by 12.87% compared to PCA-feature, 0.19% compared to SRGSfeature, and further reduced by 3681-D (about 21.65%) compared to CGSM-feature while maintaining comparable detection accuracy. For the F6 feature, when Payload = 0.5, the detection accuracy of the FSCE is further improved by 17.54% compared to the PCA-feature and by 0.25% compared to the SRGS-feature, while the dimensionality is further reduced by 6183-D (about 36.37%). For the F7 feature, when Payload = 0.5, the detection accuracy of the FSCE is further improved by 7.18% compared to the PCA-feature and by 0.37% compared to the SRGS-feature, while the dimensionality is further reduced by 6482-D (about 38.13%). For the F8 feature, when Payload = 0.1, the detection accuracy of FSCE is further improved by 9.76% compared to PCA-feature, 0.59% compared to SRGS-feature, while the dimensionality is further reduced by 2112-D (about 6.10%), and further improved compared to CGSM-feature by 0.18%, while the dimensionality is further reduced by 8046-D (about 23.21%). For other situations, FSCE has achieved equally excellent results.

Conclusions
To decrease the feature dimensions and the spatio-temporal overhead, this paper presents a fast selection method for image steganalytic features based on similar crossentropy (FSCE). Specifically, firstly, the innovative improvement of the normalization range simplifies the analysis of the changing trend of binary information entropy, and lays the foundation for the overall algorithm design process. Secondly, the cross-entropy applicable to image steganalysis is investigated, and a feature contribution measure criterion shaped akin to Fisher's criterion is constructed. Thirdly, after analyzing the constructed feature contribution criterion, it is found that the criterion has certain shortcomings. For this reason, after reviewing a large amount of literature, the concept of similar cross-entropy is presented for the first time and after analyzing its change trend, the feature contribution criterion is innovated and its reliability is verified theoretically.
As a result, the contribution degree of each feature component is better measured. Finally, the feature component with the highest contribution is selected as the final selected feature. Based on the above operations, FSCE has exceptional usability, which facilitates its use in real-world applications with strict memory footprint constraints and highefficiency requirements.
The effectiveness and efficiency of FSCE have been demonstrated through extensive experiments on the only standard and widely utilized Bossbase 1.01 image library. For example, for the F2 feature, when Payload = 0.5, the detection accuracy of the FSCE is further improved by 4.41% compared to the SRGS-feature. For the F3 feature, when Payload = 0.1, the detection accuracy of the FSCE is improved by 5.91% compared to the Random-feature. For the F4 feature, when Payload = 0.3, the detection accuracy of FSCE is further improved by 49.29% compared to PCA-feature. For the F8 feature, when Payload = 0.5, the detection accuracy of FSCE is further improved by 4.33% compared to CGSMfeature.
For the future, we will continue to devote ourselves to steganography and steganalysis, focusing on two aspects: On the one hand, analyzing the properties of each deleted feature component lays the foundation for new, more secure steganography techniques. On the other hand, investigating the characteristics of each retained feature component lays the foundation for new effective, efficient and low-dimensional steganalysis techniques.
Author Contributions: Conceptualization, R.J. and X.Y.; methodology, X.Y. and Y.M.; software, validation, X.Y.; formal analysis, Y.M.; writing-original draft preparation, X.Y.; writing-review and editing, X.Y. and S.Y.; visualization, X.Y. and L.X.; project administration, X.Y. All authors have read and agreed to the published version of the manuscript.