Image Steganalysis of Low Embedding Rate Based on the Attention Mechanism and Transfer Learning

Shouyue Liu; Chunying Zhang; Liya Wang; Pengchao Yang; Shaona Hua; Tong Zhang

doi:10.3390/electronics12040969

,

and

¹

College of Science, North China University of Science and Technology, Tangshan 063210, China

²

Key Laboratory of Data Science and Application of Hebei Province, Tangshan 063210, China

^*

Author to whom correspondence should be addressed.

Electronics2023, 12(4), 969;https://doi.org/10.3390/electronics12040969

This article belongs to the Special Issue Intelligent Analysis and Security Calculation of Multisource Data

Version Notes

Order Reprints

Review Reports

Abstract

In recent years, some research results have been achieved in the field of image steganalysis. However, there are still problems of difficulty in extracting steganographic features from images with low embedding rates and unsatisfactory detection performance of steganalysis. In this paper, we propose an image steganalysis method based on the attention mechanism and transfer learning. The method constructs a network model based on a convolutional neural network, including a preprocessing layer, a transposed convolutional layer, an ordinary convolutional layer, and a fully connected layer. We introduce the efficient channel attention module after the ordinary convolutional layer to focus on the steganographic region of the image, capture the local cross-channel interaction information, realize the adaptive adjustment of feature weights, and enhance the ability to extract steganographic features. Meanwhile, we apply the transfer learning method to use the training model parameters of high embedding rate images as the initialization parameters of the training model of the low embedding rate to achieve feature migration and further improve the steganalysis performance of the low embedding rate. The experimental results show that compared to the typical Xu-Net and Yedroudj-Net models, the detection accuracy of the proposed method is improved by 16.36% to 30.66% and by 35.59 to 37.83% for the embedding rates of 0.05 bpp, 0.1 bpp, and 0.2 bpp, respectively. Compared to the state-of-the-art Shen-Net model with low embedding rates, the detection accuracy is improved by 3.43% to 6.41%. This demonstrates the higher detection performance of the proposed method for steganalysis of low embedding rate images.

Keywords:

image steganalysis; low embedding rate; attention mechanism; transfer learning; convolutional neural network

1. Introduction

With the rapid development of image steganography, image steganalysis has become a key research direction in network information security. Image steganalysis determines whether an image contains secret information by detecting it. It is mainly used to prevent people from illegally using steganography to transmit terrorist information for commercial espionage and other unlawful acts and to maintain Internet security and safeguard the stability of a country and society.

The development of image steganalysis is roughly divided into two stages. The first stage is to detect images using traditional steganalysis methods, which rely mainly on manual experience to extract useful steganographic features, and these methods have limitations. With the development of deep learning, steganalysis enters the second stage, combining the convolutional neural network (CNN) with steganalysis, which has achieved good detection results. It is common to combine convolutional neural networks with image-based models [1,2,3]. Xu et al. [4] proposed a 5-layer convolutional neural network steganalysis model, Xu-Net, which was completely different from the CNN structure in computer vision and can well optimize the critical steps of steganalysis in residual computation, feature extraction, and binary classification. Ye et al. [5] constructed Ye-Net, proposed the truncation activation function (TLU) to enhance the representation of steganographic features, and combined it with channel selection perception to improve the model convergence speed. In the reference [6], Yedroudj-Net was proposed by combining the advantages of Xu-Net [4], Ye-net [5], and Res-Net [7], which used thirty preprocessing filters from SRM [8], as well as constructed five convolutional layers, three fully connected layers, and one softmax layer. Shen et al. [9] designed a new convolutional neural network structure Shen-Net. It used TanH [10] and ReLU [11] activation functions along with the transfer learning method to improve the steganalysis accuracy.

The amount of steganographic content in an image can be measured with the embedding rate. The embedding rate is the ratio of the bits of steganographic information embedded to the number of pixels in the image, and it is measured in bits per pixel (bpp). Currently, most of the image steganalysis models based on deep learning detect images with high embedding rates, and the detection performance of steganalysis with low embedding rates is poor or less studied. The main reason for this is that when the embedding rate of steganography is lower, fewer traces are left in the image, and model training is more difficult or even impossible to converge, making steganography images more difficult to detect.

To address this problem, we propose an image steganalysis method for a low embedding rate based on the attention mechanism and transfer learning, named TCSI-ECA-Transfer. The method combines convolutional neural networks with Efficient Channel Attention (ECA) [12] to construct a network model structure. ECA, as an attention mechanism, enhances the attention of image steganographic features with low embedding rates, making it easier for the network to extract steganographic features in a fine-grained manner. Then, since the secret images with high embedding rates and low embedding rates have common features, the training model parameters of the high embedding rate images as the source domain are migrated as a way to initialize the model task of the low embedding rate images in the target domain, which effectively avoids the training model of the low embedding rate from learning using random initialization of parameters and further improves the steganalysis performance of the images. Extensive experimental results show that at a low embedding rate and using three common adaptive image steganographic algorithms, S-UNIWARD [13], WOW [14], and HUGO [15], the proposed image steganalysis method based on the attention mechanism and transfer learning achieves significant improvements over the existing steganalysis methods based on convolutional neural networks. Among them, compared to the typical models Xu-Net [4] and Yedroudj-Net [6], the detection accuracy is improved by 16.36% to 30.66% and 35.59 to 37.83%, respectively. Compared with the advanced model Shen-Net [9], the detection accuracy is improved by 3.43% to 6.41%.

The main structure of this paper is as follows. The Section 2 covers the work related to attention mechanisms and transfer learning in recent years. The Section 3 introduces the structure of the proposed network model, the detailed internal structure of ECA, and the specific process of feature migration. The Section 4 describes the experimental environment and the complete analysis of the experimental results to verify the method’s effectiveness in this paper. The Section 5 concludes and recommends the next steps.

2. Related Work

2.1. Attention Mechanism

The attention mechanism is a way to focus on focal regions and effectively generate feature weights [16]. The concept of the attention mechanism originates from the human visual system and simulates the perception of the human eye; when faced with a thing filled with a large amount of information, humans fuse local visual structures and selectively focus on some of the important information and ignore others. Thus, the attention mechanism improves the performance of classification network models using this means.

In recent years, the attention mechanism has been a mainstream approach in deep learning and has been widely used in computer vision [17,18], speech recognition [19,20], natural language processing [21,22], etc. Fukui et al. [23] proposed an attention branching network ABN to extend the visual interpretation model with a branching structure using an end-to-end approach. At the same time, ABN can also be introduced into various baseline models and applied to image recognition tasks. Yan et al. [24] introduced a spatio-temporal attention mechanism into an encoder–decoder neural network and used it on a video screen. This approach takes into account the temporal and spatial structures in the video and selectively focused on the important regions of the frames to enable the decoder to decode the input information accurately. Liu et al. [25] proposed a bidirectional LSTM network structure, in which the attention mechanism was used to focus on the important information in the forward and backward hidden layers of the bidirectional LSTM to adjust the weights of variable length sequences and improve the accuracy of text classification. Li et al. [26] introduced the attention mechanism into convolutional neural networks for identifying facial expressions with partial occlusion, and the proposed network model can automatically perceive facial occlusion regions and focus on recognizing unoccluded regions and informative areas. In the reference [27], a facial expression recognition network was designed to combine LBP features and an attention mechanism. They used the attention mechanism to increase the weights of key part features such as the eyes, mouth, and nose, combined to enhance facial recognition. Cai et al. [28] proposed a new graph convolution algorithm with a cross-attention mechanism, which increases the feature variance by assigning weight coefficients to each row and column of features by a horizontal attention mechanism and a vertical attention mechanism.

In the field of image steganalysis, the importance of each feature map differs when the cover image and the secret image are input to the convolutional neural network for training, and the attention mechanism can obtain the important information of the feature map and allocate the computational resources of the neural network more reasonably. In this paper, we also introduce the attention mechanism into the network model and demonstrate its excellent effect in Section 4.3.1.

2.2. Transfer Learning

Transfer learning is an important machine learning method that improves the effectiveness of the task model

B

by reusing an existing task model A in the process of another task model

B

. It is specifically defined assuming that the source domain

D_{s}

consists of a feature space

X_{s}

and edge probability distribution

P_{s}

, and the target domain

D_{t}

consists of a feature space

X_{t}

and edge probability distribution

P_{t}

, then we have

D_{s} = {X_{s}, P (X_{s})}

(1)

D_{t} = {X_{t}, P (X_{t})}

(2)

Transfer learning is the process of first learning in

D_{s}

and then transferring the learned knowledge to

D_{t}

, where

D_{s} \neq D_{t}

. Niu et al. [29] argue that transfer learning can solve the problem of significant degradation in model performance by fusing knowledge from one or more different domains. Zhuang et al. [30] argued that the purpose of transfer learning is to transfer knowledge from the source domain to improve the learning of the target domain contained in different but related domains, to improve the efficiency of the target learner, and to reduce the dependence on the target domain data. The data in the source and target domains need to be sufficiently related or similar for transfer learning to achieve the desired effect. If the source and target domains are not correlated, the experimental results may show a negative transfer.

Transfer learning can be divided into four types of learning: sample, feature, model, and relationship. In this paper, we perform transfer learning as a model to optimize the model of a low embedding rate using the trained image steganalysis model with high embedding rates in the BOSSbase dataset [31] as a way to improve the steganalysis in the low embedding rate case.

3. Proposed Method

3.1. Model Structure

Aiming at the problem that it is difficult to improve the accuracy of image steganalysis of a low embedding rate, we build a new steganalysis model TCSI-ECA-Transfer for low embedding rate images based on convolutional neural network structure, introducing the attention mechanism and using transfer learning to further improve the image steganalysis detection of a low embedding rate. The detailed structure of the model is shown in Figure 1.

Figure 1. The detailed structure of the proposed TCSI-ECA-Transfer model. Each grid represents the details of each layer. “a × (b × b)” means that the size of the convolution kernel is b × b and the number of input channels is a.

The model consists of one preprocessing layer, one transposed convolutional layer, five ordinary convolutional layers (including the ECA module), and two fully connected layers. The preprocessing layer filters the image using 30 high-pass filters, and the weights of each high-pass filter are kept constant during training while all are taken from the SRM. The filtered image is input to the transposed convolutional layer for upsampling, doubling the size of the feature map. At the same time, the steganographic noise is also amplified, which facilitates the extraction of steganographic features by the ordinary convolutional layers later. The transposed convolutional layer is a 3 × 3 convolutional kernel with a stride of 2. Subsequently, the feature map enters the ordinary convolutional layers. The first convolutional layer and the third convolutional layer have the same convolutional kernel size of 3 × 3, and the second and fourth convolutional layers have convolutional kernels size of 1 × 1. The fifth one is global convolution, and the convolutional kernel size is the same as the feature map output from the fourth convolutional layer, which is 64×64. Among the five ordinary convolutional layers, the number of channels in Conv1, Conv3, and Conv4 is all 64, and the number of channels in Conv2 and Conv5 is 32 and 128, respectively. The network model is finally classified by two fully connected layers and a softmax function for the feature maps. In the whole model constructed, the preprocessing layer is followed by the TLU activation function, and both the transposed convolutional layer and the ordinary convolutional layer are followed by the batch normalization (BN) and the ReLU activation function. The ECA module assigns weights to the feature maps after each convolution layer to enhance the steganographic features.

We select three models to be used for an experimental comparison analysis in Section 4.3.3, and their model parameters are shown in Table 1. Compared with the preprocessing layer of Xu-Net [4], our model using 30 high-pass filters can extract the noisy residual information in the image by combining the correlation between adjacent pixels. In both Xu-Net [4] and Yedroudj-Net [6], the number of convolutional layers is 5, the size of convolutional kernels in the first two layers is 5 × 5, and the size of convolutional kernels in the last two layers is 3 × 3. In Shen-Net [6], the number of convolutional layers is 3, the size of convolutional kernels in the first two layers is 3 × 3, and the size of the third convolutional kernel is 62 × 62. Compared to Xu-Net [4] and Yedroudj-Net [6], our model disables the averaging pooling layer to avoid steganographic feature loss due to the over-sampling of images. All three models also do not use transposed convolution to enlarge the feature maps.

Table 1. Parameters of Xu-Net, Yedroudj-Net, and Shen-Net models.

3.2. ECA Module

Features are usually extracted in convolutional neural network models using multiple layers of convolutional kernels, and the features extracted from each layer are different. These features contain both primary and secondary features. Therefore, we add the ECA module after the five ordinary convolutional layers to assign weights to the features targeted to different channels. The ECA module is a channel attention mechanism that focuses on the focused steganographic features, but fuses the feature maps of multiple channels, and achieves adaptive adjustment of feature weights. It can also capture local cross-channel interaction information and learn channel attention effectively.

The internal structure of the ECA module is shown in Figure 2. Firstly, the global average pooling (GAP) is performed on the convolved feature maps

χ \in R^{W \times H \times C}

, as shown in Equation (3). The purpose of this operation is to respond to the global distribution over the channels and aggregate the feature information so that all inputs have a global perceptual field.

Figure 2. The detailed internal structure of the efficient channel attention module (ECA). “C × (H × W)” means that the number of input channels is C, the height of the convolution kernel is H, and the width of the convolution kernel is W. “C × 1 × 1” means that the size of the convolution kernel is 1 × 1, and the number of input channels is C.

G (χ) = \frac{1}{W H} \sum_{i = 1, j = 1}^{W, H} χ_{i j}

(3)

In Equation (3),

W

and

H

are the width and height of the feature map. Let the number of channels in the input feature map be C. After global average pooling, the feature map of the dimension

1 \times 1 \times C

is output. Then, we capture the local cross-channel interaction information using a one-dimensional convolution with a convolution kernel of size

k

and generate the channel weights using a Sigmoid nonlinear activation function. The formula is shown in Equation (4), where

σ

is the Sigmoid function and

1 D_{k}

is the one-dimensional convolution.

w = σ (C 1 D_{k} (G (χ)))

(4)

The coverage of the interaction, which is the size of the convolution kernel

k

, is determined adaptively based on the number of channels

C

. The value is calculated as shown in Equation (5).

k = {| \frac{\log_{2} C}{γ} + \frac{b}{γ} |}_{o d d}

(5)

where

∥_{o d d}

denotes that

k

takes an odd value.

γ

and

b

are set to 2 and 1, respectively, to change the ratio between the number of channels

C

and the size of the convolution kernel

k

. Finally, we multiply the channel weights with the input feature map

χ

to complete the channel attention weighting. By introducing the super channel attention module into the model structure, we achieve recalibration of the feature map to stimulate the main features and suppress the unimportant features, which makes the directionality of the image steganalysis features enhanced at low embedding rates and effectively improves the detection performance of the model.

3.3. Model Migration

Currently, the adaptive image steganographic algorithms, such as S-UNIWARD, WOW, and HUGO, that have been proposed preferentially embed secret information into the most complex regions of the image texture. In contrast, the smooth regions of the image do not change. In the case of low embedding rates in image steganography, such as 0.1 bpp, only about 2% of the pixel values of the steganographic image are changed relative to the cover image. Therefore, it is difficult for the network model in the training stage of a low embedding rate to learn the steganographic features effectively, and thus the detection accuracy of image steganalysis cannot achieve the desired effect. To further improve the detection performance of image steganalysis for a low embedding rate, we introduce the transfer learning method into the proposed network model by constructing three processes: source domain training, feature migration, and target domain training. The architecture of the transfer learning method is shown in Figure 3.

Figure 3. Detailed architecture of the transfer learning method. The top half (i.e., blue area) is the source domain and the bottom half (i.e., yellow area) is the target domain. The features are transferred to the target domain after training in the source domain.

Source domain training.

Source domain training is the pre-training stage of the network model where steganalysis is performed on images with high embedding rates. For the selected dataset with high embedding rate steganography to obtain the carrier image, we input the cover image and the secret image as the training and validation sets in the source domain into the constructed network model, TCSI-ECA-Transfer, and generate the pre-trained model using the backpropagation computation process of the convolutional neural network and stochastic gradient descent algorithm to continuously update the parameters.

2.: Feature migration.

Feature migration is the transfer of the features extracted using the pre-trained model with high embedding rates to the target domain with low embedding rates. Since the overall model structures of the source and target domain training are the same, feature migration can also be interpreted as the parameter information saved by the pre-trained model as the initialization parameters for target domain training, which can improve the learning efficiency and training cost of the target domain.

3.: Target domain training.

After feature migration, target domain training is performed, which is the training stage for low-embedding rate images. The training and validation sets consist of low-embedding rate cover images and secret images. The target domain training is equivalent to fine-tuning the parameters of the model trained with the same steganography algorithm for high embedding rates to improve further the accuracy of image steganalysis of a low embedding rate.

For the same steganography algorithm, the steganographic features of high-embedding rate images contain the image steganographic features of the low embedding rate, which makes the source and target domains have similarity, and the transfer learning effect could be better. Meanwhile, in Section 4.3.2 of this paper, we verify the effectiveness of the transfer learning method using experiments.

4. Experiments and Results Analysis

4.1. Software Platform and Dataset

Our experiments are implemented using the Pytorch 1.8.1 deep learning framework on a Linux system environment. The GPU is NVIDIA GeForce RTX 2080 SUPER. We choose a BOSSbase dataset [31] that is widely used in image steganalysis and information hiding. It contains 10,000 grayscale images with a size of 512 × 512. Due to GPU memory limitations, all images are resampled to 256 × 256 using Matlab tools. We use the S-UNIWARD, WOW, and HUGO steganographic algorithms for the random embedding of secret information on images, which constitute 10,000 pairs of cover/steg images, respectively. The embedding rates of image steganography are 0.05 bpp, 0.1 bpp, 0.2 bpp, 0.3 bpp, and 0.4 bpp. The images are divided into the training set, validation set, and test set according to the ratio of 4:1:5.

4.2. Hyper-Parameters

The network model is trained using the SGD algorithm, and the convolutional layers and fully connected layers are initialized using the Xavier method. The batch size of the training phase is set to 8, the initial learning rate is 0.001, the weight is 0.9, the weight decay is 0.004, and the maximum number of iterations is 200,000. During the training process of the experiment, we use the early stop method to avoid overfitting of the network model. The preprocessing layer is not involved in learning, and all 30 high-pass filters are not normalized.

4.3. Experimental Analysis

In the adaptive image steganographic algorithm, as the embedding rate increases, the difference between the cover image and the secret image changes becomes more and more obvious. Figure 4a shows a cover image in the BOSSbase dataset [31]. Figure 4b–f shows the change in the secret information embedding using the S-UNIWARD algorithm at 0.05 bpp, 0.1 bpp, 0.2 bpp, 0.3 bpp, and 0.4 bpp. The black dots in the image are the unchanged pixels, and the white dots are the pixels altered by the algorithm.

Figure 4. (a) One of the cover images in the BOSSbase dataset. (b–f) The secret images are obtained by passing the image of (a) through the S-UNIWARD steganographic algorithm. The embedding rates of image steganography are 0.05 bpp, 0.1 bpp, 0.2 bpp, 0.3 bpp, and 0.4 bpp respectively.

4.3.1. Effect of Attentional Mechanisms on Model Performance

In the experiments, we introduce the ECA module into the convolutional neural network to study the effect of attention on the image steganographic features. To verify the effectiveness of the ECA module, we use the model structure in Section 2.1 for training and testing to derive the detection accuracy, while five ECA modules are removed from the model structure as comparison tests. The detection accuracies of the experiments using two steganographic algorithms and four embedding rates are shown in Table 2 and Table 3.

Table 2. Effect of the attention mechanism on model accuracy when using the S-UNIWARD algorithm.

Table 3. Effect of the attention mechanism on model accuracy when using the WOW algorithm.

From Table 2 and Table 3, it can be seen that the accuracy of steganalysis for the network model with the introduction of the ECA module is improved at both high and low embedding rates. In particular, the detection accuracy of the two steganographic algorithms improves more significantly than 0.3 bpp and 0.4 bpp for the embedding rates of 0.05 bpp, 0.1 bpp, and 0.2 bpp, indicating that the ECA module can effectively improve the detection performance of images steganalysis of a low embedding rate by adjusting the feature weights to the model and enhancing the attention of important features. However, there are also shortcomings. The detection results of the model using the WOW algorithm at 0.05 bpp in Table 3 are not very satisfactory even with the introduction of the ECA module. This may be caused by the model being trapped in a local optimum, a problem that we have effectively solved using transfer learning (i.e., Section 4.3.2).

4.3.2. Role of Transfer Learning

In this section, we use the S-UNIWARD, WOW, and HUGO steganographic algorithms for image steganalysis in the network model with the introduction of the ECA module. To verify that the transfer learning method can further improve the detection of steganalysis of a low embedding rate, we perform the accuracy comparison of the model with and without the introduction of transfer learning at the low embedding rates of 0.05 bpp, 0.1 bpp, and 0.2 bpp. The experimental results are shown in Figure 5.

Figure 5. The accuracy is obtained using each of the three steganographic algorithms. (a) Detection accuracy using the S-UNIWARD algorithm. (b) Detection accuracy using the WOW algorithm. (c) Detection accuracy using the HUGO algorithm.

When the models are transferred, the principle of “proximity” migration is adopted, i.e., the models with an embedding rate of 0.4 bpp are transferred to the models trained with 0.3 bpp, the models with an embedding rate of 0.3 bpp are transferred to the models trained with 0.2 bpp, and so on to the models with 0.1 bpp and 0.05 bpp. In Figure 5, it can be seen that at 0.05 bpp, all three steganographic algorithms are better improved in accuracy using transfer learning, among which the accuracy improvement is more obvious for the WOW and HUGO algorithms, indicating that transfer learning enables the models to jump out of the local optimum at lower embedding rates. At both 0.1 bpp and 0.2 bpp, the detection accuracy of transfer learning is also improved compared with that without the introduction of transfer learning. However, at 0.2 bpp, the training model without the introduction of transfer learning can learn enough steganographic features, and the accuracy improvement is more negligible with the introduction of transfer learning. In summary, we can conclude from the experiments that the proposed model using transfer learning can further improve the model detection of the low embedding rate.

4.3.3. Comparison with Other Models

In the three low embedding rates of 0.05 bpp, 0.1 bp, and 0.2 bpp, we select the Xu-Net [4], Yedroudj-Net [6], and Shen-Net [9] network models to compare the detection accuracy with TCSI-ECA-Transfer. The experimental results are shown in Table 4, Table 5 and Table 6, and “--” in the table indicates that the models did not converge.

Table 4. Accuracy comparison with other models when using the S-UNIWARD algorithm.

Table 5. Accuracy comparison with other models when using the WOW algorithm.

Table 6. Accuracy comparison with other models when using the HUGO algorithm.

The accuracy experimental results from Table 4, Table 5 and Table 6 show that the proposed TCSI-ECA-Transfer model improves the accuracy by 16.36% to 30.66% at low embedding rates compared to the Xu-Net [4] model, where the improvement is more obvious using the S-UNIWARD algorithm. The Yedroudj-Net [6] model does not converge at 0.05 bpp and 0.1 bpp using the three steganographic algorithms. The model is able to converge at 0.2 bpp, but the detection effect is not satisfactory. From the perspective of network model structure, compared with Xu-Net [4] and Yedroudj-Net [6], the TCSI-ECA-Transfer model disables the pooling layer to avoid image downsampling, which effectively reduces the loss of steganographic features and improves the detection accuracy of steganalysis. From the perspective of network model depth, compared with Shen-Net [9], the TCSI-ECA-Transfer model appropriately increases the number of convolutional layers and reasonably sets the size of the convolutional kernel to extract richer steganographic information and learn more complex and subtle features. The accuracy of the TCSI-ECA-Transfer model is improved from 3.43% to 6.41%, among which, the accuracy improvement is more obvious using the HUGO algorithm. Therefore, compared with the existing network models, the proposed TCSI-ECA-Transfer model achieves significant improvement. The experimental results demonstrate that the introduction of the ECA module and transfer learning in the model can significantly improve the performance of image steganalysis with low embedding rates.

5. Conclusions

In this paper, we focus on the poor performance of image steganalysis for low embedding rates. To this end, we propose an image steganalysis method (TCSI-ECA-Transfer) based on the attention mechanism and transfer learning, which introduces the ECA module into the convolutional neural network to enhance the attention of image steganographic features of a low embedding rate. Then, the trained model of high embedding rate images is used as the source domain. The model of lower embedding rate images is used as the target domain, and the trained parameters of the source domain are transferred to the target domain to enhance the image steganalysis performance further. The experimental results conclude that the TCSI-ECA-Transfer accuracy is significantly improved at low embedding rates. In particular, compared to existing typical and advanced models, the accuracy of TCSI-ECA-Transfer is improved by 5.03% to 37.83% at a very low embedding rate of 0.05 bpp. In the future, we will further optimize network models from multiple perspectives, such as Inception structures, residual networks, etc. In addition, we focus on enhancing the generalization capability of images steganalysis at low embedding rates.

Author Contributions

Conceptualization, S.L. and C.Z.; data curation, S.L.; formal analysis, S.L. and L.W.; funding acquisition, C.Z.; investigation, S.L. and L.W.; methodology, S.L.; project administration, S.L. and C.Z.; resources, S.L. and L.W.; software, S.L.; supervision, P.Y. and C.Z.; validation, P.Y., T.Z. and S.L.; visualization, S.H.; writing—original draft, S.L.; writing—review and editing, S.L., S.H. and T.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Hebei Province Professional Degree Teaching Case Establishment and Construction Project (Chunying Zhang: No. KCJSZ2022073), the Hebei Postgraduate Course Civic Politics Model Course and Teaching Master Project (Chunying Zhang: No. YKCSZ2021091), the Basic Scientific Research Business Expenses of Hebei Provincial Universities (Liya Wang: No. JST2022001) and the Tangshan Science and Technology Project (Liya Wang: No. 22130225G).

Acknowledgments

Support by colleagues and the university is acknowledged.

Conflicts of Interest

The authors declare no conflict of interest.

References

Han, C.; Ma, T.; Huyan, J.; Huang, X.; Zhang, Y. CrackW-Net: A Novel Pavement Crack Image Segmentation Convolutional Neural Network. IEEE Trans. Intell. Transp. Syst. 2022, 23, 22135–22144. [Google Scholar] [CrossRef]
Ghaderizadeh, S.; Abbasi-Moghadam, D.; Sharifi, A.; Zhao, N.; Tariq, A. Hyperspectral Image Classification Using a Hybrid 3D-2D Convolutional Neural Networks. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 7570–7588. [Google Scholar] [CrossRef]
Ansari, S.U.; Javed, K.; Qaisar, S.M.; Jillani, R.; Haider, U. Multiple sclerosis lesion segmentation in brain MRI using inception modules embedded in a convolutional neural network. J. Healthc. Eng. 2021, 2021, 4138137. [Google Scholar] [CrossRef] [PubMed]
Xu, G.; Wu, H.; Shi, Y. Structural design of convolutional neural networks for steganalysis. IEEE Signal Process. Lett. 2016, 23, 708–712. [Google Scholar] [CrossRef]
Ye, J.; Ni, J.; Yi, Y. Deep learning hierarchical representations for image steganalysis. IEEE Trans. Inf. Forensics Secur. 2017, 12, 2545–2557. [Google Scholar] [CrossRef]
Yedroudj, M.; Comby, F.; Chaumont, M. Yedroudj-net: An efficient CNN for spatial steganalysis. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 2092–2096. [Google Scholar]
He, K.M.; Zhang, X.Y.; Ren, S.Q.; Jian, S. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Fridrich, J.; Kodovsky, J. Rich Models for Steganalysis of Digital Images. IEEE Trans. Inf. Forensics Secur. 2012, 7, 868–882. [Google Scholar] [CrossRef]
Shen, J.; Liao, X.; Qin, Z.; Liu, X.-C. Spatial Steganalysis of Low Embedding Rate Based on Convolutional Neural Network. J. Softw. 2021, 32, 2901–2915. [Google Scholar]
Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. J. Mach. Learn. Res. 2010, 9, 249–256. [Google Scholar]
Nair, V.; Hinton, G.E. Rectified linear units improve restricted boltzmann machines vinod nair. In Proceedings of the International Conference on Machine Learning, Madison, WI, USA, 21 June 2010; pp. 807–814. [Google Scholar]
Wang, Q.L.; Wu, B.G.; Zhu, P.F.; Li, P.H.; Zuo, W.M.; Hu, Q.H. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11531–11539. [Google Scholar]
Holub, V.; Fridrich, J.; Denemark, T. Universal distortion function for steganography in an arbitrary domain. EURASIP J. Inf. Secur. 2014, 1, 1. [Google Scholar] [CrossRef]
Holub, V.; Fridrich, J. Designing Steganographic Distortion Using Directional Filters. In Proceedings of the IEEE International Workshop on Information Forensics and Security, Costa Adeje, Spain, 2–5 December 2012. [Google Scholar]
Pevný, T.; Filler, T.; Bas, P. Using high-dimensional image models to perform highly undetectable steganography. Int. Workshop Inf. Hiding 2010, 6387, 161–177. [Google Scholar]
Niu, Z.Y.; Zhong, G.Q.; Yu, H. A review on the attention mechanism of deep learning. Neurocomputing 2021, 452, 48–62. [Google Scholar] [CrossRef]
Huang, Z.L.; Wang, X.G.; Huang, L.C.; Huang, C.; Wei, Y.; Liu, W. CCNET: Criss-cross attention for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 603–612. [Google Scholar]
Wang, X.; Girshick, R.B.; Gupta, A.; He, K. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 7794–7803. [Google Scholar]
Chan, W.; Jaitly, N.; Le, Q.; Vinyals, O. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 20–25 March 2016; pp. 4960–4964. [Google Scholar]
Sperber, M.; Niehues, J.; Neubig, G.; Stüker, S.; Waibel, A. Self-attentional acoustic models. arXiv 2018, arXiv:1803.09519. [Google Scholar] [CrossRef]
Letarte, G.; Paradis, F.; Giguère, P.; Laviolette, F. Importance of self-attention for sentiment analysis. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Brussels, Belgium, 1 November 2018; pp. 267–275. [Google Scholar]
Shen, T.; Zhou, T.; Long, G.; Jiang, J.; Pan, S.; Zhang, C. Disan: Directional self-attention network for rnn/cnn-free language understanding. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 4–6 February 2018. [Google Scholar] [CrossRef]
Fukui, H.; Hirakawa, T.; Yamashita, T.; Fujiyoshi, H. Attention branch network: Learning of attention mechanism for visual explanation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
Yan, C.; Tu, Y.; Wang, X.; Zhang, Y.; Hao, X.; Zhang, Y.; Dai, Q. STAT: Spatial-temporal attention mechanism for video captioning. IEEE Trans. Multimed. 2019, 22, 229–241. [Google Scholar] [CrossRef]
Liu, G.; Guo, J. Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing 2019, 337, 325–338. [Google Scholar] [CrossRef]
Li, Y.; Zeng, J.; Shan, S.; Chen, X. Occlusion aware facial expression recognition using CNN with attention mechanism. IEEE Trans. Image Process. 2018, 28, 2439–2450. [Google Scholar] [CrossRef] [PubMed]
Li, J.; Jin, K.; Zhou, D.; Kubota, N.; Ju, Z. Attention mechanism-based CNN for facial expression recognition. Neurocomputing 2020, 411, 340–350. [Google Scholar] [CrossRef]
Cai, W.; Wei, Z. Remote sensing image classification based on a cross-attention mechanism and graph convolution. IEEE Geosci. Remote Sens. Lett. 2020, 19, 1–5. [Google Scholar] [CrossRef]
Niu, S.; Liu, Y.; Wang, J.; Song, H. A decade survey of transfer learning (2010–2020). IEEE Trans. Artif. Intell. 2020, 1, 151–166. [Google Scholar] [CrossRef]
Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; He, Q. A comprehensive survey on transfer learning. Proc. IEEE 2020, 109, 43–76. [Google Scholar] [CrossRef]
Bas, P.; Filler, T.; Pevný, T. “Break our steganographic system”: The ins and outs of organizing BOSS. Int. Workshop Inf. Hiding 2011, 6958, 59–70. [Google Scholar] [CrossRef]

Figure 1. The detailed structure of the proposed TCSI-ECA-Transfer model. Each grid represents the details of each layer. “a × (b × b)” means that the size of the convolution kernel is b × b and the number of input channels is a.

Figure 2. The detailed internal structure of the efficient channel attention module (ECA). “C × (H × W)” means that the number of input channels is C, the height of the convolution kernel is H, and the width of the convolution kernel is W. “C × 1 × 1” means that the size of the convolution kernel is 1 × 1, and the number of input channels is C.

Figure 3. Detailed architecture of the transfer learning method. The top half (i.e., blue area) is the source domain and the bottom half (i.e., yellow area) is the target domain. The features are transferred to the target domain after training in the source domain.

Figure 4. (a) One of the cover images in the BOSSbase dataset. (b–f) The secret images are obtained by passing the image of (a) through the S-UNIWARD steganographic algorithm. The embedding rates of image steganography are 0.05 bpp, 0.1 bpp, 0.2 bpp, 0.3 bpp, and 0.4 bpp respectively.

Figure 5. The accuracy is obtained using each of the three steganographic algorithms. (a) Detection accuracy using the S-UNIWARD algorithm. (b) Detection accuracy using the WOW algorithm. (c) Detection accuracy using the HUGO algorithm.

Table 1. Parameters of Xu-Net, Yedroudj-Net, and Shen-Net models.

Models	Number of HPF	Number of Convolutional Layers	Number of Average Pooling Layers	Number of Fully Connected Layers	With or without Transposed Convolution
Xu-Net [4]	1	5	5	1	No
Yedroudj-Net [6]	30	5	4	3	No
Shen-Net [9]	30	3	0	2	No

Table 2. Effect of the attention mechanism on model accuracy when using the S-UNIWARD algorithm.

Whether or Not to Introduce ECA	Detection Accuracy (%)
Whether or Not to Introduce ECA	0.05 bpp	0.1 bpp	0.2 bpp	0.3 bpp	0.4 bpp
No ECA introduced	72.82	81.77	88.65	94.65	96.91
ECA introduced	73.62	82.18	90.59	95.10	97.16

Table 3. Effect of the attention mechanism on model accuracy when using the WOW algorithm.

Whether or Not to Introduce ECA	Detection Accuracy (%)
Whether or Not to Introduce ECA	0.05 bpp	0.1 bpp	0.2 bpp	0.3 bpp	0.4 bpp
No ECA introduced	50.22	77.70	87.79	93.02	95.87
ECA introduced	50.67	78.39	88.61	93.65	96.20

Table 4. Accuracy comparison with other models when using the S-UNIWARD algorithm.

Network Models	Detection Accuracy (%)
Network Models	0.05 bpp	0.1 bpp	0.2 bpp
Xu-Net [4]	50.55	53.82	60.94
Yedroudj-Net [6]	--	--	56.01
Shen-Net [9]	69.17	78.90	88.17
TCSI-ECA-Transfer	74.20	84.00	91.60

Table 5. Accuracy comparison with other models when using the WOW algorithm.

Network Models	Detection Accuracy (%)
Network Models	0.05 bpp	0.1 bpp	0.2 bpp
Xu-Net [4]	50.76	55.86	65.43
Yedroudj-Net [6]	--	--	51.20
Shen-Net [9]	63.90	74.05	85.49
TCSI-ECA-Transfer	69.73	79.84	89.00

Table 6. Accuracy comparison with other models when using the HUGO algorithm.

Network Models	Detection Accuracy (%)
Network Models	0.05 bpp	0.1 bpp	0.2 bpp
Xu-Net [4]	52.83	59.49	66.79
Yedroudj-Net [6]	--	--	51.32
Shen-Net [9]	62.78	72.93	84.66
TCSI-ECA-Transfer	69.19	79.30	89.15

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Image Steganalysis of Low Embedding Rate Based on the Attention Mechanism and Transfer Learning

Abstract

1. Introduction

2. Related Work

2.1. Attention Mechanism

2.2. Transfer Learning

3. Proposed Method

3.1. Model Structure

3.2. ECA Module

3.3. Model Migration

4. Experiments and Results Analysis

4.1. Software Platform and Dataset

4.2. Hyper-Parameters

4.3. Experimental Analysis

4.3.1. Effect of Attentional Mechanisms on Model Performance

4.3.2. Role of Transfer Learning

4.3.3. Comparison with Other Models

5. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics