Deep-Learning Steganalysis for Removing Document Images on the Basis of Geometric Median Pruning

: The deep-learning steganography of current hotspots can conceal an image secret message in a cover image of the same size. While the steganography secret message is primarily removed via active steganalysis. The document image as the secret message in deep-learning steganography can deliver a considerable amount of e ﬀ ective information in a secret communication process. This study builds and implements deep-learning steganography removal models of document image secret messages based on the idea of adversarial perturbation removal: feed-forward denoising convolutional neural networks (DnCNN) and high-level representation guided denoiser (HGD). Further—considering the large computation cost and storage overheads of the above model—we use the document image-quality assessment (DIQA) as threshold, calculate the importance of ﬁlters using geometric median and prune redundant ﬁlters as extensively as possible through the overall iterative pruning and artiﬁcial bee colony (ABC) automatic pruning algorithms to reduce the size of the network structure of the existing vast and over-parameterized deep-learning steganography removal model, while maintaining the good removal e ﬀ ects of the model in the pruning process. Experiment results showed that the model generated by this method has better adaptability and scalability. Compared with the original deep-learning steganography removal model without pruning in this paper, the classic indicators params and ﬂops are reduced by more than 75%.


Introduction
Image steganography is a technique for concealing secret messages in cover images and transmitting stego images to complete transmission of secret messages in a common channel. The receiving end of the transmission can leak the secret message. In recent years, deep-learning frameworks, especially convolutional neural networks (CNNs) have recently achieved remarkable superiority over conventional approaches in many fields. In the meanwhile, from early AlexNet [1] and VGGNet [2], to later more advanced Inception models [3] and ResNet [4]. Deep learning has also been introduced to the field of information hiding. Deep-learning steganography has progressed considerably with larger payloads in secret messages than the traditional steganography and successfully distributes secret messages to available bits of the cover image. However, lossy deep steganography limits secret messages to images. Baluja [5] proposed a CNN based on the encoder-decoder structure, the encoder network can successfully conceal a secret image into a same-size cover image and the decoder network can reveal the secret image completely. Wu et al. [6] put forward a deep-learning steganography based on CNN architecture to provide better payload and performance compared with the traditional steganography method. Dong et al. [7] offered a deep-learning steganography called ISGAN by introducing generative adversarial networks (GANs) into CNN networks while enhancing security and increasing invisibility of the secret message by minimizing the difference between empirical probability distributions of stego and natural images. Existing studies on steganography fail to use document images as secret steganography message. Unlike general images, document images can transmit a large amount of text messages, its privacy and credibility need to be effectively protected in social networks [8]. The document image, which has certain practical and research significance, is used in this study as the secret message of deep-learning steganography. General image-quality assessment methods, such as peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) [9], are full-reference objective evaluation algorithms for examining images, that fail to measure the removal quality of document images, Hence, this study uses document image-quality assessment (DIQA) [10] to measure the removal quality of the document image's secret message.
Steganalysis is an attack on steganography algorithms that intercepts images through listeners in a common channel and analyses the confidentiality of information contained in images. Since the introduction of steganalysis, steganography and steganalysis have been mutually reinforcing. Steganalysis [11] is a technique for detecting or deleting steganography secret messages. However, detecting the existence of hidden information and extracting such hidden information without knowing the steganography method are difficult; hence, the removal of steganography secret messages has become an important research area. Limited studies are available on the removal of secret messages of deep-learning steganography. Jung et al. [12] proposed a framework called PixelSteganalysis, which is inspired by PixelCNN [13]. The proposed deep-learning steganography removal method in this study is inspired by adversarial examples proposed by Szegedy et al. [14] that treat the hidden secret message as tiny adversarial perturbation and adopt the idea of adversarial perturbation removal as deep-learning steganography removal. For example, Jia et al. [15] put forward an end-to-end image model called ComDefend, which consists of a compression convolutional neural network (ComCNN) and a reconstruction convolutional neural network (ResCNN), to protect against adversarial sample attacks. ComCNN is used to maintain the structure information of the original image, and ResCNN is utilized to construct the original image with high quality. This study proposes a deep-learning steganography removal model with better adaptability and scalability that combines deep-learning steganography and adversarial perturbation removal methods.
Although deep neural networks progressed considerably in the field of steganography, a large number of data sets are required to obtain good performance and deployment is difficult because a substantial amount of parameters and storage overheads [16,17] are involved. proposed framework of deep-learning steganography removal based on the deep neural network in this study also has a large number of params and flops. Deploying steganography firewalls in network nodes is difficult. Model optimization has recently become an important research topic. Network pruning [18,19] is a technology that can effectively compress and accelerate CNNs and allows users to deploy efficient networks on hardware devices with limited storage and computing resources. For example, Han et al. [20] initially pruned a deep learning model by removing weights below a certain threshold. Han et al. [21] further combined weight pruning with Huffman coding; however, this method may lead to unstructured sparseness of filters that may be inefficient in saving memory usage and computing costs. Structured sparseness and effective memory usage can be achieved in the model via the filter pruning method. Therefore, filter pruning is effective in accelerating the development of deep neural networks. Accordingly, this study uses filter pruning to compress the model, reduce params and flops of the deep-learning steganography removal model and ensure the model performance.
The main contributions of this article include the following: The idea of adversarial perturbation removal is adopted as deep-learning steganography removal, and the two types of deep-learning steganography removal models, namely, DnCNN and HGD, are implemented.
The document image is used as the secret message of the deep-learning steganography method called ISGAN [7], and the document image-quality assessment called DIQA [10] is utilized to evaluate the secret message of deep-learning steganography.
Geometric median pruning is used to analyze filters and the performance of each convolutional layer of the deep-learning steganography removal model. DIQA and image-quality assessment [9] are used as thresholds, and iterative pruning is applied to the two proposed deep-learning steganography removal models to achieve satisfactory results.
DIQA, image-quality assessment, params and flops are used as nectar source fitness, and artificial bee colony (ABC) automatic pruning is applied to the two proposed deep-learning steganography removal models to achieve satisfactory results.
The remainder of this paper is organized as follows: Technology related to this study is discussed in Section 2. Two types of deep-learning steganography removal frameworks are proposed in Section 3. Deep-learning steganalysis for removing document images based on geometric median pruning is put forward in Section 4. The experimental results are presented in Section 5. Finally, the conclusions of this study are summarized, and future research directions are discussed in Section 6.

Active Steganalysis
The steganography secret message is primarily removed via active steganalysis, which is divided into two categories, namely, conventional active and deep-learning steganalyses.
Fridrich et al. [22] proposed a method of conventional active steganalysis to overwrite random bits suggested by the gaussian noise or others where messages may reside. Guo et al. [23] proposed a median filter for removing noise in an image, especially sporadic noise of large variance. Amritha et al. [24] removed the hidden secret message by using a denoising filter to reduce the cover image quality and then restoring the cover image to some extent through deconvolution operations.
Jung et al. [12] proposed a framework of deep-learning steganalysis based on PixelCNN [13] called PixelSteganalysis, which exploits sophisticated pixel distributions and edge areas of images using a deep neural network. Accordingly, we adaptively remove secret information at the pixel level. Corley et al. [25] offered a deep digital steganography purifier based on GAN that destroys steganography content without compromising the perceived quality of the original image.
This study proposes the use of adversarial perturbation removal as deep-learning steganography removal. Zhang et al. [26] put forward an end-to-end trainable Gaussian denoising architecture called the denoising convolutional neural networks (DnCNN); batch normalization and residual learning are integrated to accelerate the training process and boost the denoising performance. Song et al. [27] offered a defense method using pixel-level CNN called PixelDefend and determined that adversarial examples primarily lie in low-probability regions of the training distribution; PixelDefend can reconstruct the low-probability regions of adversarial examples into a clean image that meets the requirements of high-probability regions. Liao et al. [28] proposed several high-level representation guided denoiser (HGD) methods by treating adversarial examples as noise, to achieve defensive adversarial examples.

Pruning Methods
Network pruning methods can be divided into two categories for CNNs, namely, unstructured weight pruning [20] and channel-based structured pruning. Unstructured pruning typically results in irregular network structures, which require dedicated hardware and software to support the actual acceleration. The network complexity cannot be reduced without the dedicated hardware support. Therefore, using the structured pruning method is practical.
The channel-pruning method compresses the model structure by deleting the entire filters and is appropriately supported by general hardware. Several works have evaluated the importance of filter weights. Li et al. [29] assessed the importance of filter using L p norm; unimportant filters in the convolution layer are deleted after artificially setting the pruning ratio. Yang et al. [30] proposed a soft filter pruning method that allows updating the pruned filters when pruning the training model; the network with good model learning ability can be trained from scratch and pruned at the same time without fine tuning to achieve a good effect. He et al. [31] offered a geometric median pruning method that prunes redundant filters by regarding the convolution kernel near the geometric center as similar and unimportant. Chen et al. [32] proposed a self-adaptive network pruning method (SANP) to reduce cost for CNNs, that introduces a general Saliency-and-Pruning Module (SPM) for each convolutional layer, which learns to predict saliency scores and applies pruning for each channel.
Several works have focused on pruning based on reconstruction errors by using channel pruning as an optimization problem and selecting representative filters. Madaan et al. [33] proposed a new loss for adversarial learning to minimize the feature-level vulnerability during training and proposed a Bayesian framework to prune features with high vulnerability in order to reduce both vulnerability and loss on adversarial samples. Luo et al. [34] proposed the ThiNet architecture that uses greedy method to delete channels with minimal effect on the activation value of the next layer. He et al. [35] used lasso regression to select channels for pruning. Yu et al. [36] determined pruning filters by minimizing reconstruction errors of the penultimate layer of the network and considering cumulative back propagation errors.
Some studies have focused on pruning based on regularization. Liu et al. [37] used scaling factor γ in the normalization layer to impose sparseness constraints, measure the importance of channels during the training process, filter out channels with low scores and prune layer-by-layer. Huang et al. [38] and Lin et al. [39] proposed a sparse regularization mask method based on channel pruning; the mask is optimized via data-driven selection or generative adversarial learning. Zhao et al. [40] further developed the norm-based importance estimation by taking the dependency between the adjacent layers into consideration and propose a novel mechanism to dynamically control the sparsity-inducing regularization so as to achieve the desired sparsity.
Other studies have focused on searchable automatic pruning methods. Guerra et al. [41] proposed an effective pruning strategy for selecting redundant low-precision filters by combining neural network quantification and pruning to realize automation process. Dong et al. [42] proposed a search architecture called transformable architecture that combines knowledge distillation and searchability to find a good network structure. Liu et al. [43] proposed a heuristic search algorithm that trains the remaining weights while pruning to obtain a structurally sparse model of weight distribution and further searches and deletes a small part of redundant weights through network structure purification. Lin et al. [44] proposed a channel-pruning method based on artificial bee colony (ABC) algorithm; searching for the optimal pruning structure is regarded as an optimization problem and the ABC algorithm is integrated to solve the problem of selecting the optimal pruning structure with the best fitness automatically.

Deep-Learning Steganography Removal Model
This study adopts the idea of adversarial perturbation removal as deep-learning steganography removal and implements two types of deep-learning steganography removal models, namely DnCNN [26] and HGD [28]. Approximately 50,000 ISGAN-generated [7] stego images with a size of 256 × 256 resolution and their corresponding cover images are used as data sets to train the model. SSIM and PSNR [9] are used to measure the image quality. SSIM measures the image similarity in terms of brightness, contrast and structure, and its value is within the range [0, 1]. PSNR is a widely used objective evaluation assessment for images based on errors between corresponding pixels.
Self-made black and white document images and Google's tesseract document image recognizer are used to score document images. We classify approximately 40,000 document images with a size of 256 × 256 resolution and their label data sets in each rating segment for the training of the documentation evaluation model [10]. The trained model is used as DIQA [10] to measure the degree of destruction of document images as secret messages.

DnCNN Model
The DnCNN model is modified on the basis of VGG network [2] to make it suitable for image denoising. We use the DnCNN denoising model as deep-learning steganography removal (see Figure 1). Figure 1 illustrates an end-to-end deep convolutional steganography removal network. Given a depth of D network, the different types of layer composition are presented as follows: Convolutional layer (Conv) + rectified linear unit (ReLU): The first layer uses 64 filters with a size of 3 × 3 × 3 to generate 64 feature maps. Using ReLu enables the output of some neurons to tend toward zero and lead to a sparse network with improved mining-related features.
Conv + batch normalization (BN) + ReLU: The 2~(D-1) layers use 64 filters with a size of 3 × 3 × 64. The BN layer is added between Conv and ReLU to accelerate the training and improve the performance of steganography removal. The document image regarded as a secret image in the stego image is gradually removed as noise in the layer-by-layer iteration. The DnCNN model is modified on the basis of VGG network [2] to make it suitable for image denoising. We use the DnCNN denoising model as deep-learning steganography removal (see Figure 1). Figure 1 illustrates an end-to-end deep convolutional steganography removal network. Given a depth of D network, the different types of layer composition are presented as follows: Convolutional layer (Conv) + rectified linear unit (ReLU): The first layer uses 64 filters with a size of 3 × 3 × 3 to generate 64 feature maps. Using ReLu enables the output of some neurons to tend toward zero and lead to a sparse network with improved mining-related features.
Conv + batch normalization (BN) + ReLU: The 2 ~(D-1) layers use 64 filters with a size of 3 × 3 × 64. The BN layer is added between Conv and ReLU to accelerate the training and improve the performance of steganography removal. The document image regarded as a secret image in the stego image is gradually removed as noise in the layer-by-layer iteration. Conv: The last layer uses three filters with a size of 3 × 3 × 64 to reconstruct the image as the output of the deep-learning steganography removal model.
The loss function of DnCNN is expressed as Formula (1), where C represents the original image; R represents the image after removing the steganography secret message; and I and J represent the length and width of the image, respectively. The difference between the original and purified images is used as the loss function to ensure that the image is as close to the original image as possible after removing the secret message as well as improves the SSIM and PSNR values of original and purified images as well as stego and purified images.

HGD Model
Denoising autoencoder (DAE) [45] is a popular denoising model, DAE has a bottleneck structure between the encoder and decoder. This bottleneck may constrain the transmission of fine-scale information necessary for reconstructing high resolution images. Hence, the HGD model was proposed by modifying DAE with U-net [46] structure to overcome the bottleneck. We use the HGD denoising model as deep-learning steganography removal (see Figure 2).
The entire model consists of a feedforward path and a feedback path and is divided into upper and lower layers (see Figure 2). HGD is primarily composed of 3 × 3 convolutional layers, BN and ReLU (Conv + BN + ReLU) to form a block and the last layer of 1 × 1 convolutional layer as the output layer. The upper layer network structure that uses 256 × 256 image * as the input and generates a set of feature maps with increasingly low resolution is called the Com layer, which is composed of 14 blocks. The lower layer network that primarily enlarges the size of feature maps through upsampling and then merges with the output of a certain level of the Com layer as the input of the feedback path to connect the upper and lower layers is called the Res layer, which is composed of 11 blocks and a Conv: The last layer uses three filters with a size of 3 × 3 × 64 to reconstruct the image as the output of the deep-learning steganography removal model.
The loss function of DnCNN is expressed as Formula (1), where C represents the original image; R represents the image after removing the steganography secret message; and I and J represent the length and width of the image, respectively. The difference between the original and purified images is used as the loss function to ensure that the image is as close to the original image as possible after removing the secret message as well as improves the SSIM and PSNR values of original and purified images as well as stego and purified images.

HGD Model
Denoising autoencoder (DAE) [45] is a popular denoising model, DAE has a bottleneck structure between the encoder and decoder. This bottleneck may constrain the transmission of fine-scale information necessary for reconstructing high resolution images. Hence, the HGD model was proposed by modifying DAE with U-net [46] structure to overcome the bottleneck. We use the HGD denoising model as deep-learning steganography removal (see Figure 2).
The entire model consists of a feedforward path and a feedback path and is divided into upper and lower layers (see Figure 2). HGD is primarily composed of 3 × 3 convolutional layers, BN and ReLU (Conv + BN + ReLU) to form a block and the last layer of 1 × 1 convolutional layer as the output layer. The upper layer network structure that uses 256 × 256 image X * as the input and generates a set of feature maps with increasingly low resolution is called the Com layer, which is composed of 14 blocks.
The lower layer network that primarily enlarges the size of feature maps through upsampling and then merges with the output of a certain level of the Com layer as the input of the feedback path to connect the upper and lower layers is called the Res layer, which is composed of 11 blocks and a convolutional layer. The resolution of feature maps continues to increase and restores to the original image size through the feedback path.
Symmetry 2020, 12, x FOR PEER REVIEW 6 of 24 convolutional layer. The resolution of feature maps continues to increase and restores to the original image size through the feedback path. The last output layer of HGD is a residual image − * , the final ^ = * − * as a purified image after steganography removal. The secret message of the hidden document image in the stego image is removed as small perturbations through compression and reconstruction. HGD uses a loss function consistent with DnCNN to ensure the quality of the purified image.
The DnCNN and HGD models are trained for about three days under a single GeForce GTX 1080 Ti graphics card. The performance of the final DnCNN and HGD is presented in Table 1. According to the results in Table 1, the trained DnCNN and HGD deep-learning steganography removal models have excessively large amount of params and flops, which are unfavorable for deployment in network nodes. Therefore, pruning is performed on the trained deep-learning steganography removal model. Redundant filters are pruned as much as possible to reduce the size of the network structure of the existing vast and over-parameterized deep-learning steganography removal model while ensuring that the secret message of the hidden document image is invisible to a certain extent. The specific method is discussed in Section 4.

Pruning Strategy for the Deep-Learning Steganography Removal Model
The filter pruning method allows the model to have structured sparseness and effective memory usage. Therefore, filter pruning is preferred in the process of accelerating the development of deep neural networks. Therefore, this study applies the filter pruning method of geometric median to compress the deep-learning steganography removal model.
DnCNN and HGD models generally utilize two pruning methods. As shown in Figures 3 and 4, let and represent the width and height of the feature map, respectively; represents the feature map, where ∈ × × ; represents the set; × × represents a set of three dimensions for ; represents the number of input channels of the convolution layer; represents the number of output channels of the convolution layer; represents the convolution kernel of the convolutional layer, where ϵ × ( is generally 1 or 3); × represents a set of two dimensions for ; , represents a filter of the convolutional layer, and all filters of the layer constitute a kernel matrix ∈ × × × ; × × × represents a set of four dimensions for .
The network structure shown in Figure 3 prunes a filter , in the convolution layer kernel matrix , affects the number of corresponding channels of the output feature maps and uses The last output layer of HGD is a residual image −Y * , the final Xˆ= X * − Y * as a purified image after steganography removal. The secret message of the hidden document image in the stego image is removed as small perturbations through compression and reconstruction. HGD uses a loss function consistent with DnCNN to ensure the quality of the purified image.
The DnCNN and HGD models are trained for about three days under a single GeForce GTX 1080 Ti graphics card. The performance of the final DnCNN and HGD is presented in Table 1. According to the results in Table 1, the trained DnCNN and HGD deep-learning steganography removal models have excessively large amount of params and flops, which are unfavorable for deployment in network nodes. Therefore, pruning is performed on the trained deep-learning steganography removal model. Redundant filters are pruned as much as possible to reduce the size of the network structure of the existing vast and over-parameterized deep-learning steganography removal model while ensuring that the secret message of the hidden document image is invisible to a certain extent. The specific method is discussed in Section 4.

Pruning Strategy for the Deep-Learning Steganography Removal Model
The filter pruning method allows the model to have structured sparseness and effective memory usage. Therefore, filter pruning is preferred in the process of accelerating the development of deep neural networks. Therefore, this study applies the filter pruning method of geometric median to compress the deep-learning steganography removal model.
DnCNN and HGD models generally utilize two pruning methods. As shown in Figures 3 and 4, let x i and y i represent the width and height of the feature map, respectively; P i represents the feature map, where P i ∈ R n i ×x i ×y i ; R represents the set; R n i ×x i ×y i represents a set of three dimensions for P i ; n i represents the number of input channels of the i th convolution layer; n i+1 represents the number of output channels of the i th convolution layer; K represents the convolution kernel of the convolutional layer, where K R k×k (K is generally 1 or 3); R k×k represents a set of two dimensions for K; F i,j represents a filter of the convolutional layer, and all filters of the layer constitute a kernel matrix Q i ∈ R n i ×n i+1 ×k×k ; R n i ×n i+1 ×k×k represents a set of four dimensions for Q i . pruned feature maps as the input of the next convolution layer. Hence, the corresponding kernel matrix of the next convolutional layer should also be removed accordingly. Hence, pruning a filter , in the model will reduce The HGD network structure is illustrated in Figure 4. Pruning a filter F , in the convolutional layer kernel matrix Q in the Com layer will affect corresponding channels of output feature maps P in the Com layer and the corresponding kernel matrix Q of the next convolutional layer, and then the corresponding weights in of the kernel matrix Q in the Res layer must be removed.

Geometric Median Pruning
We analyze all filters of each convolutional layer of the proposed DnCNN and HGD via geometric median. The geometric median minimizes the sum of Euclidean distances to all filters , in convolutional layers. The information center of all filters in the convolutional layer of this layer is expressed as Formula (2).
Filters at or near the geometric median containing redundant information can be replaced with the remaining filters, as shown in Formula (3).

Overall Iterative Pruning of Deep-Learning Steganography Removal Model Based on Geometric Median
Geometric median is used in this study to prune the pre-trained deep-learning steganography removal model, as shown in Algorithm 1. pruned feature maps as the input of the next convolution layer. Hence, the corresponding kernel matrix of the next convolutional layer should also be removed accordingly. Hence, pruning a filter , in the model will reduce The HGD network structure is illustrated in Figure 4. Pruning a filter F , in the convolutional layer kernel matrix Q in the Com layer will affect corresponding channels of output feature maps P in the Com layer and the corresponding kernel matrix Q of the next convolutional layer, and then the corresponding weights in of the kernel matrix Q in the Res layer must be removed.

Geometric Median Pruning
We analyze all filters of each convolutional layer of the proposed DnCNN and HGD via geometric median. The geometric median minimizes the sum of Euclidean distances to all filters , in convolutional layers. The information center of all filters in the convolutional layer of this layer is expressed as Formula (2).
Filters at or near the geometric median containing redundant information can be replaced with the remaining filters, as shown in Formula (3).

Overall Iterative Pruning of Deep-Learning Steganography Removal Model Based on Geometric Median
Geometric median is used in this study to prune the pre-trained deep-learning steganography removal model, as shown in Algorithm 1. The network structure shown in Figure 3 prunes a filter F i,j in the convolution layer kernel matrix Q i , affects the number of corresponding channels of the output feature maps P i+1 and uses pruned feature maps as the input of the next convolution layer. Hence, the corresponding kernel matrix Q i+1 of the next convolutional layer should also be removed accordingly. Hence, pruning a filter F i,j in the model will reduce n i × k 2 × x i+1 × y i+1 flops.
The HGD network structure is illustrated in Figure 4. Pruning a filter F i,j in the convolutional layer kernel matrix Q i in the Com layer will affect corresponding channels of output feature maps P i+1 in the Com layer and the corresponding kernel matrix Q i+1 of the next convolutional layer, and then the corresponding weights in n i+2 -n i+3 of the kernel matrix Q i+3 in the Res layer must be removed.

Geometric Median Pruning
We analyze all filters of each convolutional layer of the proposed DnCNN and HGD via geometric median. The geometric median minimizes the sum of Euclidean distances to all filters F i,j in i th convolutional layers. The information center of all filters in the convolutional layer of this layer is expressed as Formula (2).
Filters at or near the geometric median containing redundant information can be replaced with the remaining filters, as shown in Formula (3).

Overall Iterative Pruning of Deep-Learning Steganography Removal Model Based on Geometric Median
Geometric median is used in this study to prune the pre-trained deep-learning steganography removal model, as shown in Algorithm 1.
The document image can contain a large amount of information when used as a secret message in deep-learning steganography. Hence, the deep-learning steganography removal model is used as the steganography firewall to remove the hidden secret message in stego images. The removal model proposed in this study uses DIQA as the threshold to evaluate the degree of secret message removal while ensuring that the steganography secret message is invisible to a certain extent. In addition, the quality of the images must be considered, that is, the PSNR and SSIM indicators of the original and purified images, stego images and purified images. Algorithm 2 sets the DIQA threshold, SSIM > 0.9 and PSNR > 26 and iteratively prunes every layer in each convolution layer of the proposed DnCNN and HGD.
Algorithm 1 Pruning the deep-learning steganography removal model via geometric median 1: Prepare the pre-trained DnCNN and HGD models; 2: Calculate geometric median on all filters F i, j of a convolutional layer in the deep-learning steganography removal model, as in Formula (2), find the data center point f of filters in the convolutional layer; 3: Filters F i, j that have redundant information are closed to the geometric median according to Formula (3). We prune these redundant filters to achieve the purpose of pruning, while maintaining the performance of the models; 4: Prune a filter in a convolution layer, which will affect the kernel matrix and corresponding feature map channels in the next convolutional layer, As analyzed in 4.1. Therefore, it is necessary to remove the number of feature maps channels and corresponding weights involved in pruning. and to match the number of input and output channels of the relevant convolution layer; 5: Retain the remaining kernel matrix after removing filters of one convolution layer of the deep-learning steganography removal model, complete a filter pruning operation based on geometric median.

Algorithm 2
Iterative pruning process of the deep-learning steganography removal model 1: Input: Prepare the pre-trained DnCNN and HGD models; 2: Initialization: Set m pruning layers; set the maximum channels pruning rate r; set the channels iteration pruning rate p of the deep-learning steganography removal model. According to the maximum channels pruning rate and channels iterative pruning rate, set the number of iterative pruning channels of each convolutional layer, C = (c 1 , c 2 , · · · c n ), n = 1 p , 0 ≤ c ≤ channel × r; 3: Conditions: Set the DIQA threshold while ensuring the image quality, that is, SSIM > 0.9 for the original image and purified image, SSIM > 0.9 for the stego image and purified image, PSNR > 26 for the original image and purified image, PSNR > 26 for the stego image and purified image; 4: for i = 1:m do for j in C do Call Algorithm 1 to perform a geometric median pruning; Verify the network after each pruning. If the network after pruning meets the DIQA threshold and image-quality assessment, we should prune more channels for this convolutional layer. If the network after pruning does not meet the DIQA threshold and image-quality assessment, jump out of this layer loop, determine the final pruning result of the convolutional layer, save the pruning model and start pruning of the next convolutional layer; end end 5: Output: The pruned models.

ABC Automatic Pruning of Deep-Learning Steganography Removal Model Based on Geometric Median
Section 4.3 does violent iterative pruning in 90% channel pruning space of each convolutional layer of the deep-learning steganography removal model. The process considers the convolutional layer of the deep-learning steganography removal model separately and does not realize the automatic process. Thus, this section shrinks the combinations where the preserved channels of the deep-learning steganography removal model are limited to a specific space based on geometric median. Moreover, then, we formulate the search of optimal pruned structure as an optimization problem and integrate the ABC algorithm to solve it in an automatic manner.
The ABC algorithm contains three basic elements: nectar sources, employed bees and unemployed bees and three basic behavior models: search for nectar sources, recruit bees for nectar sources and give up a nectar source.

•
Nectar sources: Its value is composed of many factors, such as the amount of nectar, the distance from the hive and the difficulty of obtaining nectar. The fitness of nectar source is used to express the above factors; • Employed bees: The number of employed bees and nectar sources is usually equal. Employed bees have memory function to store relevant information of a certain nectar source, including the distance, direction and abundance of nectar source and share this information with a certain probability to other bees; • Unemployed bees: The responsibility of unemployed bees including onlooker bees and scout bees is to find the nectar source to be mined. Onlooker bees observe the swinging dance of the employed bees to obtain important nectar source information and choose the bees that they are satisfied with to follow. The number of onlooker bees and employed bees is equal. Scout bees that account for 5-20% of total bee colonies do not follow any other bees and randomly search for nectar sources around the hive.
The corresponding relationship between the foraging behavior of ABC and the channel pruning problem of the deep-learning steganography removal model is shown in Table 2. Thus, the optimization problem of structure search of the deep-learning steganography removal model is abstracted into the foraging behavior of bees. Table 2. The corresponding relationship between the foraging behavior of ABC and the channel pruning problem of the deep-learning steganography removal model.

Foraging Behavior of ABC Channel Pruning Problem of the Deep-Learning Steganography Removal Model
Nectar sources The combinations of each convolutional layer channels of model.

Quality of nectar sources
The quality of combinations is achieved by calculating the combination fitness value, that is, sets the DIQA threshold, SSIM > 0.9, PSNR > 26 and params and flops of the model as small as possible.
Optimal quality of nectar sources The params and flops of the model are the smallest and the image quality and document image quality are guaranteed.

Pick nectar
Search the pruning structure of the model.

Initialization of Nectar Sources
The combinations of each convolutional layer channels in deep-learning steganography removal are regarded as nectar sources, and the quality of the nectar sources corresponds to the fitness value f i of combinations. Let D represents the number of convolutional layers participating in combinations. The position of nectar sources i = {1, 2, · · · n} is expressed as X t i = x t i1 , x t i2 , · · · , x t iD . x id ∈ (L d , U d ) represents a convolutional layer of the deep-learning steganography removal model. Let d {1, 2, · · · , D}, L d and U d represent the lower and upper limits of the search pace, respectively. The lower limit means that a certain convolutional layer of the model is not pruned, and the upper limit means that a certain convolutional layer of the model is pruned 90% channels. The initial position of nectar i is randomly generated in the search space according to Formula (4).

Search Process of Employed Bees
At the beginning of the search, the employed bee searches for a new nectar source near the old nectar source i according to Formula (5). Among them, j i, j {i = 1, 2, . . . n} means to randomly select a nectar source not equal to i from n nectar sources. ∅ id is a uniformly distributed random number [−1, 1], which determines the degree of pruning of each convolutional layer channels. x id is a new nectar source, calculated on the basis of comparing the previous nectar source x id with the neighbor nectar source. If the quality of the new nectar source f X i is better than the quality of the previous nectar source f X i , the employed bees memorize the new nectar source, otherwise the old nectar source is memorized. x

Search Process of Onlooker Bees
Onlooker bees select nectar sources through roulette from the nectar sources searched by employed bees, and the probability of a nectar source is selected according to Formula (6). f X i is the quality of the nectar source i. After the onlooker bee chooses the nectar source, it also searches for a new nectar source according to Formula (5):

Search Process of Scout Bees
There is an important parameter limit in the search process of the scout bees in order to prevent the algorithm from falling into a local optimum, which is responsible for controlling the number of iterations that the quality of the nectar source has not improved. If the nectar source X i reaches the limit threshold after trial i iterations and no better nectar source is found, the nectar source X i will be abandoned. The role of the corresponding employed bee will become a scout bee, and the scout bee will randomly generate a new nectar source in the search space to replace X i . If the limit threshold is not reached, the search process of the employed bee will continue, as in Formula (7): Using the ABC automatic pruning algorithm based on geometric median is not easy to fall into local extreme points because the role of scout bee searchers for new nectar sources around the hive and can converge to an optimal nectar source with maximum probability. An optimal deep-learning steganography removal model is obtained. The deep-learning steganography removal algorithm based on ABC pruning is as Algorithm 3.

Algorithm 3 ABC automatic pruning of deep-learning steganography removal model based on geometric median
1: Input: Prepare the pre-trained DnCNN and HGD models. 2: Initialization: Set t pruning rounds; initialization of nectar sources according to Formula (4); set n pruning layers; maximum channel pruning ϕ of each convolutional layer of the model; maximum of poor quality of nectar source is limit; the number of iteration search for poor quality of nectar source is trail. 3: Conditions: Set the DIQA threshold while ensuring the image quality, that is, SSIM > 0.9 for the original image and purified image, SSIM > 0.9 for the stego image and purified image, PSNR > 26 for the original image and purified image, PSNR > 26 for the stego image and purified image, params and flops as small as possible. Use the above conditions as the nectar source fitness value f X i . 4: for i = 1:t do for j = 1:D do The employed bee searches for a new nectar source X i around the nectar source X i through Formula (5), and calls algorithm 1 to obtain the combinations of each convolutional layer channels of the model, and calculates the fitness value f X i ; Calculate the probability of nectar being selected through Formula (6); Generate a random number The employed bee searches for a new nectar source X i around the nectar source X i through Formula (5), and calls algorithm 1 to obtain the combinations of each convolutional layer channels of the model, and calculates the fitness value f X i ; (7); end end 5: endend Output: The pruned models.

Analysis of Algorithm
Convolutional layers, except the last output layer, are pruned in each layer to satisfy the following conditions: DIQA as the threshold and image-quality assessment of SSIM > 0.9, PSNR > 26. Moreover, iterative search for the number of channels in each convolutional layer. It can ensure that each convolutional layer of the deep-learning steganography removal model can prune redundant filters to the maximum extent and the amount of params and flops of the models can be remarkably reduced while ensuring a certain degree of invisible secret message and the quality of the images. Given that iterative pruning is performed using a double-layered cyclic structure, the time complexity of the algorithm is O(m × n). The specific experimental results are presented in Section 5.
The ABC automatic pruning algorithm, using the combinations of pruning channels of each convolutional layer of the deep-learning steganography removal model as nectar sources and performing random target search by combining probability rules according Formula (6) without prior knowledge, is robust and adaptability. In addition, the employed bee and the onlooker bee form a positive feedback mechanism when looking for the optimal deep-learning steganography removal structure, which speeds up the convergence of the algorithm.

Experimental Preparation and Environment
This experiment prepares the trained deep-learning steganography called ISGAN. The cover image data set uses the ILSVRC2012 data set, which is scaled and randomly cut into 256 × 256 resolution, to select 50,000 pieces. Document images are regarded as secret images using 50,000 self-made black-and-white document images.
Prepare the trained document image evaluation model as DIQA. Details are presented in Section 3. Prepare the pre-trained deep-learning steganography removal models called DnCNN and HGD. Train both deep-learning steganography removal models using the same data set for comparability. Use the corresponding 50,000 stego images generated by ISGAN and the corresponding original images as the data set. The data set image size is 256 × 256 resolution. In addition, the same loss function and optimizer SGD are used for training. Details are presented in Section 3.
All experiments are completed using the PyTorch platform with development language of Python v.3.6, which is accelerated with 1 NVIDIA GTX 1080Ti graphics card (NVIDIA, Santa Clara, CA, USA).

Results of Pruning Experiments
DnCNN has 17 convolutional layers (convolutional layers are counted from left to right in Figure 1), and the last layer is the output layer without pruning. HGD has 26 convolutional layers (convolutional layers are counted from left to right in the Com layer and then from right to left in the Res layer, see Figure 2), and the last layer is not pruned.
Pruning experiments in this study are primarily conducted using the DnCNN and HGD models. We analyze the models via individual pruning, overall iterative pruning and ABC automatic pruning. Individual pruning gradually prunes 90% of each convolutional layer and analyses the sensitivity of each convolutional layer of the deep-learning steganography removal model. Overall, iterative pruning is the maximum pruning of each layer under the conditions of SSIM, PSNR and DIQA threshold. Ensuring the image quality and invisibility of the secret message to a certain extent is necessary after pruning. ABC automatic pruning uses DIQA threshold, SSIM, PSNR, params and flops as the fitness of the nectar source and uses the combinations of pruning channels of each convolutional layer of the deep-learning steganography removal model as the nectar source to search the structure of the DnCNN and HGD models automatically.

Sensitivity Analysis of the Individual Pruning
The sensitivity of pruning of each convolutional layer is different in the process of geometric median pruning of the 16 convolutional layers of the DnCNN model. The performance of PSNR, SSIM and DIQA begin to decrease to a certain extent when pruning the 50% filters. A few convolutional layers, such as conv16 shown in Figure 5. The similar performance of PSNR, SSIM and DIQA compared with the proposed and original models indicate that the conv16 layer filter has significant redundancy when pruning the 90% filters. Several convolutional layers are very sensitive in the process of geometric median pruning the 25 convolutional layers of the HGD model, as shown in the conv2 layer of Figure 6. The performance of PSNR, SSIM, and DIQA in the model begins to have a certain impact when pruning the 30% filters.  (Table 3). Geometric median pruning of the HGD model, shows that params of the model reduce from the initial 11,034 M to 0.721 M, flops reduce from 50,937 G to 5731 G, DIQA changes from 0.069 to 0.160 and SSIM and PSNR of stego and purified images are better than the original model. Hence, geometric median pruning has a significant effect on the HGD model of deep-learning steganography removal (Table 4).  (Table 3). Geometric median pruning of the HGD model, shows that params of the model reduce from the initial 11,034 M to 0.721 M, flops reduce from 50,937 G to 5731 G, DIQA changes from 0.069 to 0.160 and SSIM and PSNR of stego and purified images are better than the original model. Hence, geometric median pruning has a significant effect on the HGD model of deep-learning steganography removal (Table 4). DnCNN and HGD deep-learning steganography removal models have achieved significant results using geometric median pruning. Figures 7 and 8 illustrate that most of the purified images, cover images and stego images are basically indistinguishable to the naked eye after pruning. The color of a few purified images may change during the pruning process. However, in the process of secret message communication, the receiver may not have seen the stego image. Hence, the degree of image color change is still acceptable after the steganography removal. The removal effect of the document image's secret information after pruning based on the DIQA threshold is still within the range acceptable to the naked eye. Therefore, pruning of the deep-learning steganography removal model is reliable and effective to a certain extent.

Analysis of the Overall Iterative Pruning Threshold
Deep steganography removal in the DnCNN model, when DIQA < 0.2, DIQA < 0.4 and DIQA < 0.6, the filter pruning by each convolutional layer are the same, as shown in Table 5.
Deep steganography removal in the HGD model demonstrates significant differences in pruning at each DIQA threshold. As shown in Table 6, under different DIQA value conditions, the model compression rate, and the performance of the model are different. The problem of compression degradation will occur when the DIQA threshold is set excessively large. Therefore, HGD has the optimal compression effect when the DIQA threshold is set to 0.4. The renderings are illustrated in Figures 8 and 9.
Symmetry 2020, 12, x FOR PEER REVIEW 18 of 24 Figure 8. Renderings after the geometric median pruning in the HGD model when DIQA is equal to 0.2. First column of figure represents the cover image, the second column represents the stego image, the third column represents the purified image, the fourth column represents the document image secret message, and the fifth column represents the document image after removing.

Analysis of the Overall Iterative Pruning Threshold
Deep steganography removal in the DnCNN model, when DIQA < 0.2, DIQA < 0.4 and DIQA < 0.6, the filter pruning by each convolutional layer are the same, as shown in Table 5.
Deep steganography removal in the HGD model demonstrates significant differences in pruning at each DIQA threshold. As shown in Table 6, under different DIQA value conditions, the model compression rate, and the performance of the model are different. The problem of compression degradation will occur when the DIQA threshold is set excessively large. Therefore, HGD has the optimal compression effect when the DIQA threshold is set to 0.4. The renderings are illustrated in Figures 8 and 9.    On the basis of the geometric median, the ABC automatic pruning algorithm combines the convolutional layers that the last layer of the deep-learning steganography removal model does not participate in the arrangement under the condition of DIQA and the maximum channel pruning ϕ of each convolutional layer of the model. ϕ = 9 represents the maximum channel pruning of each convolutional layer of the model is 90% and ϕ = 6 represents the maximum channel pruning of each convolutional layer of the model is 60%. Each arrangement is a compressed structure of the model. According to the process of nectar source collection and role transition of three types of bees, the optimal pruning structure that meets the conditions is finally found under t = 150 rounds, as shown in Tables 7 and 8. Compared with the overall iterative pruning, the ABC pruning has realized an automated process.  DIQA < 0.4 9 [6,2,9,3,9,4,7,4,9,5,9,7,2,2,9,7,3,8,3,3,5,8,4,5,4] 6 [4, 2, 6, 6, 5, 3, 5, 6, 6, 6, 6, 6, 5, 6, 6, 6, 3, 6, 6, 6, 6, 1, 5, 4, 4] DIQA < 0.6 9 [3,3,7,7,6,5,9,9,9,4,3,9,4,7,9,8 Tables 9 and 10 are the conditions of the best searched nectar source quality under t = 150 rounds. From table, it can be seen that the ABC algorithm on the basis of geometric median pruning can efficiently search for nectar source under the limited nectar source fitness, and finally find a lightweight deep-learning steganography removal model with robustness and adaptability. The amount of model params and flops can be greatly compressed compared with the original model under the DIQA threshold and the maximum channel pruning ϕ of the convolutional layer of the model, and the image quality and document image removal are guaranteed to be within the acceptable range. The original DnCNN model has 1030 channels, 0.558 M params and 36,591 G flops. The overall iterative pruning based on geometric median is a violent pruning process, which can prune the channels to the maximum extent. Under the conditions of DIQA threshold and maximum channel pruning ϕ of the convolutional layer, about 65% of the channels are pruned, the classic indicators params and flops are reduced by more than 85%. ABC pruning based on geometric median is an automatic pruning process, which can automatically search for the required channels. Under the conditions of DIQA threshold and maximum channel pruning ϕ of the convolutional layer, about 50% of the channels are pruned, the params and flops are reduced about 75% (Table 11). The original HGD model has 4870 channels, 11,034 M params and 50,937 G flops. The overall iterative pruning based on geometric median results in about 75% channel pruning, about 93% params reducing and 88% flops reducing under the conditions of DIQA threshold and maximum channel pruning ϕ of the convolutional layer. ABC pruning based on geometric median results in more than 44% channel pruning, more than 69% params reducing and more than 68% flops reducing under the conditions of DIQA threshold and maximum channel pruning ϕ of the convolutional layer (Table 12).

Conclusions
A considerable amount of resources are consumed when pre-trained DnCNN and HGD deep-learning steganography removal models are used to deploy network steganography firewalls. A geometric median pruning method that can be used for multiparameter, large-scale and pre-trained deep-learning steganography removal models is proposed in this study to search for effective network structures, prune redundant filters, and thus compress the deep-learning steganography removal model.
The pruning method significantly reduces params and flops of the model while ensuring the robustness of the deep-learning steganography removal model. The quality of purified image is still within an acceptable range, and the document image as secret message achieves a certain removal effect. However, we only explore the lightweight deep-learning steganography removal model from the perspective of pruning, so our future work will focus on the following aspects: (1) Explore a fast adaptive structure adjustment algorithm (2) Explore the feasibility of deep-learning steganography removal based on knowledge distillation [47] and quantification (3) We plan to use dimensionality reduction technology [48] to document images to improve model performance.

Conflicts of Interest:
The authors declare no conflict of interest.