Dual-Domain Fusion Convolutional Neural Network for Contrast Enhancement Forensics

Contrast enhancement forensics techniques have always been of great interest for the image forensics community, as they can be an effective tool for recovering image history and identifying tampered images. Although several contrast enhancement forensic algorithms have been proposed, their accuracy and robustness against some kinds of processing are still unsatisfactory. In order to attenuate such deficiency, in this paper, we propose a new framework based on dual-domain fusion convolutional neural network to fuse the features of pixel and histogram domains for contrast enhancement forensics. Specifically, we first present a pixel-domain convolutional neural network to automatically capture the patterns of contrast-enhanced images in the pixel domain. Then, we present a histogram-domain convolutional neural network to extract the features in the histogram domain. The feature representations of pixel and histogram domains are fused and fed into two fully connected layers for the classification of contrast-enhanced images. Experimental results show that the proposed method achieves better performance and is robust against pre-JPEG compression and antiforensics attacks, obtaining over 99% detection accuracy for JPEG-compressed images with different QFs and antiforensics attack. In addition, a strategy for performance improvements of CNN-based forensics is explored, which could provide guidance for the design of CNN-based forensics tools.


Introduction
Being a simple yet efficient image processing operation, CE is typically used by malicious image attackers to eliminate inconsistent brightness when generating visually imperceptible tampered images.CE detection algorithms play an important role in decision analysis for authenticity and integrity of digital images.Although some schemes have been proposed to detect contrast-enhanced images, the performance of such techniques is limited in the cases of pre-JPEG compression and anti-forensic attacks.Therefore, it is critical to develop robust and effective CE forensics algorithms.
Thanks to the efforts of researches in the past decade, a number of schemes [1,2,3,4,5,6,7,8,9] has been proposed to discriminate contrast-enhanced images in uncompressed format.Stamm et al. [1,2,3] found that contrast enhancement would introduce peaks and gaps into the image's gray level histogram, which led to specific high values in high-frequency components.Lin et al. [6,7] revealed that contrast enhancement would disturb the inter-channel correlation left by color image interpolation and they measured such correlation to distinguish the enhanced images from the original images.Furthermore, in order to recover the image processing history, many algorithms for estimating parameters for contrast-enhanced images have been developed [10,11,12,13].
Despite the good performance obtained by the abovementioned algorithms, their robustness can be unsatisfactory in some cases, such as the CE of JPEG images (pre-JPEG compression) and the occurrence of anti-forensic attacks [14,15,16,17,18,19].The reason lies in that the fingerprint left by CE operation would be altered.Based on such a phenomenon, some researchers proposed more robust CE forensic algorithms, which can be divided into two major branches: overcoming pre-JPEG compression [4] and defending against anti-forensic attacks [9].Unfortunately, neither one of these methods is capable of addressing both pre-JPEG compression and anti-forensic attacks.To date there are no satisfactory solutions for these problems.
With the rapid development of deep learning technique, and especially convolutional neural networks (CNNs), some researchers have recently attempted to use them for digital image forensics.A number of preliminary works exploring CNNs in a single-domain (such as the pixel domain [20], the histogram domain [21], and the gray-level co-occurrence matrix (GLCM) [22,23]) has been proposed for CE forensics.According to the report [22], deep learning-based CE forensic schemes have achieved better performance than traditional ones.The schemes mentioned above try to deal with CE forensics task by feeding singledomain information to CNNs.However, each domain has its own advantages and disadvantages.For example, according to our experiments, the CNN working in the pixel domain is robust to post-processing but hard to get satisfactory performance.In addition, it is well known that histogram domain is effective for CE forensics task but fails to resist to CE attacks.Such situations give us strong incentive to explore fusion algorithm across multiple domains based on deep learning technique against pre-JPEG compression and anti-forensic attacks.
In this paper, we propose a novel framework based on dual-domain fusion convolutional neural network for CE forensics.Specifically, pixel-domain CNN (P-CNN) is designed for the pattern extraction of contrast-enhanced image in pixel domain.For P-CNN, high-pass filter is used to reduce the affect of image contents and keep the data distribution balance cooperating with batch normalization [24].In addition, the histogram-domain CNN (H-CNN) is constructed by feeding an histogram with 256 dimensions into convolutional neural network.
The features obtained from P-CNN and H-CNN are fused together and fed into a classifier with two fully connected layers.Experimental results show that our proposed method outperforms state-of-the-art schemes in the case of uncompressed images and obtains comparable performance in the cases of pre-JPEG compression, anti-forensics attack, and CE level variation.
The main contributions of this paper are: 1) we present a dual-domain fusion framework for CE forensics; 2) we propose and evaluate two kinds of simple yet effective convolutional neural networks based on pixel and histogram domains; 3) we explore the design principle of CNN for CE forensics, specifically, adding the preprocessing, improving complexity of architecture, and selecting training strategy that includes fine-tune technique and data augmentation.
The rest of this paper is organized as follows.Section 2 describes related works in the field of CE forensics.In Section 3, we formulate the problem and in Section 4 we present the proposed dual-domain fusion CNN framework.In Section 5, experimental results are reported.Conclusion is given in Section 6.

Related Works
CE forensics, as a popular topic in image forensics community, has been study for a long time.Early research works attempt to extract features from the histogram domain.Stamm et al. [1,2,3] observed that the histogram of contrast-enhanced images presents peak/gaps artifacts, in contrast, that of un-enhanced image does not occur the peak/gaps, as shown in Fig 1 .Based on such observation, they proposed the histogram-based scheme that the high frequency energy metric is calculated and decided by threshold strategy.However, the above method failed to detect CE image in previously middle/lower quality JPEG compressed images in which the peak/gaps artifacts also exits [4].Cao et al. [4] studied this issue and found that there exists notable difference between the peak/gap artifacts from contrast enhancement and those from JPEG compression, which is that the gap bins with zero height always appear in contrast-enhanced images.But the above phenomenon does not occur in the case of anti-forensics attack.As can be seen in Fig 1, the histogram of enhanced image with anti-forensics attack conforms to a smooth envelope, which is similar with the un-enhanced image.
Instead of exploring the features in histogram domain, De Rosa et al. [9] studied the possibility of using second order statistics to detect contrast-enhanced images even in the case of anti-forensics attack.Specifically, the co-occurrence matrix of a gray-level image was explored.According to the report [9], several empty rows and columns appears in the GLCM of contrast-enhanced images, as shown in Fig 2, even after the application of anti-forensics attack [14].Based on this observation, the authors tried to extract such feature from the standard deviation of each column of the GLCM.However, its performance still not satisfactory, especially for the other powerful anti-forensics attack [12].These algorithms described are based on handcrafted low-level features which is not easy to deal with the above problems simultaneously.With the development of data-driven technique, some researchers have started to study the deep feature represents for CE forensics via data-driven approach recently and existing methods [22,20,21,23] focus on exploring in single-domain.Barni et al. [20] present a CNN containing a total of 9 convolutional layers in pixel domain which is similar with the typical CNNs used in the field of computer vision.Cong et al. [21] explore the information in histogram domain and apply the histogram with 256 dimensions into VGG-based multi-path network.Sun et al. [22] propose to calculate the gray-level co-occurrence matrix (GLCM) and feed it to a CNN with 3 convolutional layers.Although these approaches based on deep features in single-domain have obtained performance gain for CE forensics, they ignore multi-domain information which could be useful in the case that some features in single-domain are destroyed.
To overcome these limitation of exiting works, we propose a new deep learning-based framework to extract and fuse feature representation in pixel and histogram domains for CE forensics.

Problem Formulation
As a common way of contrast enhancement, gamma correction can be found in many image-editing tools.In addition, according to the report [20], enhancedimages with gamma correction is harder to be detected than the enhance-images via the other way.Therefore, in this paper, we mainly focus on the detection of gamma correlation, which is typically defined as, where X denotes an input and Y represents the re-mapped value, T = (X/255) [0, 1].
The problem addressed in this paper is how to classify the given image as contrast enhanced or non-enhanced image.Particularly, the robustness of proposed method against pre-JPEG compression and anti-forensics attacks is evaluated.

Proposed Method
In this section, we first make an overview of the proposed framework dualdomain fusion convolutional neural network, and then introduce the major components in detail.

Framework Overview
The proposed dual-domain fusion convolutional neural network is shown in and H-CNN, respectively, and then fuses them before feeding into the classifier with two fully-connected layers.Our end-to-end system would predict whether the image is a contrast enhanced or non-enhanced image.

Pixel-Domain Convolutional Neural Network
Convolutional neural networks (CNNs) in pixel domain have been applied in image forensics and developed for specific forensic tasks recently.The common Firstly, the high-pass filter is added into the front-end of architecture to eliminate the interfere of image content.Another advantage of using high-pass filter could be that it accelerates training by cooperating with batch normalization.Because that the histogram of high-pass filtered images approximately follows the generalized Gaussian distribution, which is similar to batch normalization [24].In particular, we experimentally find that the filter of the first-order difference along horizontal direction has better performance.Next, high-pass filtering layer is followed by four traditional convolutional layers.For each layer, there are four types of operations: convolution, batch normalization, ReLU and average pooling.The feature maps for each layer are 64, 16, 32, 128, respectively.The kernel size for convolutional and pooling operation is 3x3 with 1 stride, 5x5 with 2 strides.It should be pointed out that: 1) we experimentally find that the numbers of feature map for first convolutional layer is important for CE detection and it has better performance when the feature maps is 64.In other words, low-level feature would be more helpful; 2) instead of average pooling, the spatial pyramid pooling (SPP) layer [26] is used in last convolutional layer to fuse multi-scale features.The convolutional layer is calculated as where F, R, P, S represent the batch normalization, ReLU, average pooling, and spatial pyramid pooling, respectively.For spatial pyramid pooling, three scales are chosen and lead to 2688 dimensional output.
In the end, the fully connected layer and softmax is followed by a multinomial logistic loss.The loss function is defined as, where n is the number of classes and j denotes the true label.In our experimental setup, Mini-batch Stochastic Gradient Descent is applied and the batch size is set as 120.The learning rate is initialized as 0.001, and scheduled to decrease 10% for every 10000 iterations.The max iterations is 100000.The momentum and weight decay are fixed to 0.9 and 0.0005, respectively.vector with 1x256 dimensions.Then, such an input layer is followed by two convolutional and two fully connected layers.The feature maps are 64, 64, 512, 1024, respectively.Lastly, the softmax layer followed by a multinomial logistic loss is added to classify original and enhanced images.The parameters of convolutional layers and hyper-parameters are the same as the P-CNN.

Dual-domain Fusion Convolutional Neural Network
According to the description in Section 1,2, the performance of CE system designed in single-domain is still not satisfactory.Fortunately, fusion strategies [27] provide a good solution to obtain higher performance and have been adopted in the community of digital image forensics [28,29].In this work, we assume Besides, in order to ensure that the outputs of the P-CNN and H-CNN have the same dimension, one scale of spatial pyramid pooling in P-CNN is chosen and the number of feature map in the second convolutional layer of H-CNN is set to 128.The features output from of P-CNN and H-CNN are concatenated together and then fed into classification unit, which consists of two fully connected layers and one softmax layer followed by multinomial logistic loss.It is worth noting that due to the limitation of our hardware configuration, only dual-domains are fused in our system and it would be useful to ensemble features from the other domains.

Experimental Results
In order to verify the validity of proposed methods, we compared them with four other methods.De Rosa [9], Cao [4] and Sun [22]   For anti-forensic attacks, Cao's method does not work and there is a degradation in performance of H-CNN, especially, when anti-forensic method [12] is applied.Because that the anti-forensic attacks would conceal the peak/gap feature in histogram domain.In addition, the anti-forensics attacks based on histogram maybe have a slight effect on pixel domain.Therefore, the P-CNN has better performance than H-CNN in this case.When the fusion framework is used to merge pixel and histogram domains together, DM-CNN obtained the best detection accuracy.While the pre-compression and anti-forensic attack are put into together, as shown in Table 4, the proposed CNN gains comparable performance with Li and Sun' scheme.
In conclusion, De Rosa's method is not robust for pre-JPEG compression and anti-forensics attack and Cao's method is vulnerable for anti-forenisic attack.
Furthermore, such prior algorithms are unstable in different gamma levels.Although Li's method based on high dimensional features is better than previous works in the case of pre-JPEG compression and anti-forensic attack, its performance is unsatisfactory when no other operation is used.The deep learningbased method proposed by Sun obtained slight lower detection accuracy than the proposed DM-CNN, but it has a much higher computational cost during the feature extraction of the GLCM in preprocessing.Comparing with the above schemes, the proposed DM-CNN achieves good robustness against pre-JPEG compression, anti-forensic attacks, and CE level variation and obtains the best average detection accuracy in all cases studied.architecture in the community of image forensics.In order to fill such gap, we make a preliminary exploration in this work.Specifically, there are three parts: adding the preprocessing, improving complexity of architecture, and selecting training strategy, which includes fine-tune technique and data augmentation.

Conclusion
The  In sprite of good performance of exiting schemes, it is still a hard task to detect CE images in the case of post-JPEG compression with lower quality factors.
The new algorithm should be designed to deal with this problem.In addition, the security of CNNs has drawn a lot of attention.Therefore, improving the security of CNNs is worth studying in the future.

Figure 1 :
Figure 1: Histogram of uncompressed image, contrast enhanced image with γ = 0.6, contrast enhanced image in the case of anti-forensic attack, JPEG image that quality factor is equal to 70, respectively.

Figure 2 :
Figure 2: GLCM of uncompressed image, contrast enhanced image with γ = 0.6, contrast enhanced image in the case of anti-forensic attack, JPEG image that quality factor is equal to 70, respectively.

Fig 3 ,
Fig 3, which extracts the features from pixel and histogram domains by P-CNN

Figure 3 :
Figure 3: The proposed dual-domain fusion convolutional neural network.

Figure 4 :
Figure 4: The architecture of proposed pixel-domain convolutional neural networks.

Figure 5 :
Figure 5: The architecture of proposed histogram-domain convolutional neural networks.
that the features extracted from P-CNN and H-CNN are complementary for CE forensics, thus we propose a simple yet effective feature fusion framework for deep learning-based CE forensics to integrate multiple domains and construct the dual-domains fusion CNN (DM-CNN), as shown in Fig 3. Firstly, high-pass filtered images and the histogram are extracted from input images.Then the filtered images are fed into P-CNN with four 2D-convolutional layers and the histogram is fed into H-CNN with two 1D-convolutional layers.Note that for the purpose of fusion, P-CNN and H-CNN are slightly modified.The P-CNN of DM-CNN is composed of the convolutional layers extracted from the P-CNN.

5. 3
.1.Preprocessing Through protracted and unremitting efforts of researchers, the deep learning technique developed for computer vision (CV) tasks has been succeeded in image forensics.Differing from CV related tasks, classification on image forensic has little relation to the image content.Therefore, preprocessing techniqueevolved into a universal way to improve the signal-to-noise ratio (SNR).Highpass filtering has become one of most popular means in preprocessing stage.In this part, using P-CNN in the case of γ = 0.6 as an example, we evaluate six kinds of high-pass filters, H1, V1, H2, V2, LAP, HP, respectively, that widely applied into image forensics and compare them with the case without prepro-cessing.The definition of these filter are shown in Table.5 and performance of the above cases is presented in Fig 7.N ON means the case without preprocessing.It can be seen that it is not good for CE forensic when non-preprocessing is used.In addition, first-order difference along horizontal direction has better performance.At the same time, the HP and LAP filter proposed for the other forensic task obtained worse performance, which indicates that it is necessary for image forensics to design different high-pass filters.

. 3 . 2 .
Powerful Convolutional Neural Networks Thanks to the development of deep learning technique in CV, more powerful CNNs (ResNet, XceptionNet, SENet) spring up at an increasing rate in recent years.However, because of the limitations in the forensics community, such as insufficient training dataset and hardware configuration, it would be difficult to evaluate all of them.In order to verify the effectiveness of powerful CNN in CE forensics, based on P-CNN, we replace its traditional convolutional layers with residual blocks that proposed in ResNet18.The result is shown in Fig 7. Comparing with the case of H1, detection accuracy of the Res H1 increases by 0.65%.The above discussion, we make a conclusion that for CE forensics, powerful CNNs would enhance performance and preprocessing plays a more important role.

Figure 6 :
Figure 6: Performance on P-CNN with/without preprocessing and with powerful network.NON means the case of P-CNN without preprocessing.The others represent the P-CNN with LAP, V2, H2, V1, H1 filter in the preprocessing, respectively.Res H1 denotes the P-CNN with H1 filter and residual blocks.

5. 3 . 3 .
Training StrategyIt is well known that the scale of data has an important effect on performance for deep-learning based method and transfer learning technique[32] also provide an effective strategy to train the CNN model.In this part, we conducted experiments to evaluate the effect of the scale of data and transfer learning strategy, respectively, on performance of CNN.For the former, the images from BOSS-Base are firstly cropped into 128x128 pixel patches with non-overlapping.Then these images are enhanced with γ = 0.6.We randomly chose 80000 image pairs as test data and 5000, 20000, 40000, 80000 image pairs as training datas.Four groups of H-CNN, P-CNN are generated using above four training datas and the test data is same for these experiments.The result is as shown in Figure8.It can be seen that the scale of training data has a slight effect on H-CNN with small parameters and the opposite happens for P-CNN.Therefore, larger scale of training data is beneficial to the performance of P-CNN with more parameters and the performance of P-CNN would be improved by enlarging training data.For the latter, we compare the performance of P-CNN with/without transfer learning in the cases of γ = {0.8,1.2, 1.4}.The P-CNN with transfer learning by finetuning the model for γ = {0.8,1.2, 1.4} from the model for γ = 0.6.As shown in Fig 9, P-CNN-FT achieves better performance than P-CNN.

Figure 7 :
Figure 7: Effect of the scale of training data.
existing schemes for contrast enhancement forensics have an unsatisfactory performance, especially, in the cases of pre-JPEG compression and antiforensic attacks.To deal with such problems, in this paper, a new deep learningbased framework dual-domain fusion convolutional neural networks (DM-CNN) is proposed.Such method achieve end-to-end classification based on pixel and histogram domains, which obtain great performance.Experimental results show that our proposed DM-CNN achieve better performance than the state-of-theart ones and proposed method is robust against pre-JPEG compression, antiforensic attack, and CE level variation.Beside, we explored on the strategy to improve performance of CNN-based CE forensics, which could provide guidance for the design of CNN-based forensics.

Figure 8 :
Figure 8: Performance of the P-CNN and the P-CNN with fine-tune (P-CNN-FT).

Table 1 :
{0.6, 0.8} is much higher than one for γ = {1.2,1.4}.The reason is that gap feature is unstable among CE parameters, which is consistent with our analysis in Section III.In addition, H-CNN has better performance than the above four schemes.Such results demonstrated that the histogram domain feature should be effective for CE detection.Besides, proposed fusion framework, DM-CNN, obtains best average detection accuracy.It should mentioned that although the deep learning-based method proposed by Sun obtained slightly lower detection accuracy than DM-CNN, it has a much higher computational cost during the feature extraction of the GLCM in preprocessing.CE detection accuracy for contrast-enhanced images in the case that ORG vs P-CE.AVE is the average accuracy.Best results are marked in bold.
[12,14] P-CE and JPEG-CE denote enhanced versions of ORG and JPEG-ORG, respectively, and Anti-CE and JPEG-CE-Anti-CE represent enhanced images with anti-forensics attack for P-CE, JPEG-CE, respectively.The BOSSBase[30]with 10000 images is chosen to construct the dataset.Firstly, the images are centrally cropped into 128x128 pixel patches as ORG.Then, JPEG compression with Q = 70, 50 is carried out for ORG to build JPEG-ORG.Next, gamma correction with γ = {0.6,0.8,1.2, 1.4} is implemented on ORG, JPEG-ORG to constitute P-CE and JPEG-CE.In the end, Anti-CE is produced by antiforensics attacks[12,14]on P-CE and JPEG-CE.The reasons for our choice of pixel patch size are that 1) the detection for the images with lower resolution is much harder than higher resolution image; 2) 128x128 is a suitable size for tamper locating based on CE forensics; 3) our hardware configuration is limited.domainfusionCNN.As seen from the Table1, for Cao's method, the detection accuracy for γ =

Table 2 :
CE detection accuracy for pre-JPEG compressed images with different QFs.AVE is the average accuracy.Best results are marked in bold.

Table 3 :
CE detection accuracy in the case of anti-forensics attacks.− denotes that the method does not work in this case.AVE is the average accuracy.Best results are marked in bold.

Table 4 :
[12]etection accuracy for JPEG compressed images with different QFs and antiforensics attack[12].− denotes that the method does not work in this case.AVE is the average accuracy.Best results are marked in bold.

Table 5 :
The filters evaluated in this work.