An Image Hashing Algorithm for Authentication with Multi-Attack Reference Generation and Adaptive Thresholding

: Image hashing-based authentication methods have been widely studied with continuous advancements owing to the speed and memory efﬁciency. However, reference hash generation and threshold setting, which are used for similarity measures between original images and corresponding distorted version, are important but less considered by most of existing models. In this paper, we propose an image hashing method based on multi-attack reference generation and adaptive thresholding for image authentication. We propose to build the prior information set based on the help of multiple virtual prior attacks, and present a multi-attack reference generation method based on hashing clusters. The perceptual hashing algorithm was applied to the reference/queried image to obtain the hashing codes for authentication. Furthermore, we introduce the concept of adaptive thresholding to account for variations in hashing distance. Extensive experiments on benchmark datasets have validated the effectiveness of our proposed method.


Introduction
With the aid of sophisticated photoediting software, multimedia content authentication is becoming increasingly prominent. Images edited by Photoshop may mislead people and cause social crises of confidence. In recent years, image manipulation has received a lot of criticism for its use in altering the appearance of image content to the point of making it unrealistic. Hence, tampering detection, a scheme that identifies the integrity and authenticity of the digital multimedia data, has emerged as an important research topic. Perceptual image hashing [1][2][3][4] supports image content authentication by representing the semantic content in a compact signature, which should be sensitive to content altering modifications but robust against content preserving manipulations such as blur, noise and illumination correction [5][6][7].
A perceptual image hashing system generally consists of three pipeline stages: the pre-processing stage, the hashing generation stage and the decision making stage. The major purpose of pre-processing is to enhance the robustness of features by preventing the effects of some distortions. After that, the reference hashes are generated and transmitted through a secure channel. For the test image, the same perceptual hash process will apply to the queried image to be authenticated. After the image hashing is generated, the task of image authentication can be validated by the decision making stage. The reference hash will be compared with image hashes in the test database for content authentication based on the selected distance metric, such as Hamming distance. Currently the majority of perceptual hashing algorithms for authentication application can roughly be divided into the five categories: invariant feature transform-based methods [8][9][10][11][12][13], local feature points-based schemes [14][15][16][17][18][19][20][21][22][23], dimension reduction-based hashing [24][25][26][27][28][29], statistical features-based hashing [30][31][32][33][34][35] and leaning-based hashing [36][37][38][39].
For the decision making stage of perceptual hashing-based image authentication framework, only a few studies have been devoted to the reference generation and threshold selection. For reference hashing generation, Lv et al. [36] proposed obtaining an optimal estimate of the hash centroid using kernel density estimation (KDE). In this method, the centroid was obtained as the value which yields the maximum estimated distribution. Its major drawbacks are that the binary codes are obtained by using a data independent method. Since the hashing generation is independent of the data distribution, data independent hashing methods may not consider the characteristics of data distribution in hashing generation. Currently, more researchers are beginning to focus on the data dependent methods with learning for image tamper detection. Data dependent methods with learning [40][41][42][43] can be trained to optimally fit data distributions and specific objective functions, which produce better hashing codes to preserve the local similarity. In our previous work [44], we proposed a reference hashing method based on clustering. This algorithm makes the observation that the hashes of the original image actually not be the centroid of its cluster set. Therefore, how to learn the reference hashing code for solving multimedia security problems is an important topic for current research. As for authentication decision making, the simple way is to use threshold-based classifiers. Actually, perceptual differences under the image manipulations are often encountered when information is provided by different textural images. Traditional authentication tasks aim to identify the tampered results from distance values among different image codes. In this kind of task, the threshold is regarded as a fixed value. However, in a number of real-world cases, the objective truth cannot be identified by one fixed threshold for any image.
In this paper, we extend our previous work [44] and propose an image hashing algorithm framework for authentication with multi-attack reference generation and adaptive thresholding. According to the requirement of authentication application, we propose to build the prior information set based on the help of multiple virtual prior attacks, which is produced by applying virtual prior distortions and attacks on the original images. Differently from the traditional image authentication task, we address this uncertainty and introduce the concept of adaptive thresholding to account for variations in hashing distance. The main difference here is that a different threshold value is computed for each image. This technique provides more robustness to changes in image manipulations. We propose a data dependent semi-supervised image authentication scheme by using an attack-specific, adaptive threshold to generate a hashing code. This threshold tag is embedded in the hashing code transmission which can be reliably extracted at the receiver. The framework of our algorithm is shown in Figure 1. We firstly introduce the proposed multi-attack reference hashing algorithm. Then, we describe how original reference images were generated for experiments. After that, the perceptual hashing process was applied to the reference/queried image to be authenticated, so as to obtain the hashing codes. Finally, the reference hashes were compared with queried image hashes in the test database for content authentication.

Problem Statement and Contributions
Authentication is an important issue of multimedia data protection; it makes possible to trace the author of the multimedia data and to allow the determination of whether an original multimedia data content was altered in any way from the time of its recording. The hash value is a compact abstract of the content. We can re-generate a hash value from the received content, and compare it with the original hash value. If they match, the content is considered as authentic. In the proposed algorithm, we aim to compute the common hashing function h k (.) for image authentication work. Let D(., .) indicate a decision making function for comparing two hash values. For given thresholds τ, the perceptual image hashing for tamper detection should satisfy the following criteria. If two images x and y are perceptually similar, their corresponding hashes need to be highly correlated, i.e., The main contributions can be summarized as follows: (1) We propose building the prior information set based on the help of multiple virtual prior attacks, which we did by applying virtual prior distortions and attacks to the original images. On the basis of said prior image set we aimed to infer the clustering centroids for reference hashing generation, which is used for a similarity measure. (2) We effectively exploited the semi-supervised information into the perceptual image hashing learning. Instead of determining metric distance on training results, we explored the hashing distance for thresholding by considering the effect on different images. (3) In order to account for variations in exacted features of different images, we took into account the pairwise variations among different originally-received image pairs. Those adaptive thresholding improvements maximally discriminate the malicious tampering from content-preserving operations, leading to an excellent tamper detection rate.

Multi-Attack Reference Hashing
Currently, most image hashing method take the original image as the reference. However, the image hashes arising from the original image may not be the hash centroid of the distorted copies. As shown in Figure 2a, we applied 15 classes of attacks on five original images and represent their hashes in 2-dimensional space for both the original images and their distorted copies. From Figure 2a, we can observe five clusters in the hashing space. From Figure 2b by zooming into one hash cluster, we note an observation that the hashes of the original image actually may not be the centroid of its cluster.
For l original images in the dataset, we apply V type content preserving attacks with different types of parameter settings to generate simulated distorted copies. Let us denote the feature matrix of attacked instances in set Ψ v as X v ∈ R m×t . Here, v = 1, 2, ..., V, m is the dimensionality of data feature and t is the number of instances for attack v. Finally, we get the feature matrices for the total n instance as X = {X 1 , ..., X V }, and here n = tV. Note that the feature matrices are normalized to zero-centered.
(a) multiple hash clusters (b) a single hash cluster By considering the total reconstruction errors of all the training objects, we have the following minimization problem in a matrix form, which jointly exploits the information from various content preserving multi-attack data.
whereX is the shared latent multi-attack feature representation. The matrixŨ can be viewed as the basis matrix, which maps the input multi-attack features onto the corresponding latent features. Parameter α, β is a nonnegative weighting vector to balance the significance. From the information-theoretic point of view, the variance over all data is measured, and taken as a regularization term: where γ is a nonnegative constant parameter. The image reference for authentication is actually an infinite clustering problem. The reference is usually generated based on the cluster centroid image. Therefore, we also consider keeping the cluster structures. We formulate this objective function as: where C ∈ R k×l and G ∈ {0, 1} l×n are the clustering centroid and indicator. Finally, the formulation can be written as: Our objective function simultaneously learns the feature representationsX and finds the mapping matrixŨ, the cluster centroid C and indicator G. The iterative optimization algorithm is as follows.
Fixing all variables but optimizeŨ: The optimization problem (Equation (4)) reduces to: By setting the derivation ∂J(Ũ) ∂Ũ =0, we have: Fix all variables but optimizeX: Similarly, we solve the following optimization problem: which has a closed-form optimal solution:X = αŨX + λCG. (8) Fix all variables but C and G: For the cluster centroid C and indicator G, we obtain the following problem: min Inspired by the optimization algorithm ADPLM (adaptive discrete proximal linear method) [45], we initialize C =XG T and update C as follows: where where H(b i , c s ) is the distance between the i-th feature codes x i and the s-th cluster centroid c s . After we infer the cluster centroid C and the multi-attack feature representationsX, the corresponding l reference images are generated. The basic idea is to compare the hashing distances among the nearest content, preserving the attacked neighbors of each original image and corresponding cluster centroid.

Semi-Supervised Hashing Code Learning
For the reference and received images, we use the semi-supervised learning algorithm for hashing code generation and image authentication. Firstly, all the input image is converted to a normalized size 256 × 256 by using the bi-linear interpolation. The resizing operation makes our hashing robust against image rescaling. Then, the Gaussian low-pass filter is used to blur the resized image, which can reduce the influences of high-frequent components on the image, such as noise contamination or filtering. Let F(i, j) be the element in the i-th row and the j-th column of the convolution mask. It is calculated by in which F (1) (x, y) is defined as where σ is the standard deviation of all elements in the convolution mask. Next, the RGB color image is converted into the CIE LAB space and the image is represented by the L component. The reason is that the L component closely matches human perception of lightness. The RGB color image is firstly converted into the XYZ color space by the following formula: where R, G, and B are the red, green and blue components of the color pixel. We convert it into the CIE LAB space by the following equation: where X w = 0.950456, Y w = 1.0 and Z w = 1.088754 are the CIE XYZ tristimulus values of the reference white point, and f (t) is determined by: Figure 3 illustrates an example of the preprocessing.
The features of labeled images are represented as X ∈ R M×L . Note that these feature matrices are normalized to zero-centered. The goal of our algorithm is to learn hash functions that map X ∈ R M×N to a compact representation H ∈ R K×N in a low-dimensional Hamming space, where K is the digits length. Our hash function is defined as: The hash function of a single image is defined as: In order to learn a W that is simultaneously maximizing the empirical accuracy on the labeled image and the variance of hash bits over all images, the empirical accuracy on the labeled image is defined as: where matrix E is the classification of marked image pairs, as follows: Specifically, a pair (x i , x j ) ∈ S is denoted as a perceptually similar pair when the two images are the same images or the attacked images of a same image, and a pair (x i , x j ) ∈ D is denoted as a perceptually different pair when the two images are different images or when one suffered from malicious manipulations or perceptually significant attacks.
Equation (19) can also be represented as: This relaxation is quite intuitive. That is, the similar images are desired to not only have the same sign but also large projection magnitudes, while the projections for dissimilar images not only have different signs but also are as different as possible.
Moreover, to maximize the amount of information per hash bit, we want to calculate the maximum variance of all hash bits of all images and use it as a regularization term of the hash function.
Due to the indifferentiability of the above function, it is difficult to calculate its extreme value. However, the maximum variance of the hash function is the lower bound of the scale variance of the projected data, so the information theoretic regularization is represented as: Finally, the overall semi-supervised objective function combines the relaxed empirical fitness term from Equation (21) and the regularization term from Equation (23).
where η = 0.25 is a tradeoff parameter. The optimization problem is as follows: where the constraint WW T = I makes the projection directions orthogonal. We learn the optimal projection W that is obtained by means of eigenvalue decomposition of matrix M.

Adaptive Thresholds-Based Decision Making
To measure the similarity between hashes of original and attacked/tampered images, the metric distance between two hashing code is calculated by: where h 1 and h 2 are two image hashes. In general, the more similar the images, the smaller the distance. The greater the difference, the greater the distance. Then, the threshold T is defined to judge whether the image is a similar image or a tampered image.
If the distance is less than a given threshold, the two images are judged as visually identical images. Otherwise, they are judged as distinct images.
Traditional image tamper detection algorithms take a fixed value as the threshold to judge similar images/tampered images. However, due to the different characteristics among images, some images cannot be correctly judged by the fixed threshold value. In our adaptive thresholds algorithm, we firstly find the maximum value for the distance value of the similar images and the minimum value for the distance value of the tampered images. In order to prevent the two values from being too extreme, we set the following limits: where dist1 is the distance between the similar image and the original image; dist2 is the distance between the tampered image and the original image. ψ and ξ are two constants set experimentally. Then, the resulting maximum and minimum values are compared with fixed thresholds: where τ is a fixed threshold obtained experimentally, τ is the adaptive threshold suitable for this image. Then, all images have their own thresholds, which are represented as: Finally, we put the adaptive threshold at the top of the hash code and transfer it along with the hash code. Thus, the final hash code is represented as:

Data
Our experiments were carried out on two real-world datasets. The first came from the CASIA [46], which contains 918 image pairs, including 384 × 256 real images and corresponding distorted images with different texture characteristics. The other one was RTD [47,48], which contains 220 real images and corresponding distorted images with resolution 1920 × 1080.
To ensure that the images of the training set were different from the images of the testing set, we selected 301 non-repetitive original images and their corresponding tampered images to generate 66,231 images as our training data. Furthermore, 10,000 images were randomly selected from 66,231 images as a labeled subset. We adopted 226 repetitive original images and their corresponding set of tampered images to determine the threshold value of each image. The remaining images in CASIA and RTD datasets were used to test performance.

Baselines
We compared our proposed algorithm with a number of baselines. In particular, we compared it with: Wavelet-based image hashing [49]: It is an invariant feature transform-based method, which develops an image hash from the various sub-bands in a wavelet decomposition of the image and makes it convenient to transform from the space-time domain to the frequency.
SVD-based image hashing [24]: It belongs to dimension reduction-based hashing and it uses spectral matrix invariants as embodied by singular value decomposition. The invariant features based on matrix decomposition show good robustness against noise addition, blurring and compressing attacks. [30]: It incorporates ring partition and invariant vector distance into image hashing by calculating the images statistics. The statistical information of the images includes: mean, variance, standard deviation, kurtosis, etc.

RPIVD-based image hashing
Quaternion-based image hashing [12]: This method considers multiple features, and constructs a quaternion image to implement a quaternion Fourier transform for hashing generation.

Perceptual Robustness
To validate the perceptual robustness of proposed algorithm, we applied twelve types of content preserving operations: We extracted the reference hashing code based on the original image (ORH) and our proposed multi-attack reference hashing (MRH). For the content-preserving distorted images, we calculated the corresponding distances between reference hashing codes and content-preserving images' hashing codes. The statistical results under different attacks are presented in Table 1. Just as shown, the hashing distances for the four baseline methods were small enough. In our experiments, we set the threshold τ = 0.12 to distinguish the similar images and forgery images from the CASIA dataset for the PRIVD method. Similarly, for the other three methods, we set the thresholds as 1.2, 0.0012 and 0.008 correspondingly for their best results.

Discriminative Capability
The discriminative capability of a image hashing means that visually distinct images should have significantly different hashes. In other words, two images that are visually distinct should have a very low probability of generating similar hashes. Here, RTD dataset consisting of 220 different uncompressed color images was adopted to validate the discriminative capability of our proposed multi-attack reference hashing algorithm. We first extracted reference hashing codes for all 220 images in RTD and then calculated the hashing distance for each image with the other 219 images. Thus, we can finally obtained 220 × (220−1)/2 = 24,090 hashing distances. Figure 4 shows the distribution of these 24,090 hashing distances between hashing pairs with varying thresholds, where the abscissa is the hashing distance and the ordinate represents the frequency of hashing distance. It can be seen clearly from the histogram that the proposed method has good discriminative capability. For instance, we set τ = 0.12 as the threshold on CASIA dataset when extracting the reference hashing by RPIVD method. The minimum value for hashing distance was 0.1389, which is above the threshold. The results show that the multi-attack reference hashing can replace the original image-based reference hashing with good discrimination.

Authentication Results
As the reference hashing performance for authentication, we compared the proposed multi-attack reference hashing (MRH) with original image-based reference hashing (ORH) on four baseline image hashing methods, i.e., wavelet-based image hashing, SVD-based image hashing, RPIVD-based image hashing and QFT-based image hashing, with twelve content-preserving operations. The results are shown in Tables 2 and 3. Note that higher values indicate better performance for all metrics. It was observed that the proposed MRH algorithm outperformed the ORH algorithm by a clear margin, irrespective of the content preserving operation and image datasets (RTD and CASIA). This is particularly evident for illumination correction. For instance, in contrast to original image-based reference hashing, the multi-attack reference hashing increased the AUC of illumination correction by 21.98% on the RTD image dataset when getting the reference hashing by wavelet, as shown in Table 2. For the QFT approach, the multi-attack reference hashing we proposed was more stable and outstanding than other corresponding reference hashings. Since the QFT robust image hashing technique is used to process the three channels of the color image, the chrominance information of the color image can be prevented from being lost and image features are more obvious. Therefore, the robustness of the multi-attack reference hashing is more able to resist geometric attacks and content preserving operations. For instance, the multi-attack reference hashing increased the precision of Gaussian noise by 3.28% on the RTD image. For performance analysis, we took wavelet-based and SVD-based image hashing to extract features and used the semi-supervised method to train W for each content-preserving manipulations. The experimental results are summarized in Table 4. They show the probability of the true authentication capability of the proposed method compared to the methods: wavelet-based, SVD-based features and the corresponding semi-supervised method. Here, for the wavelet-based method, ψ = 0.02 and ξ = 0; and for SVD-based method, ψ = 0.005 and ξ = 0. The column of similar image represents the true authentication capability of the judgment of a similar image, which indicates the robustness of the algorithm. The column of tampering image represents the true authentication capability of tampering image, which indicates the discrimination of the algorithm. Higher values mean better robustness and differentiation. Only our approach selected adaptive thresholds, as other approaches choose a fixed threshold that balances robustness and discrimination.

Domains of Application
With the aid of sophisticated photoediting software, multimedia content security is becoming increasingly prominent. By using image editing tool, such as Photoshop, the counterfeiters can easily tamper the color attribute to distort the actual meanings of images. Figure 5 shows some real examples for image tamper. These edited images spread over the social network, which not only disturb our daily lives, but also seriously threat our social harmony and stability. If tampered images are extensively used in the official media, scientific discovery, and even forensic evidence, the degree of trustworthiness will undoubtedly be reduced, thus having a serious impact on various aspects of society.  Many image hashing algorithms are widely used in image authentication, image copy detection, digital watermarking, image quality assessment and other fields, as shown in Figure 6. Perceptual image hashing aims to be smoothly invariant to small changes in the image (rotation, crop, gamma correction, noise addition, adding a border). This is in contrast to cryptographic hash functions that are designed for non-smoothness and to change entirely if any single bit changes. Our proposed perceptual image hashing algorithm is mainly for image authentication applications. Our technique is suitable for processing large image data, making it a valuable tool for image authentication applications. Image quality assessment Figure 6. A generic framework of image hashing and an application perspective.

Conclusions
In this paper, we have proposed a hashing algorithm based on multi-attack reference generation and adaptive thresholding for image authentication. We effectively exploited simultaneously the supervised content-preserving images and multiple attacks for feature generation and the hashing learning. We specially take into account the pairwise variations among different originally-received image pairs, which makes the threshold more adaptable and the value more reasonable. We performed extensive experiments on two image datasets and compared our results with the state-of-the-art hashing baselines. Experimental results demonstrated that the proposed method yields superior performance. For image hashing-based authentication, a scheme with not only high computational efficiency but also reasonable authentication performance is expected. Compared with other original image-based reference generation, the limitation of the our work is that it is time consuming for cluster operation. In the future work, we will design the co-regularized hashing for multiple features, which is expected to show even better performance.