Content-Based Image Copy Detection Using Convolutional Neural Network

: With the rapid development of network technology, concerns pertaining to the enhancement of security and protection against violations of digital images have become critical over the past decade. In this paper, an image copy detection scheme based on the Inception convolutional neural network (CNN) model in deep learning is proposed. The image dataset is transferred by a number of image processing manipulations and the feature values in images are automatically extracted for learning and detecting the suspected unauthorized digital images. The experimental results show that the proposed scheme takes on an extraordinary role in the process of detecting duplicated images with rotation, scaling, and other content manipulations. Moreover, the mechanism of detecting duplicate images via a convolutional neural network model with di ﬀ erent combinations of original images and manipulated images can improve the accuracy and e ﬃ ciency of image copy detection compared with existing schemes.


Introduction
Multimedia forensics is one of the key technologies for digital evidence authentication in cybersecurity. The rapid expansion in the amount of digital content in social networks has also brought about a significant increase in the number of copyright infringements [1]. Nowadays, new image processing manipulations are rapidly developing and are incorporated into image processing software such as Photo Impact and Adobe Photoshop. Digital images are more likely to be copied and tampered while transmitting over the Internet. Therefore, concerns pertaining to the enhancement of security and protection against violations of digital images have become critical over the past decade [2]. Researchers are devoted to designing associated forensics algorithms, detecting the unauthorized manipulation, and then protecting the copyrights of original images.
In general, the current techniques for image copyright protection can be divided into digital watermarking [3][4][5] and content-based copy detection [6,7]. Digital watermarking is the mechanism that embeds digital watermarks as the copyright information into digital images, and the embedded the parameters of all convolutional layers in the trained model and replace the last fully connected layer of the next training task. The target image may use the trained neural network to extract the features of the image. The extracted features will be used as input to solve the classification problem caused by the insufficient data of the neural network, which would also shorten the time required for training and improve the accuracy of classification.
In this paper, to study a high-performance convolutional neural network model for protection against violations of digital images by adaptively learning manipulation features, a content-based image copy detection scheme based on Inception V3 is proposed. The image dataset was transferred by a number of image processing manipulations, and the feature values were automatically extracted for learning and detecting the suspected unauthorized digital image. A Google Inception Net training model was used to automatically establish a convolutional layer to train the dataset. The performance under different training parameters is studied to design the optimum training model. The experimental results show that the proposed scheme takes an extraordinary role in detecting duplicate digital images under different manipulations, such as rotation, scaling, and other content modifications. The detection accuracy and efficiency are rapidly improved compared with other content-based image copy detection schemes.
The remainder of the paper is organized as follows: The related work of content-based image copy detection and the overview of CNNs are presented in Section 2. Section 3 provides a detailed description of the proposed scheme. The experimental results and comparison of related literature are presented in Section 4. Lastly, Section 5 concludes our work.

Content-Based Image Copy Detection
Content-based copy detection applies the architecture of content-based image retrieval, which has higher requirements in recognition [18]. It is the technology that focuses on figuring out unauthorized images from the dataset that consists of suspected unauthorized digital images with rotation, scaling, and other content modifications. Researchers have focused on developing general-purpose image copy detection techniques to determine if an image has undergone malicious image processing manipulation [19]. Kim et al. [11] proposed the first representative content-based image copy detection mechanism based on DCT. The feature values of each digital image in the digital image database are generated based on an AC coefficient that is regarded as a sequential measure. To carry out this mechanism, the image is divided into 8*8 non-overlapping sub-image blocks, and an average intensity value is calculated for each block. The AC coefficients are determined from the sub-images, and sorted based on sequential measurement to represent the feature values. The experimental results shown that this mechanism can successfully detect images manipulated by equalization and contrast enhancement, as well as image processing such as watercolor, dynamic fuzzy mosaic, and Gaussian blur techniques. However, the performance is not satisfactory while dealing with rotation manipulated images.
To detect the images that are manipulated with different rotated angles, an improved content-based image copy detection scheme is proposed by Wu et al. [12]. They found that after the image was rotated, some of the edge features would move to other locations. In their proposed scheme, the picture is first converted into the YUV color model, and then only Y is taken as a plane to generate the feature value of the image. After dividing the digital image into an 8 × 8 size sub-image block, the average value of all the block is calculated. The correlation coefficient between each block and the surrounding eight blocks is extracted as the feature value. During the detection procedure, the image feature value in the Y channel of the YUV color model is extracted to detect the unauthorized images. The experimental results show that more image processing manipulation could be detected compared with Kim et al.'s scheme, even if the picture was rotated by 90 • and 270 • . However, the digital images that with more complicated rotation angles, such as 22.5 • or 45 • , are undetectable in this scheme.
Lin et al. [13] then proposed a mechanism based on feature values of the digital image edge information under different color models. In order to find each edge point, Lin et al.'s scheme scans each image with a 3 × 3 mask in zig-zag order. Their scheme can completely capture the edge features of the entire image, and can detect various rotation, shifting and cropping attacks compared with the image copy detection mechanisms proposed by Kim et al. and Wu et al. In order to detect the arbitrary rotation of the image, and at the same time achieve the desired performance in terms of accuracy and performance, Zhou et al. [14] proposed an efficient image copy detection scheme based on gradient magnitude and direction. In their method, the image is preprocessed to suppress the noise in order to improve the detection efficiency of the method. Secondly, an image segmentation strategy based on a pixel intensity sequence is used to divide the image into several rotation-invariant partitions. Finally, the features of the gradient amplitude and direction of the image are extracted, respectively, as the feature values of image. This scheme can detect the arbitrary rotation of the image. However, it requires experts to analyze the digital images in advance to find effective feature values of the digital images.
Although the schemes mentioned above have resulted in significant gains in accuracy of detecting unauthorized images that have undergone image processing manipulations, the specific image feature values should be manually extracted for detection, which is time consuming. Deep learning approaches, particularly CNNs that can provide their ability to automatically learn classification model directly from data, exploit a new area of study to improve detection accuracy and efficiency by automatically detecting whether the query image is an unauthorized duplicated image once it is input into the detection model.

Convolutional Neural Network
A CNN [20] is a deep learning model constructed by referring to the visual nerve of neural networks. Its main function is to let the computer find out the correlation between data and information on its own, and extract useful features to replace manually defined features. It is a kind of feedforward neural network, whose artificial neurons can respond to a part of the surrounding units in the coverage area and meanwhile preserve the features in the scope, which exhibits an excellent performance for a large number of image processing methods. Figure 1 shows a representative structure of CNN [21], which usually contains a convolutional layer, pooling layer, and fully connected layer.
Electronics 2020, 9, x FOR PEER REVIEW 4 of 19 digital images that with more complicated rotation angles, such as 22.5° or 45°, are undetectable in this scheme. Lin et al. [13] then proposed a mechanism based on feature values of the digital image edge information under different color models. In order to find each edge point, Lin et al.'s scheme scans each image with a 3 × 3 mask in zig-zag order. Their scheme can completely capture the edge features of the entire image, and can detect various rotation, shifting and cropping attacks compared with the image copy detection mechanisms proposed by Kim et al. and Wu et al. In order to detect the arbitrary rotation of the image, and at the same time achieve the desired performance in terms of accuracy and performance, Zhou et al. [14] proposed an efficient image copy detection scheme based on gradient magnitude and direction. In their method, the image is preprocessed to suppress the noise in order to improve the detection efficiency of the method. Secondly, an image segmentation strategy based on a pixel intensity sequence is used to divide the image into several rotation-invariant partitions. Finally, the features of the gradient amplitude and direction of the image are extracted, respectively, as the feature values of image. This scheme can detect the arbitrary rotation of the image. However, it requires experts to analyze the digital images in advance to find effective feature values of the digital images.
Although the schemes mentioned above have resulted in significant gains in accuracy of detecting unauthorized images that have undergone image processing manipulations, the specific image feature values should be manually extracted for detection, which is time consuming. Deep learning approaches, particularly CNNs that can provide their ability to automatically learn classification model directly from data, exploit a new area of study to improve detection accuracy and efficiency by automatically detecting whether the query image is an unauthorized duplicated image once it is input into the detection model.

Convolutional Neural Network
A CNN [20] is a deep learning model constructed by referring to the visual nerve of neural networks. Its main function is to let the computer find out the correlation between data and information on its own, and extract useful features to replace manually defined features. It is a kind of feedforward neural network, whose artificial neurons can respond to a part of the surrounding units in the coverage area and meanwhile preserve the features in the scope, which exhibits an excellent performance for a large number of image processing methods. Figure 1 shows a representative structure of CNN [21], which usually contains a convolutional layer, pooling layer, and fully connected layer. The convolution layer extracts the feature values of inputs and performs a convolution operation between the preprocessed image and the specific filter. The filter coefficients in each layer are initially seeded with random values, then learned using back-propagation algorithms [22,23]. The set of convolutional layers yields a large volume of feature maps. The pooling layer takes an extraordinary The convolution layer extracts the feature values of inputs and performs a convolution operation between the preprocessed image and the specific filter. The filter coefficients in each layer are initially seeded with random values, then learned using back-propagation algorithms [22,23]. The set of convolutional layers yields a large volume of feature maps. The pooling layer takes an extraordinary role in reducing the dimensionality of these features, compressing data volume and decaying overfitting. Max pooling is one of the common pooling operations that divides the input digital image into several rectangular areas and figures out the maximum value of each rectangular area. The fully connected Electronics 2020, 9, 2029 5 of 16 layer automatically identifies the target based on the feature maps trained by the processing of multiple convolutional and pooling layers with the weight and connection strength.
Recently, CNNs have been successfully used on computer vision fields, such as image recognition, object detection, and semantic segmentation. Bayar et al. [24] developed an innovative convolutional layer designed for suppressing image content that is adapted to learn for manipulating detect feature values. They selected convolution filtering to establish the first layer of a convolutional since the higher layers of the CNN take an extraordinary role in learning appropriate methods for extracting low-dimensional detection features from high-dimensional prediction errors. The images in the dataset were divided into 256 × 256 blocks, and a total of 87,000 blocks were applied for model training, shortening the model training time within one hour. The results demonstrated that CNNs take on an extraordinary role in terms of detection efficiency compared to traditional schemes. Yan et al. [25] rapidly improved the accuracy of digital image detection based on changing the feature values. It put the image dataset into a CNN model to extract the feature values, and optimized the feature values based on a Bag-of-Words model. Their scheme improved the accuracy of digital image detection, and the accuracy rate reached up to 94.59%. Cozzolino et al. [26] transferred training images out of multiple processing images of the same type (for example, gaussian noise) for repeated training to improve the stability of the deep neural network, and the accuracy rate was up to 93.9% based on the original Inception training model. Zheng et al. [27] converted the digital image into two color modes (grayscale and color), stitching the color images from top to bottom, and establishing a convolutional neural network model to, respectively, learn the feature values of grayscale images and color images. Their scheme combined two outputs via fully connected layers, and extracted feature values from a pseudo-Siamese model to keep the stitched image to maintain the feature values of the two original images. It could not only prevent the brightness features from affecting the identification, but also find the most recognizable features from the color images.
Nowadays, several CNN models, such as Google Inception Net, ResNet, MobileNet, NASNet, and their different versions, have been developed in various field of engineering for pattern recognition, image classification, and text categorization [28][29][30] as well as in many traffic safety-related research [31,32]. Researchers no longer need to spend a lot of time to design effective feature value alignment between the original image and the query image, which can accelerate the study of image copy detection. To gain ideal training results, a training model of CNNs performs an extraordinary role in conducting machine learning for the relationship between data information and specimens in the dataset. Therefore, in this work we will use a CNN as the basic model to effectively find out the list of suspected unauthorized images and improve the performance of intellectual property protection. To see whether CNNs can demonstrate outstanding detection performance on coped images, we collect 43 image processing manipulations which are commonly seen in the image processing applications to generate copied images. Moreover, to present more insight of CNNs on image copy detection, in this work, not only the detection accuracy under different training models such as Inception_v3, ResNet_v2, MobileNet_v2, NASNet_large will be presented, but different datasets used for training will also be explored.

Methodology
In this study, the conventional neural network (CNN) method was adopted to develop image copy detection based on the training of the detection CNN model to determine whether the query digital image is a suspected unauthorized image. The image copy detection model of this scheme consists of image preprocessing and detection model training procedures. To gain an expanded digital image dataset, images were first converted into 44 different forms including the original image during image preprocessing. After that, the digital images were divided into two sorts of datasets by selecting 70% of the digital images as the training dataset, and the remaining 30% of the digital images are regarded as the test dataset. The image copy detection model was then trained by automatically extracting feature values of each image based on Inception V3. Finally, the detection model is used to detect whether the extracting feature values of each image based on Inception V3. Finally, the detection model is used to detect whether the query image is a suspected unauthorized image or not. The flowchart of the proposed image copy detection scheme is shown in Figure 2.

Figure 2.
Flowchart of the proposed image copy detection scheme.

Image Preprocessing
At the image preprocessing stage, various common image processing manipulations, such as translation, rotation, and other content modifications, were firstly performed on each image via image processing software to generate 44 different forms of the image that are regarded as manipulated digital images. It was noted that the original image and its corresponding manipulation images will form a group-in other words, they will have the same label. This is because our proposed image copy detection scheme aimed to shrink the training time and scale of the training dataset. When the image owner wants to verify whether the similar images collected from the Internet contains her/his image, they only needs to generate various manipulation images in advance and feed them into our proposed image copy detection scheme for training the features of her/his image. Then, the image copy detection scheme would identify the most suspicious image from the similar images. To prevent overfitting, 70% of the images which belong to the same group were selected and finally formed a training dataset and 10% of the training dataset were randomly extracted for verification during model training. The remaining 30% of the images were regarded as the test dataset.

Detection Model Training
Tensorflow was selected as the development environment to train the dataset, and the Google Net Inception V3 [33] convolutional neural network architecture was utilized to learn the training model, which contains four significant parts-i.e., 1 × 1 convolution, 3 × 3 convolution, 5 × 5 convolution, and 3 × 3 maximum pooling, as shown in Figure 3.

Image Preprocessing
At the image preprocessing stage, various common image processing manipulations, such as translation, rotation, and other content modifications, were firstly performed on each image via image processing software to generate 44 different forms of the image that are regarded as manipulated digital images. It was noted that the original image and its corresponding manipulation images will form a group-in other words, they will have the same label. This is because our proposed image copy detection scheme aimed to shrink the training time and scale of the training dataset. When the image owner wants to verify whether the similar images collected from the Internet contains her/his image, they only needs to generate various manipulation images in advance and feed them into our proposed image copy detection scheme for training the features of her/his image. Then, the image copy detection scheme would identify the most suspicious image from the similar images. To prevent overfitting, 70% of the images which belong to the same group were selected and finally formed a training dataset and 10% of the training dataset were randomly extracted for verification during model training. The remaining 30% of the images were regarded as the test dataset.

Detection Model Training
Tensorflow was selected as the development environment to train the dataset, and the Google Net Inception V3 [33] convolutional neural network architecture was utilized to learn the training model, which contains four significant parts-i.e., 1 × 1 convolution, 3 × 3 convolution, 5 × 5 convolution, and 3 × 3 maximum pooling, as shown in Figure 3. Christian Szegedy et al. [33] further explored different Inception models- Figures 4b and 5a,b based on Figure 4a.  Christian Szegedy et al. [33] further explored different Inception models- Figures 4b and 5a,b based on Figure 4a.  Christian Szegedy et al. [33] further explored different Inception models- Figures 4b and 5a,b based on Figure 4a.  To gain optimized feature values, different information of digital images was extracted by multiple convolutional layers and combining the outputs of these convolutional layers as the input of the next layer [34]. Inception V3 improves the idea of factorization that decomposes a relatively large two-dimensional convolutional layer into petty small one-dimensional convolutional layers to accelerate the computing speed and deepness of the network. The convolution structure performs a significant role in handling spatial feature values and increases the diversity of feature values. The To gain optimized feature values, different information of digital images was extracted by multiple convolutional layers and combining the outputs of these convolutional layers as the input of the next layer [34]. Inception V3 improves the idea of factorization that decomposes a relatively large two-dimensional convolutional layer into petty small one-dimensional convolutional layers to accelerate the computing speed and deepness of the network. The convolution structure performs a significant role in handling spatial feature values and increases the diversity of feature values. The layout of our network is shown in Table 1. The output size of each module is the input size of the next one. The variations of the reduction technique are depicted in [33] and are also used in our proposed network architecture to increase the accuracy while maintaining a constant amount of computation complexity. The classifier in our network is set as softmax. It is noted that both sigmoid and softmax can be used for multiclass classification. However, the sigmoid looks at each raw output value separately. In contrast, the outputs of softmax are all interrelated. Based on the scenario of image copy detection, softmax is selected as classifier in our proposed scheme. To determine the relationship between images and labels, the preprocessed digital image dataset with corresponding labels is put into the model for training. The detection model training consists of two stages: (a) training parameters setting and (b) model accuracy detection. During the setting training parameters stage, the training amplitude, time, and size of the input digital image dataset are adjusted, and the size of the feature map and the number of extracted feature values are also determined.
In the Inception training model, the input size of the digital image was first set and divided into three depths based on the three channels of RGB as the input layer. The convolution layer depth was set to 32, which will gain 32 filters with a depth, height, and width of 3. The convolution results are 32 feature values with heights and widths of 149 based on the following equations: where W 1 is the width of the unconvoluted digital image; W 2 is the width of the convoluted feature map; F is the width of the filter; P is the number of zeros padded around the original digital image; S is the stride; H 2 is the height of the convoluted feature map; H 1 is the height of the unconvoluted digital image. Besides image size, training steps and learning rate can also be set as the parameters to justify the training results. The default training steps were set as 4000; this is because more training steps may improve the training accuracy but lower validation accuracy. The learning rate controls the training level of the final layer, and the default value was set as 0.01. After each training stage, the accuracy of During the image copy detection procedure, the system will automatically find five images in the dataset that are most similar to the query image as the suspected image list. The suspected image list is then used to compare and verify whether the query image is a suspected unauthorized image or not.

Results
To train the effectiveness and accuracy of the proposed method, the WBIIS database [35,36], which contains 10,000 open-source and copyright-free digital images in jpeg format with sizes of 384 × 256 and 256 × 384, was chosen as the test database in the experiment; the example of the test images is shown in Figure 7. We selected 70% of the digital images as the training dataset and the remaining 30% of the digital images as the test dataset. To train whether the proposed scheme can detect unauthorized duplicated images under various attacks, each image was transferred into different forms based on 43 image processing manipulations, as shown in Figure 8. It is noted that the watermarked image was not included in manipulations images as shown in Figure 8. This is because we assumed that once the original image carries the hidden watermark, its corresponding manipulations will also carry the hidden watermark. Therefore, we only need to focus on discussing whether our proposed copy detection with CNNs could identify the manipulations or not.  During the image copy detection procedure, the system will automatically find five images in the dataset that are most similar to the query image as the suspected image list. The suspected image list is then used to compare and verify whether the query image is a suspected unauthorized image or not.

Results
To train the effectiveness and accuracy of the proposed method, the WBIIS database [35,36], which contains 10,000 open-source and copyright-free digital images in jpeg format with sizes of 384 × 256 and 256 × 384, was chosen as the test database in the experiment; the example of the test images is shown in Figure 7. We selected 70% of the digital images as the training dataset and the remaining 30% of the digital images as the test dataset. To train whether the proposed scheme can detect unauthorized duplicated images under various attacks, each image was transferred into different forms based on 43 image processing manipulations, as shown in Figure 8. It is noted that the watermarked image was not included in manipulations images as shown in Figure 8. This is because we assumed that once the original image carries the hidden watermark, its corresponding manipulations will also carry the hidden watermark. Therefore, we only need to focus on discussing whether our proposed copy detection with CNNs could identify the manipulations or not. During the image copy detection procedure, the system will automatically find five images in the dataset that are most similar to the query image as the suspected image list. The suspected image list is then used to compare and verify whether the query image is a suspected unauthorized image or not.

Results
To train the effectiveness and accuracy of the proposed method, the WBIIS database [35,36], which contains 10,000 open-source and copyright-free digital images in jpeg format with sizes of 384 × 256 and 256 × 384, was chosen as the test database in the experiment; the example of the test images is shown in Figure 7. We selected 70% of the digital images as the training dataset and the remaining 30% of the digital images as the test dataset. To train whether the proposed scheme can detect unauthorized duplicated images under various attacks, each image was transferred into different forms based on 43 image processing manipulations, as shown in Figure 8. It is noted that the watermarked image was not included in manipulations images as shown in Figure 8. This is because we assumed that once the original image carries the hidden watermark, its corresponding manipulations will also carry the hidden watermark. Therefore, we only need to focus on discussing whether our proposed copy detection with CNNs could identify the manipulations or not.

Results in Different Learning Rates
To study the relationship between the learning rate and the training results, this scheme set the initial value of the learning rate parameters of the trained Inception V3 image feature vector model to 0.005, 0.01, 0.5, and 1.0 to compare the training results, as shown in Table 2. Table 2 indicates that the result of training time is inversely proportional to the learning rate. It also shows that the training time and accuracy of the default learning rate (i.e., 0.01) are satisfactory-431 and 96.23%, respectively. The training time will increase and the accuracy will decrease if the learning rate is decreased to 0.005. Therefore, we speculate that while the initial value of the learning rate parameter is less than 0.01, the longer training time will lead to overfitting and decreases the accuracy. To obtain a higher accuracy, the initial value of the learning rate should be adjusted upward, and the accuracy will reach up to 99.55% while the learning rate is 1.0.

Comparison Results in Different Training Models
To compare the results of Inception_v3 training model in Tensorflow with other training models, we future trained the detection model in ResNet_v2, MobileNet_v2, and NASNet large; the comparison results are shown in Table 3. The results show that all of the training models can achieve very high accuracy results, whereas the training time varies widely. The smallest neural network MobileNet_v2 has the shortest training time, and the accuracy rate can reach up to 98.77%. Although the accuracy rate of NASNet_large is 99.34%, the training time is also the longest. This is because NASNet_large not only trains for the most effective feature values, but also trains to gain the best neural network architecture, which would result in more training time. With respect to Inception_v3, the accuracy is 99.55% which is the best in the four models when the learning rate is set as 1, and the training time (315 s) is also very significant. Therefore, in general, the comparison results demonstrate that the Inception_v3 model used in the proposed scheme can achieve the best performance compared with other models.

Comparison Results in Different Training Datasets
Certainly, as the results in Table 3 demonstrate, with all manipulated images including the original image serve as the training dataset, our proposed method works with Inception_v3 and can offer a detection accuracy of up to 99.55%. However, it is impractical because such accuracy is dependent on all manipulated images being included in the training dataset. Therefore, the second experiment was conducted which only uses original images in the training dataset except for 43 manipulated images. Unfortunately, the detection accuracy rate is then decreased to 90.06%. Under such circumstances, we can see that 7 of the 10 test images with the Crayon effect cannot be detected, as is the case for six rotation manipulations, such as 22.5 • , 45 • , 67 • , 90 • , 180 • , and 270 • rotations. The number of undetected manipulated images is significantly decreased, and the worst case is that 4 of 6 rotation manipulated images are undetectable. For 5 of the 10 test images, the amount of undetectable rotation manipulated images ranges from 3 to 4. To increase the detection accuracy rate, the third experiment was conducted, in which the test dataset including original images and manipulated images with 45 • rotation was used. With the assistance of the manipulated images with a 45 • rotation, the CNN obtains extra features regarding the rotation manipulation. It is noted that the detection accuracy rate changed to 96.47%, which is 6% more than that trained with the dataset excluding manipulated images.
Based on the above experimental data demonstrated in Table 4, we can see that the supplementary features offered by manipulated images with a 45 • rotation significantly increase the detection accuracy rate of our proposed scheme. There are 8 of 10 test images which have their manipulated images with various rotations successfully detected. As for manipulated images with the "Crayon effect", there are 4 of 10 test images that cannot be correctly detected. However, the corresponding similarities among the manipulated images with the Crayon effect and their query images are significantly increased. After carefully observing the corresponding test images which cannot be identified from the manipulated images with the "Crayon effect", it is found that this only occurs when test images are complex and contain vivid colors.

Comparison with Content-Based Image Copy Detection Schemes
To detect the performance and accuracy of the proposed scheme, the image copy detection results are compared with some representative content-based image copy detection schemes-i.e., Lin et al.'s scheme [13], Wu et al.'s scheme [12], and Kim et al.'s scheme [11]. We randomly selected 10 images from the image database, and each image was processed based on the 44 image processing manipulations to generate the query digital images. We calculated the number of images that can be detected in 10 images of each image processing manipulation. Only if all of the 10 images in each image processing manipulation are detected will the detection result be marked as "Yes", otherwise the result will be marked as "No". Table 5 shows the detection results of the proposed scheme and other content-based image copy detection schemes in different image processing manipulations.
In the experiment, the proposed image copy detection scheme with the training dataset including all manipulated images could successfully detect all query digital images with an accuracy of 100%, whereas the accuracy of Lin  On the contrary, the proposed scheme would automatically detect whether the query image is an unauthorized duplicated image once it is input into the detection model. Therefore, the comparison results demonstrate that the proposed scheme outperforms the compared image copy detection schemes in terms of detection time and accuracy. Once the proposed image copy detection scheme with the training dataset included original images and manipulated images with 45 • rotation, the detection accuracy rate is the same as that of Lin et al.'s scheme. Although, both of the schemes cannot deal with manipulated images with the Crayon effect. Our proposed scheme still can identify 4 of 10 test images. As for manipulated images with 67 • , 90 • , 270 • , and 180 • rotations, the average detection accuracy rate still remains about 80%. Moreover, our proposed scheme can resist "45 • Rotation", "Mosaic" "Central Rotation", and "Sharpened", which cannot be handled by Lin et al.'s scheme. To further evaluate whether our proposed image copy detection with a CNN successfully learns the feature of training set images, and then could identify some manipulated images which are slightly different from those in the training set, five extra manipulated images were generated as 10 • rotation, image with 15% noise, image twisted 25 • , image 1.4 times wider, and horizontally shifted image, and added into query images.
From the accuracies listed in Table 7, we can see the accuracies of the training set are nearly the same as previous experimental results. As for the accuracies of the testing set, they are slightly lower than those of accuracies of the training set. We believe this is mainly caused by either the "Crayon Effect" or few angle rotations manipulations. Such results are consistent with previous experiments and prove that our proposed image copy detection with a CNN could take the advantage of CNNs to learn the features of images so that some manipulations can still be identified even if they are out of the scope of the training set.

Conclusions
An image copy detection scheme based on the Inception V3 convolutional neural network is proposed in this paper. The image dataset is firstly transferred by a number of image processing manipulations for training the detection model with feature values. During the image copy detection procedure, the system will automatically find out the suspected image list to compare and verify whether the query image is a suspected unauthorized image or not. The experimental results show that the training accuracy will reach up to 99.55% while the training dataset including all manipulated images and the learning rate is set as 1.0, which is superior to the model of ResNet_v2, MobileNet_v2, and NASNet large. Even though the training dataset only includes original images and manipulated images with 45 • rotations, our proposed scheme still outperforms Lin et al.'s scheme on the amount of manipulation types which have been successfully identified. Certainly, 80% of the manipulated images with rotations can be detected, but 40% of the manipulated images with the Crayon effect are hard to identify when the training dataset only contains original images and manipulated images with 45 • rotation with our proposed scheme. Such a result indicates the lowest requirement of the training set with our proposed scheme. However, the experimental results also pointed out that our proposed image copy detection scheme can identify some manipulations which are slightly different from those in the training set. We believe it is a benefit that deep learning can effectively extract the supplementary features of images in the training set. Certainly, with experiments conducted in this work, we also discovered that the additional information is still limited by feeding manipulated images with 45 • rotations. In the future, we will try to explore how to improve the detection performance of CNNs on the "Crayon Effect" and find out the tradeoff between the amount of manipulation types which should be included in the training set and the accuracy of the testing set to increase the practicability of the image copy detection scheme.