Deep Learning-Based Digital Image Forgery Detection System

: The advancements of technology in every aspect of the current age are leading to the misuse of data. Researchers, therefore, face the challenging task of identifying these manipulated forms of data and distinguishing the real data from the manipulated. Splicing is one of the most common techniques used for digital image tampering; a selected area copied from the same or another image is pasted in an image. Image forgery detection is considered a reliable way to verify the authenticity of digital images. In this study, we proposed an approach based on the state-of-the-art deep learning architecture of ResNet50v2. The proposed model takes image batches as input and utilizes the weights of a YOLO convolutional neural network (CNN) by using the architecture of ResNet50v2. In this study, we used the CASIA_v1 and CASIA_v2 benchmark datasets, which contain two distinct categories, original and forgery, to detect image splicing. We used 80% of the data for the training and the remaining 20% for testing purposes. We also performed a comparative analysis between existing approaches and our proposed system. We evaluated the performance of our technique with the CASIA_v1 and CASIA_v2 datasets. Since the CASIA_v2 dataset is more comprehensive compared to the CASIA_v1 dataset, we obtained 99.3% accuracy for the ﬁne-tuned model using transfer learning and 81% accuracy without transfer learning with the CASIA_v2 dataset. The results show the superiority of the proposed system.


Introduction
Digital images have an important role in many fields such as in newspapers, digital forensics, scientific research, medicine, and so forth. Nowadays, the usage and sharing of digital images on social media platforms is also widespread. Digital images are considered one of the main sources of information. Considering the excessive use of image sharing through various social media platforms such as WhatsApp, Instagram, Telegram, and Reddit, differentiating between real and forged images is a challenging task. The availability of many image editing software applications is making it more difficult to detect the authenticity of an image day by day. There are generally two approaches that image manipulation can be categorized into, as follows: Active approach; 2.
Passive approach.
With the active approach, a watermark or digital signature is embedded when the image is created. While using these embeddings, whether the image has been tampered with or not is analyzed at later stages.
In the passive approach, any pre-embedded information, such as a watermark embedded for the detection of image forgery, cannot be relied upon. This approach is also known as the blind approach because there is no additional information for image forgery detection. This approach is based on features that are extracted directly from the images.
Furthermore, the passive approach can be categorized into two types-independent and dependent. The independent approach detects resampling and compression forgeries, In the copy/move type of manipulation, a particular part of an image is selected and then copied and pasted to another part of the same image. By doing this, the correlation value of these two parts of the image will be relatively higher compared to other parts of the image. The goal of copy/move forgery detection is to correctly identify these duplicates in these images by comparison of the attributes extracted from the features using distance measures. Two approaches, as follows, are commonly used to extract patch-wise features from the images: 1. Images are divided into blocks, and features are extracted from these blocks, as discussed in [1]; 2. Key points are identified from the image and these key point features are extracted.
The features extracted in blocks or key points are compared one by one for generating matched pairs. The duplication is confirmed if matching is found among two blocks; it confirms the duplication and can be categorized as a manipulated image. The steps of the process are shown in Figure 2. Digital image splicing is a method of extracting of objects from one image and inserting those objects into another image. It is easier to detect manipulations in copy/move image forgery detection compared to image splicing because the similar contours of an object can be easily detected in the same image since they have the same sizes, transitions, and textures. Different objects are introduced in the case of image splicing, with different textures, sizes, and transition attributes, and this approach makes forgery difficult to identify [2]. In the copy/move type of manipulation, a particular part of an image is selected and then copied and pasted to another part of the same image. By doing this, the correlation value of these two parts of the image will be relatively higher compared to other parts of the image. The goal of copy/move forgery detection is to correctly identify these duplicates in these images by comparison of the attributes extracted from the features using distance measures. Two approaches, as follows, are commonly used to extract patch-wise features from the images:

1.
Images are divided into blocks, and features are extracted from these blocks, as discussed in [1]; 2.
Key points are identified from the image and these key point features are extracted.
The features extracted in blocks or key points are compared one by one for generating matched pairs. The duplication is confirmed if matching is found among two blocks; it confirms the duplication and can be categorized as a manipulated image. The steps of the process are shown in Figure 2. Digital image splicing is a method of extracting of objects from one image and inserting those objects into another image. forgeries, while the dependent approach detects splicing and copy/move forgeries. Figure  1 shows the hierarchy of the above approaches. In the copy/move type of manipulation, a particular part of an image is selected and then copied and pasted to another part of the same image. By doing this, the correlation value of these two parts of the image will be relatively higher compared to other parts of the image. The goal of copy/move forgery detection is to correctly identify these duplicates in these images by comparison of the attributes extracted from the features using distance measures. Two approaches, as follows, are commonly used to extract patch-wise features from the images: 1. Images are divided into blocks, and features are extracted from these blocks, as discussed in [1]; 2. Key points are identified from the image and these key point features are extracted.
The features extracted in blocks or key points are compared one by one for generating matched pairs. The duplication is confirmed if matching is found among two blocks; it confirms the duplication and can be categorized as a manipulated image. The steps of the process are shown in Figure 2. Digital image splicing is a method of extracting of objects from one image and inserting those objects into another image. It is easier to detect manipulations in copy/move image forgery detection compared to image splicing because the similar contours of an object can be easily detected in the same image since they have the same sizes, transitions, and textures. Different objects are introduced in the case of image splicing, with different textures, sizes, and transition attributes, and this approach makes forgery difficult to identify [2]. It is easier to detect manipulations in copy/move image forgery detection compared to image splicing because the similar contours of an object can be easily detected in the same image since they have the same sizes, transitions, and textures. Different objects are introduced in the case of image splicing, with different textures, sizes, and transition attributes, and this approach makes forgery difficult to identify [2].
Image splicing forgery detection is dependent on the clues that are left after the manipulation of images. Some common image splicing issues include inconsistency, edge discontinuity because of the camera, and geometric and lighting conditions. Capturing an image from different cameras results in different attributes, and clue tampering can be confirmed [3]. There can be lighting inconsistencies as well, which can arise due to different lighting conditions. A double quantization effect can also arise when saving JPEG images because of two consecutive compression operations that are performed on the tampered image [2,3].
Image tampering usually does not have any visible clues though which one can tell whether the image has been tampered with or not; however, some statistics of the image may be altered. Christlein et al. [4] experimented on the copy/move approach and cut/paste-based detection methodology was discussed by Zambpglou et al. [5]; some of these approaches are shown in Figure 1.
The development of deep learning has led to improving methodologies where stateof-the-art methods, such as CNN, Mobile Net, and ResNet50v2, automatically extract the potential features, having been trained on large datasets. Some of the examples of CNN-based feature extractions are deep features utilized for image quality assessment [6], skin lesion classification [7], or person re-identification [8]. These extracted features are adapted into the inherent structural patterns of the data. This is the main reason behind their non-discriminative and robust architecture compared to the hand-engineered features.
In this paper, motivated by the deep learning technique, we propose a transfer learningbased approach. It is an effective architecture with which we incorporated the weights of a model previously trained on a large database, and hence, it benefitted from using the meaningful weights without having to train the model from scratch. We present an architecture based on the ResNet50v2 architecture that employs the use of transfer learning for the detection of tampered images, specifically, spliced images. We used the pre-trained weights of a YOLO CNN model to detect images that were specifically tampered with using the image splicing technique. Furthermore, this study makes the following contributions to this field of research: The rest of the paper is organized as follows. Section 2 discusses the literature review and Section 3 explains the proposed system architecture. In Section 4, we present the dataset details. Section 5 presents the experimental results, discussion, and future work. Section 6 presents the conclusion.

Literature Review
Recent developments of image forensic techniques have led to the emergence of stateof-the-art techniques with which we can detect manipulations that have been made in digital images. Previously, some research studies [10][11][12] have proposed approaches that rely on the observations that are made during each phase of the image history, from its acquisition phase to saving it in a compressed format. The processing of the image leaves a trace on the image for the verification of digital authenticity. It is then determined as authentic or inauthentic by the verification of a digital signature.
Yerushalmy et al. [2] suggested a new approach for the detection of image forgery. This technique is not adding digital watermarking in the images and does not compare the images for training and testing. The authors proposed that image features extracted during the acquisition phase are themselves proof of authenticity of the image. These features are often visible to the naked eye. Specifically, it uses image artifacts caused by various irregularities as markers to determine image validity. Ahmet et al. [3] proposed a technique for detecting image tampering using a color filter array. It computes a single feature and a simple threshold-based classifier. The authors tested their approach with authentic, computer-generated, and tampered images. The experimental analysis showed low error rates.
Barad et al. [13] performed a research survey that was based on deep learning techniques for the task of image forgery detection, and they presented an analysis of the approaches used to detect the authenticity of images on publicly available datasets. Yue et al. [14] introduced a deep learning-based architecture for copy/move image forgery detection using BusterNet, which is an end-to-end trainable approach. BusterNet uses two-branch architecture. The goal of the first branch is to identify manipulation areas using visual artifacts, whereas the second branch identifies copy/move areas using visual similarities. For effective BusterNet training, they proposed simple techniques for out-ofdomain datasets and a stepwise approach. Their extensive research study demonstrated that BusterNet outperformed traditional copy/move algorithms by a large margin. The proposed architecture was evaluated with the CASIA and CoMoFoD datasets.
Manjunatha et al. [15] discussed the importance of detecting tampering in images using deep learning-based techniques on publicly available datasets such as CASIA, UCID, MICC [9,16,17], and so forth. They covered passive image forensic analysis methodology and highlighted future challenges in developing a mechanism for the detection of tampered images. In another study, Belhassen et al. [18] proposed a unique IDF technique based on a CNN. The goal of this technique is to automatically learn how image modification could be done. The proposed IDF technique takes image-altering features as input generated after destroying the contents of an image. Since tampering alters some resident associations, this technique focused on examining the local operational association among pixels rather than focusing on the look and feel of the image; it then detects forgery in an image. In another study, Rao et al. [19] proposed a CNN-based architecture for the detection of digital image forgery. They proposed that the first layer of the CNN model is directly involved in the preprocessing stage. It searches for the issues that occur after tampering. They trained the CNN model on trial images, whereas SVM was used for the detection of manipulations. Bi et al. [20] proposed a ringed residual U-Net (RRU-Net) for forgery detection in image slicing. They proposed an architecture where forgery detection is employed using an end-to-end image segmentation network. The goal of the RRU-Net study was to use human brain mechanisms to develop an approach using RRU-Nets, which can detect manipulations without pre-and post-processing. Generally, the human brain works on recall and consolidation mechanisms. Therefore, the purpose of this technique is to optimize the learning capacity of a CNN, which is inspired by human brain attributes. They solved the gradient degradation problem, as residual propagation is used to recall the input feature information in a CNN. Finally, it differentiates between the original and fake regions, as the remaining response is merged with the response feature. The experimental results showed that the proposed technique gave better results compared to the state-of-the-art traditional methods.
In another study, Zhan et al. [21] proposed a transfer learning-based methodology that has the benefit of gaining prior knowledge using the steganalysis model. With this approach, they were able to obtain an average accuracy of 97.36% on the BOSSBase and BOW datasets. Amit et al. [22] proposed a transfer learning-based mechanism that utilizes the pre-trained weights of the AlexNet model, which saves training time. This approach uses SVM as a classifier. The overall performance of the model was satisfactory.
Salloum et al. [23] suggested the use of a multitasking fully connected network. Since a single-task fully connected network has irregular output, the proposed technique performed better compared to the single-task fully connected network. The authors proposed a multitask fully connected network comprising a collection of output streams. One of these streams acquires the surface label, while the interface section edge is acquired by the next one. D. Cozzolino et al. [24] proposed a new technique for the detection of image splicing using a feature-based algorithm. In this technique, the co-occurrence of images is used to compute local features. Those local features are then used to extract feature parameters. Since spliced and host images can exhibit different properties, the expectationmaximization algorithm, together with the segmentation, is used for learning purposes.
In view of the above studies, most of the techniques used for forgery detection are based on handcrafted methods for feature extraction, which are highly dependent on the individual undertaking the task. The development of deep learning-based methods has led to automatic feature extraction. The use of deep learning thus removes possible human errors and increases the efficiency and reduces the time complexity of the model.

Proposed System Architecture
In this study, we proposed a deep learning-based approach for the identification of forged images. We proposed an architecture using ResNet50v2 as our base model, and we used the YOLO CNN weights for transfer learning. This approach enabled us to train the model with meaningful weights. We used pre-trained weights of the YOLO CNN object detection model to initialize our ResNet50v2-based proposed architecture, which saved a considerable amount of training costs, as we initialized our model with meaningful pre-trained weights. Figure 3 presents the basic architecture of ResNet50v2, in which initially batch normalization is performed, followed by an activation function and the weights being updated. Then we performed the batch normalization, ReLU activation function. After the activation function, the weights were optimized. The basic difference from the ResNet50v2 architecture is that we used pre-activation of the weight layers instead of post-activation. ResNet50v2 was developed in such a way that it removes the nonlinearity, hence clearing a path from the input to the output as a means of an identity connection. Version 2 of the ResNet module also applies the batch normalization and the activation function before the weights are multiplied. The overall proposed system is shown in Figure 4.

Transfer Learning
Transfer learning is considered an application of deep learning that enabled us to incorporate the pre-trained weights of an existing model on large data containing hundreds, if not thousands, of classes. Since these weights are pre-trained for large and challenging datasets, a high-end computing machine with GPU is required for this purpose, and it can take days or weeks to train and validate the model. The transfer Since the input and output dimensions are not the same, the residual block function is defined in Equation (1), as follows: In Equation (1), F is the residual block function, x represents the input image, and W represents the pre-trained weights of the YOLO CNN. Since the residual block function does block mapping with zero extra paddings, changing the dimensions, Resnet produces significantly better results.
We also addressed the degradation problem by the utilization of a deep residual learning framework. The desired underlined mapping is fitted into the stack layers. Formally, if we represent underlined mapping as H(x), then the mapping of another nonlinear stacked layer is F(x) := H(x) − x. We determined that optimizing the original is more difficult compared to residual mapping. It is easy to push zero residuals compared to a nonlinear layer stack. The formulation of F(x) − x is considered a shortcut connection. It refers to skipping one or more layers [25]. If identity mappings are generated from added layers, the information will be able to flow through the network, allowing any layer to serve as an original input and reducing training error.

Transfer Learning
Transfer learning is considered an application of deep learning that enabled us to incorporate the pre-trained weights of an existing model on large data containing hundreds, if not thousands, of classes. Since these weights are pre-trained for large and challenging datasets, a high-end computing machine with GPU is required for this purpose, and it can take days or weeks to train and validate the model. The transfer learning approach reduces the cost of training a model from scratch and allows for achieving more accurate results in less time. As stated in [26], transfer learning is an approach by which we could optimize our model, which prevented us from training the model from scratch, and hence, improved the performance. Figure 5 shows the comparison using transfer learning in a CNN versus no transfer learning. From this graph, we can see that at the start, the transfer learning-based deep learning model performed better compared to training a model from the scratch. In this paper, we present a deep learning-based architecture that uses a transfer learning technique to utilize the weights of a YOLO CNN. comparison using transfer learning in a CNN versus no transfer learning. From this g we can see that at the start, the transfer learning-based deep learning model perfo better compared to training a model from the scratch. In this paper, we present a learning-based architecture that uses a transfer learning technique to utilize the we of a YOLO CNN. In our proposed system, ResNet50v2 is used as a basic convolution model, w comprises five stages. With separate convolution and identity blocks, each block con of three convolution layers, and the identity block also has three convolution layers. T are over 23 million parameters that can be trained for the ResNet50v2 model.
We used the CASIA ITDE v1 and v2 datasets for this purpose, which consisted o classes of original and forged images. The dataset was divided into training and te sets. Figure 4 shows the deep learning-based proposed architecture of our prop system. The proposed model takes an input image and uses pre-trained YOLO weights to detect the authenticity of an image.

Dataset
There are some software applications available for the detection of tempered im such as Adobe Photoshop. Since a public standard dataset was not available before C datasets, researchers worked and experimented with their proposed approache limited examples [27][28][29]. Since there was no benchmark dataset, it was very diffic compare the accuracy and effectiveness of a technique. Now, some benchmark dat are available on forgery detection, such as CASIA_v1 and CASIA_v2. In this stud performed our experiments and analysis on two benchmark datasets, CASIA_v1 CASIA_v2 [9].

Preparation of Dataset
Dong et al. [9] collected a dataset and named it the CASIA Image Tamp Evaluation Database. Adobe Photoshop CS3 version 10.0.1 is used to generate all the images for the tampered database. In addition, this dataset contains CASIA_v1 CASIA_v2 for the image tampering detection evaluation database. CASIA_v1 con 1721 color images, and CASIA_v2 contains 12,323 color images. CASIA_v1 only fo on splicing as a tampering technique, and hence, all the tampered images in this da are classified as spliced tampered images. The size of images in the CASIA_v1 datab In our proposed system, ResNet50v2 is used as a basic convolution model, which comprises five stages. With separate convolution and identity blocks, each block consists of three convolution layers, and the identity block also has three convolution layers. There are over 23 million parameters that can be trained for the ResNet50v2 model.
We used the CASIA ITDE v1 and v2 datasets for this purpose, which consisted of two classes of original and forged images. The dataset was divided into training and testing sets. Figure 4 shows the deep learning-based proposed architecture of our proposed system. The proposed model takes an input image and uses pre-trained YOLO CNN weights to detect the authenticity of an image.

Dataset
There are some software applications available for the detection of tempered images such as Adobe Photoshop. Since a public standard dataset was not available before CASIA datasets, researchers worked and experimented with their proposed approaches on limited examples [27][28][29]. Since there was no benchmark dataset, it was very difficult to compare the accuracy and effectiveness of a technique. Now, some benchmark datasets are available on forgery detection, such as CASIA_v1 and CASIA_v2. In this study, we performed our experiments and analysis on two benchmark datasets, CASIA_v1 and CASIA_v2 [9].

Preparation of Dataset
Dong et al. [9] collected a dataset and named it the CASIA Image Tampering Evaluation Database. Adobe Photoshop CS3 version 10.0.1 is used to generate all the color images for the tampered database. In addition, this dataset contains CASIA_v1 and CASIA_v2 for the image tampering detection evaluation database. CASIA_v1 contains 1721 color images, and CASIA_v2 contains 12,323 color images. CASIA_v1 only focuses on splicing as a tampering technique, and hence, all the tampered images in this dataset are classified as spliced tampered images. The size of images in the CASIA_v1 database is fixed as 384 × 256 and they are stored in JPEG format. Apart from that, tampered images in CASIA_v2 are more comprehensive compared to CASIA_v1. We discuss the construction of CASIA_v1 in detail in the following section.

CASIA ITDE v1
The CASIA ITDE v1 dataset is a collection of 1721 color images that are 384*256 pixels in size. The images are in JPEG format, as shown in Table 1. These images are further divided into two sets, as follows:
Authentic set. After the division of the dataset into two subsets, the forged set contained 921 images, whereas there were 800 images in the authentic dataset, so the authentic set contained 46% and the rest belonged to the forged set. We used two sources to generate the authentic dataset. Most of the images were taken from the Corel image dataset [30]. The Corel database is a well-known image database used for the development of many professional applications. Based on the image content, the authentic set contains images of eight types (scene, texture, nature, plant, article, character, animal, and architecture). Since the generation of forged images requires the modification of original images, forged images also contain the eight types mentioned above. The crop and paste tool in Adobe Photoshop was used for the generation of forged sets from authentic images. Figures 6 and 7 show some examples taken from the CASIA_v1 dataset. After generating the spliced image, it is stored using the same filename. Table 2 shows the statistical features of the spliced images.
2. Authentic set. After the division of the dataset into two subsets, the forged set contained 921 images, whereas there were 800 images in the authentic dataset, so the authentic set contained 46% and the rest belonged to the forged set. We used two sources to generate the authentic dataset. Most of the images were taken from the Corel image dataset [30]. The Corel database is a well-known image database used for the development of many professional applications. Based on the image content, the authentic set contains images of eight types (scene, texture, nature, plant, article, character, animal, and architecture). Since the generation of forged images requires the modification of original images, forged images also contain the eight types mentioned above. The crop and paste tool in Adobe Photoshop was used for the generation of forged sets from authentic images. Figures 6  and 7 show some examples taken from the CASIA_v1 dataset. After generating the spliced image, it is stored using the same filename. Table 2 shows the statistical features of the spliced images.    Spliced images are generated based on the following criteria: 1.
Spliced image regions are either generated from the same authentic image or a combination of different authentic images.

2.
Spliced region shapes can be changed and customized using the Adobe Photoshop palette.

3.
Rotation, scaling, and other operations can be applied to cropped images before being added to spliced images.

4.
Spliced regions are generated with different spliced region sizes.

5.
The authentic set also contains texture images since forgery can easily be noticeable with text. Thus, we cropped a random region for texture images.

CASIA ITDE v2
The CASIA_v2 dataset is an extended version of the CASIA_v1 dataset. This dataset contains 12,323 samples and these samples are divided into two subsets.
The tampered set contains 5123 images, whereas the authentic set has 7200 images. The CASIA_v2 dataset is more comprehensive compared to the CASIA_v1 dataset. This dataset contains images with different dimensions, ranging from 320 × 240 to 800 × 600, and contains uncompressed images, which include TIFF and BMP samples. The authentic subset is constructed from the dataset proposed by Corel et al. [30] and a tampered subset was generated after blurring the authentic subset.
The spliced region's edge or any other region can be used with the blurring technique for the generation of tampered images. This is the unique difference between the CASIA_v1 and CASIA_v2 tampered sets. Figures 8 and 9 show examples of tampered images in the CASIA_v2 dataset [9].
The following rules were considered while generating tampered images to make this dataset more comprehensive:

1.
Photoshop was used to define realistic images as close to human vision as possible.

2.
Tampered images were generated either from two different authentic images or from the same authentic image.

3.
Cropped images were further processed with distortion, rotation, and scaling before being inserted to generate a realistic image.
The spliced region's edge or any other region can be used with the blurring t for the generation of tampered images. This is the unique difference betw CASIA_v1 and CASIA_v2 tampered sets. Figures 8 and 9 show examples of t images in the CASIA_v2 dataset [9].  The following rules were considered while generating tampered images to m dataset more comprehensive: 1. Photoshop was used to define realistic images as close to human vision as p 2. Tampered images were generated either from two different authentic image the same authentic image. 3. Cropped images were further processed with distortion, rotation, and scalin being inserted to generate a realistic image.
While generating forged images, different size images are generated for t sets. Table 1 shows the detailed differences between the CASIA_v1 and C datasets.

Dataset Evaluation
A test was designed to evaluate the quality of the generated tampered datas (30) people were given 100 images to identify each image as tampered with or n the naked human eye. They correctly identified the tempered images with an accuracy of 59%, which illustrates that these tampered images are more realistic c to the Colombia uncompressed images dataset. Table 1 shows the compariso CASIA_v1, CASIA_v2, and Columbia (compressed and uncompressed) datasets The spliced region's edge or any other region can be used with the blurring te for the generation of tampered images. This is the unique difference betw CASIA_v1 and CASIA_v2 tampered sets. Figures 8 and 9 show examples of ta images in the CASIA_v2 dataset [9].  The following rules were considered while generating tampered images to m dataset more comprehensive:

Experimental Results and Discussion
1. Photoshop was used to define realistic images as close to human vision as po 2. Tampered images were generated either from two different authentic images the same authentic image. 3. Cropped images were further processed with distortion, rotation, and scalin being inserted to generate a realistic image.
While generating forged images, different size images are generated for ta sets. Table 1 shows the detailed differences between the CASIA_v1 and CA datasets.

Dataset Evaluation
A test was designed to evaluate the quality of the generated tampered datase (30) people were given 100 images to identify each image as tampered with or n the naked human eye. They correctly identified the tempered images with an accuracy of 59%, which illustrates that these tampered images are more realistic co to the Colombia uncompressed images dataset. Table 1 shows the comparison CASIA_v1, CASIA_v2, and Columbia (compressed and uncompressed) datasets. While generating forged images, different size images are generated for tampered sets. Table 1 shows the detailed differences between the CASIA_v1 and CASIA_v2 datasets.

Dataset Evaluation
A test was designed to evaluate the quality of the generated tampered dataset. Thirty (30) people were given 100 images to identify each image as tampered with or not using the naked human eye. They correctly identified the tempered images with an average accuracy of 59%, which illustrates that these tampered images are more realistic compared to the Colombia uncompressed images dataset. Table 1 shows the comparison of the CASIA_v1, CASIA_v2, and Columbia (compressed and uncompressed) datasets.

Experimental Results and Discussion
In order to evaluate the proposed system, we conducted different experiments to demonstrate and evaluate the effectiveness of the proposed deep learning-based approach. We also compared our proposed technique with several generic tampering detection techniques with different publicly available datasets.
We performed all the experiments on the publicly available CASIA_v1 and CASIA_v2 datasets. Table 3 shows the system specifications of the hardware that was used for conducting the experiments. Each dataset contains copy/move and splicing images. CASIA_v2 is a more comprehensive and challenging dataset compared to the CASIA_v1 dataset because it contains images of different sizes and formats.

System Specification
We used the Windows 10 operating system and ASUS ROG 702 VM for conducting the experiments. All the system details are provided briefly in Table 3.

Results Discussion and Comparison
We compared the convergence performance of the proposed deep learning-based approach with and without transfer learning. Figure 10 shows the comparison of results of the CASIA_v1 and CASIA_v2 datasets with transfer learning. The blue color represents the accuracy per epoch for the CASIA_v1 dataset, whereas the orange color represents the accuracy for the CASIA_v2 dataset. A total of 25 epochs were executed to check the effectiveness of our proposed technique. After the execution of the first epoch, we saw a strong positive trend for both datasets. We achieved an accuracy of 99.3% with CASIA_v2, and 81% with the CASIA_v1 dataset. Since CASIA_v2 is considered to be a more comprehensive and benchmark dataset, the experimental results obtained with the CASIA_v2 dataset are good compared to those from the CASIA_v1 dataset. The equation for calculating the accuracy is given in Equation (2).
We performed all the experiments on the publicly available CASIA_v1 and CASIA_v2 datasets. Table 3 shows the system specifications of the hardware that was used for conducting the experiments. Each dataset contains copy/move and splicing images. CASIA_v2 is a more comprehensive and challenging dataset compared to the CASIA_v1 dataset because it contains images of different sizes and formats.

System Specification
We used the Windows 10 operating system and ASUS ROG 702 VM for conducting the experiments. All the system details are provided briefly in Table 3.

Results Discussion and Comparison
We compared the convergence performance of the proposed deep learning-based approach with and without transfer learning. Figure 10 shows the comparison of results of the CASIA_v1 and CASIA_v2 datasets with transfer learning. The blue color represents the accuracy per epoch for the CASIA_v1 dataset, whereas the orange color represents the accuracy for the CASIA_v2 dataset. A total of 25 epochs were executed to check the effectiveness of our proposed technique. After the execution of the first epoch, we saw a strong positive trend for both datasets. We achieved an accuracy of 99.3% with CASIA_v2, and 81% with the CASIA_v1 dataset. Since CASIA_v2 is considered to be a more comprehensive and benchmark dataset, the experimental results obtained with the CASIA_v2 dataset are good compared to those from the CASIA_v1 dataset. The equation for calculating the accuracy is given in Equation (2).  The comparison of the results of our proposed architecture without using transfer learning is shown in Figure 11. This approach had adverse effects on our proposed system architecture, as the model learned from scratch with randomly initialized weights. Using random initialization for weights tends to increase the complexity, cost, and training time. The random initialization is non-meaningful, so the model had to do a lot of work to update the weights. On the other hand, the use of transfer learning enabled us to use pre-trained weights that are meaningful. The model did not have to be trained from scratch, thus decreasing the training time, cost, complexity, and increasing the accuracy. In our proposed deep learning-based architecture, we used the pre-trained YOLO CNN weights. You Only Look Once (YOLO) is a real-time, state-of-the-art object detection system that is trained on thousands of different objects. Figures 10 and 11 explain the performance evaluation of the proposed technique with and without transfer learning, respectively. We achieved an accuracy of 80% without the use of transfer learning in the CASIA_v2 dataset, while we obtained an accuracy of 69.3% for the CASIA_v1 dataset. random initialization for weights tends to increase the complexity, cost, and tr The random initialization is non-meaningful, so the model had to do a lot update the weights. On the other hand, the use of transfer learning enabled us trained weights that are meaningful. The model did not have to be trained fr thus decreasing the training time, cost, complexity, and increasing the accur proposed deep learning-based architecture, we used the pre-trained YOLO CN You Only Look Once (YOLO) is a real-time, state-of-the-art object detection sy trained on thousands of different objects. Figures 10 and 11 explain the p evaluation of the proposed technique with and without transfer learning, respe achieved an accuracy of 80% without the use of transfer learning in the CASIA while we obtained an accuracy of 69.3% for the CASIA_v1 dataset. Figure 11. Testing process visualization without transfer learning. Figures 12 and 13 present the comparison of the losses between transfer le without transfer learning among the two datasets. An increase in loss is seen w not use the pre-trained weights of the YOLO CNN model. The loss was slight the CASIA_v1 dataset compared to CASIA_v2. Furthermore, we compared ou method with existing architectures, which can be seen in Table 4, which prese of tampering targeted, along with the methodology used to detect the tamperi list the advantages and disadvantages of the respective architectures and the accuracies.   Figures 12 and 13 present the comparison of the losses between transfer learning and without transfer learning among the two datasets. An increase in loss is seen when we did not use the pre-trained weights of the YOLO CNN model. The loss was slightly higher in the CASIA_v1 dataset compared to CASIA_v2. Furthermore, we compared our proposed method with existing architectures, which can be seen in Table 4, which presents the type of tampering targeted, along with the methodology used to detect the tampering. We also list the advantages and disadvantages of the respective architectures and their obtained accuracies.
update the weights. On the other hand, the use of transfer learning enabled us trained weights that are meaningful. The model did not have to be trained fr thus decreasing the training time, cost, complexity, and increasing the accur proposed deep learning-based architecture, we used the pre-trained YOLO CN You Only Look Once (YOLO) is a real-time, state-of-the-art object detection sy trained on thousands of different objects. Figures 10 and 11 explain the p evaluation of the proposed technique with and without transfer learning, respe achieved an accuracy of 80% without the use of transfer learning in the CASIA while we obtained an accuracy of 69.3% for the CASIA_v1 dataset. Figure 11. Testing process visualization without transfer learning. Figures 12 and 13 present the comparison of the losses between transfer le without transfer learning among the two datasets. An increase in loss is seen w not use the pre-trained weights of the YOLO CNN model. The loss was slight the CASIA_v1 dataset compared to CASIA_v2. Furthermore, we compared ou method with existing architectures, which can be seen in Table 4, which prese of tampering targeted, along with the methodology used to detect the tamperi list the advantages and disadvantages of the respective architectures and the accuracies.  Cross-validation is a technique used to divide the data into a given num and train the model on each set. Table 4 shows the evaluations and performa folds. The average accuracy obtained was 99.33% and the average loss According to the comparative analysis with the methodologies mentioned in T proposed architecture performed better compared to the traditional tamperin techniques. Better performance compared to other techniques.
9 Figure 13. Comparison between losses with CASIA_v1 dataset. Cross-validation is a technique used to divide the data into a given number of sets and train the model on each set. Table 4 shows the evaluations and performance of five folds. The average accuracy obtained was 99.33% and the average loss was 0.076. According to the comparative analysis with the methodologies mentioned in Table 5, the proposed architecture performed better compared to the traditional tampering detection techniques.

Future Work
Forgery detection is an ever-growing problem that needs constant improvement of the mechanisms used to detect tampered images. There are multiple techniques used for creating forged images, such as copy/paste, lighting conditions, image splicing, and retouching. In this research, we are focusing on the detection of the images that have been spliced. In the future, our proposed technique can be extended to the detection of multiple types of forged images and can be tested on multiple datasets as well. Furthermore, this work can be taken forward specifically to improve the effectiveness of multiple types of forgery detection by a single model.

Conclusions
Image forgery detection is a very challenging problem. In this era of technological advancement, we need to be able to distinguish between real and tampered images. In this study, we proposed a deep learning-based approach for image forgery detection. The proposed model is based on ResNet50v2 architecture, which uses residual layers; thus, using this architecture increases the detection rate of tampered images. Using this approach also provides the benefit of transfer learning by using the pre-trained weights of the YOLO CNN model. The use of transfer learning enabled us to train our model more efficiently, as we initialized our proposed model by meaningful assigning weights. This reduced the training time and complexity of the model and makes the architecture more efficient. We evaluated our proposed architecture on benchmark datasets, CASIA_v1 and CASIA_v2. We also compared the performance of our system with and without the use of transfer learning. We obtained an accuracy of 99.30% with the CASIA_v2 dataset for the forgery detection problem. The results of the comparison with the existing methods show the superiority of the proposed system. The proposed system will help in the image manipulation detection domain and also paves the way for future research in detecting multiple types of image forgery manipulations.