Skip to Content
MathematicsMathematics
  • Article
  • Open Access

3 November 2023

General Image Manipulation Detection Using Feature Engineering and a Deep Feed-Forward Neural Network

,
,
,
and
1
School of Computer Science Engineering, VIT Bhopal University, Bhopal-Indore Highway, Kothrikalan, Sehore 466114, Madhya Pradesh, India
2
Department of Industrial & System Engineering, Dongguk University, Seoul 04620, Republic of Korea
3
Department of Computer Science Engineering, National Institute of Technology Srinagar, Srinagar 190001, Jammu and Kashmir, India
4
Department of AI and Big Data, Woosong University, Seoul 34606, Republic of Korea

Abstract

Within digital forensics, a notable emphasis is placed on the detection of the application of fundamental image-editing operators, including but not limited to median filters, average filters, contrast enhancement, resampling, and various other operations closely associated with these techniques. When conducting a historical analysis of an image that has potentially undergone various modifications in the past, it is a logical initial approach to search for alterations made by fundamental operators. This paper presents the development of a deep-learning-based system designed for the purpose of detecting fundamental manipulation operations. The research involved training a multilayer perceptron using a feature set of 36 dimensions derived from the gray-level co-occurrence matrix, gray-level run-length matrix, and normalized streak area. The system detected median filtering, mean filtering, the introduction of additive white Gaussian noise, and the application of JPEG compression in digital Images. Our system, which utilizes a multilayer perceptron trained with a 36-feature set, achieved an accuracy of 99.46% and outperformed state-of-the-art deep-learning-based solutions, which achieved an accuracy of 97.89%.

1. Introduction

The creation of multimedia content such as digital images and videos for several platforms is comparatively easy in the current environment. Multimedia security challenges have multiplied significantly as a result of the general availability of computing gear and software, as well as the ease with which digital information can be created and altered. Determining the authenticity of digital content that originates from an illegitimate or unknown source could be challenging. Such digital content must first have its validity confirmed before consumption. An important form of digital contents are digital images [1]. A digital image is a representation of a two-dimensional visual scene, object, or subject in electronic form. It is a collection of individual picture elements or “pixels”, each of which is a tiny square or dot that contains gray-value or color information. These pixels are organized in a grid, with each pixel having a specific color or gray value, which can be displayed on a screen or printed on paper. Even though digital images are a crucial sort of digital content, it is quite challenging to confirm their legitimacy [2,3].
The authenticity of digital photos is becoming more questioned with the introduction of modern image processing tools and the ease with which information is shared and altered. As a result, there is a growing preference for blind image forensic techniques.
Digital image forensics is a specialized field within digital forensics that focuses on the analysis and authentication of digital images to determine their origin and integrity, as well as the presence of any alterations or forgeries. It involves using various techniques and tools to examine digital images for signs of manipulation, tampering, or other forms of digital deception. Digital image forensics experts employ methods such as metadata analysis, image compression analysis, noise patterns, and error-level analysis to uncover inconsistencies and anomalies in images. This discipline is crucial in a world where digital images play a significant role in both legal and non-legal contexts, ensuring the credibility and trustworthiness of visual information in domains like criminal investigations, journalism, and the verification of digital evidence. Forensic examiners benefit from being able to observe the background of how much a digital image has been processed.
Digital image forensics aims to restore trust in digital image forensics (DIF). DIF confirms an image’s legitimacy using image-editing fingerprints, and no prior knowledge-based techniques, such as watermarking, are needed [4].
General-purpose image manipulation operations involve the application of sets of operations that do not change the semantics or meaning of the images. Rather, they are used to remove traces left by other operations, making the detection of certain operators difficult. Various modifications, including median filtering, resampling, JPEG compression, and contrast enhancement, are on the list of general-purpose image manipulation and must be detected as part of the digital image forensics process [5,6].
In digital image forensics, inherent characteristic signatures left behind by image-editing methods are used to detect image changes, and the same process is applied for detection of general-purpose image manipulation operations performed on images. Detecting general-purpose image manipulation operators is a reasonable initial step in investigating the processing history of an image that may have gone through several transformations. Image forgers make use of these fundamental operators, such as median filters, which are intrinsically nonlinear. This allows them to remove any traces of evidence that may have been left behind by linear operations carried out on the images. Furthermore, in the fields of watermarking and steganography, the image’s history is also important [4,7]. The research literature offers a variety of techniques for detecting fundamental operators applied to digital images. In most cases, these methods build techniques to detect basic operators on an individual basis. In contrast, comparatively little effort is put into the design of procedures that are effective in the detection of numerous operators.
The main contributions of our work are summarized as follows:
  • We present a method of undertaking image classification for the purpose of image forensics by utilizing an existing body of domain knowledge called feature engineering.
  • We developed a 36-dimension feature vector based on texture features for general-purpose image manipulation detection.
  • We designed a system in which we replaced a CNN-based solution with an MLP-based solution. The MLP-based solution was found to perform better than the state-of-the-art methods.
  • Furthermore, we propose GIMP-FCN, a multilayer perceptron (MLP) consisting of fully connected layers followed by activation layers that accept texture-based features for further learning from features and ultimately performs general-purpose image manipulation detection.
  • The performance of our approach is superior to that of the most recent and cutting-edge method.
  • Our work shows that a multilayer perceptron in combination with feature engineering can be employed for digital image forensics.

3. The Proposed Method

In this section, we describe the proposed feature vector and the deep fully connected network developed for general-purpose image operation detection. The texture-based features are described in Section 3.1, and the proposed deep fully feed-forward neural network is described in Section 3.2, as shown in Figure 1. Figure 2 shows how training of the system is performed. The GLCM-, GLRLM-, and streak-area-based feature extractors extract features from the training images. These features are then used to train the proposed model depicted in Figure 1. The trained model is then fed features extracted from testing images, and the image manipulator operator is detected.
Figure 1. Neural network architecture: The feature vector that was extracted from the dataset images is accepted by the input feature layer. The elu activation layer comes after the previous two layers, which are each fully connected layers with a width of 100. The elu layer comes after two layers, each with a width of 80. A tanh activation layer is then followed by a group of four fully connected layers, each with a width of 36. Then, a set of four layers with a width of 25 follows, each followed by a tanh activation layer, except for the fourth-last layer. The total number of classes determines the final fully connected layer width. The classification process is then completed using a softmax layer.
Figure 2. Training of the proposed deep fully connected neural network.

3.1. Proposed Features

We designed the feature set for our purposes based on texture-based features. The Maximum Relevance Minimum Redundancy (MRMR) algorithm was applied to the texture features, and the 36 top-ranked features were selected. A total of 22 features that use GLCM for feature extraction were selected, as described in Section 3.1.1, with 11 GLRLM-based features selected, as described in Section 3.1.2. We also developed three streak-area analysis-based novel features, inspired by [34], as described in detail in Section 3.1.3. The final feature (f) was generated by concatenating the three sets of features extracted from the GLCM, GLRLM, and normalized streak area of the images.
f v f i n a l = f G L C M , f G L R L M , f n s a

3.1.1. GLCM-Based Features

The gray-level co-occurrence matrix (GLCM) was first suggested by Haralick [35] in 1979 for the purpose of interpreting satellite images. It is one of the most researched and often used generic methodologies for texture analysis, and it has recently attracted the attention of a number of research organizations. Second-order statistics are taken into consideration in the GLCM. This technique studies pairs of pixels that are in certain spatial relationships to one another. There are certain advantages to the GLCM technique, but there are some disadvantages as well, such as the high dimensionality of the matrix. For this reason, a collection of features is often retrieved from the GLCM matrix for use in a variety of image processing applications. For our study, 14 GLCM-based features were taken from the original work of Haralick et al. [35], with 4 features from Soh et al. [36] and 4 features from Clausi et al. [37]. Let p ( i , j ) be the ( i , j ) t h entry in a normalized GLCM. The feature set of 22, F1–F22, features calculated from the GLCM is outlines as follows:
  • Energy, (angular second moment):
    F 1 = i = 1 N g 1 j 0 N g 1 p ( i , j ) 2
  • Contrast:
    F 2 = n = 0 N g 1 n 2 i = 1 N g j = 1 N g p ( i , j ) , | i j | = n
  • Correlation:
    F 3 = i = 1 N g j = 1 N g ( i j ) p ( i , j ) μ x μ y σ x σ y
    where μ x , μ y , σ x , and σ y are the mean and standard deviations of p x and p y , respectively.
  • Sum of squares (variance):
    F 4 = i = 1 N g j = 1 N g ( i μ ) 2 p ( i , j )
  • Inverse difference moment:
    F 5 = i = 1 N g j = 1 N g 1 1 + ( i j ) 2 p ( i , j )
  • Sum average:
    F 6 = i = 2 2 N g i p x + y ( i )
  • Sum variance:
    F 7 = i = 2 2 N g ( 1 f 8 ) 2 p x + y ( i )
  • Sum entropy:
    F 8 = i = 2 2 N g p x + y ( i ) l o g ( p x + y ( i ) )
  • Entropy:
    F 9 = i N g j N g p ( i , j ) l o g ( p ( i , j ) )
  • Difference variance:
    F 10 = variance of p x y
  • Difference entropy:
    F 11 = i = 0 N g 1 p x y ( i ) l o g ( p x y ( i ) )
  • Information measure of correlation 1:
    F 12 = H X Y H X Y 1 m a x { H X , H Y }
  • Information measure of correlation 2:
    F 13 = ( 1 e x p [ 2.0 ( H X Y 2 H X Y ) ] ) 1 / 2
    where HY and HY are the entropies of p x and p y , respectively, and
    H X Y 1 = i j p ( i , j ) l o g ( p x ( i ) p y ( j ) )
    H X Y 2 = i j p x ( i ) p y ( j ) l o g ( p x ( i ) p y ( j ) )
  • Maximal correlation coefficient (MCC):
    F 14 = ( Sec ond - Largest Eigenvalue of Q ) 1 / 2
    where
    Q ( i , j ) = k p ( i , k ) p ( j , k ) p x ( i ) p y ( k )
  • Homogeneity/inverse difference moment:
    F 15 = i j 1 1 + ( i j ) 2 p ( i , j )
  • Autocorrelation:
    F 16 = i j ( i j ) p ( i , j )
  • Dis-similarity:
    F 17 = i j | i j | p ( i , j )
  • Cluster shade:
    F 18 = i j ( i + j μ x μ y ) 3 p ( i , j )
  • Cluster prominence:
    F 19 = i j ( i + j μ x μ y ) 4 p ( i , j )
  • Maximum probability:
    F 20 = M A X ( i , j ) { p ( i , j ) }
  • Inverse difference normalized (INN):
    F 21 = C i j 1 + | i j |
  • Inverse difference moment normalized (IDN):
    F 22 = C ( i , j ) 1 + | i j | 2
where C ( i , j ) = P ( i , j ) i , j = 1 N g P ( i , j ) .

3.1.2. GLRLM-Based Features

Galloway [38] first presented the gray-level run-length matrix (GLRLM)-based technique, which is a statistical approach to texture analysis. There is a vast collection of features based on the GLRLM that have been proposed in the research literature. These features are based on the properties of the gray-level runs that are present in the image. Coarse textures include several adjacent pixels with the same gray level, which is the basis for the GLRLM concept. Fine textures, on the other hand, are defined by a few pixels that are next to each other and have the same gray level. The GLRLM technique has been used in a variety of applications, as described in [39].
The 11-feature set denoted by f g l r l m was selected for image forensics purposes, with the features extracted as described in [40] and as defined in the equations below.
A run-length matrix, denoted by the symbol p , where p ( i , j ) , is the number of runs that contain pixels with a of gray level i and a run length of j. A run-length matrix with dimensions of MxNcan then be used to extract a variety of other attributes with respect to the texture. The first five f1–f5 characteristics of run-length statistics derived by Galloway [41], f6–f7 were proposed by Chu et al. [42] to extract more gray-level information from the matrix. According to the concept of a joint statistical measure of gray level and run length, Dasarathy and Holder [43] presented another four feature extraction functions: f8 through f11.
The final 11-dimensional feature vector used in our study is described mathematically in [40] as follows:
  • Short-run emphasis (SRE):
    f 1 = 1 n r i = 1 M j = 1 N p ( i , j ) j 2
  • Long-run emphasis (LRE):
    f 2 = 1 n r i = 1 M j = 1 N p ( i , j ) . j 2
  • Gray-Level non-uniformity (GLN):
    f 3 = 1 n r i = 1 M j = 1 N p ( i , j ) 2
  • Run-length non-uniformity (RLN):
    f 4 = 1 n r j = 1 N i = 1 M p ( i , j ) 2
  • Run percentage (RP):
    f 5 = n r n p
  • Low gray-level run emphasis (LGRE):
    f 6 = 1 n r i = 1 M j = 1 N p ( i , j ) i 2
  • High gray-level run emphasis (HGRE):
    f 7 = 1 n r i = 1 M j = 1 N p ( i , j ) . i 2
  • Short-run, low-gray-level run emphasis (SRLGE):
    f 8 = 1 n r i = 1 M j = 1 N p ( i , j ) i 2 . j 2
  • Short-run, high-gray-level run emphasis (SRHGE):
    f 9 = 1 n r i = 1 M j = 1 N p ( i , j ) . i 2 j 2
  • Long-run, low-gray-level run emphasis (LRLGE):
    f 10 = 1 n r i = 1 M j = 1 N p ( i , j ) . j 2 i 2
  • Long-run, high-gray-level run emphasis:
    f 11 = 1 n r i = 1 M j = 1 N p ( i , j ) . j 2 . i 2
where n r is the total number of streaks, and n p is the number of pixels in the image.

3.1.3. Normalized Streak-Area-Based Feature

A streak is a series of pixels with the same or almost the same intensity value that appear consecutively. If there is a run length in the picture of exactly n consecutive pixels that all have the same intensity values, then the image contains a streak of length n. The amount of streaking that is present in an image after it has been median-filtered is noticeably different from the amount of streaking that was present in the image before it was median-filtered. The streaking effect was quantified in [34] and further improved in [44] for differentiation between median-filtered and unfiltered images. The last three features investigated in this work are inspired by [44].
Let I be a digital image in gray-scale mode with dimensions of M x N . The total number of pixels in the image is A = M x N .
Let ξ ( j ) represent the number of horizontal streaks with a pixel length of j that are present in the images (I) when measured from left to right. ζ ( I ) is the sum of pixels involved in the row-wise streaks in the image (I) and can be written as ζ ( I ) = j = 2 N j ξ ( j ) . For image I, the normalized streak area measured from left to right ( n s a ( I ) ) is expressed as
η ( I ) = ζ ( I ) A
In a similar manner, the normalized column-wise streak is expressed as
η ( I ) = ζ ( I ) A
Similarly, the normalized diagonal streak is expressed as
η ( I ) = ζ ( I ) A
Finally, a three-dimensional feature vector is extracted by applying Equations (35)–(37) as follows:
f n s a = η ( I ) , η ( I ) , η ( I )

3.2. Neural Network Architecture

We developed a deep feed-forward network using fully connected layers with an activation function at appropriate places, with a final fully connected layer with an output size of two for binary classification and five for multiclass classification and a final softmax layer for the classification task. The input layer takes input feature vector; in our case, the size of the input layer was configured to accept 36 features. The input is preprocessed by applying z-score normalization as shown to improve the performance of machine learning methods [45]. Z-score normalization refers to the process of adjusting each value in a dataset such that the mean of all of the values is equal to zero and the standard deviation is equal to one. In a mathematical sense, the z-score modification of data is applied to each and every feature vector. For data with μ as a mean and σ as the standard deviation, after z-score normalization of the input feature ( f v ), the output feature ( z f v ) is expressed as:
z f v = f v μ σ
The first fully connected layer of the neural network is connected to the network input, and each layer after that is fully connected to the layer before it. Following the multiplication of the input by a weight matrix in each fully connected layer, a bias vector is added. After each fully connected layer, an activation layer is applied. No activation layer is used before the final fully connected layer. Subsequently, the softmax activation function produces the classification scores.
We designed our neural network by considering several parameters such as the number of fully connected layers from 1 to 100; the activation function was searched among ‘relulayer’, ‘tanhlayer’, ‘sigmoidlayer’, ‘swishlayer’, ‘elulayer’, ‘gelulayer’, and ‘none’. A detailed survey of different activation-layer functions can be found in [46]. The width of each fully connected layer was searched from 10 through 300.
The three different initial layer weights were adopted from [47,48,49]. Initial layer biases were searched from ‘zero’ and ‘one’. The maximum number of training iterations was kept as 8000, and loss tolerance was kept at 10 8 . The learning rate was optimized in the range of { 0.1 , 0.01 , 0.001 , 0.0001 , 0.00001 } . Every network was trained for over 80 epochs. It is the responsibility of optimization algorithms or techniques to reduce losses and offer the most accurate outcomes possible. We used the adaptive moment estimation (ADAM) optimization solver for our problem. The summary of parameters for designing MLP is provided in Table 3. The optimized neural network is shown in Figure 1.
Table 3. Search space for the design of an MLP.
The final optimized neural network contained 12 fully connected layers with different input sizes. The ‘elu’ activation layer was the best choice for the activation function for the first four layers; for the next twelve fully connected layers, the ‘tanh’ layer was used as the activation function, with initially orthogonal weights [49] for fully connected layers, the number of initial layer biases set to zero, and the optimal learning rate found to be 0.0001. The processed features can than be used for general-purpose image manipulation detection by applying an appropriate classification layer, as show in Figure 1.

4. Experimental Setup

In order to test the performance of the proposed method in the identification of various image processing activities, we performed an extensive set of experiments. Standard image datasets UCID [50], BOSSBass [51], RAISE [52], and the Dresden image dataset (DID) [53] were used to generate various training and testing sets for various experiments.
D S o r i g = { U C I D , R A I S E , B O S S B a s s , D I D }
A frequently used dataset called UCID [50] (Uncompressed Color Image Datasets) contains 1338 colored images with resolutions of 512 × 384 and 384 × 512. Images can be used as a base to create testing and training datasets for the benchmarking of detectors on uncompressed image datasets, from which additional processed datasets can be generated. The main feature of UCID is that images are in their uncompressed state. The UCID dataset, which consists of images in the TIFF format, was created initially for content-based image retrieval (CBIR). It is now used by a very wide range of image-based algorithms and is one of the primary datasets on which researchers test operator detectors.
Released in May 2011, the BOSS base 1.1 [51] dataset (Break Our Stenographic System) consists of 10,000 uncompressed 512 × 512-resolution images from the BOSS competition that were taken by seven different cameras. The images in the dataset were produced from color, full-resolution RAW images. The BOSS dataset has also been updated in the past. With CNN-based techniques, the BOSSbase dataset is more widely used.
The Dresden Image Dataset was initially created for camera-based digital forensic methods. It is made up of over 14,000 photos taken with roughly 73 different cameras. Images from many different scenarios can be found in the dataset.
1388 images with dimensions of 512 × 386 from UCID, 10 , 000 images with dimensions of 512 × 512 from the BOSSbase, 1448 images of varying dimensions from DID, and 4000 images from the RAISE dataset were used to set up a total count of 16 , 836 images. A total of 16 , 000 images of varying sizes were thus selected to construct D S o r i g . The original image set was then used as a base for the generation of various training and testing datasets. All images in D S o r i g were cropped to extract multiple image patches with dimensions of 256 × 256 to create D S o r i g 256 . The large images in datasets such as DID were cropped from the center, and multiple non-overlapped images with dimensions of 256 × 256 were extracted for dataset generation, with small image datasets such as UCID and BOSSbase contributing one or two image patches.
Gray-scale conversion was performed on all colored images as per Rec.ITU-R BT.601-7 [54], grouping together a weighted average of the red(R), green(G), and blue(B) components as follows:
g r a y v a l u e = 0.2989 R + 0.5870 G + 0.1140 B

Dataset Generation

The D S o r i g 256 was used to generate datasets for this study. To construct datasets for individual operations such as the median-filtered image dataset ( D S m f w ), window sizes of w, = { 3 , 5 , 7 } were employed to filter the D S o r i g 256 images, generating three different datasets ( D S m f 3 , D S m f 5 , D S m f 7 ). Similarly, the additive white Gaussian noise (AWGN), denoted by the D S A W G N σ dataset, was created by setting σ = { 0.1 , 0.6 , 1.2 , 1.8 } . The JPEG-compressed dataset ( D S J P E G Q F ) was created by compressing D S o r i g 256 with a JPEG compression quality factor of Q F = { 30 , 50 , 70 , 90 } . A mean filter datasets ( D S M e a n F w ) was created by mean filtering each image in D S o r i g 256 using a filter window with dimensions of w = { 3 × 3 , 5 × 5 , 7 × 7 } . The Figure 3 shows images in various datasets generated for the study. The Figure 3a shows image from original gray scale image dataset. The Figure 3b shows the same image median filtered with filter window size of 3 × 3. The Figure 3c shows the image compressed with JPEG compression. The Figure 3d shows the image with added AWGN noise. The Figure 3e shows the images mean filtered image. Table 4 summarizes the parameters used for dataset generation.
Figure 3. Sample images from the dataset used in this study: (a) original image; (b) median-filtered image; (c) JPEG-compressed image; (d) AWGN-added image; (e) mean-filtered image.
Table 4. Dataset generation parameters for experimentation.
Finally, results are reported in terms of parameters defined in Equations (42)–(49) as follows:
Accuracy is defined as
A c c u r a c y = T P + T N T P + T N + F P + F N .
Recall is defined as
R e c a l l = T P T P + F N .
Specificity is defined as
S p e c i f i c i t y = T N F P + T N .
Precision is defined as
P r e c i s i o n = T P T P + F P .
The false-positive rate (FPR) is defined as
F P R = F P F P + T N .
The F1 score is defined as
F 1 s c o r e = 2 T P 2 T P + F P + F N
The Error, miss classification error, is defined as
E r r o r = F P + F N F P + F N + T P + T N .
The Matthews correlation coefficient (MCC) is defined as
M C C = ( T P T N F P F N ) ( ( T P + F P ) ( T P + F N ) ( T N + F P ) ( T N + F N ) ) .
The Matthews correlation coefficient (MCC) can have values between −1 and 1, with −1 being the lowest and 1 being the highest. A value of −1 means that the predicted classes and the actual classes are completely different. A value of 0 means that the guessing was totally random, and a value of 1 means that the predicted classes and the actual classes are exactly the same. The MCC is a more reliable statistical rate that only yields a high score if the prediction was correct in all four of the confusion matrix categories [55].
In the equations shown above, the notation TP represents for the number of true positives, TN refers to the number of true negatives, FP stands for the number of false positives, and FN stands for the number of false negatives.
We compared our work with two very significant works in the state of the art: those reported by Bayers [14] and Rana [17]. Bayers work was implemented with network was trained as described by the author. Rana’s [17] work was also implemented and simulated; for a better comparison, we used the same dataset described in Equation (40).
We implemented the experiments using Matlab 2021 [56] on a system with an Intel Core -i7 and a Nvidia GeForce GTX 1080 GPU graphics processing unit (GPU) with 8 GB of dedicated memory and 16 GB of RAM. The deep learning toolbox [57] was employed to design the networks, and the Experiment Manager app [58] was used to manage the experiments and for thorough testing of the models.

5. Results and Discussion

The state-of-the-art methods are based on deep learning approaches and are dominated by one particular type of deep learning model called convolutional neural networks (CNNs). Table 1 and Table 2 provide a summary of such methods. The problem with any deep-learning-based solution is that it takes a large amount of data to outperform other solutions. Because such methods require knowledge of the topology, training method, and other characteristics, there is no universally accepted theory that can be applied to the selection of appropriate tools for deep learning. Therefore, it is challenging for those with lower levels of deep neural network design knowledge to adapt and design neural network models. In addition, due to the complexity of the data models, training may be extremely computationally expensive. This is true in terms of both the time and effort required to research and make an acceptable selection of various deep learning model parameters, as well as the quantity of computation that is necessary. When compared to deep learning methods, the application of techniques such as feature engineering, which require significantly less effort on the part of the user, gives domain specialists a significant advantage in their ability to find answers faster. When compared with systems based on deep learning, it is much simpler to analyze and comprehend the relationship between the inputs and outputs of the system. In our proposed method, we combined the best of deep learning and feature engineering methods. We developed a way to detect general image alteration operations, using the domain knowledge gained from working in the field of image forensics to create a solution using deep learning for automatic extraction of classification information from the features extracted from the images. We applied domain knowledge in the field of image forensics to engineer features and developed a neural network to detect general image manipulation operations.
The Figure 4 shows the strategy for testing of the proposed model for single and multiple manipulation operation detection.
Figure 4. Implementation of the operator manipulation identifier using the proposed deep fully connected neural network.

5.1. Single-Manipulation Detection

The detection of the use of a single operator is very important. We trained our model for detection classification, whether an image is original or modified using, one of the operators investigated in this study. Single-manipulation detection involves the binary classification of original images and images tampered with by the application an operator. We trained our model for binary classification for the detection of whether an image is original or modified using one of the operators investigated in this study. To detect single-operator modification, we created datasets from original image patches and operator datasets. All binary classifications were performed with original images vs. operator datasets, i.e., { D S m f 3 , D S m f 5 , M e a n F 3 , M e a n F 5 , A W G N 1.2 , A W G N 1.8 , R S 1.4 , R S 1.4 , J P E G 70 , J P E G 80 , and J P E G 90 }. Next, the image features described in Section 3.1 were extracted from the corresponding image datasets. Finally, training, validation, and testing sets were generated for each original vs. operation binary classification, with 70% of data used for training, 10% for validation, and the last 20% used for testing purposes. The deep neural network described in Section 3.2 and Figure 1 was configured by setting up a last fully connected layer with a width of two and a ‘softmax’ layer for binary classification. The input layer was a feature input layer that accepts a feature vector with a length of 36. The hidden network contained 12 fully connected layers, each followed by an activation layer. The first four fully connected layers were followed by an ‘ elu’ activation layer, and the final seven fully connected layers were followed by a ‘tanh’ activation layer. The 12th fully connected layer was not followed by any activation layer. Each network was trained for 80 epochs. The initial layer weight was adopted from Xavier [47], with initial layer biases set to ‘zero’ and the optimal learning rate value found to be 0.0001. Figure 1 shows the proposed neural network architecture. The Adam optimization solver, a popular extension of stochastic gradient descent, was used to train our deep neural network model. Table 5 shows the results obtained using the proposed method. The first column of Table 6 describes the accuracy obtained when binary classification of original vs. median-filtered images with a filter size of 3 × 3 was performed. For comparison, the transfer learning models provided by Bayers were utilized. The results are presented in the form of a confusion matrix, as well as crucial assessment criteria for machine learning algorithms. Our combined approach of feature engineering and the application of a deep neural network design outperformed the approach proposed by Rana [17] for single-operation detection, as evident from Table 5.
Table 5. Testing accuracy for single-operator detection.
Table 6. Image datasets used in the study.

5.2. Multiple-Manipulation Detection

Finding the precise technique that was used to change an image is a key challenge in the field of forensics because it might be difficult to determine which procedure was used. When it comes to operator application, this can be a difficult task due to the fact that several other procedures leave behind forensic traces that are quite similar to one another and can make it difficult to differentiate between them. Multiple operators must be identified by the trained model in order for multiple-manipulation detection to be effective. Such a multiple-manipulation detector is capable of identifying various operators. To accomplish this, we used a multiclass classification to enable our model to distinguish between the original images, median-filtered images, mean-filtered images, JPEG-compressed images, and images with AWGN added.
In order to verify the efficacy of our technique for general-purpose image operation detection, we produced a large dataset consisting of 16,000 images of varying sizes from our original dataset, as denoted by D S o r i g . Only one or two patches were taken from smaller images, such as those produced by UCID and BOSSbase, which were then clipped from the image’s central region. For large image datasets such as Dresden and RAISE, images with dimensions of 8 10 256 × 256 were cropped relative to the image’s center. The number of images cropped from a single image depended on the size of the image. The name given to this dataset was D S o r i g 256 . The entire D S o r i g 256 dataset, which comprises 30,000 randomly selected image patches measuring 256 × 256 pixels, was then subjected to processing in order to generate datasets using the five distinct manipulations detailed in Table 4and Section 4, resulting in a total of 150,000 image patches being produced. To generate the training datasets, a total of 105,000 patches were randomly extracted from the operation datasets. A total of 15,000 image patches were kept for validation of the model. The number of image patches selected from each operation dataset was maintained at the same level. Similarly, the test dataset required a total of 30,000 images, with 6000 images taken from each of the operation datasets. Finally, the training dataset for multiclass classification was fed to a deep feed-forward neural network that was customized for multiclass classification by changing the final fully connected layer to a width of 5. The Bayers technique and that proposed by Rana [17] were used to compare our results obtained for single- and multiple-operation detection. For texture-based feature extraction using the proposed method, we first extracted features from dataset images; then, the extracted features were fed to he proposed fully connected network for further processing.
The results of multiclass classification are presented in the form of a confusion matrix in Table 7 and Table 8. This matrix demonstrates that the proposed method performs more effectively than the Bayers method, as well as the Rana method, for each class. The state-of-the-art benchmark approaches proposed by Bayer [14] and Rana [17] were surpassed by our method in its ability to differentiate between original, median-filtered, and mean-filtered images, as well as between photos with AWGN noise added and JPEG-compressed images, as evident from Table 9 in term of the reported statistics; for example, MCC and kappa and are more reliable parameters than simple accuracy.
Table 7. Confusion matrix for Proposed method and Bayers method for operator detection.
Table 8. Confusion matrix for Proposed method and Ranas method for operator detection.
Table 9. Multiclass classification results for general image manipulation detection.
The results obtained using our method are presented in term of the most commonly evaluated classification metrics in Table 9. Comparatively, Bayer’s technique obtained an accuracy of 97.89%, whereas our proposed solution reached 99.46 % accuracy as a macro average. The proposed method performed well in terms of the following evaluation parameters: Accuracy, Error, Recall, Specificity, Precision; False-positive rate, F1 score, kappa, and Matthews correlation coefficient (MCC). Both the kappa and Matthews correlation coefficient provided encouraging results as compared to [14,17].
One of the most important hyperparameter decisions in deep learning systems affecting both convergence times and model performance is the choice of initial component values for the optimization of deep neural networks. We experimented with various weight initialization algorithms and combinations of activation layers. The weight initialization algorithms proposed by Xavier [47] and He [48], as well as orthogonal [49] and bias initialization of ‘zero’ and ‘one’ were studied. Figure 5a shows the training accuracy when the proposed model was employed for general-purpose image manipulation detection for different numbers of epochs.
Figure 5. Experimentation with weight, bias and activation layers. (a) Accuracy for various weight and bias combinations; (b) Accuracy for various activation layers.
The system performed best for with orthogonal weight and bias set to ‘zero’ as compared to other combinations of weight initialization and bias initialization schemes, as summarized in Table 10. Among the various combinations, the accuracy of six combinations of weight initialization and bias initialization methods are reported. The results show that the combination of orthogonal weights and ‘zero’ bias performed better than other tested combinations. We also tested narrow normal bias initialization, but the results were not satisfactory.
Table 10. Testing accuracy for various combinations of layer weight and bias initialization methods.
All these experiments were conducted by keeping the number and width of each fully connected layer fixed, with activation methods that were a mix of ‘elu’ and ‘tanh’.
We also experimented with various combinations of activation methods, i.e., ‘ReLU’, ‘tanh’, ‘ elu’, ‘ gelu’, ‘swish’, ‘leaky ReLU’, ‘none’, as well as a mixture of activation layers placed at different positions. Every component of the input is subjected to a threshold operation when a ‘ReLU’ layer is present. This operation resets any value less than zero to zero. The tangent hyperbolic ‘tanh’ method was applied to the layer inputs by an activation layer using the ‘tanh’ function. When fed positive inputs, an ‘ elu’ activation layer carries out the identity operation, whereas when fed negative inputs, it carries out an exponential nonlinearity operation. A ‘leaky ReLU’ layer carries out a threshold operation that includes multiplication of any input value that is smaller than zero by a constant scalar. Another type of activation method is called a swish activation layer, which uses the swish function on the inputs. Gated linear units (gelus) are the component-wise product of two linear projections, one of which is passed via a sigmoid function beforehand. Activation methods are discussed in detail in [46]. Table 11 summarizes the obtained results, and Figure 5b shows the testing accuracy for different epochs for activation functions employed in between fully connected layers. We can clearly see that the mixture of activation layers in which the first four layers were followed by an ‘elu’ layer and next nine fully connected layers were followed by a ‘tanh’ layer performed better than network architectures in which only one activation function was used throughout the network structure. The use of no activation layer performed poorly as compared to the other layers. The ‘ReLU’, ‘ elu’, ‘tanh’ and ‘ gelu’ configurations produced similar results, but the mixed combination performed exceptionally well. The ‘swish’ and ‘leaky ReLU’ configurations are not reported, as their results were not satisfactory. We experimented with different mixes of layers for our network architecture and found that the best mix of activation layers was that with four ‘ elu’ and nine ‘tanh’ activation layers, as shown in Figure 1 and Table 11.
Table 11. Testing accuracy for various activation functions.
The limitations of the work are two stage development. As compared to CNN designing where we supply images directly and feature extraction is done by the CNN model, our method works by first performing feature selection and then an MLP is designed for the problem-solving. MLP has a smaller search space as compared to CNN that has an infinite search space.

6. Conclusions and Future Work

We developed a method for general-purpose image alteration detection by applying the feature engineering methodology and combined it with a deep neural network design strategy. First, we established a set of features to achieve image modification detection. Next, we designed a deep neural network that utilizes fully connected layers with activation layers at appropriate points to differentiate between original images and a variety of image alteration procedures. Finally, we optimized the performance of the deep neural network by comparing it to itself using a very large number of optimization parameters.
In order to get a good idea of how well the planned system would function, we conducted a number of tests. The findings of the studies show that the proposed system, which employs a multilayer perceptron (MLP) trained with a 36-feature set, attained an accuracy of 99.46%. This method outperforms current deep-learning-based solutions, which achieved an accuracy of 97.89%.
In the future, it will be necessary to include a large number of operators in research on operation detection, and it will also be necessary to implement a more thorough approach for feature engineering that takes into account the extraction of larger feature sets from images for operator detection. Another dimension in which this work can be extended is experimentation with recent innovations in deep learning models.
In comparison to state-of-the-art methods, the real-world implementation of this work is anticipated to be faster because it requires less computation and has fewer parameters. In the future, we will perform a detailed time and space complexity analysis of the system proposed above.

Author Contributions

Conceptualization, S.A. and S.S. (Sparsh Sharma); methodology, S.S. (Sparsh Sharma); software, S.S. (Saurabh Singh); validation, S.I., B.Y. and S.A.; formal analysis, S.A.; investigation, S.S. (Sparsh Sharma); resources, S.S. (Saurabh Singh); data curation, S.I.; writing—original draft preparation, S.A.; writing—review and editing, B.Y.; visualization, S.I.; supervision, S.S. (Sparsh Sharma); project administration, S.S. (Sparsh Sharma); funding acquisition, B.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Research Foundation of Korea [under Grant NRF-2021R1I1A2045721]. The work was also supported by the Woosong University Academic Research Fund in 2023.

Data Availability Statement

All datasets utilized in this study are publicly available.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
CNNConvolutional Neural Network
MCCMathews Correlation Coefficient
MFMedian Filter
MLPMulti Layer perceptron
MnFMean Filter
GLCMGray Level Concurrence Matrix
GLRLMGray Level Run Length Matrix
GBGaussian Blurring
AWGNAdditive White Gaussian Noise
RSResampling

References

  1. Piva, A. An Overview on Image Forensics. ISRN Signal Process. 2013, 2013, 496701. [Google Scholar] [CrossRef]
  2. Stamm, M.C.; Wu, M.; Liu, K.J.R. Information Forensics: An Overview of the First Decade. IEEE Access 2013, 1, 167–200. [Google Scholar] [CrossRef]
  3. Qureshi, M.A.; Deriche, M. A bibliography of pixel-based blind image forgery detection techniques. Signal Process. Image Commun. 2015, 39, 46–74. [Google Scholar] [CrossRef]
  4. Farid, H. Digital doctoring: How to tell the real from the fake. Significance 2006, 3, 162–166. [Google Scholar] [CrossRef]
  5. Kujur, A.; Raza, Z.; Khan, A.A.; Wechtaisong, C. Data Complexity Based Evaluation of the Model Dependence of Brain MRI Images for Classification of Brain Tumor and Alzheimer’s Disease. IEEE Access 2022, 10, 112117–112133. [Google Scholar] [CrossRef]
  6. Khan, A.A.; Madendran, R.K.; Thirunavukkarasu, U.; Faheem, M. D2PAM: Epileptic seizures prediction using adversarial deep dual patch attention mechanism. CAAI Trans. Intell. Technol. 2023, 8, 755–769. [Google Scholar] [CrossRef]
  7. Zhu, B.B.; Swanson, M.D.; Tewfik, A.H. When seeing isn’t believing [multimedia authentication technologies]. IEEE Signal Process. Mag. 2004, 21, 40–49. [Google Scholar] [CrossRef]
  8. Qiu, X.; Li, H.; Luo, W.; Huang, J. A Universal Image Forensic Strategy Based on Steganalytic Model. In Proceedings of the 2nd ACM Workshop on Information Hiding and Multimedia Security, New York, NY, USA, 11–13 June 2014; MMSec ’14. pp. 165–170. [Google Scholar] [CrossRef]
  9. Fridrich, J.; Kodovsky, J. Rich models for steganalysis of digital images. IEEE Trans. Inf. Forensics Secur. 2011, 7, 868–882. [Google Scholar] [CrossRef]
  10. Shi, Y.Q.; Sutthiwan, P.; Chen, L. Textural Features for Steganalysis. In Proceedings of the Information Hiding; Kirchner, M., Ghosal, D., Eds.; Springer: Berlin/Heidelberg, Germany, 2013; pp. 63–77. [Google Scholar]
  11. Fan, W.; Wang, K.; Cayre, F. General-purpose image forensics using patch likelihood under image statistical models. In Proceedings of the 2015 IEEE International Workshop on Information Forensics and Security (WIFS), Rome, Italy, 16–19 November 2015; pp. 1–6. [Google Scholar] [CrossRef]
  12. Bayar, B.; Stamm, M.C. A Deep Learning Approach to Universal Image Manipulation Detection Using a New Convolutional Layer. In Proceedings of the 4th ACM Workshop on Information Hiding and Multimedia Security, New York, NY, USA, 20–22 June 2016; MMSec ’16. pp. 5–10. [Google Scholar] [CrossRef]
  13. Mazumdar, A.; Singh, J.; Tomar, Y.S.; Bora, P.K. Universal image manipulation detection using deep siamese convolutional neural network. arXiv 2018, arXiv:1808.06323. [Google Scholar]
  14. Bayar, B.; Stamm, M.C. Constrained Convolutional Neural Networks: A New Approach Towards General Purpose Image Manipulation Detection. IEEE Trans. Inf. Forensics Secur. 2018, 13, 2691–2706. [Google Scholar] [CrossRef]
  15. Chen, Y.; Kang, X.; Shi, Y.Q.; Wang, Z.J. A multi-purpose image forensic method using densely connected convolutional neural networks. J. Real-Time Image Process. 2019, 16, 725–740. [Google Scholar] [CrossRef]
  16. Yang, L.; Yang, P.; Ni, R.; Zhao, Y. Xception-Based General Forensic Method on Small-Size Images. In Advances in Intelligent Information Hiding and Multimedia Signal Processing; Pan, J.S., Li, J., Tsai, P.W., Jain, L.C., Eds.; Springer: Singapore, 2020; pp. 361–369. [Google Scholar]
  17. Rana, K.; Singh, G.; Goyal, P. MSRD-CNN: Multi-Scale Residual Deep CNN for General-Purpose Image Manipulation Detection. IEEE Access 2022, 10, 41267–41275. [Google Scholar] [CrossRef]
  18. Mehta, R.; Kumar, K.; Alhudhaif, A.; Alenezi, F.; Polat, K. An ensemble learning approach for resampling forgery detection using Markov process. Appl. Soft Comput. 2023, 147, 110734. [Google Scholar] [CrossRef]
  19. Singh, D.; Jain, T.; Gupta, N.; Tolani, B.; Seeja, K.R. Fake Image Detection Using Ensemble Learning. In Proceedings on International Conference on Data Analytics and Computing; Yadav, A., Gupta, G., Rana, P., Kim, J.H., Eds.; Springer: Singapore, 2023; pp. 383–393. [Google Scholar]
  20. Yeganeh, A.; Pourpanah, F.; Shadman, A. An ANN-based ensemble model for change point estimation in control charts. Appl. Soft Comput. 2021, 110, 107604. [Google Scholar] [CrossRef]
  21. Weeraddana, D.; Khoa, N.L.D.; Mahdavi, N. Machine learning based novel ensemble learning framework for electricity operational forecasting. Electr. Power Syst. Res. 2021, 201, 107477. [Google Scholar] [CrossRef]
  22. Li, X.; Zhang, G.; Huang, H.H.; Wang, Z.; Zheng, W. Performance Analysis of GPU-Based Convolutional Neural Networks. In Proceedings of the 2016 45th International Conference on Parallel Processing (ICPP), Philadelphia, PA, USA, 16–19 August 2016; pp. 67–76. [Google Scholar] [CrossRef]
  23. Marcus, G. Deep learning: A critical appraisal. arXiv 2018, arXiv:1801.00631. [Google Scholar]
  24. Amerini, I.; Anagnostopoulos, A.; Maiano, L.; Celsi, L.R. Deep Learning for Multimedia Forensics. Found. Trends Comput. Graph. Vis. 2021, 12, 309–457. [Google Scholar] [CrossRef]
  25. Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021, 8, 1–74. [Google Scholar] [CrossRef]
  26. Cybenko, G. Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 1989, 2, 303–314. [Google Scholar] [CrossRef]
  27. Lin, Z.; Memisevic, R.; Konda, K. How far can we go without convolution: Improving fully-connected networks. arXiv 2015, arXiv:1511.02580. [Google Scholar] [CrossRef]
  28. Touvron, H.; Bojanowski, P.; Caron, M.; Cord, M.; El-Nouby, A.; Grave, E.; Izacard, G.; Joulin, A.; Synnaeve, G.; Verbeek, J.; et al. ResMLP: Feedforward Networks for Image Classification With Data-Efficient Training. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 5314–5321. [Google Scholar] [CrossRef] [PubMed]
  29. Melas-Kyriazi, L. Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet. arXiv 2021, arXiv:2105.02723. [Google Scholar] [CrossRef]
  30. Liu, H.; Dai, Z.; So, D.; Le, Q.V. Pay Attention to MLPs. In Advances in Neural Information Processing Systems; Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W., Eds.; Curran Associates, Inc.: New York, NY, USA, 2021; Volume 34, pp. 9204–9215. [Google Scholar]
  31. Shi, S.; Wang, Q.; Xu, P.; Chu, X. Benchmarking State-of-the-Art Deep Learning Software Tools. In Proceedings of the 2016 7th International Conference on Cloud Computing and Big Data (CCBD), Macau, China, 16–18 November 2016; pp. 99–104. [Google Scholar] [CrossRef]
  32. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  33. Zhao, Y.; Wang, G.; Tang, C.; Luo, C.; Zeng, W.; Zha, Z.J. A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP. arXiv 2021, arXiv:2108.13002. [Google Scholar] [CrossRef]
  34. Ahmed, S.; Islam, S. Median filter detection through streak area analysis. Digit. Investig. 2018, 26, 100–106. [Google Scholar] [CrossRef]
  35. Haralick, R.M.; Shanmugam, K.; Dinstein, I. Textural Features for Image Classification. IEEE Trans. Syst. Man Cybern. 1973, SMC-3, 610–621. [Google Scholar] [CrossRef]
  36. Soh, L.K.; Tsatsoulis, C. Texture analysis of SAR sea ice imagery using gray level co-occurrence matrices. IEEE Trans. Geosci. Remote Sens. 1999, 37, 780–795. [Google Scholar] [CrossRef]
  37. Clausi, D.A. An analysis of co-occurrence texture statistics as a function of grey level quantization. Can. J. Remote Sens. 2002, 28, 45–62. [Google Scholar] [CrossRef]
  38. Galloway, M.M. Texture analysis using gray level run lengths. Comput. Graph. Image Process. 1975, 4, 172–179. [Google Scholar] [CrossRef]
  39. Castellano, G.; Bonilha, L.; Li, L.; Cendes, F. Texture analysis of medical images. Clin. Radiol. 2004, 59, 1061–1069. [Google Scholar] [CrossRef]
  40. Tang, X. Texture information in run-length matrices. IEEE Trans. Image Process. 1998, 7, 1602–1609. [Google Scholar] [CrossRef]
  41. Gallagher, N.; Wise, G. A theoretical analysis of the properties of median filters. IEEE Trans. Acoust. Speech Signal Process. 1981, 29, 1136–1141. [Google Scholar] [CrossRef]
  42. Chu, A.; Sehgal, C.M.; Greenleaf, J.F. Use of gray value distribution of run lengths for texture analysis. Pattern Recognit. Lett. 1990, 11, 415–419. [Google Scholar] [CrossRef]
  43. Dasarathy, B.V.; Holder, E.B. Image characterizations based on joint gray level—Run length distributions. Pattern Recognit. Lett. 1991, 12, 497–502. [Google Scholar] [CrossRef]
  44. Ahmed, S.; Islam, S. Median filtering detection using improved percentage Streak Area. In Proceedings of the Virtual International Research Conference on IoT, Cloud and Data Science, Online, 23–24 April 2021; p. 11. [Google Scholar]
  45. Fei, N.; Gao, Y.; Lu, Z.; Xiang, T. Z-Score Normalization, Hubness, and Few-Shot Learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 11–17 October 2021; pp. 142–151. [Google Scholar]
  46. Nwankpa, C.; Ijomah, W.; Gachagan, A.; Marshall, S. Activation functions: Comparison of trends in practice and research for deep learning. In Proceedings of the second International Conference on Computational Sciences and Technology, Jamshoro, Pakistan, 17–19 December 2020; pp. 124–133. [Google Scholar]
  47. Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Sardinia, Italy, 13–15 May 2010; Volume 9, pp. 249–256. [Google Scholar]
  48. He, K.; Zhang, X.; Ren, S.; Sun, J. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015. [Google Scholar]
  49. Saxe, A.M.; McClelland, J.L.; Ganguli, S. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv 2013, arXiv:1312.6120. [Google Scholar]
  50. Schaefer, G.; Stich, M. UCID: An uncompressed color image database. In Electronic Imaging 2004; SPIE: San Jose, CA, USA, 2003; pp. 472–480. [Google Scholar] [CrossRef]
  51. Bas, P.; Filler, T.; Pevný, T. Break Our Steganographic System”: The Ins and Outs of Organizing BOSS. In Information Hiding; Filler, T., Pevný, T., Craver, S., Ker, A., Eds.; Springer: Berlin/Heidelberg, Germany, 2011; pp. 59–70. ISBN 978-3-642-24177-2. [Google Scholar] [CrossRef]
  52. Dang-Nguyen, D.T.; Pasquini, C.; Conotter, V.; Boato, G. RAISE: A Raw Images Dataset for Digital Image Forensics. In Proceedings of the 6th ACM Multimedia Systems Conference, MMSys 15, New York, NY, USA, 18–20 March 2015; pp. 219–224. [Google Scholar] [CrossRef]
  53. Gloe, T.; Böhme, R. The dresden image database for benchmarking digital image forensics. J. Digit. Forensic Pract. 2010, 3, 150–159. [Google Scholar] [CrossRef]
  54. Union, I.T. Green BT.601: Studio Encoding Parameters of Digital Television for Standard 4:3 and Wide Screen 16:9 Aspect Ratios. Status: In force (Main). 2011. Available online: https://www.itu.int/rec/R-REC-BT.601-7-201103-I/en (accessed on 8 March 2011).
  55. Chicco, D.; Jurman, G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom. 2020, 21, 6. [Google Scholar] [CrossRef]
  56. The MathWorks, Inc. MATLAB Version: 9.13.0 (R2021a). 2021. Available online: https://in.mathworks.com/products/new_products/release2021a.html (accessed on 1 November 2023).
  57. The MathWorks, Inc. Deep-learning Toolbox: 9.4 (R2021a). 2021. Available online: https://in.mathworks.com/solutions/deep-learning.html (accessed on 1 November 2023).
  58. The MathWorks, Inc. Experiment Application (R2021a). 2021. Available online: https://in.mathworks.com/help/deeplearning/manage-experiments (accessed on 1 November 2023).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.