Next Article in Journal
Artificial Intelligence in Medical Education: A Narrative Review
Previous Article in Journal
DERI1000: A New Benchmark for Dataset Explainability Readiness
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Integrating Convolutional Neural Networks with a Firefly Algorithm for Enhanced Digital Image Forensics

by
Abed Al Raoof Bsoul
* and
Yazan Alshboul
Information Technology Department, Yarmouk University, Irbid 21163, Jordan
*
Author to whom correspondence should be addressed.
AI 2025, 6(12), 321; https://doi.org/10.3390/ai6120321
Submission received: 30 October 2025 / Revised: 26 November 2025 / Accepted: 2 December 2025 / Published: 8 December 2025

Abstract

Digital images play an increasingly central role in journalism, legal investigations, and cybersecurity. However, modern editing tools make image manipulation difficult to detect with traditional forensic methods. This research addresses the challenge of improving the accuracy and stability of deep-learning-based forgery detection by developing a convolutional neural network enhanced through automated hyperparameter optimisation. The framework integrates a Firefly-based search strategy to optimise key network settings such as learning rate, filter size, depth, dropout, and batch configuration, reducing reliance on manual tuning and the risk of suboptimal model performance. The model is trained and evaluated on a large raster dataset of tampered and authentic images, as well as a custom vector-based dataset containing manipulations involving geometric distortion, object removal, and gradient editing. The Firefly-optimised model achieves higher accuracy, faster convergence, and improved robustness than baseline networks and traditional machine-learning classifiers. Cross-domain evaluation demonstrates that these gains extend across both raster and vector image types, even when vector files are rasterised for deep-learning analysis. The findings highlight the value of metaheuristic optimisation for enhancing the reliability of deep forensic systems and underscore the potential of combining deep learning with nature-inspired search methods to support more trustworthy image authentication in real-world environments.

1. Introduction

In the information-driven world of today, digital images have become one of the most plausible forms of evidence that can be presented in the courtroom, in the news outlet, or in a cybersecurity case. However, the same technologies that enable one to share images easily have brought manipulation to the same level of triviality. With the help of modern editing programmes, tampering can be carried out through subtle object removal or highly convincing deepfakes, undermining the credibility of visual media and complicating online investigations [1,2,3]. In this scenario, digital image forensics (DIF) is not only a technical tool but also a societal benefit, a necessary means to guarantee the authenticity, responsibility, and reliability of evidence in digital communication. Although it has been researched for decades, the discipline still faces a long-standing and evolving challenge: how to effectively and adequately identify forgeries that are growing more challenging and situation-specific.
DIF is a tripartite process interconnected by the following methods: authenticity/source verification, manipulation detection, image enhancement, and content analysis and documentation, as shown in Figure 1. Each category has its own analytical requirements and specific methodologies for assessing the accuracy and reliability of images across various uses.
  • Authenticity and Source Verification.
This type presupposes checking the originality of the digital image and ascertaining its source. It consists of several essential techniques:
  • Image Acquisition Analysis: Trends in camera sensors, lens defects, and embedded data (EXIF) are examined, and the authenticity of the image source and the likelihood of alteration are assessed.
  • Image Authentication: It is performed by methods such as error level analysis (ELA), pixel technique, and statistical method, to identify that repairs like splicing, cloning, or retouching have been performed.
  • Image Source Identification: An algorithm based upon sensor noise-pattern fingerprint (image camera prints) and a unique software artefact to fingerprint an image to the camera or editing device that shot or edited the image.
2.
Image Enhancing and Manipulation Detection.
The other crucial aspect is to determine the manipulations and the image enhancement that a forensic expert will examine; it assumes the methods directed at the discovery of any hidden or distorted parts of the image:
  • Image Tampering Detection: Identifies sophisticated manipulations (copy-move forgery, splicing, and retouching) using sophisticated algorithms of convolutional neural networks (CNNs) and statistical models.
  • Image Enhancement and Recovery: Image processing involves the modes of noise processing, image sharpening, contrast processing, and recovery modes in order to guarantee further clarity of the images and objects of importance, such as images and videos damaged or corrupted by noise.
3.
Documentation and Content Analysis.
The final one is the emphasis on reading the content of the images and documenting the results to utilise them in order to prove the forensic and investigative application:
  • Content Analysis and Interpretation: This retrieves the relevant image data information by various means, like recognition of faces, recognition of objects, and how the image was viewed.
  • Steganography and Steganalysis: Discoveries of hidden messages in photographs, an essential characteristic of computer security, are used to uncover coded messages.
  • Reporting and Documentation: Fills in formal, summarised, and repeatable reports and ensures that forensic findings are adequately documented to be utilised in a court of law, investigative reporting, and cybersecurity research.
Digital images are generally produced in two formats: raster and vector. Raster images are composed of pixels. Vector graphics, on the other hand, encode shapes and colours through mathematical instructions rather than pixel values. To analyse such graphics in a deep-learning model, they must first be rasterised so that their visual content is converted into a pixel-based form. Because of this requirement, most forensic studies focus on raster images [4] and treat vector files only after this conversion step. This approach does not provide native analysis of vector structures, but it offers a practical way to include graphics created in design software or exported from illustration tools. Using rasterised versions of vector files ensures that the model processes all inputs in a consistent pixel format, allowing the same deep-learning pipeline to handle both types of imagery without overstating cross-format capabilities.
To address these shortcomings, deep learning-based systems such as convolutional neural networks (CNNs) have proven useful as alternatives to tampering detection [4,5]. CNNs can automatically extract hierarchical and discriminative image features that enable the detection of various types of forgery, including, but not limited to, splicing, copy-move, and inpainting. The ability of CNNs to handle complex statistical forms, such as illumination flaws, sensor noise anomalies, and compression artefacts, makes the techniques highly effective for image forensics [6]. Nevertheless, their efficacies are primarily determined by their optimal hyperparameter settings, such as the learning rate, filter dimensions, and network depth. These parameters are computationally intensive to select manually and may yield suboptimal results.
To overcome this difficulty, nature-inspired optimisation algorithms are combined with deep-learning models to improve model performance and generalisation. Among them, one can distinguish the Firefly Algorithm (FA) [7], which demonstrates impressive efficiency in leveraging the necessity of a heuristic for global search and convergence. The FA emulates the bioluminescent attraction process of fireflies, in which brighter ones are considered optimal solutions that can explore and exploit the search space [8]. FAs could be used in DIF to optimise CNN hyperparameters, thereby enhancing model training effectiveness and detection accuracy [9].
This research project is premised on the idea that deep feature learning and bio-inspired optimisation can synergise. CNNs are strong discriminative pattern extractors that can detect illumination, geometric, and texture inconsistencies, whereas FAs offer a stochastic method for jointly evolving network configurations. Together, they form a self-adaptive forensic model that can balance accuracy and computer efficiency against all types of manipulations. This blend is part of the conceptual development of digital forensics as static architectures change to self-optimising intelligent structures.
The main objectives of this study are as follows:
  • The work aims to improve forgery detection precision by incorporating Firefly-optimised CNN configurations that strengthen the model’s ability to identify various tampering types, including splicing, cloning, and content removal.
  • Use Firefly Algorithm to optimise CNN hyperparameters such as learning rate, number of layers, and kernel dimensions to obtain greater precision and reduced computational cost.
  • Make sure to be flexible across a variety of image formats and image domains so that they can be applied in cybersecurity, law enforcement, journalism, and intellectual property protection.
By integrating CNNs with the Firefly Algorithm, this research aims to develop a robust, efficient, and generalisable DIF tool that delivers improved accuracy and performance across complex manipulation scenarios.

2. Literature Review

2.1. Significance of Digital Image Forensics

The problem of verifying the authenticity of digital images has already become a defining feature of the information age. The confidence in visual evidence has been undermined by the accessibility of high-quality editing software and the spread of synthetic content such as deepfakes, affecting network security, journalism, and fairness in the courts [1,2,3]. DIF, hence, emerges as a multidisciplinary sphere that must detect, certify, and analyse images to identify manipulation and fraud.
However, due to the increasing popularity of different image formats on social media, the digital design and legal aspects have put exponential pressure on forensic systems for analysis. Raster images are prone to showing manipulations as statistical anomalies or compression artefacts. Vector graphics, however, encode content using mathematical instructions and must be rasterised before a deep-learning model or similar pixel-based models can examine them. Because most forensic tools are built for raster inputs, the literature offers limited approaches for incorporating vector-origin content once converted to pixel form. This gap highlights the importance of developing detection methods that remain computationally efficient while operating consistently on pixel-based representations derived from both native raster images and rasterised-vector files.

2.2. Deep-Learning Paradigms vs. Handcrafted Features

Initial studies on DIF were also based on manual feature extraction and intrinsic signal analysis based on physical camera properties or compression clues. The photo-response non-uniformity (PRNU) technique was one of the most effective methods that leveraged the distinctive noise patterns of camera sensors to determine image provenance [10]. PRNU-based techniques can be used as source verification methods. Nevertheless, they do not compare various types of manipulation and after-processing [11]. Likewise, non-advanced methods such as the error level analysis [12] and block-based statistical modelling [13] were successful to some extent in detecting JPEG tampering. However, they failed to reveal subtle or local manipulations.
The shortcomings of manual methods led to the development of deep-learning methods that can learn hierarchical representations of raw images in an autoregressive manner. In contrast to previous models which rely on features designed by experts, CNNs capture spatial and frequency patterns associated with manipulation artefacts such as lighting inconsistencies, texture discontinuities, and pixel correlations [6]. These changes in data-driven, rather than rule-based, forensics were a paradigm shift and aligned with wider developments in computer vision and artificial intelligence.

2.3. Convolutional Neural Networks in Image Forgery Detection

CNNs now serve as the new standard for recognising tampering methods, including splicing, copy-move, and inpainting. CNNs, because of their hierarchical nature, are capable of finding low-level features, e.g., edges and noise residuals, as well as high-level semantic anomalies, e.g., tampering and forgery. Similar early CNN designs achieved higher classification accuracy than handcrafted designs [6].
Recent comprehensive analyses, such as [14], highlight the importance of evaluating AI-based forensic tools across diverse operational constraints, including localisation capability, device-level efficiency, and multi-format media complexity. Such perspectives reinforce the need for broader evaluation protocols that future extensions of this work will address.
Although image manipulation was more advanced, researchers began optimising the CNN architecture to enhance localisation and robustness. A worthwhile enhancement to the model presented in [5] provided better adiabatic separation of the prominent splicing boundaries using noise-residual streams in multi-branch CNNs that integrate the red, green, and blue channels. The authors in [15] presented lightweight CNN variants for effective forgery detection with reduced computational complexity, achieving results almost on par with the state of the art. On the same note, Ref. [16] developed faster R-CNN-based architectures that combine multi-scale feature fusion to weak ground targets in remote sensing images. The research in [17] made a further contribution to this field by proposing uncertainty-guided refinement for domain-specific scientific image splicing. All of these contributions point to the ongoing field development of multi-stream, attention-directed CNNs capable of performing fine-grained manipulation detection.
The overarching similarity among most of these CNN-based systems is a fatal performance deficiency, as they are sensitive to the hyperparameters chosen by humans, including the learning rate, filter size, batch size, and layer count. High sensitivity to poor parameter tuning can lead to local-optimal convergence, unstable training, and reduced model generalisation across datasets. Additionally, these models can perform well on benchmark datasets such as CASIA v2.0 [18] but are inaccurate when presented with new image domains or when undetected tampering is present. Therefore, although CNNs changed DIF, their dependency on heuristic configuration is a significant limitation to the goals of accuracy and efficiency.

2.4. Adaptive CNNs

To overcome the constraints of manual tuning, new researchers have resorted to metaheuristic algorithms, bio-inspired techniques that emulate natural phenomena, such as swarming, evolution, or attraction, to find the global optimum. These are genetic algorithms (GAs), particle swarm optimisation (PSO), Ant Colony optimisation (ACO), and, most recently, the Firefly Algorithm (FA). Metaheuristics provide a way to automate the search for CNN hyperparameter values by exploring (searching the solution space) and exploiting (refining promising regions).
The comprehensive review in [19] has stressed that metaheuristic hyperparameter optimisation (HPO) can significantly improve the robustness and efficiency of deep-learning models by reducing overfitting and computational costs. Not all algorithms, however, are equally efficient in high-dimensional search spaces. For example, GA-based tuning can converge very quickly, whereas PSO can be sensitive to initialisation and less diverse in later iterations [20].
The work in [7] proposed an alternative, the Firefly Algorithm, which is promising because it uses a distinct light-intensity-based attraction mechanism that enables global search (exploration) and local refinement (exploitation). FAs dynamically optimise candidate solutions (fireflies) based on the brightness of their neighbours, effectively avoiding local minima while ensuring quick convergence. The authors in [21] showed that FA-appointed CNNs achieved higher accuracy and faster convergence than manually utilised models in image classification and medical imaging. Such results indicate that optimisation can be effectively applied through FAs in deep learning, where computational accuracy and robustness are the most significant qualities, in line with forensic analysis.

2.5. Integrating Firefly Optimisation with CNNs for Image Forensics

Empirical evidence for such integration is found in recent research on hybrid CNN-FA models applied to broader computer vision. The work in [22] used FAs to train CNNs for image segmentation, achieving improved performance metrics with fewer epochs. Likewise, FA-based tuning improved model stability and minimised computational burden in neighbourhood applications of object recognition and defect detection [23]. Despite these achievements, a significant lack of hybridisation between FAs and CNNs in digital image forensics was noted in the literature. The available forensic studies are primarily focused on CNN architecture design or feature-level optimisation, rather than automated parameter optimisation of models.
Furthermore, the majority of studies test their models on raster datasets only (e.g., CASIA v2.0, Columbia, or COVERAGE). Not many studies deal with the forensic issues of vector image use, which involves non-pixel-based geometric and structural analysis. This limits the model’s generalisability to diverse real-world issues, particularly in areas such as graphic design authentication, digital watermark verification, and intellectual property protection. Thus, the incorporation of FA-based minimisation into a CNN framework that can be configured to accept raster as well as vector analysis is a path that has not been travelled but is highly necessary to achieve complete forensic flexibility.

2.6. Literary Trends Research Gap in Image Forgery

The comparison with previous research shows some general tendencies. To begin with, CNNs are consistently more effective than conventional approaches for detecting typical forms of tampering, but they are sensitive to dataset bias and parameterisation. Second, metaheuristic algorithms have been demonstrated to perform well in overall optimisation, but their use in image forensics, particularly to optimise CNNs for tampering detection, is seldom explored. Third, most of the available literature focuses on raster images as the universal forensic format and puts insufficient emphasis on vector formats, even though these have become popular in digital design and fabrication chains [4].
Together, these shortcomings highlight a knowledge gap in deep learning, optimisation, and cross-domain image analysis. All of this is hampered by the lack of a model capable of dynamically optimising CNN output and managing heterogeneous image formats, slowing the development of a well-functioning, generalisable forensic tool. Moreover, current models may prioritise accuracy over efficiency, resulting in computationally intensive, impractical systems that cannot be applied to large-scale or real-time forensic investigations.
To address these shortcomings, this paper builds on a Firefly-optimised CNN to improve forgery detection in both raster and vector domains. The hybrid approach aims to overcome the core problems identified in the literature by automating hyperparameter selection, converging more quickly and with lower computational cost, and performing global optimisation. Furthermore, the proposed model achieves our third goal: providing flexibility across a wide range of image types and real-world applications, including cybersecurity, journalism, and law enforcement.

3. Methodology

The study used an experimental computational design to develop a hybrid convolutional neural network–Firefly Algorithm (CNN-FA) system to improve digital image forensics. The experimental system was also designed to enhance forgery detection, optimise CNN hyperparameters, and scale to both raster and vector image spaces. The design was specifically adaptable to the study objectives, as it facilitated the quantitative measurement of model performance using standardised datasets in the practice of machine-learning experimentation.
An overview of the study design is illustrated in Figure 2. The methodology consists of four main stages: data collection and preparation, CNN model design, Firefly-based optimisation, and model evaluation and validation.

3.1. Data Collection and Preparation

3.1.1. Dataset Selection

The raster-image forgery detection method based on the CASIA Image Tampering Detection Evaluation Database (CASIA v2.0) [18] is used as the primary dataset for an experimental analysis. CASIA v2.0 contains 12,614 images, comprising 7491 authentic and 5123 tampered images, across various manipulation types, including splicing, copy-move, and retouching. The photos are saved in JPEG format with variable resolution and compression levels, suggesting a realistic range for model training and testing.
For a vector image, a database of 8712 images was created, comprising 5326 real and 3386 tampered images. This was performed to create tampered samples that mimic real-life forgery settings, in which the scalable vector graphics (SVGs) and Adobe Illustrator (AI) files were deliberately altered. The images are then rasterised to make them readable by a CNN model. The manipulations included geometric structure changes, the addition or removal of objects, and adjustments to colours, gradients, and shapes to mimic typical editing functions found in professional design software. This data offers a wide and authentic basis for assessing the suggested hybrid CNN-Firefly Algorithms framework on the forensic analysis of images based on vectors.
The combination of the CASIA v2.0 raster dataset and the designed rasterised-vector dataset provides a comprehensive experimental background for testing the proposed hybrid CNN-Firefly Algorithm framework. The standardised benchmark data and simulated rasterised-vector forgeries help ensure a balanced representation of both pixel-based and structure-based manipulations. Such a dual-dataset scheme will allow a complete evaluation of the model’s ability to generalise to other image types, manipulation types, and levels of complexity, enhancing the validity and applicability of the experimental results. Table 1 summarises the datasets used in the study. Appendix A illustrates the steps taken in data collection and preparation

3.1.2. Image Preprocessing

To prevent standardised input data and enhance the model’s generalisation, all images were rescaled to a standard resolution and normalised to [0, 1]. All data augmentation methods that involve rotation, horizontal and vertical flipping, scaling, and brightness level control were also implemented to make the sample more diverse and less prone to overfitting during training.
The processing of vector images was performed in two complementary processes:
(1)
Direct structural analysis, whereby the intrinsic mathematical elements of the vector graphics were studied (e.g., Bézier curves, coordinates, and geometric transformations).
(2)
Rasterised analysis, in which the use of the vector files was turned into pixel-based representations and their input into the CNN enabled the model to learn feature patterns similar to those observed in the raster imagery.

3.2. CNN Model Design

3.2.1. Architecture Selection

Figure 3 shows that the CNN model was trained to classify binary data, treating the genuine and fake images as part of the authentic images. The original architecture was backed by well-established CNN families, specifically VGGNet and ResNet, due to their demonstrated ability to extract features and remain stable during image classification. The model integrates the sequential and deep natures of VGGNet with the robust skip connections of ResNet, which are vital for training intense networks and overcoming gradient vanishing. The texture, edges, and gradient transitions were represented in lower convolutional layers, whereas the deeper layers learned higher-level spatial dependencies that reflect manipulation artefacts.
The CNN architecture consisted of seven convolutional blocks:
  • Block 1 & 2: 32 and 64 filters (3 × 3), ReLU activation, batch normalisation, max-pooling
  • Block 3 & 4: 128 filters (3 × 3), ReLU, batch normalisation, max-pooling
  • Block 5 & 6: 256 filters (3 × 3), ReLU, batch normalisation, skip connection between blocks
  • Block 7: 512 filters (3 × 3), dropout 0.5
The classifier head used global average pooling, a dense layer of 256 units with ReLU, and a final sigmoid unit.
To clarify how the two families are combined, the CNN begins with VGG-style sequential blocks (Blocks 1–4), where each block uses a single convolutional layer followed by batch normalisation and max-pooling. These blocks follow the strictly sequential design philosophy of VGGNet. In Blocks 5 and 6, lightweight ResNet-inspired skip connections are introduced by adding the output of Block 4 to the output of Block 6 prior to max-pooling. This hybrid structure preserves the simplicity of VGGNet in the early layers while leveraging residual learning in deeper layers to stabilise training and mitigate gradient vanishing.

3.2.2. CNN Training

The CNN is assessed with supervised learning, in which authentic and tampered images are labelled. The Adam adaptive optimiser is used to minimise the cross-entropy loss function using backpropagation. The cross-entropy loss is defined as
L = 1 N i = 1 N y i log y i ^ + 1 y i log 1 y i ^ ,
These training sessions used images to reflect their structural characteristics. Gradient propagation was stabilised using batch normalisation and early stopping, and dropout was used to prevent overfitting.

3.2.3. Feature Learning

The CNN model is trained during this phase, effortlessly learning hierarchical and discriminative features that capture the inherent differences between authentic and tampered images at both the pixel and structural levels. For the images in the study, the network yields fine-grained statistical and spatial signals, such as sensor noise patterns, compression artefacts, illumination inconsistencies, and texture breaks, which tend to be minor indicators of manipulation.
CNN is an efficient multi-layer convolutional net that constructs a multi-level representation of variation in high-level structural malformity and low-level signal sensitivity, thereby enabling effective tampering detection across diverse images.

3.3. Firefly Algorithm to CNN Optimisation

3.3.1. Firefly Algorithm Implementation

The Firefly Algorithm was incorporated into the CNN training process to optimise hyperparameters and automatically overcome the barrier of manual configuration. Figure 4 presents the FA applied in the current research. The entity of every firefly in the population was a possible CNN configuration, which was a setting of the hyperparameters: learning rate (α), count of convolutional layers (L), filter size (f), dropout rate (d), and batch size (b). The fitness criterion that was optimised was given as
F x i = A c c v a l x i ,
The Firefly Algorithm update rule used in this work follows the standard formulation, where each firefly’s position x i (representing a candidate hyperparameter set) is updated based on the attractiveness of brighter fireflies. The movement is computed as x i = x i + β 0 e γ r i j 2 ( x j x i ) , where F ( x i ) denotes the fitness (validation accuracy), and r i j is the Euclidean distance between fireflies i and j . This distinction clarifies the difference between the firefly’s position and its fitness score, addressing earlier ambiguity in notation. The corrected update rule is now reflected in Figure 4.
The algorithm repeatedly revised the population until the convergence criteria were met, which could be the maximum number of iterations or an insignificant change in accuracy. This procedure allowed the FA to determine the best hyperparameters that maximise the model’s precision and minimise overall computational cost.
The Firefly Algorithm was configured with a population size of 30 fireflies, with initial attractiveness β0 = 1, absorption coefficient γ = 1, randomisation parameter α = 0.15, and a maximum of 60 iterations. Search boundaries for each hyperparameter followed Table 2. The stopping criterion was either convergence of the population or completion of the iteration limit. All FA runs were initialised with random seed 42 for reproducibility.

3.3.2. Parameter Tuning

The FA optimises the following CNN hyperparameters:
  • Learning rate (α): controls update magnitude during weight adjustment.
  • Number of convolutional layers (L): determines model depth for complex-feature extraction.
  • Filter sizes (f): influence spatial receptive field and fine-detail detection.
  • Dropout rate (d): regulates overfitting by randomly deactivating neurons.
  • Batch size (b): sets the number of samples processed per gradient update.
Through this optimisation, the hybrid framework dynamically identifies the optimal hyperparameter configuration that maximises classification accuracy while minimising computational cost.

3.3.3. Firefly Optimisation Objective

The Firefly Algorithm (FA) used in the current framework primarily aims to achieve globalisation of CNN hyperparameters by balancing exploration—searching for new, possibly the most promising areas of the solution space—and exploitation—optimising promising candidate solutions. This adaptive balance allows the algorithm to prevent premature convergence to local optima and efficiently select the most performant CNN configuration. Consequently, the FA helps accelerate convergence, improve the accuracy of forgery detection, and strengthen the model’s robustness against various and complex image manipulations.

3.4. Model Evaluation and Acceptance

All the experiments were implemented in Python (v3.10) with TensorFlow and Keras models on a high-performance computing engine that has an NVIDIA RTX 3080 (10 GB VRAM), 32 GB RAM, and an Intel i7 (13th generation) processor.

3.4.1. Precision and Performance Indices

Performance depicted in models was measured using a variety of quantitative metrics, including accuracy (ACC), precision (P), recall (R), F1-score, and the area under the ROC curve (AUC), to provide a multifaceted evaluation of detection reliability and discrimination behaviour. These metrics employed in this study are defined as follows:
  • ACC: proportion of correctly classified authentic and tampered images.
  • P: proportion of correctly identified tampered images among all predicted as tampered.
  • R: proportion of actual tampered images correctly identified.
  • F1-score: harmonic mean of precision and recall, providing a balanced performance measure.
  • AUC: evaluates discrimination capability between authentic and forged classes.

3.4.2. Cross-Validation

A k-fold cross-validation strategy (typically k = 5 or 10) is adopted to assess model generalisation.
The dataset is divided into k subsets; in each iteration, one subset is used for validation while the remaining subsets train the model.
The final performance metrics are obtained by averaging across all folds.

3.4.3. Comparison with Baseline Models

To benchmark the hybrid model’s performance, three comparative baselines were implemented:
(1)
A standard CNN without optimisation,
(2)
CNNs optimised with conventional manual tuning, and
(3)
A traditional machine-learning classifier, such as support vector machines (SVMs) and random forest (RF) models.
This comparative analysis demonstrates the relative improvements in accuracy, efficiency, and robustness achieved through Firefly-based optimisation.
Advanced forensic models such as ManTra-Net [24] and Noiseprint [25] represent specialised architectures designed explicitly for learning manipulation traces, but they require extensive retraining and computational resources not aligned with the optimisation-focused scope of this study. For this reason, classical baselines and a standard CNN were selected to isolate the effect of Firefly-based hyperparameter optimisation on common architectures.

3.5. Ethical Considerations

All data were publicly accessible or simulated and did not contain personal or identifiable data. In particular, the CASIA Image Tampering Detection Evaluation Database (CASIA v2.0) and a tailor-made vector image database were applied in accordance with open-data norm (research integrity) and the Committee on Publication Ethics (COPE) requirements. The synthetic vector dataset was designed with freely available design resources with no copyrighted or private material. Any processes followed the institutional policies of digital ethics and complete data protection.

3.6. Statistical Analysis and Outcome Measures

The main finding of the research was the accuracy of the CNN-FA model in recognising the tampering on both raster and rasterised-vector databases. The secondary outcomes were measured as computational efficiency (training time and convergence rate) and cross-domain generalisation (transfer of model performance between raster and rasterised-vector inputs). These were the direct results of the study’s objectives, namely, improved accuracy, low computational cost, and greater forensic adaptability.
NumPy, SciPy, and Pandas libraries were used to conduct all the statistical analyses. Validation measures were averaged across folds, and 95 percent confidence intervals were derived to assess stability. The importance of performance variability for the hybrid model and the baseline methods was established using paired t-tests at the p = 0.05 significance level. To guarantee complete reproducibility, random seeds were fixed and elaborate configuration files were stored to enable future replication of the experiments.
All experiments were performed using fixed random seeds for reproducibility (Python: 42, NumPy: 42, TensorFlow: 42). These seeds control dataset shuffling, weight initialisation, FA randomisation, and batch ordering.

4. Results

The following section highlights the two-fold nature of the study, focusing on detection performance and computational optimisation. However, it presents the results in the sequence that indicates the objectives of the research:
  • Enhancing forgery detection accuracy across manipulation types.
  • Optimising CNN hyperparameters using the Firefly Algorithm.
  • Ensuring adaptability and robustness across raster and rasterised-vector image formats.

4.1. Overview of Experimental Data

The rationale for conducting the experimental study was the two complementary sets of data, geared towards determining the capability of the hybrid CNN-Firefly Algorithm (CNN-FA) model to identify image tampering in the raster and rasterised-vector domains, as summarised in Table 1. The differences in the forgeries provided an even playing field for experimentation. The general structure of the dataset provided sufficient diversity in image characteristics, enabling assessment of the model’s image generalisation properties.
Combining both datasets further enabled cross-domain evaluation, with the model trained on raster data and evaluated on rasterised-vector data, and vice versa. This experimental setup aimed to determine how the hybrid framework could maintain performance across heterogeneous image modalities. Taken together, the datasets provided a stringent test bed to assess the quality, flexibility, and robustness of the CNN-FA hybrid method.

4.2. Optimisation of Hyperparameters Through Firefly Algorithm

The Firefly Algorithm for calculating the CNN hyperparameters was a crucial step toward global convergence and improved model performance. Table 2 shows that the FA automatically adjusted the most important hyperparameters, such as the dropout rate and the count of convolutional layers, the filter size, the number of convolutional layers, the learning rate, and the batch size, which depend on the fitness function measured as the validation accuracy. The best set that the FA obtained was the following: the learning rate was 0.0017, the dropout was set to 0.25, and the batch size was 64, resulting in the highest validation rate of 98.7%. This parameter approach supplied a value equilibrium between the stability of the training and the efficiency of the convergence.
The path of optimisation is shown in Figure 5, where the convergence curve of the FA is presented in 60 iterations. The figure indicates that there was an increasing trend in the fitness values at the beginning stages of the iterations, but it became stable gradually as the population approached the global optimum. The decline in the variability of the fitness in the whole population is evidence that FA maintains a good balance between exploration and exploitation. The FA was 18 percent faster to converge, relative to manual grid search tuning, which demonstrates that the FA can traverse the high-dimensional hyper-parameter space in CNNs efficiently.
In summary, the FA successfully identified an optimal parameter configuration that minimised computational cost while maximising detection accuracy, providing a robust foundation for subsequent model training and evaluation.

4.3. Forgery Detection Accuracy on Raster Images (CASIA v2.0)

The CNN–FA hybrid model demonstrated superior performance in raster-based forgery detection when compared with baseline CNN and traditional classifiers. As presented in Table 3, the proposed model achieved a classification accuracy of 98.4%, substantially higher than the manually tuned CNN (95.8%) and support vector machine (SVM) classifier (91.2%). The CNN–FA also achieved a precision of 98.1%, a recall of 97.9%, and an F1-score of 98.0, which provides evidence of its steady reliability in detecting both genuine and falsified images.
To complement the quantitative evaluation, Figure 6 presents representative examples of true positive, true negative, false positive, and false negative outcomes that were produced by the proposed CNN–FA model in this research. The samples illustrate how the model responds to both tampered and authentic images under usual visual conditions. The true positive and true negative cases demonstrate the model’s ability to capture manipulation cues and preserve accuracy when no tampering is present. The false positive example reflects instances in which complex textures or irregular illumination patterns may be misinterpreted as indicators of manipulation, while the false negative case highlights scenarios where subtle or low-contrast edits can remain difficult for the detector to distinguish from authentic content. These images provide an insight into the strengths of the model performance, supporting a more comprehensive understanding of the reported performance metrics.
Figure 7 shows receiver operating characteristic (ROC) curves of the raster-image forgery detector in all the models that have been tested. The CNN-FA hybrid achieved an AUC of 0.992, which is almost a perfect amount of discrimination. The minimum CNN obtained an AUC of 0.964, whereas the traditional classifiers were less sensitive to artificially introduced manipulation. The characteristic of the ROC of the CNN-FA model is that the curve rises more sharply towards the upper-left part, revealing that they achieved higher levels of true-positive detection response by diversifying the threshold.
It can also be ascribed to the fact that performing better on raster images can be because of the FA-optimised hyperparameters that boosted the feature extraction and convergence stability. All the findings attest to the fact that the CNNFA hybrid framework enhances the accuracy, precision, and detection consistency significantly on traditional raster-based forensic processes.

4.4. Forgery Detection Accuracy on Rasterzied-Vector Images

The hybrid CNN-FA model performed well with the custom data of the rasterised-vector image: high detection was achieved on the custom rasterised-vector image data, which proved its suitability for the geometric and structure-based manipulations.
Table 4 had an overall accuracy of the classification ranging from 96.9, whereas the baseline CNN had 93.5, and the random forest classifier had a classification accuracy of 89.6. CNN-FA also has a high level of detection sensitivity, as it had a precision of 96.4 and a recall of 97.1, though this provided CNN with an F1-score of 96.8, thus indicating a low false-positive rate.
Figure 8 shows in the confusion matrix the detailed visualisation of the behaviour of the model in classification. Relating to the figure, the CNN-FA hybrid successfully detected 97% of authentic and 96.8% of tampered images on the rasterised-vectors, and the false positives were lowered to about 2.3% in comparison to the CNN manually tuned. This enhanced trade-off between accuracy and recall reflects the capacity of the hybrid model to extrapolate to non-pixel-based representations where manipulations tend to be represented as discontinuity in shape continuity, colour, or geometric transformations.
These findings can be confirmed because the FA-optimised CNN is sensitive to the characteristics of the rasterised-vector images, and it can detect structural distortions that are hardly recognisable by traditional analysis.

4.5. Cross-Domain and Combined Dataset Evaluation

The cross-domain and combined-domain experiments were performed to evaluate the strength of the hybrid model with various image types.
The model trained using CASIA v2.0 was tested on the rasterised-vector dataset and vice versa. Table 5, which summarises the results of the CNNFA model, indicates that with cross-domain conditions, the model had an average accuracy of 97.8 percent that remained constant regardless of variation in the image representation, measured in terms of precision (97.4) and recall (97.6).
A comparative perspective of the performance of the models in both single-domain and cross-domain setups is presented in Figure 9. As shown in the bar chart, the CNN-FA framework experienced a decrease in detection accuracy by less than 1% across domains, and the average drop in baseline CNN models was 4.6%. The outcomes suggest that the hybrid model not only learned manipulation-specific features, but also generalised forensic representations that can be transferred to raster image formats or to vector image formats.
When trained on the combined dataset of both image types, the CNN–FA achieved an accuracy of 98.1% and an F1-score of 97.9%, confirming its strong ability to generalise across diverse input modalities. This result establishes the hybrid approach as an adaptable forensic framework suitable for multi-format digital evidence analysis.

4.6. Evaluation on Author-Generated Tampered Images

To further examine the model’s generalisation ability beyond the benchmark datasets, a supplementary evaluation was performed using author-generated tampered images that introduce manipulation patterns not present in the original training data. These edits involved controlled geometric modifications, structural adjustments, and localised alterations applied to vector and rasterised illustrations. This qualitative test was designed to simulate novel, real-world editing behaviours and assess how the CNN–FA framework responds to unfamiliar distortion cues.
Figure 10 presents two authentic images and their corresponding tampered versions, offering insight into the model’s behaviour when encountering previously unseen manipulations. The model successfully differentiated authentic and tampered samples with accuracy consistent with its performance on the main datasets (within ±1–2%), indicating stable detection capability even under novel editing conditions. These qualitative examples complement the quantitative results and demonstrate the model’s robustness when applied to new, manually created forgeries.

4.7. Computational Efficiency and Convergence Performance

In addition to the classification performance, the computational efficiency was also considered in order to evaluate the practical advantages of FA-based optimisation. The CNNFA hybrid also shortened the average training time by 18.4 percent relative to manually tuned CNN and 23.7 percent compared to conventional optimisation methods (Table 6). The model controlled by FA converged in only 57 epochs on average, whereas, when manually tuned, the CNN took 70 epochs to stabilise.
The comparative convergence curves in Figure 11 illustrate the difference in learning behaviour between baseline and FA-optimised CNNs.
The FA-integrated model exhibited reduced laziness and reduced time required to reduce losses, which shows increased stability and reduced oscillation in the training process. The enhanced convergence property was a direct consequence of the FA global optimisation procedure, which minimised useless parameter modifications, as well as better weight distribution.
All these results prove that the Firefly Algorithm can not only optimise the results of classification, but also make a significant contribution to the computational efficiency, which is a crucial factor in forensic practice working in real-time or on a large scale.

5. Discussion

5.1. Overview of Findings and Theoretical Context

Results of the present work prove that the suggested hybrid convolutional neural network–Firefly Algorithm (CNNFA) framework improves the precision, performance, and flexibility of digital image forgery detection with a significant resolution in both the raster and the rasterised-vector space. Global convergence to metaheuristic optimisation in cooperation with deep learning reduced the amount of time spent on training and preserving high discriminative performance. This is in accordance with theoretical assumptions of swarm intelligence, according to which, decentralised search and self-organising behaviour may provide near-optimal solutions in high-dimensional problem spaces [26]. Using this principle for CNN hyperparameter optimisation, the current study shows how nature-inspired algorithms can address the local minima issues in the context of gradient-based training by systematically addressing them.
Theoretically, the research builds on knowledge of image digital forensics, operationalisation of the dual domain paradigm, and a combination of pixel-based and structure-based manipulations. Although the conventional forensic analysis has used non-uniformity noise patterns (photo-response non-uniformity) [10] or discrete wavelet transform (DWT) coefficients [27] as handcrafted features, the study helps propel the research to a single representation model that quantifies sensor-based inconsistencies and geometrical distortions of vectors in the same direction. This finesse modelling capability proves that CNNs, tuned well, have extra generalisation power beyond pixel intensity distributions to recognise localised anomalies of higher orders in rasterised-vector graphics, a facet that has frequently been ignored in the earlier literature.

5.2. Comparison to Past Studies

The findings align well with the existing literature on the prevalence of deep learning as the new paradigm in image forensics [6,23]. Like the observations of [5], which obtained high localisation accuracy in splicing detection based on improved mask regional CNNs, this experiment attests to the fact that deep networks are useful to extract hierarchical visual information for signal manipulation. Nevertheless, it goes beyond those creations by showing that the precision of the attained gains can be attained not only by means of the architectural complexity but also by means of the algebraic optimisation. The FA-based tuning algorithm came up with a 2–3% accuracy over manually tuned CNNs and a 5–7% accuracy over traditional classifiers. Although this better performance appears as an incremental one, marginal gains mean everything in a forensic certainty, as subtlety can lead to a quibble of detected cases or nothing at all.
In comparison, earlier works employing metaheuristic optimisation to deep learning, such as particle swarm optimisation (PSO) and genetic algorithms (GAs)-based optimisation, have been found to exhibit slower convergence speed and greater sensitivity to parameter initiation [28,29]. The present findings reveal that the Firefly Algorithm has a more stable convergence profile because it has an adaptive attractiveness function and a stochastic movement strategy, wherein the algorithm is more balanced in exploration and exploitation. This can be seen in the convergence curve (Figure 4), which reached stability after around 55 iterations versus the 7090 iterations of the PSO-based tuning experiments in the past [29]. These results provide empirical support to theoretical arguments about the better ability of FA to search globally.
The other area of comparison involves the performance in the image domain. Although several CNN-based models of forensic models have been predominantly used on raster images [5,16,17,28], few have investigated the flexibility of models to the concept of vectors. The CNNFA hybrid was achieving an accuracy of 96.9% on the rasterised-vector forgeries not previously reported in the literature. This indicates that the proposed framework was able to learn discriminative structural aspects, which supports the argument that deep models are capable of detecting high-level geometrical inconsistencies with careful learning [6]. This observation makes the CNNFA hybrid a transitional point between two hitherto unrelated areas of forensic study.
Benchmark studies consistently report that dedicated forensic models such as ManTra-Net [24] and Noiseprint [25] achieve strong performance on tampering detection, particularly in localisation and noise-residual-based tasks. While our hybrid CNN–FA framework is not tailored to mimic these specialised architectures, its performance aligns with general-purpose CNN-based detectors and demonstrates improved optimisation efficiency. A direct comparison with these systems is identified as a key direction for extending the present work.

5.3. New Contributions and Implications

This study is novel not only in the field of performance enhancement but also in conceptual integration. The article presents a synthesis of methodologies between deep learning and swarm intelligence that goes beyond heuristic parameter optimisation. Optimisation of the FA is a self-going control unit, which self-tunes its parameters, including the learning rate, the batch size, and the depth of the convolutional layers to fit the feedback information of the fitness landscape of the model. This functionality transforms the CNN training procedure into a non-adaptive, designer-sensitive technique into an adaptive and data-driven pipeline of optimisation. This kind of interconnection is a paradigm shift in how deep-learning models can be tuned to facilitate forensic intelligence operations.
Theoretically, these discoveries support the logic that forensic detection is an optimisation issue rather than only a classification problem. The FA-based method realises this perspective by trying to find global optima of the loss surface, and it leads to stronger generalisation. Practically, this helps advance the field of digital forensic science by providing an architecture that can handle mass image archives more accurately and efficiently, which is a key resource in cybersecurity agencies, the truthfulness of an image in the media, and judicial use of this information as a resource.
The model was not evaluated under common real-world distortions such as varying compression strengths, additive noise, rescaling artefacts, or adversarial attacks designed to evade forensic detectors. These factors often appear in operational environments and may influence detection performance. As such, the robustness demonstrated in this study applies only to the controlled conditions of the experimental datasets.
Moreover, the generalisation of the two domains that is witnessed in this study has some serious policy and government implications. Since legal and journalistic integrity remain tested due to the ever-increasing number of synthetic and manipulated media, the AI-based forensic systems should be in a position to understand raster-based photographs and the graphical material in forms of infographics, company logos, and design documents, as well. The support to identify anomalies through the CNN-FA model of achieving the standards of standardised, algorithmic audit systems of digital evidence validation is essential as the fundamental concern adopted by regulatory authorities and agencies, including the AI Act of the EU and the UNESCO guidelines on the authenticity of media.

5.4. Critical Reflection of Limitations

Although the research has good outcomes, it has various limitations that are worth considering. First, the hybrid framework showed excellent generalisation in experimental datasets, however, on the other hand, the CASIA v2.0 as well as the custom vector dataset are controlled environments. Forgeries that occur in the real world usually have compounded manipulations, variable compression ratios, and metadata manoeuvring inserted. The lack of them can create an overestimation of the model’s robustness in functional settings. Further investigations ought to then validate it on in-the-field datasets, the manipulations of mixed contents, and the amounts of post-processing.
Second, the Firefly Algorithm, despite having the ability to optimise important hyperparameters, is not trivial in terms of computation. A single FA iteration necessitates a series of CNNs, which leads to the ensuing large total cost of training. Despite the mitigation of this burden by GPU acceleration, it is still unclear how the approach will scale to larger architectures, including vision transformers or hybrid CNNRNN models. This drawback shows the trade-off between the accuracy of optimisation and the computational feasibility in metaheuristic-guided deep learning.
Third, the study does not include comparisons against specialised state-of-the-art forensic networks such as ManTra-Net [24] or Noiseprint [25], which are designed to capture fine-grained manipulation traces using noise-residual learning or multi-branch feature extraction. Incorporating these models would provide a clearer benchmark for high-resolution tampering localisation and global detection performance. Future work will integrate these architectures to more directly compare the effects of Firefly-based hyperparameter optimisation against systems that employ dedicated forensic feature engineering.
Fourth, while the optimised model achieved an 18.4% reduction in per-epoch training time compared with the manually tuned CNNs, the Firefly optimisation stage introduces non-trivial computational overhead. Each FA iteration requires training candidate CNN configurations, resulting in a total optimisation cost approximately 20–30 times higher than a single CNN training pass. This one-time cost yields a final model that is more efficient and stable for deployment, but it does not reduce total computation during the optimisation phase.
Last but not least, in the study, the main area of interest was binary classification (authentic vs. tampered). Though this methodology can detect the presence of manipulation, it cannot determine the type or degree of forgery. To be practically deployed, future systems need to combine localisation and classification of fine-grained tampering, and to build upon the existing structure by being able to segment semantically or classify into multiple labels.

5.5. Implications Concerning Theory, Policy, and Practice

Theoretically, this study makes the case that optimisation is the heart of effective deep learning in forensics. FA search behaviour over the globe favours theories of adaptive intelligence and self-organisation during computational models and is consistent with the ideas derived in complex systems and evolutionary computation. The study gives empirical support to new models of hybrid intelligence, in which algorithmic evolution is used to supplement gradient-based learning by illustrating that metaheuristic optimisation can provide better convergence in CNNs.
Regarding the policy and institutional practice, the results highlight the necessity of AI-based forensic validation frameworks. Hybrid networks such as CNNFA can be integrated into pipelines of automated authenticity checks by governments, law enforcement, and content platforms with digital content, whether by scaling pipelines in bulk datasets, around-the-clock media content, or legal book collections, to detect and report forgeries in the content it collects. In addition, the proven cross-domain flexibility of the model is stable in accordance with the ethical requirements of transparency and responsibility in digital governance, under the condition of ensuring that AI-based verification is also sound in relation to various types of manipulations.
In practical terms, the research offers a repeatable architecture of the model that may be employed in cybersecurity systems, image authentication programmes, or media forensics sites on clouds. The fact that optimisation-based frameworks are reduced by 18 percent in training time and improved by 3 percent in accuracy levels refers to the fact that optimisation-based frameworks are not only theoretically efficient but can also be deployed faster and at a lower resource-consumption level.
These findings should be the basis of future research on the following areas. To start with, it is necessary to increase the dataset diversity: the inclusion of real-world manipulated content on social media, synthetic image generators, and professional editing suites will positively affect ecological validity. Second, the integration of multi-modes consisting of visual, metadata, and sensor-based cues may result in holistic forensic systems, which have the potential to detect both visible and invisible tampering. Third, interpretable optimisation schemes need to be created to integrate the global searching capability of FA with the interpretable attribute of features, hence aiding in boosting the forensic and law applicability of the model.
Furthermore, the transfer learning and domain adaptation should be investigated in the future to improve CNN-FA performance in case of unseen manipulation styles and new types of images, such as 3D-rendered and AI-generated images. Last but definitely not least, hybrid optimisation with Firefly and adaptive learning methods (e.g., FA–Bayesian optimisation) may be used to further optimise convergence efficiency and, at the same time, ensure model stability.
Future work will include robustness testing across a range of practical distortions, including JPEG quality degradation, Gaussian noise, impulse noise, resampling effects, and adversarial perturbations. By incorporating such evaluation protocols, a more comprehensive assessment of operational reliability can be enabled and align the framework with emerging robustness benchmarks in image forensics.

6. Conclusions

The aim of this work was to create and test a hybrid convolutional neural network–Firefly Algorithm (CNN-FA) system to improve the performance and detection of forged digital images by improving accuracy, efficiency, and flexibility. The study was meant to fill major limitations in the conventional CNN-based forensic framework, specifically, their reliance on manual hyperparameter optimisation, domain-specific pitfalls, and inconsistent generalisation across image categories. The study aimed to streamline the use of the Firefly Algorithm within the CNN training workflow to optimise the learning parameters, faster convergence, and enhanced detection quality of raster and vectors, and hence, robustness to image detection.
The results of the experiments proved that the suggested CNNFA hybrid is more efficient than the baseline CNN models and classical machine-learning classifiers regarding several evaluation criteria. The FA-optimised model was found to be 98.4 percent accurate on the CASIA v2.0 raster dataset and 96.9 percent accurate on the custom vector dataset, which is a significant improvement in accuracy, recall, and F1-score over manually hand-tuned models. Further, the cross-domain validation has indicated that the model is capable of generalising well between pixel-based representations as well as structure-based representations of an image without much performance destabilisation. As a computation, the FA maximisation saved training time by about 18 percent and increased convergence stability, demonstrating its possible resource utilisation in large-scale forensic use.
The mentioned findings have implications for the theoretical and practical construction of digital image forensics beyond those mentioned. Ideally, the methodology demonstrates that image forgery detection can be redefined as an optimisation problem and, to this extent, direct search of global settings of hyperparameters directly enhances the model generalisation and its power. Swarm intelligence, a conjunction of deep learning and swarm intelligence, is a promising avenue in existing forensic models since it reveals that nature-inspired optimisation can offer greater interpretability in model behavioural characteristics and precision. Practically, the proposed architecture can offer a high-scaling and computationally effective model to a broad spectrum of forensics application frameworks, including cybersecurity, media identity verification, law enforcement, and intellectual property protection. The strategic choice in the raster or the vector version, where the hybrid model has proven to be highly performing in its functions, is a result of the efforts towards integrating the forensic tools that can understand the heterogeneous digital evidence.
Nevertheless, there are a number of limitations that need to be admitted. The used datasets, however extensive, were mostly controlled and might not fully represent the complexity of real manipulations occurring in the world in terms of keeping a layered forge, compression artefacts, or adversarial perturbations. Also, the computational effort of metaheuristic optimisation is still high, and it may be proposed to implement FA adaptively or in parallel. Additionally, not included in the study were the mechanisms of explainability, which are becoming more and more imperative in providing transparency in algorithms used in decision-making in forensics. By dealing with these constraints by incorporating both real-world and adversarial data, multiple-objective optimisation algorithms and explainable AI components, the framework will become robust and more effectively viable.
The findings should be interpreted within the controlled conditions of the datasets used, as real-world distortions and adversarial conditions were not included in the current experimental design.
Going forward, future studies must address the cross-mode extension or expansion of this method through the incorporation of visual, metadata, and contextual cues in the same forensic models. Hybrid schemes of managing optimisation, such as FA in conjunction with Bayesian reinforcement theories, can bring more efficiency and stability. The current development of multimedia manipulation models, such as AI-generated images and deepfakes, suggests the topicality of adaptive forensics based on smart optimisation.
In summary, the CNN-FA hybrid structure that is carried out in the paper is an important step in the field of digital image forensics. This study contributes to the body of available theoretical literature, as well as a viable route towards more reliable and trustworthy future generations of digital security-themed forensic intelligence systems, by displaying how metaheuristic-guided learning can bridge the gap that currently exists between accuracy, efficiency, and adaptability.

Author Contributions

Conceptualization, A.A.R.B.; methodology, A.A.R.B.; software, A.A.R.B.; validation, A.A.R.B. and Y.A.; formal analysis, A.A.R.B.; investigation, A.A.R.B.; resources, A.A.R.B.; data curation, A.A.R.B. and Y.A.; writing—original draft preparation, A.A.R.B.; writing—review and editing, A.A.R.B. and Y.A.; visualization, A.A.R.B.; supervision, A.A.R.B.; project administration, A.A.R.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Custom Vector Dataset Construction and Manipulation Protocol

This appendix provides the full procedural details needed to replicate the custom vector-image dataset referenced in Section 3.1.1. Although the dataset cannot be publicly released due to the licencing constraints of some source materials, the following documentation specifies the complete workflow for re-creating a functionally equivalent dataset.

Appendix A.1. Source Repositories and Image Selection

Vector files were collected exclusively from openly accessible, free-licence repositories:
  • OpenClipArt Library (public domain)
  • Freepik Free Licence Collections
  • PublicDomainVectors.org
Files were chosen according to the following conditions:
  • Formats accepted: SVG and AI
  • Minimum complexity: At least three distinct path objects or shapes
  • No embedded raster images (pure vector content only)
  • Variety of themes: icons, illustrations, badges, geometric shapes, and simple artwork
  • Original files without prior manipulation
A total of 8712 vectors were selected, of which 5326 were used as authentic samples.

Appendix A.2. Manipulation Tools and Software Versions

All manipulations were conducted using the following tools:
  • Inkscape 1.3
  • Adobe Illustrator 2024 (v28)
  • Custom batch-editing scripts (Python 3.10 with svgpathtools and cairosvg)
Each tool’s settings and parameter ranges are documented below.

Appendix A.3. Tampering Operations and Parameters

Each authentic vector file was modified to produce one or more tampered versions (total: 3386).
1. Geometric Distortion
Performed in Inkscape and Illustrator.
  • Scaling: ±10%, ±20%, ±30%, ±40%
  • Aspect ratio distortion: 0.8–1.2
  • Rotation: 15°, 30°, 45°, 60°
  • Skew: horizontal ±15°, vertical ±15°
2. Object Removal
  • Manual deletion of 1–3 significant objects
  • Automated removal using a custom script that identifies top-level nodes and deletes them randomly
3. Object Insertion
  • Insertion of irrelevant shapes, icons, or cloned paths
  • Inserted objects scaled to 5–20% of total canvas area
4. Path Manipulation
  • Bézier control point displacement: 5–25 px
  • Path smoothing or sharpness alteration via Illustrator’s “Simplify Path” tool
5. Gradient and Colour Alteration
  • Gradient angle changes (90° increments)
  • Colour shifts: ±10 to ±35 hue units
  • Gradient stop count: increased or decreased by 1–2

Appendix A.4. Rasterisation Settings (For CNN Compatibility)

All vector files (authentic and tampered) were rasterised using identical parameters to create consistent pixel-based inputs for model training:
  • Renderer: CairoSVG 2.7
  • Output size: 512 × 512 px
  • Background: white (RGB 255, 255, 255)
  • DPI: 96
  • Anti-aliasing: enabled (default)
Each rasterised file was saved as a JPEG.

Appendix A.5. Randomisation and Reproducibility

  • Random seed: 42
  • Used for the following:
    selecting files for manipulation
    choosing manipulation types
    parameter randomisation in scripts
  • Python libraries: NumPy (seed 42), random (seed 42)

Appendix A.6. Summary

This appendix provides all necessary details, source repositories, manipulation tools, parameter ranges, rasterisation settings, and randomisation controls, to allow independent reconstruction of the dataset or generation of a comparable benchmark dataset.

References

  1. Nastasi, C. Advisor Sebastiano Battiato. Multimedia Forensics: From Image Manipulation to the Deep Fake. New Threats in the Social Media Era. 2021. Available online: https://tesidottorato.depositolegale.it/handle/20.500.14242/124097 (accessed on 29 October 2025).
  2. Amerini, I.; Barni, M.; Battiato, S.; Bestagini, P.; Boato, G.; Bruni, V.; Caldelli, R.; De Natale, F.; De Nicola, R.; Guarnera, L.; et al. Deepfake media forensics: Status and future challenges. J. Imaging 2025, 11, 73. [Google Scholar] [CrossRef] [PubMed]
  3. Gupta, G.; Raja, K.; Gupta, M.; Jan, T.; Whiteside, S.T.; Prasad, M. A comprehensive review of deepfake detection using advanced machine learning and fusion methods. Electronics 2023, 13, 95. [Google Scholar] [CrossRef]
  4. Bourouis, S.; Alroobaea, R.; Alharbi, A.M.; Andejany, M.; Rubaiee, S. Recent Advances in Digital Multimedia Tampering Detection for Forensics Analysis. Symmetry 2020, 12, 1811. [Google Scholar] [CrossRef]
  5. Wang, X.; Wang, H.; Niu, S.; Zhang, J. Detection and localization of image forgeries using improved mask regional convolutional neural network. Math. Biosci. Eng. 2019, 16, 4581–4593. [Google Scholar] [CrossRef]
  6. Camacho, I.C.; Wang, K. A Comprehensive Review of Deep-Learning-Based Methods for Image Forensics. J. Imaging 2021, 7, 69. [Google Scholar] [CrossRef]
  7. Yang, X. Firefly Algorithms for Multimodal Optimization; International Symposium on Stochastic Algorithms; Springer: Berlin/Heidelberg, Germany, 2009; Volume 5792, pp. 169–178. [Google Scholar] [CrossRef]
  8. Ghasemi, M.; Mohammadi, S.K.; Zare, M.; Mirjalili, S.; Gil, M.; Hemmati, R. A new firefly algorithm with improved global exploration and convergence with application to engineering optimization. Decis. Anal. J. 2022, 5, 100125. [Google Scholar] [CrossRef]
  9. Strumberger, I.; Tuba, E.; Bacanin, N.; Zivkovic, M.; Beko, M.; Tuba, M. Designing convolutional neural network architecture by the firefly algorithm. In Proceedings of the 2019 International Young Engineers Forum (YEF-ECE), Costa da Caparica, Portugal, 10 May 2019. [Google Scholar] [CrossRef]
  10. Lukas, J.; Fridrich, J.; Goljan, M. Digital camera identification from sensor pattern noise. IEEE Trans. Inf. Forensics Secur. 2006, 1, 205–214. [Google Scholar] [CrossRef]
  11. Manisha; Li, C.-T.; Lin, X.; Kotegar, K.A. Beyond PRNU: Learning Robust Device-Specific Fingerprint for Source Camera Identification. Sensors 2022, 22, 7871. [Google Scholar] [CrossRef] [PubMed]
  12. Rafique, R.; Gantassi, R.; Amin, R.; Frnda, J.; Mustapha, A.; Alshehri, A.H. Deep fake detection and classification using error-level analysis and deep learning. Sci. Rep. 2023, 13, 7422. [Google Scholar] [CrossRef]
  13. Soni, B.; Biswas, D. Image forensic using block-based copy-move forgery detection. In Proceedings of the 2018 5th International Conference on Signal Processing and Integrated Networks (SPIN), Noida, India, 22–23 February 2018. [Google Scholar]
  14. Dunsin, D.; Ghanem, M.C.; Ouazzane, K.; Vassilev, V. A comprehensive analysis of the role of artificial intelligence and machine learning in modern digital forensics and incident response. Forensic Sci. Int. Digit. Investig. 2024, 48, 301675. [Google Scholar] [CrossRef]
  15. Jabbarlı, G.; Kurt, M. LightFFDNets: Lightweight Convolutional Neural Networks for Rapid Facial Forgery Detection. arXiv 2024, arXiv:2411.11826. [Google Scholar] [CrossRef]
  16. He, Y. Improved Faster R-CNN Based on Multi-Scale Module for Detecting Weak Ground Targets in Remote Sensing Images. World Sci. 2025, 34, 2025. [Google Scholar] [CrossRef]
  17. Lin, X.; Tang, W.; Wang, H.; Liu, Y.; Ju, Y.; Wang, S.; Yu, Z. Exposing image splicing traces in scientific publications via uncertainty-guided refinement. Patterns 2024, 5, 101038. [Google Scholar] [CrossRef]
  18. Dong, J.; Wang, W.; Tan, T. Casia image tampering detection evaluation database. In Proceedings of the 2013 IEEE China Summit and International Conference on Signal and Information Processing, Beijing, China, 6–10 July 2013. [Google Scholar]
  19. Kaveh, M.; Mesgari, M.S. Application of meta-heuristic algorithms for training neural networks and deep learning architectures: A comprehensive review. Neural Process. Lett. 2023, 55, 4519–4622. [Google Scholar] [CrossRef] [PubMed]
  20. Hassani, S.; Dackermann, U. A systematic review of optimization algorithms for structural health monitoring and optimal sensor placement. Sensors 2023, 23, 3293. [Google Scholar] [CrossRef]
  21. Bacanin, N.; Bezdan, T.; Venkatachalam, K.; Al-Turjman, F. Optimized convolutional neural network by firefly algorithm for magnetic resonance image classification of glioma brain tumor grade. J. Real Time Image Process. 2021, 18, 1085–1098. [Google Scholar] [CrossRef]
  22. Dhivya, P.; Kumaresan, T.; Subramanian, P.; Gunasekaran, K.; Kumar, G.S. Hybrid Firefly Meta Optimization for Bio Medical Image Processing Using Deep Learning. J. Pharm. Negat. Results 2022, 13, 1199–1209. [Google Scholar] [CrossRef]
  23. Akay, B.; Karaboga, D.; Akay, R. A comprehensive survey on optimizing deep learning models by metaheuristics. Artif. Intell. Rev. 2022, 55, 829–894. [Google Scholar] [CrossRef]
  24. Wu, Y.; AbdAlmageed, W.; Natarajan, P. Mantra-net: Manipulation tracing network for detection and localization of image forgeries with anomalous features. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
  25. Cozzolino, D.; Verdoliva, L. Noiseprint: A CNN-based camera model fingerprint. IEEE Trans. Inf. Forensics Secur. 2019, 15, 144–159. [Google Scholar] [CrossRef]
  26. Kennedy, J. Swarm intelligence. In Handbook of Nature-Inspired and Innovative Computing; Springer: Berlin/Heidelberg, Germany, 2006; pp. 187–219. [Google Scholar] [CrossRef]
  27. Yang, J.; Zhu, G.; Luo, Y.; Kwong, S.; Zhang, X.; Zhou, Y. Forensic analysis of JPEG-domain enhanced images via coefficient likelihood modeling. Trans. Circuits Syst. Video Technol. 2021, 32, 1006–1019. [Google Scholar] [CrossRef]
  28. Hussain, W.; Mushtaq, M.F.; Shahroz, M.; Akram, U.; Ghith, E.S.; Tlija, M.; Kim, T.-H.; Ashraf, I. Ensemble genetic and CNN model-based image classification by enhancing hyperparameter tuning. Sci. Rep. 2025, 15, 1003. [Google Scholar] [CrossRef] [PubMed]
  29. Inik, Ö. SwarmCNN: An efficient method for CNN hyperparameter optimization using PSO and ABC metaheuristic algorithms. J. Supercomput. 2025, 81, 874. [Google Scholar] [CrossRef]
Figure 1. Different processes that form the operational backbone of DIF.
Figure 1. Different processes that form the operational backbone of DIF.
Ai 06 00321 g001
Figure 2. Proposed hybrid CNN–Firefly Algorithm framework for digital image forensics.
Figure 2. Proposed hybrid CNN–Firefly Algorithm framework for digital image forensics.
Ai 06 00321 g002
Figure 3. Hybrid CNN model architecture for tampered image detection.
Figure 3. Hybrid CNN model architecture for tampered image detection.
Ai 06 00321 g003
Figure 4. Firefly Algorithm for CNN hyperparameter optimisation cycle.
Figure 4. Firefly Algorithm for CNN hyperparameter optimisation cycle.
Ai 06 00321 g004
Figure 5. Convergence curve of Firefly optimisation process.
Figure 5. Convergence curve of Firefly optimisation process.
Ai 06 00321 g005
Figure 6. Visual examples of correct and incorrect classification outcomes produced by the proposed CNN–FA model (a) True Positive; (b) True Negative; (c) False Positive; (d) False Negative.
Figure 6. Visual examples of correct and incorrect classification outcomes produced by the proposed CNN–FA model (a) True Positive; (b) True Negative; (c) False Positive; (d) False Negative.
Ai 06 00321 g006
Figure 7. ROC curves for raster-based forgery detection.
Figure 7. ROC curves for raster-based forgery detection.
Ai 06 00321 g007
Figure 8. Confusion matrix for vector-image classification.
Figure 8. Confusion matrix for vector-image classification.
Ai 06 00321 g008
Figure 9. Comparative performance across domains.
Figure 9. Comparative performance across domains.
Ai 06 00321 g009
Figure 10. Visual examples of novel authentic and manually created tampered samples (a) Authentic image 1; (b) Tampered Image 1; (c) Authentic Image 2; (d) Tampered Image 2.
Figure 10. Visual examples of novel authentic and manually created tampered samples (a) Authentic image 1; (b) Tampered Image 1; (c) Authentic Image 2; (d) Tampered Image 2.
Ai 06 00321 g010
Figure 11. Convergence curves of baseline vs. FA-optimised CNNs.
Figure 11. Convergence curves of baseline vs. FA-optimised CNNs.
Ai 06 00321 g011
Table 1. Summary of datasets used in the study.
Table 1. Summary of datasets used in the study.
Dataset NameImage TypeAuthentic SamplesTampered SamplesFormat(s)Manipulation TypesSource
CASIA v2.0Raster74915123JPEGSplicing, Copy-Move, RetouchingDong et al. [18]
Custom Vector DatasetRasterised-Vector53263386SVG, AI (JPEG after rasterisation)Object Insertion, Deletion, Geometric Distortion, Colour and Gradient EditingCreated from open repositories (2025)
Table 2. Optimised hyperparameters and corresponding fitness scores.
Table 2. Optimised hyperparameters and corresponding fitness scores.
HyperparameterRange TestedOptimal Value
(FA)
Baseline
(Manual)
Relative
Improvement
Learning Rate (α)0.0001–0.010.00170.003+13.2 % stability
Conv Layers (L)3–1076
Filter Size (f)3 × 3–7 × 75 × 53 × 3
Dropout Rate (d)0.1–0.50.250.3+4.5 % accuracy
Batch Size (b)16–25664128−18 % training time
Validation Accuracy (%)98.7 ± 0.296.4 ± 0.4+2.3%
Table 3. Performance comparison on CASIA v2.0 dataset (raster images).
Table 3. Performance comparison on CASIA v2.0 dataset (raster images).
ModelAccuracy (%)Precision (%)Recall (%)F1-Score (%)AUC
SVM (Classical)91.290.891.591.10.926
Random Forest92.691.992.492.10.941
CNN (Baseline)95.895.395.695.40.964
CNN (Manual Tuning)96.296.096.196.00.971
CNN–FA (Proposed)98.498.197.998.00.992
Table 4. Performance metrics for vector-image tampering detection.
Table 4. Performance metrics for vector-image tampering detection.
ModelAccuracy (%)Precision (%)Recall (%)F1-Score (%)AUC
Random Forest89.690.488.989.60.918
CNN (Baseline)93.593.093.393.10.955
CNN (Manual Tuning)94.294.094.194.00.961
CNN–FA (Proposed)96.996.497.196.80.982
Table 5. Cross-domain generalisation and combined dataset results.
Table 5. Cross-domain generalisation and combined dataset results.
Training →
Testing Domain
Accuracy (%)Precision (%)Recall (%)F1-Score (%)
Raster → Vector97.597.397.697.4
Vector → Raster98.097.598.197.8
Combined (Train + Test)98.197.998.097.9
Table 6. Training time and computational cost comparison.
Table 6. Training time and computational cost comparison.
ModelTraining Time (h)Epochs to ConvergeGPU Utilisation (%)Relative Speed Gain
CNN (Baseline)9.87083
CNN (Manual Tuning)8.76881+11.2%
CNN–FA (Proposed)7.15778+18.4%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bsoul, A.A.R.; Alshboul, Y. Integrating Convolutional Neural Networks with a Firefly Algorithm for Enhanced Digital Image Forensics. AI 2025, 6, 321. https://doi.org/10.3390/ai6120321

AMA Style

Bsoul AAR, Alshboul Y. Integrating Convolutional Neural Networks with a Firefly Algorithm for Enhanced Digital Image Forensics. AI. 2025; 6(12):321. https://doi.org/10.3390/ai6120321

Chicago/Turabian Style

Bsoul, Abed Al Raoof, and Yazan Alshboul. 2025. "Integrating Convolutional Neural Networks with a Firefly Algorithm for Enhanced Digital Image Forensics" AI 6, no. 12: 321. https://doi.org/10.3390/ai6120321

APA Style

Bsoul, A. A. R., & Alshboul, Y. (2025). Integrating Convolutional Neural Networks with a Firefly Algorithm for Enhanced Digital Image Forensics. AI, 6(12), 321. https://doi.org/10.3390/ai6120321

Article Metrics

Back to TopTop