Figure 1.
Example of image splicing forgery. (a) The authentic image. (b) The forged image. (c) The ground truth.
Figure 1.
Example of image splicing forgery. (a) The authentic image. (b) The forged image. (c) The ground truth.
Figure 2.
The overall architecture of the proposed AMSEANet. The network follows a standard encoder–decoder structure, incorporating three main innovative modules: the Cross-Scale Dense Residual Fusion Block (CSDRFBlock) for multi-scale feature enhancement, the Edge-Aware Spatial-Frequency Fusion Module (ESFFM) for guided trace perception, and the MGFA mechanism in the skip connections.
Figure 2.
The overall architecture of the proposed AMSEANet. The network follows a standard encoder–decoder structure, incorporating three main innovative modules: the Cross-Scale Dense Residual Fusion Block (CSDRFBlock) for multi-scale feature enhancement, the Edge-Aware Spatial-Frequency Fusion Module (ESFFM) for guided trace perception, and the MGFA mechanism in the skip connections.
Figure 3.
The architecture of the proposed MGFA module.
Figure 3.
The architecture of the proposed MGFA module.
Figure 4.
The architecture of the proposed CSDRFBlock. The blocks on the left represent the input multi-scale feature maps. The module processes them through its three main components: a multi-scale feature generation layer, a top-down cross-scale interaction branch, and a bottom-up residual enhancement path to produce the fused output. The red arrows indicate downsampling operations, while the dark red arrows within the residual path represent upsampling operations.
Figure 4.
The architecture of the proposed CSDRFBlock. The blocks on the left represent the input multi-scale feature maps. The module processes them through its three main components: a multi-scale feature generation layer, a top-down cross-scale interaction branch, and a bottom-up residual enhancement path to produce the fused output. The red arrows indicate downsampling operations, while the dark red arrows within the residual path represent upsampling operations.
Figure 5.
A diagram of the structure of SimpleGate.
Figure 5.
A diagram of the structure of SimpleGate.
Figure 6.
The architecture of the proposed Simplified Channel Attention Block (SCABlock) and the Multi-Receptive Field Block (MRFBlock).
Figure 6.
The architecture of the proposed Simplified Channel Attention Block (SCABlock) and the Multi-Receptive Field Block (MRFBlock).
Figure 7.
The architecture of the Edge-Aware Feature Enhancement Block (EFEBlock). The block is composed of two ESFFM modules and a ConvBlock in series, with a weight-sharing strategy applied between corresponding modules.
Figure 7.
The architecture of the Edge-Aware Feature Enhancement Block (EFEBlock). The block is composed of two ESFFM modules and a ConvBlock in series, with a weight-sharing strategy applied between corresponding modules.
Figure 8.
Detailed architecture of the Edge-Aware Module (ESFFM) and its core component, the Adaptive Smooth Filter (ASF). The diagram illustrates how ESFFM uses frequency decomposition to generate an attention map for feature enhancement. The dashed box details the working principle of its core component, the ASF, which dynamically generates a smoothing kernel from learnable parameters.
Figure 8.
Detailed architecture of the Edge-Aware Module (ESFFM) and its core component, the Adaptive Smooth Filter (ASF). The diagram illustrates how ESFFM uses frequency decomposition to generate an attention map for feature enhancement. The dashed box details the working principle of its core component, the ASF, which dynamically generates a smoothing kernel from learnable parameters.
Figure 9.
Diagram illustrating the principle of the traditional Dice loss function. The method measures the quality of a segmentation result by comparing the Predicted Forgery Mask and the Ground Truth Forgery Mask based on their intersection and non-overlapping regions.
Figure 9.
Diagram illustrating the principle of the traditional Dice loss function. The method measures the quality of a segmentation result by comparing the Predicted Forgery Mask and the Ground Truth Forgery Mask based on their intersection and non-overlapping regions.
Figure 10.
Tamper localization results of AMSEANet and other methods on public datasets. The rows, from top to bottom, show two representative samples each from the CASIA v2.0, NIST’16, and COLUMBIA datasets, respectively.
Figure 10.
Tamper localization results of AMSEANet and other methods on public datasets. The rows, from top to bottom, show two representative samples each from the CASIA v2.0, NIST’16, and COLUMBIA datasets, respectively.
Figure 11.
Visual comparison of the loss function ablation study. (a) Input image and (g) ground truth. (b–f) show the prediction results under different loss function configurations, where (f) is the result from our final Multi-Scale Edge Dice loss (ME-Dice).
Figure 11.
Visual comparison of the loss function ablation study. (a) Input image and (g) ground truth. (b–f) show the prediction results under different loss function configurations, where (f) is the result from our final Multi-Scale Edge Dice loss (ME-Dice).
Figure 12.
Comparative diagrams of different frequency-domain information fusion strategies. The figure illustrates the six different feature fusion methods based on fixed filters (SRM, Gaussian filter) and the adaptive filter (ASFilter) used in the comparative experiments.
Figure 12.
Comparative diagrams of different frequency-domain information fusion strategies. The figure illustrates the six different feature fusion methods based on fixed filters (SRM, Gaussian filter) and the adaptive filter (ASFilter) used in the comparative experiments.
Figure 13.
Comparison results under noise addition attack. The three columns represent Precision, Recall, and F-measure, respectively. (a–c) show the experiment results on the CASIA v2.0, COLUMBIA, and NIST16 datasets, respectively.
Figure 13.
Comparison results under noise addition attack. The three columns represent Precision, Recall, and F-measure, respectively. (a–c) show the experiment results on the CASIA v2.0, COLUMBIA, and NIST16 datasets, respectively.
Figure 14.
Comparison results under JPEG compression attack. The three columns represent Precision, Recall, and F-measure, respectively. (a–c) show the experiment results on the CASIA v2.0, COLUMBIA, and NIST16 datasets, respectively.
Figure 14.
Comparison results under JPEG compression attack. The three columns represent Precision, Recall, and F-measure, respectively. (a–c) show the experiment results on the CASIA v2.0, COLUMBIA, and NIST16 datasets, respectively.
Figure 15.
Comparison results under resize operation attack. The three columns represent Precision, Recall, and F-measure, respectively. (a–c) show the experiment results on the CASIA v2.0, COLUMBIA, and NIST16 datasets, respectively.
Figure 15.
Comparison results under resize operation attack. The three columns represent Precision, Recall, and F-measure, respectively. (a–c) show the experiment results on the CASIA v2.0, COLUMBIA, and NIST16 datasets, respectively.
Figure 16.
Successful detection results of AMSEANet on natural background replacement samples. Subfigure (a) shows the original images, while the model’s prediction (b) is highly consistent with the ground truth (c), demonstrating its effectiveness.
Figure 16.
Successful detection results of AMSEANet on natural background replacement samples. Subfigure (a) shows the original images, while the model’s prediction (b) is highly consistent with the ground truth (c), demonstrating its effectiveness.
Figure 17.
Limitation analysis of AMSEANet on highly challenging samples. Subfigure (a) shows the original images, while red boxes highlight discrepancies between the prediction (b) and the ground truth (c), showing that the model may produce incomplete detections on contours with extremely fine-grained details.
Figure 17.
Limitation analysis of AMSEANet on highly challenging samples. Subfigure (a) shows the original images, while red boxes highlight discrepancies between the prediction (b) and the ground truth (c), showing that the model may produce incomplete detections on contours with extremely fine-grained details.
Table 1.
Dataset partitioning and parameter settings for robustness tests.
Table 1.
Dataset partitioning and parameter settings for robustness tests.
| Sets | | Param | Range | Step | CASIA V2.0 | COLUMBIA | NIST’16 |
|---|
| Training | - | - | - | 1300 | 125 | 180 |
| Validation | - | - | - | 100 | 10 | 15 |
| Testing | Plain | - | - | - | 200 | 45 | 60 |
| JPEG compression | QF | 50–90 | 10 | 200 × 5 | 45 × 5 | 60 × 5 |
| Noise addition | Var | 0.002–0.01 | 0.002 | 200 × 5 | 45 × 5 | 60 × 5 |
| Resize operation | Ratio | 0.5–0.9 | 0.1 | 200 × 5 | 45 × 5 | 60 × 5 |
| All images | - | - | - | 4600 | 855 | 1155 |
Table 2.
Performance comparison of our proposed method (Ours) against several state-of-the-art forgery detection methods. The best results are highlighted in bold.
Table 2.
Performance comparison of our proposed method (Ours) against several state-of-the-art forgery detection methods. The best results are highlighted in bold.
| Method | Detection Result |
|---|
| CASIA v2 | COLUMBIA | NIST’16 |
|---|
| Precision | Recall | F | Precision | Recall | F | Precision | Recall | F |
|---|
| NOI | 0.281 | 0.787 | 0.414 | 0.394 | 0.932 | 0.554 | 0.159 | 0.643 | 0.255 |
| CFA | 0.255 | 0.450 | 0.325 | 0.497 | 0.512 | 0.504 | 0.196 | 0.218 | 0.206 |
| ManTra-Net | 0.809 | 0.786 | 0.797 | 0.836 | 0.894 | 0.864 | 0.819 | 0.861 | 0.839 |
| RRU-Net | 0.851 | 0.836 | 0.843 | 0.924 | 0.822 | 0.871 | 0.789 | 0.742 | 0.765 |
| C2R-Net | 0.611 | 0.741 | 0.669 | 0.567 | 0.848 | 0.680 | 0.477 | 0.675 | 0.559 |
| CAT-Net | 0.833 | 0.869 | 0.851 | 0.941 | 0.952 | 0.946 | 0.924 | 0.893 | 0.908 |
| PSCC-Net | 0.821 | 0.806 | 0.813 | 0.919 | 0.949 | 0.934 | 0.747 | 0.782 | 0.764 |
| Ours | 0.887 | 0.865 | 0.877 | 0.965 | 0.963 | 0.964 | 0.917 | 0.921 | 0.919 |
Table 3.
Results of the ablation study on the core components of AMSEANet, conducted on the CASIA v2.0 dataset. The checkmark (🗸) indicates that the module is included in the configuration, while the cross (×) indicates its exclusion.
Table 3.
Results of the ablation study on the core components of AMSEANet, conducted on the CASIA v2.0 dataset. The checkmark (🗸) indicates that the module is included in the configuration, while the cross (×) indicates its exclusion.
| Method Set | ME-Dice | CSDRF | ESFFM | MGFA | Precision | Recall | F |
|---|
| Baseline | × | × | × | × | 0.703 | 0.715 | 0.709 |
| Edgebase | 🗸 | × | × | × | 0.794 | 0.781 | 0.787 |
| Edgebase + ESFFM | 🗸 | × | 🗸 | × | 0.801 | 0.821 | 0.811 |
| Edgebase + MGFA | 🗸 | × | × | 🗸 | 0.810 | 0.798 | 0.804 |
| Edgebase + CSDRF | 🗸 | 🗸 | × | × | 0.837 | 0.832 | 0.834 |
| Edgebase + CSDRF + MGFA | 🗸 | 🗸 | × | 🗸 | 0.859 | 0.845 | 0.852 |
| Edgebase + CSDRF + ESFFM | 🗸 | 🗸 | 🗸 | × | 0.869 | 0.858 | 0.863 |
| Edgebase + MGFA + ESFFM | 🗸 | × | 🗸 | 🗸 | 0.835 | 0.843 | 0.839 |
| Ours | 🗸 | 🗸 | 🗸 | 🗸 | 0.887 | 0.865 | 0.877 |
Table 4.
Ablation study for the edge-guided loss function. denotes the edge loss using the standard 2D Sobel operator. and represent the edge losses using only the 1 × 3 and 1 × 5 1D gradient kernels, respectively. is the combination of and .
Table 4.
Ablation study for the edge-guided loss function. denotes the edge loss using the standard 2D Sobel operator. and represent the edge losses using only the 1 × 3 and 1 × 5 1D gradient kernels, respectively. is the combination of and .
| Loss Composition | Precision | Recall | F |
|---|
| 0.881 | 0.720 | 0.792 |
| 0.917 | 0.759 | 0.831 |
| 0.843 | 0.876 | 0.859 |
| 0.924 | 0.813 | 0.865 |
| 0.887 | 0.865 | 0.877 |
Table 5.
Performance comparison of the ME-Dice loss against other advanced loss functions. The best results are highlighted in bold.
Table 5.
Performance comparison of the ME-Dice loss against other advanced loss functions. The best results are highlighted in bold.
| Loss Composition | Precision | Recall | F-Measure |
|---|
| + Boundary Loss | 0.851 | 0.847 | 0.849 |
| + Focal Loss | 0.854 | 0.821 | 0.837 |
| + Active Contour Loss | 0.894 | 0.827 | 0.859 |
| + | 0.887 | 0.865 | 0.877 |
Table 6.
Performance comparison of different frequency-domain information fusion strategies. The best results are highlighted in bold.
Table 6.
Performance comparison of different frequency-domain information fusion strategies. The best results are highlighted in bold.
| Method | Detection Result |
|---|
| CASIA v1 | COLUMBIA |
|---|
| Precision | Recall | F | Precision | Recall | F |
|---|
| Normal | 0.835 | 0.672 | 0.745 | 0.962 | 0.922 | 0.941 |
| SRM-AddHF | 0.783 | 0.698 | 0.738 | 0.923 | 0.941 | 0.932 |
| SRM-CatDiff | 0.828 | 0.695 | 0.755 | 0.964 | 0.922 | 0.942 |
| GaussFix-AddHF | 0.805 | 0.676 | 0.735 | 0.878 | 0.930 | 0.903 |
| LS-AddHF | 0.833 | 0.702 | 0.762 | 0.963 | 0.923 | 0.943 |
| LS-AddLF | 0.806 | 0.713 | 0.757 | 0.950 | 0.939 | 0.945 |
| Ours | 0.867 | 0.687 | 0.767 | 0.964 | 0.942 | 0.953 |
Table 7.
Computational complexity comparison of different methods.
Table 7.
Computational complexity comparison of different methods.
| Method | Parameters (M) | FLOPs (G) | FPS |
|---|
| MVSS-Net [5] | 146.88 | 40.89 | 37.3 |
| ManTra-Net [14] | 3.80 | 67.90 | 26.4 |
| PSCC-Net [40] | 2.75 | 105.72 | 40.7 |
| AMSEANet | 52.08 | 124.86 | 35.1 |