Interpretability of Deep High-Frequency Residuals: A Case Study on SAR Splicing Localization

Cannas, Edoardo Daniele; Mandelli, Sara; Bestagini, Paolo; Tubaro, Stefano

doi:10.3390/jimaging11100338

Open AccessCommunication

Interpretability of Deep High-Frequency Residuals: A Case Study on SAR Splicing Localization

Image and Sound Processing Lab (ISPL), Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Via Ponzio 34/5, 20133 Milan, Italy

^*

Author to whom correspondence should be addressed.

J. Imaging 2025, 11(10), 338; https://doi.org/10.3390/jimaging11100338

Submission received: 14 July 2025 / Revised: 22 September 2025 / Accepted: 23 September 2025 / Published: 28 September 2025

(This article belongs to the Special Issue Advancements in Deepfake Technology, Biometry System and Multimedia Forensics)

Download

Browse Figures

Versions Notes

Abstract

Multimedia Forensics (MMF) investigates techniques to automatically assess the integrity of multimedia content, e.g., images, videos, or audio clips. Data-driven methodologies like Neural Networks (NNs) represent the state of the art in the field. Despite their efficacy, NNs are often considered “black boxes” due to their lack of transparency, which limits their usage in critical applications. In this work, we assess the interpretability properties of Deep High-Frequency Residuals (DHFRs), i.e., noise residuals extracted from images by NNs for forensic purposes, that nowadays represent a powerful tool for image splicing localization. Our research demonstrates that DHFRs not only serve as a visual aid in identifying manipulated regions in the image but also reveal the nature of the editing techniques applied to tamper with the sample under analysis. Through extensive experimentation on spliced amplitude Synthetic Aperture Radar (SAR) images, we establish a correlation between the appearance of the DHFRs in the tampered-with zones and their high-frequency energy content. Our findings suggest that, despite the deep learning nature of DHFRs, they possess significant interpretability properties, encouraging further exploration in other forensic applications.

Keywords:

Multimedia Forensics; Deep High-Frequency Residuals (DHFRs); image splicing localization; interpretability; SAR; explainability; xAI

1. Introduction

Multimedia Forensics (MMF) focuses on assessing the integrity of multimedia objects, such as digital pictures, audio clips, videos, or satellite imagery [1,2]. Historically, researchers looked for forensic footprints, i.e., traces left by editing operations at a signal processing level to unveil the history of the digital object at hand. However, data-driven solutions like Neural Networks (NNs) and Convolutional Neural Networks (CNNs) now represent the state of the art in the field [3]. NNs can automatically extract meaningful features from data corpora, lifting the burden on researchers to look for specific footprints, and are nowadays outperforming standard signal processing techniques on almost every forensic task.

One main criticism against data-driven techniques is their lack of interpretability and explainability [4,5,6]. NNs are often regarded as “black box” tools, meaning that it is unclear what elements in their input are important for making a decision or how NNs process them [6]. These properties are relevant whenever data-driven tools make high-stakes choices, e.g., in criminal justice or healthcare domains. The same applies to MMF, whose relevance in law, justice, and fighting misinformation is paramount [7,8].

Over the years, high-frequency residuals extracted from images by CNNs, here defined for simplicity “Deep High-Frequency Residuals (DHFRs)”, have been demonstrated to be a powerful instrument for MMF, especially for image splicing localization [9,10,11]. This task focuses on spatially localizing traces left by editing operations locally applied to the image, e.g., inserting a portion of an image into another one or deleting a pixel area from the sample under attack. State-of-the-art solutions based on DHFRs, like the Noiseprint [10], the TruFor [11], and the detector developed in [9], provide a heatmap localizing spliced areas as inconsistencies in the editing traces of the image under analysis. This heatmap can be inspected by the naked eye, giving immediate feedback to the user about the clues found by the CNN. Therefore, we can assert to a certain extent that such solutions are interpretable.

This work aims to dig deeper into the interpretability capabilities of DHFRs. We claim not only that DHFRs provide visual feedback but that the appearance of the tampered-with area indicates the nature of the editing executed on it.

We focus on the specific scenario of amplitude Synthetic Aperture Radar (SAR) image splicing localization for our experiments. SAR images are bi-dimensional representations of backscattered radar waves with numerous applications, especially in intelligence and military operations [12,13,14,15,16]. Spliced amplitude SAR images are images in which a malicious actor has substituted a pixel region with another one coming from a different sample, and additional processing has been applied to hinder this manipulation [9]. SAR images are characterized by phenomena like speckle noise, layover, or multi-path effects [13]. Moreover, their generation process involves many signal processing operations, e.g., resampling [17], ground-range projection [18], etc. To make a spliced sample more plausible, a malicious user can resort to a wider range of editing operations with respect to natural images, such as noise addition, blurring, resizing, etc. All these operations have different impacts on the high-frequency residual extracted from these images, making SAR imagery splicing localization particularly fascinating for our study.

Figure 1 reports an example of our findings. We show that whenever a malicious actor applies a blurring or a resampling edit on the tampered-with area, the resulting pixels in the DHFR present low amplitude, i.e., they appear as “dark spots”; conversely, whenever the user applies noise-based editing, the tampered-with area appears as a “bright” spot. We conjecture this behavior is related to the nature of DHFRs and to the considered editing operations: blur-like editing works as low-pass filtering, hence diminishing the power of the signal in the high-frequency range and resulting in low-brightness pixels; noise-based editing increases the power of the signal and widens its spectral content, enhancing the DHFR brightness.

We evaluate our findings on a dataset of spliced amplitude SAR images presented in [9], confirming that DHFRs of blurring and noise-based edited samples appear consistently across the whole dataset. Moreover, we experimentally check our previously reported intuition, i.e., that the appearance of spliced areas in the DHFR links to their power spectral content.

The main contributions of our work are the following:

We show that DHFRs extracted from spliced amplitude SAR images present different appearance depending on the nature of the editing operation executed on them;
We link this phenomenon to the ability of DHFR to capture high-frequency-related traces, in particular, the energy content of the image in the high-frequency range.

2. Background

2.1. Multimedia Forensics and High-Pass Frequency Residuals

Historically, MMF focused on analyzing digital pictures. Stamm et al. [19] provide a detailed overview of all the techniques and tasks undertaken in recent years. For instance, many works propose techniques to detect Forensic Footprints (FFs) left by operations executed on the whole picture, e.g., resampling operations [20,21,22], application of median filters [23,24], or multiple image compressions [25,26,27].

Another lively research topic is the localization of splicing editing. As mentioned earlier, splicing refers to inserting a portion of an image into another one, with the possible execution of further editing, aiming to conceal a specific pixel area. Splicing localization means spatially identifying (i.e., at a pixel level) which areas of an image are under editing. Many works in the literature rely on the information carried by the so-called noise or high-frequency residual to accomplish this task [28,29,30]. This residual is a picture obtained by removing the high-level semantic content from the image through high-pass filtering, assuming that editing traces leave distinctive marks in high-frequency ranges.

Thanks to the automatic extraction of forensic traces executed by data-driven methods, deep learning approaches have become popular in MMF. CNNs in particular are now the State Of The Art (SOTA) for the task of image tampering localization. Some works combine CNNs with the idea of high-frequency residuals by either using a fixed high-pass convolutional filter as the first layer of the network [31,32] or making the network learn the most appropriate filter during training [33,34].

More recent contributions further elaborate on this idea by exploiting the representation capabilities of Denoiser-Convolutional Neural Networks (Dn-CNNs), i.e., CNNs developed for image denoising. Its basic functioning consists of estimating a DHFR from the image under analysis. In particular, the forensic community heavily exploited the Dn-CNN [35], e.g., for PRNU anonymization [36]. This is also the case with Noiseprint [10], i.e., a Dn-CNN that extracts a noise residual, suppressing the vast majority of the image content, and exposes editing-related artifacts due to local image forgery. When analyzing pristine images, the Noiseprint is self-consistent, whereas in the case of spliced images, it highlights the edited regions. Table 1 reports an overview of all solutions exploiting the DHFR and its applications.

2.2. SAR Imagery and Forensics

Due to the diffusion of portals that offer them for free, in recent years, satellite images have been targeted by malicious manipulations [1], including splicing editing [38,39]. Among the different modalities of imagery available, Synthetic Aperture Radar (SAR) refers to a particular modality of satellite data consisting of bi-dimensional representations of backscattered radar waves. Such data has a wide range of applications [40] thanks to its independence from cloud coverage, weather conditions, and daylight [12,13,41]. These characteristics are particularly appealing in intelligence and military operations, i.e., to detect sensible military targets like airports [42] and aircrafts [14], ships [15], tanks, or other vehicles [16].

SAR images are affected by some characteristic phenomena, like layover, speckle noise, etc., that, together with their peculiar lifecycle, make standard FFs not suitable for their analysis [1]. However, especially when provided as amplitude products, they are easy to manipulate with standard image editing software like Photoshop or GIMP [9], or even synthetic image manipulation tools [1,43].

For these reasons and their centrality in sensitive applications, the forensic community developed tailored solutions. An example is [9], where the authors adapted the concept of DHFR to the SAR scenario to localize splicing editing on amplitude imagery. Similarly to [10], their core idea is to train a Dn-CNN to extract consistent DHFR only for pristine amplitude SAR tiles, i.e., whenever an SAR image is spliced, the tampered-with area should present with a different appearance due to the inhomogeneity of local processing traces. The authors then tested their solution against a dataset of spliced amplitude SAR images, showing good localization performances across various editing operations applied to the spliced area.

3. Amplitude SAR Imagery Splicing Localization

The lifecycle of SAR data is characterized by processing pipelines and degradation phenomena that vary from simple resampling procedures to different kinds of noises [9,43]. A malicious actor realizing a splicing attack may rely on various editing operations to make it more plausible, from noise additions and speckle noise injection to blurring or affine transformations. All these operations have different impacts on the frequency content of the tampered-with area, making amplitude SAR imagery splicing localization a relevant case study.

Formally, we define a spliced amplitude SAR image as follows. Let us consider two amplitude SAR images, a donor image

D

and a target

T

. A splicing operation modifies the pixels in a target region

T

of

T

using pixels from a donor region

D

of

D

. As explained previously, a malicious user might also apply some editing to make the splicing more plausible. Without loss of generalization, we hypothesize the user first creates an edited version of the donor image

E (D)

, with

E (\cdot)

being a suitable editing function, and then selects

D

accordingly. Notice that

T

and

D

have the same shape and orientation but might differ for their position in

T

and

E (D)

, respectively. The final spliced image

S

is defined as [9]

S (x, y) = \{\begin{matrix} D (x^{'}, y^{'}) & if (x, y) \in T \\ T (x, y), & if (x, y) \notin T \end{matrix},

(1)

with

(x^{'}, y^{'})

being the point coordinates of the donor region corresponding to the target coordinate system

(x, y)

.

We can define the integrity of the spliced sample

S

with a tampering mask

M

, taking values 0 or 1 depending on the pixels being pristine or spliced. More formally [9],

M (x, y) = \{\begin{matrix} 1, if (x, y) \in T \\ 0, if (x, y) \notin T \end{matrix} .

(2)

In [9], the authors have recently proposed a forensic detector for localizing splicing areas on amplitude SAR images. This detector highlights local inconsistencies between the donor and target pixel areas and proves superior to state-of-the-art tools developed for natural pictures. Its functioning is based on a denoising CNN [35] which extracts a real-valued fingerprint map, i.e., a Deep High-Frequency Residual (DHFR), exposing local inconsistencies between pixels. Formally, we define the DHFR extracted from the spliced image as [9]

R = f (S),

(3)

where

f (\cdot)

represents the detector operator, while

R

is a real-valued matrix with the same resolution of the input image. We can consider the function

f (\cdot)

as a high-pass filtering operator that captures traces relative to the processing pipeline of the considered image. As a result, the DHFR highlights spliced pixels as inconsistencies in these traces [9].

4. SAR DHFR Interpretability Analysis

As mentioned previously, the discriminative power of high-frequency residuals rely on their capability to highlight local processing traces. Indeed, many editing operations that commonly affect photographs, such as resampling or compression, leave peculiar artifacts in the high-frequency range that become visible after a simple high-pass filtering [28,29,30]. We argue that this property is also true for DHFRs extracted from amplitude SAR images and that we can obtain further insight into the nature of the editing operation executed by simple visual inspection.

4.1. Experimental Setup

For our investigations, we rely on a portion of the dataset presented in [9] and available at https://github.com/polimi-ispl/dhfr_interpretability (accessed on 22 September 2025). The selected dataset is defined as Spliced Dataset 2 (SD2), which comprises around 7000 spliced amplitude SAR images with various editing operations modifying the manipulated area, i.e., Additive White Gaussian Noise (AWGN), Additive Laplacian Noise (ALN), Speckle Noise (SN) injection, Average Blur (AB), Median Blur (MB), and affine transforms such as random Rotation & Resize (R&R). We also consider the case where no additional editing has been executed on the spliced region.

As for the forensic detector, we rely on the Augmented SAR Adapted Extractor (ASAE), which is the best performing localization model presented in [9]. We use this detector as it is from our original work, without retraining or fine-tuning it (please refer to [9] for more details).

4.2. DHFR Visual Inspection

After extracting the DHFR from all the images in SD2, we first visually inspect a few of them and compare the effects of different editing. Figure 2 reports four examples of spliced images undergoing diverse editing, while the DHFRs extracted by the ASAE detector are shown in Figure 3. In particular, in Figure 3a,b, we report the DHFRs obtained from amplitude images spliced using SN and AWGN (noise-based manipulations) as editing operations. In Figure 3c,d, the operations are instead AB and R&R (blur-based manipulations). We can immediately notice that the first two columns’ DHFRs present bright pixels in the manipulated area, while the last two columns show a dark spot.

4.3. Consistency Across the Dataset

We perform two additional investigations to confirm our visual intuition, that is, bright spots occur in DHFRs extracted from images spliced with noise-based and dark spots in case of blur-based editing.

As a first analysis, we extract the pixels in correspondence with the manipulated area from each DHFR. We then compute the Multi-Dimensional Scaling (MDS) [44] from them. MDS is a dimensionality reduction technique, i.e., a technique to visualize high-dimensional data points such as vectors. Roughly speaking, if two data points are close in their MDS representation, their original Euclidean distance is small, i.e., they are adjacent in their high-dimensional definition space.

Figure 4 and Figure 5 show the MDS visualization mentioned above. While SD2 presents tampered-with areas of various sizes, for brevity’s sake, here we report the visualization only for samples with a manipulated area of

128 \times 128

and

256 \times 256

pixels. However, we observed similar behaviors for the other tampering sizes. Results for the additional resolutions of

160 \times 160

,

192 \times 192

, and

224 \times 224

are available in the code repository https://github.com/polimi-ispl/dhfr_interpretability (accessed on 22 September 2025).

Pixels spliced with noise-based editing like AWGN, ALN, and SN cluster together. Similarly, pixels spliced with blur-based operations like R&R, MB, and AB cluster together but separately from noise-based manipulations. Moreover, the “No editing” case forms a final third group separated from the others. The MDS visualization seems to confirm our previous intuition: different editing operations present in the DHFR with different brightness levels. This phenomenon is not only visible to the naked eye, but it is also confirmed at the Euclidean distance level.

As a second experiment, we also check if the appearance of the different tampered-with areas is consistent in brightness values, i.e., that noise-based manipulated areas always appear as white spots and, conversely, blur-based as black spots. Given the DHFRs extracted from SD2 through the ASAE detector, we compare them with their ground-truth tampering masks by computing the Receiving Operating Characteristic (ROC) curves and their relative Area Under the Curve (AUC).

Due to the chosen ground-truth values for the tampering mask (i.e., tampered-with pixels are equal to 1), DHFRs with brighter pixels in the tampered-with area and darker pixels outside will achieve AUC values above

0.5

. Contrarily, DHFRs with darker pixels in the manipulated area will show AUC values “swapped” and below

0.5

. For instance, in the DHFRs shown in Figure 3a,b the AUCs are larger than

0.5

; for Figure 3c,d, the AUCs are lower than

0.5

.

Figure 6 and Figure 7 depict the distribution of AUC values for the different samples divided by editing operation. As we can see, blur-based operations lead to AUCs below

0.5

, while noise-based AUCs are above this value, hence confirming the consistency of the brightness of the DHFRs across the dataset.

4.4. Interpretation of DHFR Appearance

As a last experiment, we want a more profound understanding of why DHFRs present such behavior to strengthen their interpretability. We conjecture the behavior in question results from a combination of the specific types of editing considered and the DHFR capabilities of capturing high-frequency details.

Specifically, we believe DHFRs offer insights into the high-frequency domain content of the sample being analyzed, including its energy. For instance, SN increases the power in the image’s high frequencies; hence, the DHFR conveys this information with a bright spot (as visible in Figure 3a). On the contrary, blurring acts as a low-pass filter, eliminating energy from the image’s high frequencies; as a result, the tampered-with area in Figure 3c appears as a dark spot due to its reduced energy content. The behaviors of R&R and AWGN are similar to the above-mentioned editing.

To further validate our hypothesis, we analyze the Fourier spectrum of the tampered-with areas of the different spliced samples. We divide the spliced images into three categories, namely “No editing”, “Blur-based” (AB, MB, R&R), and “Noise-based” (AWGN, ALN, SN). We then average the spectra from 25 target pixel regions

T

of each editing category. Figure 8 show our results. As we can easily inspect, blur-based edited images, on average, present lower energy in the high-frequency range, while noise-based editing operations have more widespread energy at all frequencies with respect to the average “No editing” spectrum.

Given the considerations relative to the spectrum of the tampered-with areas, we deepen our investigations by computing a specific scalar feature related to their Power Spectral Density (PSD). Let us define the PSD of a target pixel region

T

as

P

, represented over the spatial-frequency coordinates

w_{x}, w_{y}

as a 2D matrix of size

X \times Y

. We define a scalar PSD integral feature as [45]

f_{X, Y} = \sum_{w_{x} \in X} \sum_{w_{y} \in Y} P (w_{x}, w_{y}),

(4)

where

X

and

Y

define a set of points in the

w_{x}

and

w_{y}

coordinates of

P

, respectively.

After shifting the spectrum matrix so that the DC component is at the center of

P

, we analyze the AA-PSD feature, which is a commonly used descriptor in the forensic literature [45,46]. To compute the AA-PSD feature, in (4) the set of points defined by

X

and

Y

should draw a circle with a specific radius r from the origin. To clarify, Figure 8 represents such points. By concatenating the features computed over R different radii, we obtain the feature vector

f_{AA-PSD} = [f_{X_{1}, Y_{1}}, \dots, f_{X_{R}, Y_{R}}]

.

Figure 9 and Figure 10 report the average

f_{AA-PSD}

vector for all the spliced areas of SD2, differentiated again by the editing operation applied to them. To simplify, as performed for the MDS plot, we only consider spliced samples with a target area of

128 \times 128

and

256 \times 256

pixels. As we can inspect, the results confirm our previous hypothesis: noise-based manipulations (blue lines) present on average a higher power content in high-frequency ranges with respect to non-edited spliced areas (green line), while blur-based edits (orange lines) show a lower power content.

Table 2 and Table 3 report a numerical quantification of this insight. In particular, we averaged the content of the AA-PSD vectors into three separate frequency ranges, namely low, medium, and high. As the reader can quickly inspect, in the high-frequency range, the PSD is considerably lower for blur-based editing, while it is considerably higher in noise-based attacks.

Please note that all the experiments conducted in this work are based on real SAR amplitude images that have been manipulated with realistic editing operations, simulating potential attacks by a malicious user. However, one limitation of the proposed study is the lack of experiments on real-world examples of locally manipulated SAR data. We believe the reliability of the SAR detector proposed in [9] and the insights provided by our interpretability analysis would be further strengthened if publicly accessible manipulated data were available.

5. Conclusions

In this paper, we investigated the interpretability properties of Deep High-Frequency Residuals (DHFRs) in the context of amplitude SAR image splicing localization. We linked the appearance of the DHFR in terms of brightness to its ability to represent the PSD of the image under analysis. This behavior is consistent for operations that produce similar modifications to the tampered-with areas in the high-frequency ranges of the Fourier domain.

While the explainability and interpretability flaws of NNs are well justified, we think our results highlight how the well-known combination of forensic footprints and data-driven solutions already guarantees a certain degree of interpretability. In light of the limitations of explainability techniques [6], we believe our results can be extremely valuable for forensic analysts. A simple inspection of a DHFR gives practitioners a rough estimate of what kind of operations have been executed on the image, gaining more insight into its history and more confidence in using data-driven tools.

We are confident that the present study offers a valuable contribution to the forensic community, by fostering the development of more interpretable detectors and advancing towards greater explainability of results. In particular, as deep learning now enables increasingly powerful detectors, one of the most pressing challenges is no longer solely to reach very high accuracy but rather to obtain reliable and interpretable results that can also withstand examination in environments with strong reliability requirements, e.g., law enforcement investigations.

Future works will focus on verifying our results on other data modalities like standard digital pictures and with more advanced editing operations that are known to leave artifacts in the high-frequency range, e.g., GAN and diffusion models’ image generation [46,47,48].

Author Contributions

Conceptualization, E.D.C.; methodology, E.D.C. and S.M.; software, E.D.C.; validation, E.D.C. and S.M.; formal analysis, E.D.C. and S.M.; investigation, E.D.C., S.M. and P.B.; resources, S.T.; data curation, E.D.C. and S.M.; writing—original draft preparation, E.D.C.; writing—review and editing, E.D.C. and S.M.; visualization, E.D.C. and S.M.; supervision, P.B. and S.T.; project administration, P.B. and S.T.; funding acquisition, P.B. and S.T. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the FOSTERER project, funded by the Italian Ministry of Education, University, and Research within the PRIN 2022 program. This work was partially supported by the European Union—Next Generation EU under the Italian National Recovery and Resilience Plan (NRRP), Mission 4, Component 2, Investment 1.3: CUP D43C22003080001, partnership on “Telecommunications of the Future” (PE00000001—program “RESTART”), and CUP D43C22003050001, partnership on “SEcurity and RIghts in the CyberSpace” (PE00000014—program “FF4ALL-SERICS”).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Abady, L.; Cannas, E.D.; Bestagini, P.; Tondi, B.; Tubaro, S.; Barni, M. An Overview on the Generation and Detection of Synthetic and Manipulated Satellite Images. APSIPA Trans. Signal Inf. Process. 2022, 11, e36. [Google Scholar] [CrossRef]
Piva, A. An Overview on Image Forensics. Int. Sch. Res. Not. 2013, 2013, 496701. [Google Scholar] [CrossRef]
Verdoliva, L. Media Forensics and DeepFakes: An Overview. IEEE J. Sel. Top. Signal Process. 2020, 14, 910–932. [Google Scholar] [CrossRef]
Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; pp. 4765–4774. [Google Scholar]
Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef] [PubMed]
Kraetzer, C.; Hildebrandt, M. Explainability and interpretability for media forensic methods: Illustrated on the example of the steganalysis tool stegdetect. In Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, Rome, Italy, 27–29 February 2024; SCITEPRESS—Science and Technology Publications: Setúbal, Portugal, 2024. [Google Scholar]
Hall, S.W.; Sakzad, A.; Choo, K.K.R. Explainable artificial intelligence for digital forensics. WIREs Forensic Sci. 2022, 4, e1434. [Google Scholar] [CrossRef]
Cannas, E.D.; Bonettini, N.; Mandelli, S.; Bestagini, P.; Tubaro, S. Amplitude SAR Imagery Splicing Localization. IEEE Access 2022, 10, 33882–33899. [Google Scholar] [CrossRef]
Cozzolino, D.; Verdoliva, L. Noiseprint: A CNN-Based Camera Model Fingerprint. IEEE Trans. Inf. Forensics Secur. 2020, 15, 144–159. [Google Scholar] [CrossRef]
Guillaro, F.; Cozzolino, D.; Sud, A.; Dufour, N.; Verdoliva, L. Trufor: Leveraging all-round clues for trustworthy image forgery detection and localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 20606–20615. [Google Scholar]
Tomiyasu, K. Tutorial review of synthetic-aperture radar (SAR) with applications to imaging of the ocean surface. Proc. IEEE 1978, 66, 563–583. [Google Scholar] [CrossRef]
Oliver, C.; Quegan, S. Understanding Synthetic Aperture Radar Images; SciTech Publishing: Raleigh, NC, USA, 2004. [Google Scholar]
Wang, Z.; Li, Y.; Yu, F.; Yu, W.; Jiang, Z.; Ding, Y. Object detection capability evaluation for SAR image. In Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China, 10–15 July 2016. [Google Scholar] [CrossRef]
Chang, Y.L.; Anagaw, A.; Chang, L.; Wang, Y.C.; Hsiao, C.Y.; Lee, W.H. Ship Detection Based on YOLOv2 for SAR Imagery. Remote Sens. 2019, 11, 786. [Google Scholar] [CrossRef]
Hummel, R. Model-based ATR using synthetic aperture radar. In Proceedings of the Record of the IEEE 2000 International Radar Conference [Cat. No. 00CH37037], Alexandria, VA, USA, 12 May 2000. [Google Scholar] [CrossRef]
Agency, E.S. Product Slicing. Available online: https://sentinel.esa.int/web/sentinel/technical-guides/sentinel-1-sar/products-algorithms/product-slice-handling (accessed on 28 June 2021).
Program, C. Sentinel-1 Mission User Guide. Available online: https://sentinel.esa.int/web/sentinel/user-guides/sentinel-1-sar (accessed on 2 January 2022).
Stamm, M.C.; Wu, M.; Liu, K.J.R. Information Forensics: An Overview of the First Decade. IEEE Access 2013, 1, 167–200. [Google Scholar] [CrossRef]
Popescu, A.; Farid, H. Exposing digital forgeries in color filter array interpolated images. IEEE Trans. Signal Process. 2005, 53, 3948–3959. [Google Scholar] [CrossRef]
Kirchner, M. Fast and reliable resampling detection by spectral analysis of fixed linear predictor residue. In Proceedings of the ACM Workshop on Multimedia and Security (MM&Sec), Oxford, UK, 22–23 September 2008. [Google Scholar] [CrossRef]
Vázquez-Padín, D.; Pérez-González, F. Prefilter design for forensic resampling estimation. In Proceedings of the 2011 IEEE International Workshop on Information Forensics and Security, Iguacu Falls, Brazil, 29 November–2 December 2011; pp. 1–6. [Google Scholar]
Cao, G.; Zhao, Y.; Ni, R.; Yu, L.; Tian, H. Forensic detection of median filtering in digital images. In Proceedings of the 2010 IEEE International Conference on Multimedia and Expo, Singapore, 19–23 July 2010; pp. 89–94. [Google Scholar]
Kirchner, M.; Fridrich, J. On detection of median filtering in digital images. In Proceedings of the Media Forensics and Security II. International Society for Optics and Photonics, San Jose, CA, USA, 17–21 January 2010; Volume 7541, p. 754110. [Google Scholar]
Bianchi, T.; Piva, A. Detection of nonaligned double JPEG compression based on integer periodicity maps. IEEE Trans. Inf. Forensics Secur. 2011, 7, 842–848. [Google Scholar] [CrossRef]
Thai, T.H.; Cogranne, R.; Retraint, F.; Doan, T.N.C. JPEG quantization step estimation and its applications to digital image forensics. IEEE Trans. Inf. Forensics Secur. 2016, 12, 123–133. [Google Scholar] [CrossRef]
Mandelli, S.; Bonettini, N.; Bestagini, P.; Lipari, V.; Tubaro, S. Multiple JPEG compression detection through task-driven non-negative matrix factorization. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 2106–2110. [Google Scholar]
Lyu, S.; Pan, X.; Zhang, X. Exposing region splicing forgeries with blind local noise estimation. Int. J. Comput. Vis. 2014, 110, 202–221. [Google Scholar] [CrossRef]
Cozzolino, D.; Poggi, G.; Verdoliva, L. Splicebuster: A new blind image splicing detector. In Proceedings of the IEEE International Workshop on Information Forensics and Security (WIFS), Rome, Italy, 16–19 November 2015. [Google Scholar] [CrossRef]
Cozzolino, D.; Verdoliva, L. Single-image splicing localization through autoencoder-based anomaly detection. In Proceedings of the IEEE International Workshop on Information Forensics and Security, Abu Dhabi, United Arab Emirates, 4–7 December 2016; pp. 1–6. [Google Scholar]
Rao, Y.; Ni, J. A deep learning approach to detection of splicing and copy-move forgeries in images. In Proceedings of the IEEE International Workshop on Information Forensics and Security (WIFS), Abu Dhabi, United Arab Emirates, 4–7 December 2016. [Google Scholar] [CrossRef]
Liu, Y.; Guan, Q.; Zhao, X.; Cao, Y. Image Forgery Localization Based on Multi-Scale Convolutional Neural Networks. In Proceedings of the 6th ACM Workshop on Information Hiding and Multimedia Security, Innsbruck, Austria, 20–22 June 2018. [Google Scholar] [CrossRef]
Bayar, B.; Stamm, M.C. A Deep Learning Approach to Universal Image Manipulation Detection Using a New Convolutional Layer. In Proceedings of the ACM Workshop on Information Hiding and Multimedia Security, Vigo, Spain, 20–22 June 2016; pp. 5–10. [Google Scholar]
Bayar, B.; Stamm, M.C. Design Principles of Convolutional Neural Networks for Multimedia Forensics. In Proceedings of the IS&T International Symposium on Electronic Imaging: Media Watermarking, Security, and Forensics, Burlingame, CA, USA, 29 January–2 February 2017. [Google Scholar]
Zhang, K.; Zuo, W.; Chen, Y.; Meng, D.; Zhang, L. Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising. IEEE Trans. Image Process. 2017, 26, 3142–3155. [Google Scholar] [CrossRef] [PubMed]
Bonettini, N.; Bondi, L.; Güera, D.; Mandelli, S.; Bestagini, P.; Tubaro, S.; Delp, E.J. Fooling PRNU-Based Detectors Through Convolutional Neural Networks. In Proceedings of the 2018 26th European Signal Processing Conference (EUSIPCO), Rome, Italy, 3–7 September 2018; pp. 957–961. [Google Scholar] [CrossRef]
Cannas, E.D.; Baireddy, S.; Bestagini, P.; Tubaro, S.; Delp, E.J. Enhancement Strategies For Copy-Paste Generation & Localization in RGB Satellite Imagery. In Proceedings of the IEEE International Workshop on Information Forensics and Security (WIFS), Nürnberg, Germany, 4–7 December 2023. [Google Scholar]
Mashable. Satellite Images Show Clearly That Russia Faked Its MH17 Report. Available online: http://mashable.com/2015/05/31/russia-fake-mh17-report (accessed on 11 August 2023).
BBC. Conspiracy Files: Who Shot Down MH17? April 2016. Available online: https://www.bbc.com/news/magazine-35706048 (accessed on 11 August 2023).
Moreira, A.; Prats-Iraola, P.; Younis, M.; Krieger, G.; Hajnsek, I.; Papathanassiou, K.P. A tutorial on synthetic aperture radar. IEEE Geosci. Remote Sens. Mag. 2013, 1, 6–43. [Google Scholar] [CrossRef]
Henderson, F.M.; Lewis, A.J. Principles and Applications of Imaging Radar. Manual of Remote Sensing, 3rd ed.; Wiley Publisher: Hoboken, NJ, USA, 1998; Volume 2. [Google Scholar]
Chen, L.; Tan, S.; Pan, Z.; Xing, J.; Yuan, Z.; Xing, X.; Zhang, P. A New Framework for Automatic Airports Extraction from SAR Images Using Multi-Level Dual Attention Mechanism. Remote Sens. 2020, 12, 560. [Google Scholar] [CrossRef]
Cannas, E.D.; Mandelli, S.; Bestagini, P.; Tubaro, S.; Delp, E.J. Deep Image Prior Amplitude SAR Image Anonymization. Remote Sens. 2023, 15, 3750. [Google Scholar] [CrossRef]
Borg, I.; Groenen, P. Modern Multidimensional Scaling: Theory and Applications; Springer Series in Statistics: Berlin/Heidelberg, Germany, 1997. [Google Scholar]
Cannas, E.D.; Beaus, P.; Bestagini, P.; Marques, F.; Tubaro, S. A One-Class Approach to Detect Super-Resolution Satellite Imagery with Spectral Features. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2024, Seoul, Republic of Korea, 14–19 April 2024. [Google Scholar]
Durall, R.; Keuper, M.; Keuper, J. Watch your Up-Convolution: CNN Based Generative Deep Neural Networks are Failing to Reproduce Spectral Distributions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, 14–19 June 2020. [Google Scholar]
Corvi, R.; Cozzolino, D.; Poggi, G.; Nagano, K.; Verdoliva, L. Intriguing Properties of Synthetic Images: From Generative Adversarial Networks to Diffusion Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Vancouver, BC, Canada, 18–22 June 2023; pp. 973–982. [Google Scholar]
Mandelli, S.; Bestagini, P.; Tubaro, S. When synthetic traces hide real content: Analysis of stable diffusion image laundering. In Proceedings of the 2024 IEEE International Workshop on Information Forensics and Security (WIFS), Rome, Italy, 2–5 December 2024; pp. 1–6. [Google Scholar]

Figure 1. A schematic illustration of the proposed investigations on the interpretability of DHFRs in the context of amplitude SAR image splicing localization. In our experiments, we show how the appearance of DHFRs (extracted by state-of-the-art detectors working on SAR amplitude data [9]) depends on the nature of the editing operations executed on the spliced region (indicated with a red contour in the figure).

Figure 2. Four different examples of spliced images. We provide the target image

T

in the first row, first column, and the tampering mask

M

in the first row, second column. The first row, third column, and second row, second column show a close-up around the manipulated area edited with SN and AWGN. On the last figures, i.e., second row, second and third columns, the editing is AB and R&R. Once again, we highlight the spliced area with a red contour.

Figure 2. Four different examples of spliced images. We provide the target image

T

in the first row, first column, and the tampering mask

M

in the first row, second column. The first row, third column, and second row, second column show a close-up around the manipulated area edited with SN and AWGN. On the last figures, i.e., second row, second and third columns, the editing is AB and R&R. Once again, we highlight the spliced area with a red contour.

Figure 3. DHFRs extracted from the spliced samples of Figure 2. Once again, the spliced area is highlighted with red contours. (a,b) present different brightness values in the spliced area with respect to (c,d). This difference is also reflected in the AUC values, as explained in Section 4.

Figure 4. MDS visualization of the pixels belonging to the tampered-with area extracted from the DHFRs of

128 \times 128

spliced samples.

Figure 4. MDS visualization of the pixels belonging to the tampered-with area extracted from the DHFRs of

128 \times 128

spliced samples.

Figure 5. MDS visualization of the pixels belonging to the tampered-with area extracted from the DHFRs of

256 \times 256

spliced samples.

Figure 5. MDS visualization of the pixels belonging to the tampered-with area extracted from the DHFRs of

256 \times 256

spliced samples.

Figure 6. AUC values distribution of DHFRs for different editing operations on samples spliced with a

128 \times 128

area.

Figure 6. AUC values distribution of DHFRs for different editing operations on samples spliced with a

128 \times 128

area.

Figure 7. AUC values distribution of DHFRs for different editing operations on samples spliced with a

256 \times 256

area.

Figure 7. AUC values distribution of DHFRs for different editing operations on samples spliced with a

256 \times 256

area.

Figure 8. The effect of different editing operations on the average Fourier spectrum of the target region

T

. We also show the set of points selected for the Azimuthal Integral PSD (AA-PSD) descriptor. All spectra are centered on the DC component and plotted in the same dynamic range.

Figure 8. The effect of different editing operations on the average Fourier spectrum of the target region

T

. We also show the set of points selected for the Azimuthal Integral PSD (AA-PSD) descriptor. All spectra are centered on the DC component and plotted in the same dynamic range.

Figure 9. The mean AA-PSD vector extracted from the pixels belonging to the tampered-with areas of

128 \times 128

spliced samples.

Figure 9. The mean AA-PSD vector extracted from the pixels belonging to the tampered-with areas of

128 \times 128

spliced samples.

Figure 10. The mean AA-PSD vector extracted from the pixels belonging to the tampered-with areas of

256 \times 256

spliced samples.

Figure 10. The mean AA-PSD vector extracted from the pixels belonging to the tampered-with areas of

256 \times 256

spliced samples.

Table 1. State-of-the-art techniques relying on DHFRs, their applications, and modality of analysis.

Detector	Task	Modality
Bonettini [36]	PRNU anonymization	Natural images
Noiseprint [10]	Image splicing localization	Natural images
Noiseprint++ [11]	Image splicing localization and detection	Natural images
ASAE [9]	Image splicing localization	SAR
SatNoiseprint [37]	Image splicing localization	Satellite RGB

Table 2. AA-PSD average content divided into low-, middle-, and high-frequency ranges for samples with a 128 × 128 spliced area. Similarly to the other plots, orange is associated with blur-based operations, while blue is associated with noise-based editing. In the high-frequency ranges, we use the color red to show lower power content than the “No editing” scenario; green shows higher content instead.

Operation	Low Frequencies	Medium Frequencies	High Frequencies
No editing	3731	3888	2132
AB	1725	346	247
MB	2708	956	732
RR	3828	2527	971
AWGN	4014	4741	4449
ALN	4147	5273	5778
SN	3980	5555	6817

Table 3. AA-PSD average content divided into low-, middle-, and high-frequency ranges for samples with a 256 × 256 spliced area. Similarly to other plots, orange is associated with blur-based operations, while blue with noise-based editing. In the high-frequency range, we use the color red to show a lower power content than the “No editing” scenario; green shows higher content instead.

Operation	Low Frequencies	Medium Frequencies	High Frequencies
No editing	7400	7755	4166
AB	3016	560	394
MB	5137	1774	1334
RR	7518	5070	1782
AWGN	7353	9093	8559
ALN	7797	10,411	11,694
SN	7596	11,391	14,687

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cannas, E.D.; Mandelli, S.; Bestagini, P.; Tubaro, S. Interpretability of Deep High-Frequency Residuals: A Case Study on SAR Splicing Localization. J. Imaging 2025, 11, 338. https://doi.org/10.3390/jimaging11100338

AMA Style

Cannas ED, Mandelli S, Bestagini P, Tubaro S. Interpretability of Deep High-Frequency Residuals: A Case Study on SAR Splicing Localization. Journal of Imaging. 2025; 11(10):338. https://doi.org/10.3390/jimaging11100338

Chicago/Turabian Style

Cannas, Edoardo Daniele, Sara Mandelli, Paolo Bestagini, and Stefano Tubaro. 2025. "Interpretability of Deep High-Frequency Residuals: A Case Study on SAR Splicing Localization" Journal of Imaging 11, no. 10: 338. https://doi.org/10.3390/jimaging11100338

APA Style

Cannas, E. D., Mandelli, S., Bestagini, P., & Tubaro, S. (2025). Interpretability of Deep High-Frequency Residuals: A Case Study on SAR Splicing Localization. Journal of Imaging, 11(10), 338. https://doi.org/10.3390/jimaging11100338

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Interpretability of Deep High-Frequency Residuals: A Case Study on SAR Splicing Localization

Abstract

1. Introduction

2. Background

2.1. Multimedia Forensics and High-Pass Frequency Residuals

2.2. SAR Imagery and Forensics

3. Amplitude SAR Imagery Splicing Localization

4. SAR DHFR Interpretability Analysis

4.1. Experimental Setup

4.2. DHFR Visual Inspection

4.3. Consistency Across the Dataset

4.4. Interpretation of DHFR Appearance

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI