1. Introduction
In recent years, the rapid advancement of information technology and the widespread adoption of online platforms have greatly accelerated the digitization of paper documents, making it a widely used approach for distributing textual information, chart data, and other structured or unstructured content across various domains. For example, scanned document images stored on servers, such as contracts, medical records, and certificates, not only facilitate long-term preservation but also enhance the efficiency of information sharing. However, with the prevalence of image editing and text manipulation software, the integrity of document images faces increasing challenges [
1,
2]. In particular, the use of tampered document images in illicit activities has emerged as a serious social concern. In fields such as finance, healthcare, and law, where information integrity is critical, tampered document images might cause substantial economic losses. For example, an attacker may alter contract terms to obtain unlawful benefits. As a result, verifying the content of document images is particularly important.
In real-life scenarios, document content is generally diverse and unstructured. Therefore, subtle changes within it are typically difficult to discern [
3]. Although such tampering may appear to be subtle, such as altering a single character, digit, or keyword, it can still result in serious consequences. As shown in
Figure 1, an attacker tampered with the first line of a paragraph in a rental contract to illicitly change the monetary amount. This type of localized tampering is particularly difficult to detect through manual inspection due to the complexity of the diverse document structures and unknown contents. Consequently, the detection of subtle content change in document images has been garnering increasing attention from researchers in recent years [
3,
4].
In this work, we consider two common tampering attack scenarios involving document images. As illustrated in
Figure 2, an attacker may either alter the content of a digital document before printing and scanning it to produce a tampered document image or tamper with the content of an already scanned document image using image editing software to generate a forged version for illicit purposes. In both scenarios, the genuine and tampered document images are generated through the print–scan (PS) process. A direct and effective approach to verify the content integrity of a document image is to compare it with its corresponding reference document image stored on a secure server, which preserves the original content prior to printing. In such a verification system, the corresponding reference image is first retrieved from a document database, followed by content change detection to determine whether the content has been altered.
In many real-world scenarios, direct access to the original document image may not be possible due to privacy or security constraints, and the recipient may only obtain a printed copy. For example, in lawsuits, evidence collection often involves printed copies of original documents, making verification against the corresponding digital reference a practical and necessary task. In such scenarios, a document may be printed multiple times, and the goal of this work is not to authenticate a specific physical copy but to verify the integrity of its content. As a result, although there is overlap, the problem addressed in this work differs from prior studies that focus on tampering detection without reference images. For example, detecting attacks such as recapturing is useful for revealing potential tampering [
5], but the content of a recaptured document may still be genuine. In contrast, this work focuses on the content integrity verification, ensuring that the information in a document image remains consistent with a trusted reference source.
Existing methods for detecting content changes in document images, such as those illustrated in
Figure 2, can be broadly classified into two categories: Optical Character Recognition (OCR)-based methods and image feature analysis-based methods. Methods in the first category employ OCR [
6] to recognize and extract textual content from document images, which is then compared with the text stored in a database to determine whether a tampering attack has occurred [
7]. However, the performance of these methods is heavily dependent on OCR’s language support. In real-world scenarios, a document image may include multiple languages or non-textual elements, such as images and tables, which can easily introduce errors during character recognition. Additionally, the PS process introduces distortions such as noise, blurring, and shape deformation, further reducing the accuracy and robustness of OCR-based methods [
8]. As a result, the methods based on OCR are inherently constrained by the limitations of OCR technology, making them unsuitable for detecting tampered document images in complex, real-world scenarios.
Methods in the second category exploit image features to detect the content change in document images. In these methods, image features are first extracted from the input document images and then compared with those of the reference document images stored in a database [
7]. Unlike OCR-based methods, methods in this category do not impose restrictions on document content, making it applicable to a wider range of document types. However, to detect subtle changes, the input document image and the corresponding reference document image must be precisely aligned for accurate feature comparison. The geometric distortions introduced during the PS process often result in varying degrees of misalignment between the input and the reference document images [
9]. Such distortions lead to spatial mismatches in the extracted features, contributing to a high rate of false detection. Additionally, noise introduced by the PS channel, as well as the unknown response functions of printers and scanners, creates differences in pixel distributions between the input and reference document images [
10,
11]. These factors further exacerbate the discrepancies between the feature distributions of input and reference document images, thereby degrading the overall performance of content change detection.
To address the challenges faced by existing methods, we propose a document image verification scheme that includes two stages, namely, the document image retrieval stage and the content change detection stage. In the document image retrieval stage, the proposed scheme first extracts paragraph structure features from the reference document images and stores them in a database, where the extracted features serve as the index. Paragraph features are then extracted from the input document images and matched against those in the database to retrieve the corresponding reference document image. Once retrieved, correspondences between the paragraphs of the input and reference document images are established for subsequent content change detection. By relying on paragraph structure rather than textual content, this process avoids the dependence on document content or language, enabling fast retrieval and accurate alignment of paragraphs across diverse document types. In the second stage, deep features are extracted from pairs of matched paragraphs between the input and reference document images, and a contrastive learning framework is employed to address the distortions introduced by the PS channel. Furthermore, an additional verification step is incorporated to address feature mismatches within paragraph pairs, thereby ensuring more accurate feature alignment. By effectively addressing the aforementioned challenges, the proposed scheme demonstrates superior performance compared to benchmark methods, particularly by achieving a high detection rate for content changes while maintaining a low false detection rate.
The main contributions of this work can be summarized as follows:
We propose a document image retrieval method that leverages paragraph structure features from both input and reference document images. Compared with existing methods, it significantly improves retrieval efficiency. In addition, it enables precise paragraph alignment, which substantially facilitates content change detection in document images.
We propose a content change detection method based on contrastive learning. In the proposed method, a tailored loss function is designed to enable the model to distinguish between unchanged and changed content, and a second verification step is incorporated to address false detections. Together, these two mechanisms significantly improve the detection accuracy.
We construct two document image databases comprising the genuine and tampered document images, and conduct extensive evaluations for the proposed document image verification scheme. The results demonstrate that the proposed scheme accurately detects the content change in practical scenarios for documents of general content and outperforms the benchmark methods.
The rest of the paper is organized as follows:
Section 2 reviews related works on document image retrieval and content change detection.
Section 3 introduces the proposed document image verification scheme.
Section 4 evaluates the proposed scheme and compares its performance with existing methods. Finally,
Section 5 concludes.