Document Image Shadow Removal Based on Illumination Correction Method

Gao, Depeng; Liu, Wenjie; Chen, Shuxi; Qiu, Jianlin; Mei, Xiangxiang; Wang, Bingshu

doi:10.3390/a18080468

Open AccessArticle

Document Image Shadow Removal Based on Illumination Correction Method

by

Depeng Gao

¹,

Wenjie Liu

²,

Shuxi Chen

¹,

Jianlin Qiu

¹,

Xiangxiang Mei

¹ and

Bingshu Wang

^2,3,4,*

¹

School of Yonyou Digital and Intelligence, Nantong Institute of Technology, Nantong 226000, China

²

School of Software, Northwestern Polytechnical University, Xi’an 710000, China

³

Guangdong Provincial Key Laboratory of Intelligent Information Processing, Shenzhen University, Shenzhen 518000, China

⁴

Shenzhen Key Laboratory of Media Security, Shenzhen University, Shenzhen 518000, China

^*

Author to whom correspondence should be addressed.

Algorithms 2025, 18(8), 468; https://doi.org/10.3390/a18080468

Submission received: 22 May 2025 / Revised: 19 July 2025 / Accepted: 25 July 2025 / Published: 26 July 2025

(This article belongs to the Section Combinatorial Optimization, Graph, and Network Algorithms)

Download

Browse Figures

Versions Notes

Abstract

Due to diverse lighting conditions and photo environments, shadows are almost ubiquitous in images, especially document images captured with mobile devices. Shadows not only seriously affect the visual quality and readability of a document but also significantly hinder image processing. Although shadow removal research has achieved good results in natural scenes, specific studies on document images are lacking. To effectively remove shadows in document images, the dark illumination correction network is proposed, which mainly consists of two modules: shadow detection and illumination correction. First, a simplified shadow-corrected attention block is designed to combine spatial and channel attention, which is used to extract the features, detect the shadow mask, and correct the illumination. Then, the shadow detection block detects shadow intensity and outputs a soft shadow mask to determine the probability of each pixel belonging to shadow. Lastly, the illumination correction block corrects dark illumination with a soft shadow mask and outputs a shadow-free document image. Our experiments on five datasets show that the proposed method achieved state-of-the-art results, proving the effectiveness of illumination correction.

Keywords:

shadow removal; illumination correction; shadow detection; document image

1. Introduction

With the widespread use of smartphones and tablets, mobile devices have become the preferred tools for scanning and saving digital documents. They offer great convenience, allowing users to capture document images anytime, anywhere, and swiftly convert them into digital formats for storage and transmission. However, shadows often appear when using a mobile device to photograph documents due to unfavorable lighting conditions or angle issues, affecting document clarity and readability [1,2]. In particular, when dealing with text-intensive files, the presence of shadows can disrupt information interpretation and subsequent processing, such as applications of optical character recognition (OCR), potentially leading to inaccurate or erroneous information extraction [3,4,5]. Some document images with shadows are shown in Figure 1. Therefore, shadow removal from document images is crucial, as this not only enhances the overall quality and appearance of documents but also contributes to more efficient data processing and management.

The current research on shadow removal has achieved some good results in natural scenes. However, these methods often lack flexibility and adaptability when dealing with shadows in document images, so there are obvious limitations in terms of robustness and generalization. Shadows in document images not only interfere with textual clarity and the overall visual appearance of the image but can also negatively affect subsequent processing processes such as OCR [6,7,8]. For example, shadows can lead to mistaken character recognition or missing text information, which can affect document processing accuracy and efficiency. Therefore, effectively removing shadows in document images and providing clear and accurate images for subsequent processing have become key issues in the field of image processing.

At present, there is a lack of shadow removal methods, specifically for document images [9,10,11]. Most directly learn each pixel value after shadow removal, which ignores the optical principle of shadow formation, and the resulting performances are not satisfactory. Shadow formation is an optical phenomenon, and its physics involves the straight-line propagation of light [12]. Normally, light travels in a straight line. When light encounters an opaque object, part of the light is blocked because it cannot penetrate this object, resulting in the formation of an area behind the object that is not directly illuminated by the light, i.e., the shadow. Shadow shapes and sizes depend on several factors: the size and shape of the light source, the relative position of the light source to the object, and the distance between the object and the projection plane. Based on imaging physics, as shown in Equation (1), a shadow-free image consists of two parts: direct light, where the light source shines directly on the object and ambient light, which is the light that diffusely reflects on the object.

I_{(x, y)}^{s h a d o w - f r e e} = L_{(x, y)}^{d} R_{(x, y)} + L_{(x, y)}^{a} R_{(x, y)}

(1)

where superscript

s h a d o w - f r e e

and subscript

(x, y)

denote the shadow-free image and pixel coordinator, respectively. Scalar I,

L^{d}

,

L^{a}

, and R denote pixel value, direct illuminance, ambient illuminance, and reflectance, respectively.

Because the occluder blocks direct light, the shadow area is only illuminated by ambient light. Moreover, the occluder may also block some ambient light, so it multiplies by an attenuation factor, as shown in Equation (2)

I_{(x, y)}^{s h a d o w} = a_{(x, y)} L_{(x, y)}^{a} R_{(x, y)}

(2)

where superscript

s h a d o w

denotes the shadow image, scalar a denotes attenuation factor.

According to the physical principle of shadow formation, shadow removal is a regular process, and it is much simpler for us to learn the correction law of light in the network than the pixel value of the shadow-free image. Therefore, we proposed the dark illumination correction network (DICNet) to improve the performance of shadow removal for document images, which can accurately simulate the optical principle, thus removing shadows based on light correction while maintaining high-quality images.

DICNet acquires a soft shadow mask via the designed shadow detection module to label the shadow areas with an accurate shadow intensity of each pixel in the images. Furthermore, DICNet learns the required illumination intensity correction via the dark light correction module with a soft shadow mask. By correcting shadow area illumination, the network can produce a shadow-free document image. Overall, through these two pivotal branches of shadow detection and light correction, the DICNet model ensures an accurate and high-quality removal effect, achieving state-of-the-art (SOTA) results on different datasets. Our primary contributions are as follows:

(1) Proposing our DICNet model. It can utilize the optical principle of shadow formation and achieve better performance on different datasets.

(2) Designing a novel dark–illumination correction module. The module can learn the required illumination intensity correction.

(3) Designing a shadow detection module. It can acquire an accurate shadow intensity of each pixel in the document images.

(4) Constructing a simplified shadow-corrected attention block. It enables the model to pay attention to both the global channel and local spatial information to capture richer and more complete feature representation.

2. Related Works

2.1. Document Shadow Removal Dataset

The existing datasets for document shadow removal can be divided into two types: real and synthetic datasets. All the images in the former are captured in real scenarios, while the images in the latter are generated via some specific methods. In this work, we surveyed some generally used datasets, with their details shown in Table 1.

2.1.1. The Real Datasets

The Adobe dataset [13] is the first real dataset for shadow removal with 81 document images with different shadow intensities and shapes. Each image has five to nine shading variations to simulate different lighting conditions. Adobe contains eleven document images, each of which is subjected to varying degrees of shading, resulting in multiple shading conditions. This setup allows the dataset to be effectively used to quantitatively and qualitatively evaluate the performance of shadow removal techniques.

The Jung dataset [14] is mainly composed of two types of images: those taken with a smartphone camera and those with a scanner. These digital document images are generated in different lighting conditions to simulate the various circumstances when taking pictures in the real world. Jung contains a series of real shadow-free images and corresponding shaded document images. It provides a practical test basis for evaluating document shadow removal algorithms, especially considering image quality differences between smartphones and scanners.

The Kligler dataset [15] consists of 381 pairs of corresponding shaded and unshaded document images, divided into four categories: handwritten documents, printed documents, posters, and fonts. Each category consists of 7–10 documents, and each document has 8–12 different variations in shadow intensity, shape, and position. Kligler also includes some images with small characters, small color variations, and light shadows that are very similar to the shadow-free images. It can provide some challenging test conditions, including different types of text, levels of degradation, stains and creases, loss of text, etc.

The OSR dataset [16] provides a diverse set of shadow images, divided into two parts: images created in a controlled lighting environment and those captured in a natural scene. The former contains 237 shadow images and their corresponding shadow-free images. These images are mainly taken from typical documents such as books, newspapers, and brochures. During the photo-taking process, the documents and camera were fixed on the table and holder, respectively, to maintain the photo angle. Shadow-free images are taken first; then, shadow images are taken using lights and occlusions (e.g., hands and pens). The document, table, and camera positions were fixed to ensure consistency and reliability. The other part contains 24 real images obtained from the Internet, which have different sizes and are used to model different lighting conditions and shadow intensities. OSR contains not only standardized images taken in a controlled environment but also real images taken in natural scenes, which ensures that the dataset has good diversity in shadow intensity, type, and distribution. The OSR dataset also gives the corresponding shadow mask for each shaded image and provides a valuable test basis for shadow removal research in document images, especially in strong shadow scenes.

The HS dataset [17] contains 100 images covering 10 documents, with 10 images per document. To simulate multiple different scenarios, half of the images were captured under sunlight and the other half under LED lights, and different occlusion shapes were used to create shadows. HS provides sample images taken under more complex and variable lighting conditions, especially images containing hard shadows and shadow transition regions, making the testing of the shadow removal algorithm more challenging and practical.

2.1.2. The Synthetic Datasets

The FSDSRD dataset [18] is the first synthetic dataset available for the shadow removal of document images. It comprises a quad of shadow images, non-shadow images, shadow masks, and foreground masks. FSDSRD creates documents by synthesizing text, graphics, and textures and then uses the graphics renderer to add synthetic shadows to create various shading conditions. Additionally, various colored textures were added to help train models learn document background colors and maintain color consistency between inputs and outputs. FSDSRD offers a wide variety of content, textures, and shadow types, addressing the issue of limited diversity in real document image datasets. However, FSDSRD is a fully synthetic dataset, lacking background diversity. Moreover, the synthesized shadows are black, which significantly differ from real shadows, making the models trained on FSDSRD unable to simulate the characteristics of real-world datasets.

The SDCSRD dataset [19] contains 17,624 images divided into training, validation, and test sets in a 12:3:1 ratio. Compared to FSDSRD, it contains more images with a colorful background and two different font styles: handwritten and artistic. This diversity can help improve the model’s robustness in handling different font styles and complex background conditions and promote the overall enhancement of the model’s text recognition ability in complex visual environments.

2.2. Document Shadow Removal Methods

Lin et al. [20] proposed the background estimation document shadow removal network (BEDSR-Net). As the first deep network designed for shadow removal in document images, BEDSR-Net contains two modules: background estimation network (BENet) and shadow removal network (SRNet). By learning the global background color and spatial distribution of documents, BENet provides accurate background information and attention map for SRNet to effectively restore shadow-free images. The average peak-signal-to-noise ratio (PSNR) and structural similarity (SSIM) on both synthetic and real datasets are 37.55 and 0.9534, respectively. This method needs to calculate two complex neural network models and can only remove lighter shadows. It is not good for complex scenarios such as those with deeper shadows.

Chen et al. [21] proposed a novel transformer-based model, named ShadocNet, specifically for shadow removal in document images. The ShadocNet mainly consists of three parts: firstly, the shadow detection module extracts a shadow mask on the document image; secondly, the global shadow remapping module uses the visual converter to remap the shadow area color; finally, the RefineNet improves the visual quality by refining the image pixel by pixel. The experimental results show that ShadocNet outperforms previous methods in terms of aesthetic quality. It effectively remaps shadow region colors and refines images, producing fewer artifacts and being closer to real shadow-free images. The PSNR/SSIM/RMSE (root mean square error) of ShadocNet on the Jung and Kligler datasets are 24.60/0.91/15.30 and 26.20/0.94/13.48, respectively. However, ShadocNet faces challenges in handling highly complex shadow patterns or insufficiently represented variable lighting conditions.

Zhang et al. [22] proposed an innovative shadow removal method for document images, which pays special attention to color-aware background guidance. Its core technology includes two main aspects: color-aware background extraction network (CBENet) and background-guided shadow removal network (BGShadowNet). The CBENet extracts background images that can accurately reflect variations in the document background color space. The BGShadowNet first generates preliminary shadow removal results using a background-constrained decoder and then performs further refinement through a background-based attention module and a detail improvement module to enhance the textural details. Its PSNR/SSIM/RMSE on the RDD and Kligler datasets are 37.585/0.983/2.219 and 29.176/0.948/5.377, respectively. The results show the advantage of preserving image details and textures. However, when the images are corrupted by heavy noise, outputs may contain some residual noise, resulting in uneven illumination with the surrounding environment.

These methods utilize sophisticated network architectures to meticulously capture and process shadows. Although they typically work well in effectively removing shadows, they merely leverage the powerful fitting capability of convolutional neural networks while neglecting the optical principles underlying shadow formation.

3. The Dark Illumination Correction Net

In this paper, we design an end-to-end network called Dark Illumination Correction Network (DICNet), which aims to achieve high-quality shadow removal results by correcting the illumination intensity of shadows in document images.

3.1. Overall DICNet Structure

The DICNet structure is shown in Figure 2, and mainly includes three blocks: feature extraction, shadow detection block, and the illumination correction block. The feature extraction module first extract visual features from the input images; then, the shadow detection block obtains the soft shadow mask of the input image to accurately capture shadow intensity information, which provides a key aid for the subsequent shadow correction. The illumination correction block further learns the illumination intensity requiring correction with the assistance of the soft shadow mask. It supplements the illumination intensity in the shadow area and ultimately outputs a document image without shadows. Overall, DICNet achieves shadow removal in document images based on optical principles and ensures an accurate and high-quality removal effect through two key steps: shadow detection and illumination correction.

The feature extraction block includes four convolution layers with the ReLU activation function and SSCABlock. It first extracts features from the input images. Then, the DICNet is divided into two branches: one for shadow mask detection and the other for dark illumination correction based on the soft shadow mask.

The shadow detection block consists of one convolution layer, one SSCABlock, and three convolution layers in sequence, and the ReLU activation function is used. The shadow detection block outputs a soft shadow mask

M

. At the same time, the outputs of shadow detection and feature extraction blocks are fused as the input of the illumination correction block.

The illumination correction block consists of two convolution layers, one SSCABlock, two convolution layers, and one SSCABlock in sequence. It outputs an illumination-corrected image

B

.

The shadow-free image outputted by DICNet can be calculated as follows:

O = I + (1 - M) \otimes B

(3)

where ⊗ denotes channel-weighted operation, i.e.,

(1 - M)

is not produced with each

B

channel;

O

is the output shadow-free image;

I

is the input shaded image; and

1

is the matrix where all elements are 1.

The parallel shadow detection and illumination correction branches are tightly coupled through shared feature extraction and cross-branch guidance mechanisms. Specifically, the shadow detection branch provides soft shadow masks that dynamically guide the illumination correction branch to focus on shadow regions while preserving non-shadow content: a critical requirement for document images where text and details must remain intact.

3.2. The Simplified Shadow-Corrected Attention Block

For removing shadows from document images, the computational efficiency is very important for real-time or mobile device applications. We propose the use of a simplified-shadow corrected attention block (SSCABlock) based on nonlinear activation free block [23,24] to reduce intra-block complexity, as shown in Figure 3.

In the original nonlinear activation free block, only channel attention is employed. As a result, it tends to focus exclusively on the global information of feature channels while neglecting local spatial information, which leads to the underutilization of pixel-level detailed information. Consequently, texts in shadowed regions may be compromised during the shadow removal process. To address this issue, we propose a simple spatial attention (SSA) block to capture features of critical regions. It enables the model to focus on global channel and local spatial information at the same time to capture richer and more complete features. By integrating spatial and channel information, DICNet may preserve the detailed information without sacrificing the overall quality.

In SSCABlock, the SimpleGate [23] first divides the input feature into two same-sized sub-features and inputs them into Simplified Channel Attention (SCA) and SSA, respectively. Then, the output features of SCA and SSA are combined along the channel dimension to finally obtain the feature vector with original size.

The SSA structure is shown in Figure 4. Firstly, the average pooling on the channel dimension is used to transform the input feature from

H \times W \times C

to

H \times W \times 1

, where H, W, and C denote height, width, and the number of channels, respectively. In other words, the input features are transformed from 3D to 2D by average pooling, as shown in Equation (4).

\tilde{F} = avePool (F) = \frac{1}{C} \sum_{i = 1}^{C} F_{i}

(4)

where

F

is the input feature of SSA with

H \times W \times C

,

\tilde{F}

is a matrix with

H \times W

, and

F_{i}

is the i-th feature channel of

F

.

Secondly, the softmax function (as shown in Equation (5)) is used to obtain the effective 2D probability distribution

softmax {(\tilde{F})}_{(x, y)} = \frac{e^{{\tilde{F}}_{(x, y)}}}{\sum_{x} \sum_{y} e^{{\tilde{F}}_{(x, y)}}}

(5)

where the subscript

(x, y)

and

{\tilde{F}}_{(x, y)}

denote the pixel location and the corresponding pixel value, respectively.

Lastly, the original input features are weighted according to the attention score of each position with 2D probability distribution and transformed back to the 3D space again. The calculation process is described in Equation (6)

F^{*} = (softmax (avePool (F)) \otimes F

(6)

where

F^{*}

is the output feature of SSA with

H \times W \times C

.

The simplified spatial attention includes three operations: average pool, softmax, and weighting, which is simple to calculate. Moreover, it is able to capture the probabilistic information reflecting which pixel is the shadow by using the softmax function. This information contributes to establishing a more accurate soft shadow mask.

3.3. The Loss Function

To detect the shadow mask and correct the dark illumination, the loss function of DICNet includes two components: shadow detection loss

L_{s h a}

and illumination correction loss

L_{i l l}

. The shadow detection loss is designed to measure the discrepancy between the true shadow mask

\tilde{M}

and the predicted shadow mask

M

. As shown in Equation (7), it includes two components:

L_{s h a 1}

and

L_{s h a 2}

.

L_{s h a} = λ_{s h a} L_{s h a 1} + (1 - λ_{s h a}) L_{s h a 2}

(7)

where

λ_{s h a}

is a trade-off parameter and used to balance the influence of the terms,

L_{s h a 1}

is the L1 loss, and

L_{s h a 2}

is the mean square error; they are described in Equation (8) and Equation (9), respectively.

L_{s h a 1} = {∥\tilde{M} - M∥}_{1}

(8)

L_{s h a 2} = \frac{1}{N} \sum_{i = 1}^{N} {({\tilde{M}}_{i} - M_{i})}^{2}

(9)

where N is the total number of pixels, and

{\tilde{M}}_{i}

and

M_{i}

are the ith pixel values in

\tilde{M}

and

M

, respectively.

The illumination correction loss is used to ensure accurate shadow removal, as shown in Equation (10), and also includes two components:

L_{i l l 1}

and

L_{i l l}

.

L_{i l l} = λ_{i l l} L_{i l l 1} + (1 - λ_{i l l}) L_{i l l 2}

(10)

where

λ_{i l l}

is the trade-off parameter. As shown in Equations (11) and (12),

L_{i l l 1}

and

L_{i l l 2}

are the L1 loss and weighted mean square error, which are used to measure the discrepancy between the real shadow-free image

\tilde{O}

(i.e., the ground truth) and the predicted shadow-free image

O

.

L_{i l l 1} = {∥\tilde{O} - O∥}_{1}

(11)

L_{i l l 2} = \frac{1}{N} \sum_{i = 1}^{N} M_{i} {({\tilde{O}}_{i} - O_{i})}^{2}

(12)

where

{\tilde{O}}_{i}

and

O_{i}

are the ith pixel values in

\tilde{O}

and

O

, respectively.

The total loss of DICNet

L_{t o t a l}

is shown as follows:

L_{t o t a l} = λ L_{s h a} + (1 - λ) L_{i l l}

(13)

where

λ

is the trade-off parameter, and it is used to balance the effect of

L_{s h a}

and

L_{i l l}

. Although both L1 loss and mean square error are commonly used, this combination of losses can simultaneously measure the errors in shadow detection and illumination correction.

4. Experiments

4.1. Evaluation Metrics

Some generally used evaluation metrics [22,25,26,27] are selected to quantitatively evaluate the effect of our shadow removal method.

First, the root mean square error (RMSE) in RGB color space is selected as a quantitative evaluation metric to evaluate shadow removal accuracy by comparison with the corresponding non-shadow image, which is defined as follows:

R M S E (\tilde{O}, O) = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {({\tilde{O}}_{i}, O_{i})}^{2}}

(14)

The second evaluation metric is peak-signal-to-noise ratio (PSNR), which is defined as follows:

P S N R (\tilde{O}, O) = 20 \log_{10} (\frac{255}{R M S E (\tilde{O}, O)})

(15)

The third evaluation metric is structural similarity (SSIM), which is defined as follows:

S S I M (\tilde{O}, O) = \frac{(2 u_{\tilde{o}} u_{o} + c_{1}) (2 σ_{\tilde{o} o} + c_{2})}{(u_{\tilde{o}}^{2} + u_{o}^{2} + c_{1}) (σ_{\tilde{o}}^{2} + σ_{o}^{2} + c_{2})}

(16)

where

u_{\tilde{o}}

and

u_{o}

are the mean of pixel values;

σ_{\tilde{o}}

and

σ_{o}

are the variance of pixel values;

σ_{\tilde{o} o}

is the covariance;

c_{1}

and

c_{2}

are small constants used to avoid denominators of zero.

These metrics take into account the color accuracy, signal-to-noise ratio, and structural similarity of the document images. Through them, we can have a more comprehensive understanding of the performance of shadow removal methods in different aspects.

4.2. Experimental Settings and Baseline Methods

Since the numbers of images in existing real datasets are too small, as shown in Table 1, it is difficult to support the training of the network model. Therefore, we take the synthetic datasets FSDSRD and SDCSRD as the training set to train the models and then test them on the real datasets Adobe, HS, Jung, Kligler, and OSR. The FSDSRD contains 14,420 pairs of triples consisting of ground truth, shadow image, and soft shadow mask. The images in FSDSRD are resized to

512 \times 512

. The images in Jung and Kligler are also resized to

512 \times 512

for inference; after obtaining the predicted shadow-removed images, they are resized to their original size to calculate the metrics. In this work,

λ_{s h a} = 0.5

,

λ_{i l l} = 0.1

, and

λ = 0.4

.

DICNet is compared with ten state-of-the-art methods, including seven heuristic-based methods [14,15,17,28,29,30,31] and three neural network-based methods [22,25,27], to illustrate its performance. For the heuristic-based methods, the results are obtained with the code provided by the authors, and for the neural network-based methods, the results are cited from original articles.

4.3. Visual Comparison

In terms of visual comparison, the qualitative results of various methods are presented in Figure 5. We can see that the comparison methods often lead to background color changes when dealing with the shadow areas of colored background. However, our method is able to accurately restore the original background color of these regions. In addition, DICNet performs excellently in removing severe shadows, as shown in lines 4 and 5. While other methods may accidentally delete some texts in the document image, our method successfully eliminates all shadow regions and cleverly preserves the original text contents.

The DICNet exhibits superior visual effects and quantitative performance compared to other methods in dealing with various scenarios, including printed documents, posters, and fonts. These results fully demonstrate the effectiveness and excellence of our proposed method in visual restoration, especially when removing shadows in document images and maintaining high quality and naturalness.

4.4. Quantitative Analysis

In this work, the SSIM, RMSE, and PSNR are used to quantitatively analyze the performance of the proposed DICNet. All the images in the testing set of each dataset are used. We recorded the SSIM/RMSE/PSNR values of each method on different datasets and counted the average and standard deviation.

The quantitative analysis results of SSIM are shown in Table 2. The closer the SSIM value is to 1, the higher the structural fidelity. The DICNet achieved higher SSIM values on all datasets, especially on Adobe (0.9978) and Kligler (0.9956) datasets. Besides that, its standard deviation is the smallest, indicating its performance is more stable than others. The reason may be the SSCABlock’s dual-channel attention mechanism: It not only captures the global illumination distribution through channel attention, but also focuses on the local texture of text and background through spatial attention (e.g., small characters and complex typographic in Kligler dataset). On the other hand, traditional methods (e.g., ISR, 3D-PC) may lose the local structure due to only focusing on global illumination.

Figure 5. The visual comparison on the SDCSRD dataset: (a) input image; (b) WF method; (c) BE method; (d) LGBC method; (e) JWF method; (f) BEATE method; (g) CBENet; (h) DICNet; and (i) the ground truth.

As shown in Table 3, DICNet achieves the lowest RMSE across all datasets, with an average RMSE of 8.1467, which is only 75% of that of the suboptimal method BEATE (10.8854). Additionally, its standard deviation is the smallest. The key to achieving such results lies in the accuracy of the soft shadow mask: the mask output by the shadow detection branch can distinguish pixel-level differences between shadow regions and non-shadow regions (such as the hard shadow boundaries in the HS dataset), enabling the illumination correction block to only supplement light to the shadow regions and avoid the over-correction of non-shadow regions, thereby reducing the overall pixel error. Since lacking of refined shadow masks, methods such as LGBC and CBENet may cause the incorrect correction of non-shadow regions, leading to an increase in RMSE. For example, the RMSE of LGBC on the Jung dataset is 20.8406, which is twice that of DICNet.

The PSNRs of all comparison methods are shown in Table 4. Higher PSNR values indicating less noise and better quality. The average PSNR of DICNet (30.3618) far exceeds that of BEATE (28.1850) and CBENet (23.9249), with particularly outstanding performance on the Adobe (33.0658) and Kligler (31.1705) datasets. This benefit from the hierarchical adjustment mechanism of the illumination correction block: combined with the multi-round feature optimization of the SSCABlock, the light supplementation for shadow areas is smoother (for example, when processing light shadows in Kligler, it avoids “overexposure” noise caused by excessive correction). In contrast, methods like BEATE may cause local brightness jumps due to a single correction strategy, reducing PSNR.

From Table 2 and Table 4, we can see that DICNet achieves the highest SSIM (0.9978) and PSNR (33.0658) on the Adobe dataset, far exceeding the suboptimal method BEATE (SSIM 0.9513, PSNR 32.2547). This may be because the shadow variations in the Adobe dataset are complex (each document has 5–9 shading variations), while the SSCABlock of DICNet can dynamically balance the global illumination distribution and local shadow details, avoiding the over-correction of BEATE in strong shadow regions. DICNet achieves the optimal performance in terms of PSNR (30.7542) and SSIM (0.9949) on HS dataset, which contains hard shadows and transition regions under sunlight and LED lighting. This is attributed to the soft shadow mask output by the shadow detection branch, which can accurately distinguish between the core area of hard shadows and the transition area. The illumination correction block supplements light in a graded manner based on this mask. In contrast, methods like MS-GAN, due to their rough masks, tend to produce brightness discontinuities in the transition area.

Overall, the proposed method shows significant performance benefits on different document shadow removal tasks; it performed better in terms of structural detail preservation, error minimization, and overall image quality improvement.

4.5. Ablation Experiment

To gain a deeper understanding of the role of each module in DICNet, a series of ablation experiments are conducted on the Adobe and HS datasets, and the results are shown in Table 5.

First, the SSCA is removed and convolutions are used instead. The experimental results in the third line show that the addition of SSCA is very important to improve the model performance. Specifically, on the Adobe dataset, PSNR increases from 28.2369 to 33.0658, and RMSE decreases from 10.5231 to 5.8785. More notably, on the HS dataset, SSIM increases from 0.9413 to 0.9959. This demonstrates the important role of SSCA in fusing global channel information (illumination distribution) with local spatial information (text/background boundary). Especially on the HS dataset, the SSIM decreases from 0.9959 to 0.9413, which verifies the ability of SSCA to capture the details of the hard shadow transition area.

Second, the shadow detection branch in the network structure is removed, and the network outputs the shadow-free images without a shadow mask. We can see that, compared to the complete DICNet, PSNR and SSIM decreased, and RMSE increased on both the Adobe and HS datasets. This indicates that the accurate soft shadow mask (

M

) serves as a "navigation map" for illumination correction: without

M

, the correction would blindly enhance all low-brightness regions (including non-shadow dark text), leading to a decrease in the contrast between text and background; whereas the presence of

M

ensures that the correction only acts on real shadow regions, preserving the original brightness of the text.

5. Conclusions

To effectively remove shadows from document images, we proposed an end-to-end document image shadow removal network (DICNet) based on illumination correction. A simplified shadow-corrected attention block is designed to capture the spatial and channel information. With it, the shadow detection block obtained the soft shadow mask to identify the shadow intensity at each pixel in the document image. Then, the illumination correction block generated a shadow-free image by correcting the illumination of the shaded parts with a soft shadow mask. The experimental results on different datasets show that the proposed DICNet achieves significant results in shadow removal in document images. However, since the limitation of existing datasets in terms of shadow types, scene complexity, and the number of images, it is a lack of broader, systematic randomized evaluation. In the future, we will expand datasets which may encompass a wider variety of shadow types, scenes, and challenges, and conduct more systematic evaluations including rigorous, randomized large-scale visual evaluation to systematically demonstrate the method’s performance under diverse conditions.

Author Contributions

D.G. wrote the manuscript; W.L. performed the experiment and simulation; S.C. processed the experimental data and participated in the revision of the manuscript; J.Q. provided financial support; X.M. assisted with the experiments and analysis; B.W. administrated the project. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Science and Technology Plan Project of Nantong (No. JC2023023), Social Livelihood Science and Technology Plan of Nantong City (No. MS2024016, MSZ2024122), the Project of Guangdong Provincial Key Laboratory of Intelligent Information Processing (No. 2023B1212060076), National Natural Science Foundation of China, Youth Fund (No. 62102318), Phase III Project of the Brand Professional Programs Construction Project for Universities in Jiangsu Province, the Nantong Key Laboratory of Virtual Reality and Cloud Computing (No. CP2021001), the Electronic Information Master’s project of Nantong Institute of Technology (No. 879002), and the Software Engineering Key Discipline Construction project of Nantong Institute of Technology (No. 879005), the Phd project in Nantong Institute of Technology (No. 2023XK(B)06).

Data Availability Statement

The code is available at: https://github.com/gdpinntit/Document-Image-Shadow-Removal-based-on-Illumination-Correction/tree/main (accessed on 21 May 2025). All datasets used in this study are open access.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhang, J.; Liang, L.; Ding, K.; Guo, F.; Jin, L. Appearance enhancement for camera-captured document images in the wild. IEEE Trans. Artif. Intell. 2023, 5, 2319–2330. [Google Scholar] [CrossRef]
Imahayashi, S.; Mukaida, M.; Takeda, S.; Suetake, N. Shadow removal from document image based on background estimation employing selective median filter and black-top-hat transform. Opt. Rev. 2023, 30, 336–340. [Google Scholar] [CrossRef]
Bogdan, K.O.; Megeto, G.A.; Leal, R.; Souza, G.; Valente, A.C.; Kirsten, L.N. Ddoce: Deep document enhancement with multi-scale feature aggregation and pixel-wise adjustments. In Proceedings of the International Conference on Document Analysis and Recognition, Lausanne, Switzerland, 5–10 September 2021; Springer: Cham, Switzerland, 2021; pp. 229–244. [Google Scholar]
Georgiadis, K.; Yucel, M.K.; Skartados, E.; Dimaridou, V.; Drosou, A.; Saa-Garriga, A.; Manganelli, B. Lp-ioanet: Efficient high resolution document shadow removal. In Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Rhodos, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar]
Amin, B.; Riaz, M.M.; Ghafoor, A. Automatic shadow detection and removal using image matting. Signal Process. 2020, 170, 107415. [Google Scholar] [CrossRef]
Michalak, H.; Okarma, K. Fast binarization of unevenly illuminated document images based on background estimation for optical character recognition purposes. J. Univers. Comput. Sci. 2019, 25, 627–646. [Google Scholar]
Das, S.; Sial, H.A.; Ma, K.; Baldrich, R.; Vanrell, M.; Samaras, D. Intrinsic decomposition of document images in-the-wild. arXiv 2020, arXiv:2011.14447. [Google Scholar]
Ravindran, A.; Arudselvam, U.; Thayasivam, U. Shadow removal for documents with reflective textured surface. In Proceedings of the IEEE 2022 Moratuwa Engineering Research Conference, Moratuwa, Sri Lanka, 27–29 July 2022; pp. 1–6. [Google Scholar]
Oliveira, D.M.; Lins, R.D. A new method for shading removal and binarization of documents acquired with portable digital cameras. In Proceedings of the Third International Workshop on Camera-Based Document Analysis and Recognition, Barcelona, Spain, 25 July 2009; pp. 3–10. [Google Scholar]
Wang, B.; Feng, S.; Chen, C.P. Strong shadow removal of text document images based on background estimation and shading scale. In Proceedings of the IEEE 2020 7th International Conference on Information, Cybernetics, and Computational Social Systems, Guangzhou, China, 13–15 November 2020; pp. 738–742. [Google Scholar]
Gong, H.; Cosker, D. Interactive shadow removal and ground truth for variable scene categories. In Proceedings of the BMVC 2014—Proceedings of the British Machine Vision Conference, Nottingham, UK, 1–5 September 2014; pp. 1–11. [Google Scholar]
Inoue, N.; Yamasaki, T. Learning from synthetic shadows for shadow detection and removal. IEEE Trans. Circuits Syst. Video Technol. 2020, 31, 4187–4197. [Google Scholar] [CrossRef]
Bako, S.; Darabi, S.; Shechtman, E.; Wang, J.; Sunkavalli, K.; Sen, P. Removing shadows from images of documents. In Proceedings of the Asian Conference on Computer Vision, Taipei, China, 20–24 November 2016; pp. 173–183. [Google Scholar]
Jung, S.; Hasan, M.A.; Kim, C. Water-filling: An efficient algorithm for digitized document shadow removal. In Proceedings of the Asian Conference on Computer Vision, Perth, Australia, 2–6 December 2018; Springer: Cham, Switzerland, 2018; pp. 398–414. [Google Scholar]
Kligler, N.; Katz, S.; Tal, A. Document enhancement using visibility detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2374–2382. [Google Scholar]
Wang, B.; Chen, C.P. Local water-filling algorithm for shadow detection and removal of document images. Sensors 2020, 20, 6929. [Google Scholar] [CrossRef]
Shah, V.; Gandhi, V. An iterative approach for shadow removal in document images. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, Calgary, AB, Canada, 15–20 April 2018; pp. 1892–1896. [Google Scholar]
Matsuo, Y.; Akimoto, N.; Aoki, Y. Document shadow removal with foreground detection learning from fully synthetic images. In Proceedings of the 2022 IEEE International Conference on Image Processing, Bordeaux, France, 16–19 October 2022; pp. 1656–1660. [Google Scholar]
Liu, W.; Wang, B.; Wang, Z.; Chen, C.L. DocShaDiffusion: Diffusion Modal in Latent Space for Document Image Shadow Removal. arXiv 2025, arXiv:2507.01422. [Google Scholar]
Lin, Y.H.; Chen, W.C.; Chuang, Y.Y. Bedsr-net: A deep shadow removal network from a single document image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 12905–12914. [Google Scholar]
Chen, X.; Cun, X.; Pun, C.M.; Wang, S. Shadocnet: Learning spatial-aware tokens in transformer for document shadow removal. In Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Rhodos, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar]
Zhang, L.; He, Y.; Zhang, Q.; Liu, Z.; Zhang, X.; Xiao, C. Document image shadow removal guided by color-aware background. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 1818–1827. [Google Scholar]
Luo, Z.; Gustafsson, F.K.; Zhao, Z.; Sjölund, J.; Schön, T.B. Refusion: Enabling large-size realistic image restoration with latent-space diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Rhodes, Greece, 4–10 June 2023; pp. 1680–1691. [Google Scholar]
Chen, L.; Chu, X.; Zhang, X.; Sun, J. Simple baselines for image restoration. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Cham, Switzerland, 2022; pp. 17–33. [Google Scholar]
Jin, Y.; Sharma, A.; Tan, R.T. Dc-shadownet: Single-image hard and soft shadow removal using unsupervised domain-classifier guided network. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 5027–5036. [Google Scholar]
Guo, L.; Wang, C.; Yang, W.; Huang, S.; Wang, Y.; Pfister, H.; Wen, B. Shadowdiffusion: When degradation prior meets diffusion model for shadow removal. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 14049–14058. [Google Scholar]
Hu, X.; Jiang, Y.; Fu, C.W.; Heng, P.A. Mask-shadowgan: Learning to remove shadows from unpaired data. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 2472–2481. [Google Scholar]
Liu, W.; Wang, B.; Zheng, J.; Wang, W. Shadow removal of text document images using background estimation and adaptive text enhancement. In Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Rhodos, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar]
Wang, Z.; Wang, B.; Zheng, J.; Chen, C.P. Joint water-filling algorithm with adaptive chroma adjustment for shadow removal from text document images. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, Prague, Czech Republic, 9–12 October 2022; pp. 2882–2887. [Google Scholar]
Wang, J.R.; Chuang, Y.Y. Shadow removal of text document images by estimating local and global background colors. In Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, Virtual, 4–9 May 2020; pp. 1534–1538. [Google Scholar]
Wang, B.; Chen, C.P. An effective background estimation method for shadows removal of document images. In Proceedings of the 2019 IEEE International Conference on Image Processing, Taipei, China, 22–25 September2019; pp. 3611–3615. [Google Scholar]

Figure 1. Examples of document images with shadows.

Figure 2. DICNet structural diagram.

Figure 3. Simplified shadow-corrected attention block structural diagram.

Figure 4. Simplified spatial attention structural diagram.

Table 1. The details of document shadow removal datasets.

Dataset	Number of Images	Shadow Mask Is Available	Real/Synthetic
Adobe [13]	81	no	real
Jung [14]	87	no	real
Kligler [15]	381	no	real
OSR [16]	237	yes	real
HS [17]	100	no	real
FSDARD [18]	14,200	yes	synthetic
SDCSRD [19]	17,624	yes	synthetic

Table 2. The quantitative analysis on SSIM.

Methods	Adobe	HS	Jung	Kligler	OSR	Average	Standard Deviation
ISR [17]	0.6511	0.8707	0.8714	0.7544	0.8259	0.7947	0.0835
3D-PC [15]	0.7931	0.8970	0.8527	0.7969	0.8491	0.8378	0.0388
WF [14]	0.7748	0.9370	0.9173	0.8341	0.9141	0.8755	0.0614
BE [31]	0.7182	0.8608	0.6576	0.7955	0.8234	0.7711	0.0736
LGBC [30]	0.8581	0.7093	0.8135	0.8841	0.8256	0.8181	0.0598
JWF [29]	0.9169	0.9072	0.8747	0.8023	0.9004	0.8803	0.0414
BEATE [28]	0.9513	0.9469	0.9162	0.9268	0.9372	0.9357	0.0129
MS-GAN [27]	0.8506	0.9225	0.8816	0.8160	0.9117	0.8765	0.0393
DCShadow-Net [25]	0.9183	0.9351	0.8778	0.8267	0.8996	0.8915	0.0376
CBENet [22]	0.9249	0.9566	0.9466	0.9216	0.9506	0.9401	0.0141
DICNet	0.9978	0.9949	0.9894	0.9956	0.9942	0.9944	0.0028

Table 3. The quantitative analysis on RMSE.

Methods	Adobe	HS	Jung	Kligler	OSR	Average	Standard Deviation
ISR [17]	132.8236	69.9663	51.4320	94.9936	74.7759	84.7983	27.4869
3D-PC [15]	45.1189	22.9948	27.1410	22.1827	32.5690	30.0013	8.4088
WF [14]	79.1152	20.6632	18.4178	38.0803	31.9471	37.6447	21.9568
BE [31]	55.0093	41.5587	96.7136	40.5711	40.9424	54.9590	21.5708
LGBC [30]	10.3736	27.5048	20.8406	9.7618	23.3185	18.3599	7.1004
JWF [29]	23.7720	38.1403	53.7947	48.7566	28.1078	38.5143	11.5253
BEATE [28]	6.5083	9.0040	15.2094	8.0355	15.6698	10.8854	3.8055
MS-GAN [27]	15.6217	14.6334	31.1069	27.1857	23.9341	22.4964	6.4387
DCShadow-Net [25]	10.4625	15.7446	33.1704	29.7510	23.3857	22.5028	8.4588
CBENet [22]	24.7669	16.6032	22.0069	10.1808	19.6035	18.6323	5.0102
DICNet	5.8785	7.4634	10.1487	7.2563	9.3031	8.0100	1.5268

Table 4. The quantitative analysis on PSNR.

Methods	Adobe	HS	Jung	Kligler	OSR	Average	Standard Deviation
ISR [17]	5.7007	12.2114	14.1089	8.6179	10.7413	10.2760	2.9098
3D-PC [15]	15.0667	20.9485	19.5096	21.3490	18.1030	18.9954	2.2737
WF [14]	10.2377	22.3817	22.8450	16.6868	18.3221	18.0947	4.5762
BE [31]	13.6516	16.6430	9.0183	16.3216	16.5030	14.4275	2.9211
LGBC [30]	28.3047	19.8316	21.8917	28.6566	20.9347	23.9239	3.7789
JWF [29]	21.3437	16.8070	13.6268	14.5352	19.7622	17.2150	2.9583
BEATE [28]	32.2547	29.1951	24.6339	30.3406	24.5006	28.1850	3.1117
MS-GAN [27]	24.7240	24.9514	18.5062	19.9905	20.9466	21.8237	2.5819
DCShadow-Net [25]	27.9549	24.3020	18.2546	19.1632	23.3857	22.6121	3.5463
CBENet [22]	22.5846	24.5800	21.4163	28.2218	22.8219	23.9249	2.3748
DICNet	33.0658	30.7542	28.0092	31.1705	28.8094	30.3618	1.7924

Table 5. The ablation experiment results.

Methods	Adobe			HS
Methods	PSNR	SSIM	RMSE	PSNR	SSIM	RMSE
DICNet w/o SSCA	28.2369	0.9821	10.5231	29.4675	0.9413	9.6438
DICNet w/o mask	30.6627	0.9913	7.5681	30.6781	0.9474	7.4896
DICNet	33.0658	0.9978	5.8785	30.7542	0.9959	7.4634

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gao, D.; Liu, W.; Chen, S.; Qiu, J.; Mei, X.; Wang, B. Document Image Shadow Removal Based on Illumination Correction Method. Algorithms 2025, 18, 468. https://doi.org/10.3390/a18080468

AMA Style

Gao D, Liu W, Chen S, Qiu J, Mei X, Wang B. Document Image Shadow Removal Based on Illumination Correction Method. Algorithms. 2025; 18(8):468. https://doi.org/10.3390/a18080468

Chicago/Turabian Style

Gao, Depeng, Wenjie Liu, Shuxi Chen, Jianlin Qiu, Xiangxiang Mei, and Bingshu Wang. 2025. "Document Image Shadow Removal Based on Illumination Correction Method" Algorithms 18, no. 8: 468. https://doi.org/10.3390/a18080468

APA Style

Gao, D., Liu, W., Chen, S., Qiu, J., Mei, X., & Wang, B. (2025). Document Image Shadow Removal Based on Illumination Correction Method. Algorithms, 18(8), 468. https://doi.org/10.3390/a18080468

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Document Image Shadow Removal Based on Illumination Correction Method

Abstract

1. Introduction

2. Related Works

2.1. Document Shadow Removal Dataset

2.1.1. The Real Datasets

2.1.2. The Synthetic Datasets

2.2. Document Shadow Removal Methods

3. The Dark Illumination Correction Net

3.1. Overall DICNet Structure

3.2. The Simplified Shadow-Corrected Attention Block

3.3. The Loss Function

4. Experiments

4.1. Evaluation Metrics

4.2. Experimental Settings and Baseline Methods

4.3. Visual Comparison

4.4. Quantitative Analysis

4.5. Ablation Experiment

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI