Real-Time High Dynamic Equalization Industrial Imaging Enhancement Based on Fully Convolutional Network

Shi, Chenbo; Ren, Xiangqun; Mo, Yuanzheng; Zhang, Guodong; Yan, Shaojia; Wang, Yu; Zhu, Changsheng

doi:10.3390/electronics14030547

Open AccessArticle

Real-Time High Dynamic Equalization Industrial Imaging Enhancement Based on Fully Convolutional Network

by

Chenbo Shi

¹,

Xiangqun Ren

¹

,

Yuanzheng Mo

¹

,

Guodong Zhang

²,

Shaojia Yan

¹,

Yu Wang

³ and

Changsheng Zhu

^1,*

¹

College of Intelligent Equipment, Shandong University of Science and Technology, Taian 271019, China

²

Department of Artificial Intelligence, Suzhou Lamberv Intelligent Technology, Suzhou 215000, China

³

College of Information Science and Technology, Qingdao University of Science and Technology, Qingdao 266000, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(3), 547; https://doi.org/10.3390/electronics14030547

Submission received: 27 December 2024 / Revised: 19 January 2025 / Accepted: 27 January 2025 / Published: 29 January 2025

(This article belongs to the Special Issue Artificial Intelligence Innovations in Image Processing)

Download

Browse Figures

Versions Notes

Abstract

Severe reflections on the surfaces of smooth objects can result in low dynamic range and uneven illumination in images, which negatively impacts downstream tasks such as defect detection and QR code recognition on images of smooth workpieces. Consequently, this paper proposes a novel approach to real-time high dynamic equalization imaging based on a fully convolutional network, termed Multi-exposure Image Fusion with Multi-dimensional Attention Mechanism and Training Storage Units (MEF-AT). Specifically, this paper innovatively proposes using training storage units, which utilize intermediate results during network training as auxiliary images, to remove uneven illumination and enhance image dynamic range effectively. Furthermore, by integrating a multi-dimensional attention mechanism into the backbone network, the model can more efficiently extract and utilize critical image information. Additionally, this paper introduces a Deep Guided Filter (DGF) with learnable parameters, which upsample the weight maps generated by the network, thus better adapting to complex industrial scenarios and producing higher quality fused images. An image evaluation metric assessing the lighting uniformity is introduced to thoroughly evaluate the proposed method’s performance. Given the lack of an MEF dataset for smooth workpieces, this paper collects a new dataset for multi-exposure fusion tasks on metallic workpieces. Our method takes less than 4 ms to run four 2K images on a GPU 3090. Both qualitative and quantitative experimental results demonstrate our method’s superior comprehensive performance in proprietary industrial and public datasets.

Keywords:

attention mechanism; fully convolutional network; image evaluation metrics; multi-exposure image fusion; training storage unit; uneven illumination

1. Introduction

Multi-exposure image Fusion (MEF) offers a cost-effective high dynamic imaging (HDR) solution by accepting a sequence of images taken under varying exposure conditions as inputs and producing high-quality images with enhanced dynamic range [1] as an output. Utilizing two-dimensional image information for character recognition and defect detection on metallic workpieces is a common industrial application within machine vision. However, due to the typically smooth and reflective surfaces of metals, severe localized reflections often occur when identifying and inspecting metallic surfaces, as illustrated in Figure 1. Different lighting equipment choices and polarizing filters are used in the industrial field to address these reflections. However, these methods are costly, inefficient, and unable to meet real-time requirements. Therefore, this paper adopts a deep learning-based multi-exposure image fusion approach to address the issues of low dynamic range and uneven local lighting in images of metal materials. Recently, many end-to-end deep learning-based MEF algorithms [2,3,4,5,6,7,8,9,10,11,12,13] have been introduced and have become mainstream in the MEF field. However, there are still four primary issues that persist in industrial scenarios: (1) Under extreme exposure levels, such as combining a highly overexposed and an underexposed image, the resulting fused image often performs poorly. Moreover, most existing algorithms typically limit the number of images that can be fused. When the number of source images is restricted, and there is a significant difference in exposure between them, the efficacy of these algorithms in producing a satisfactory fused result decreases; (2) There are a lack of training data. One of the biggest challenges in applying deep learning to image fusion is the need for authentic fused images for supervised learning. Some approaches attempt to address this issue by manually constructing ground truths, which often need to be inaccurate and set a ceiling on network learning potential. Moreover, for industrial scenarios, acquiring well-exposed images of smooth workpieces in practical applications is particularly challenging. (3) Balancing efficiency and quality is challenging, making practical deployment difficult. Most existing MEF methods struggle to balance efficiency and quality. MEFNet [14] can process images in real-time, benefiting from operations at a low resolution. However, it relies solely on MEF-SSIM [15] (MEF Structural Similarity Index) as its loss function, which may result in the loss of information from the source images. Qu [16] proposed a new method named TransMEF, which uses Transformer technology to enhance the quality of fused images. However, this method does not consider processing speed, making applying it in natural industrial environments challenging; and (4) existing Image Quality Assessment (IQA) methods rarely address the objective evaluation of uneven illumination. The goal expectation is to minimize the effects of uneven lighting while preserving the detail of the industrial image. This requirement differs from the treatment of natural images, where the lighting element is often considered a significant factor affecting image quality and is included in most IQA algorithms’ metrics. Consequently, most current evaluation metrics do not adequately measure whether the lighting in industrial images is uniform.

To address problems (1) and (2), this paper proposes an unsupervised, fully convolutional network named MEF-AT, capable of handling an arbitrary number of exposure images. Multiple industrial images with different exposures can be taken for image fusion according to their complexity for complex industrial scenes. This paper also adopts an unsupervised approach for network training to overcome the lack of authentic fused images. The paper combines multi-dimensional attention mechanisms and a DGF module [17] with the fully convolutional network supported by training storage units during training to resolve issue (3). The attention mechanism enhances image detail retention by operating across channel and spatial dimensions. The DGF module applies complex industrial scenarios through its learnable parameters for the generated low-resolution weight maps. It uses a guide image to produce high-resolution weight maps fused to generate high-quality industrial images. During training, the training storage units store intermediate fused images produced, serving as supervisory auxiliary signals to further enhance the dynamic range of the fused images. This paper proposes a new evaluation metric named Illumination Component Gradient (ICG) to tackle issue (4). ICG uses multi-scale Gaussian functions to extract the illumination component of images and calculates the average gradient of this component to objectively and accurately assess the level of uneven illumination. The contributions of this paper are summarized as follows:

An unsupervised, fully convolutional network with a multidimensional attention mechanism is proposed. It can receive images of any spatial resolution and exposure number, generate high-quality fused images in real-time, and adapt to complex industrial scenarios.
A training storage unit is introduced to transform the training scheme in the unsupervised image fusion framework. Intermediate results during network training are used as auxiliary supervisory signals, effectively eliminating uneven illumination in industrial images and enhancing image dynamic range.
An image evaluation metric named ICG is proposed to measure the level of uneven illumination in industrial images, refining the evaluation metrics system for industrial images.

The remainder of the paper is organized as follows: Section 2 describes current methods and applications of multi-exposure image fusion. Section 3 introduces the proposed method. In Section 4, experiments validate the effectiveness of this method. Finally, a discussion and overall conclusion surrounding the algorithm is given in Section 5 and Section 6.

2. Related Work

MEF tasks typically focus on finding appropriate methods to determine the weights of exposure sequence images. Mertens [1] derived fusion weights based on exposure sequences’ exposure, contrast, and saturation. Since then, many pixel-level multi-exposure image fusion techniques [18] have been developed, primarily aiming to enhance visual effects, but consequently increasing computational complexity. Li [19] employed a guided filter to decompose source images into base and detail layers and utilized a weighted average technique to merge different layers, effectively leveraging spatial consistency information. The GGIF [20] technique introduced gradient-domain edge-preserving smoothing instead of Gaussian smoothing to suppress noise interference and halo effects. Compared to per-pixel MEF algorithms, patch-based methods produce smoother weight maps, require less post-processing, and carry a heavier computational load [21,22]. Some approaches [2,23,24] prefer to frame the MEF task as an optimization problem. However, these methods, which almost always involve manually designed features, have limited applicability to complex industrial scenarios due to their general inflexibility and relatively high time consumption, challenging practical application in the industrial field.

Deep learning-based MEF methods have significantly improved performance and visual effects in recent years. DeepFuse [2] was the first to integrate deep learning with MEF tasks, constructing a novel CNN (Convolutional Neural Network) architecture and employing MEF-SSIM as the training loss function for unsupervised learning. MEF-GAN [3] introduced Generative Adversarial Networks to MEF for the first time, incorporating a self-attention mechanism to correct artifacts in the fused images. AGAL [4] utilized dual GANs to impose global and local constraints, with the global discriminator allowing the fused image to learn the overall exposure distribution of the ground truth and the local discriminator focusing on detail preservation. PMGI [5] and SDNet [6] maintained the similarity between the fusion results and source images using gradients and intensities. MEF-CL [25] and HoLoCo [7] introduced contrastive learning to the MEF task, modeling the contrastive relationships between source and reference images, thus achieving better fusion performance without adding any model parameters. HALDeR [8] introduced a multi-scale attention mechanism, extending attention coverage from different angles to achieve good exposure correction and prevent color distortion. CF [9] combined MEF with super-resolution tasks through a multi-scale coupled feedback module, allowing the two tasks to collaborate and interact. The specially designed coupled feedback network performed well on both tasks. Han [10] proposed a depth-perception enhanced network for MEF, known as DPE-MEF, which utilizes two modules to collect content details and handle the final result’s color mapping and correction. FCMEF [26] introduced a Fourier Transform-based Pixel Intensity Transfer strategy to synthesize many images with varying exposure levels for training the image fusion network, enabling the trained network to maintain robustness and effectiveness when fusing images with extreme and diverse exposure levels.

In addition to the deep learning methods tailored for Multi-Exposure Fusion, various general image fusion frameworks that include MEF tasks have also been proposed. IFCNN [11] implemented a universal image fusion model capable of handling various fusion tasks by adopting different fusion rules. U2Fusion [12] utilizes DenseNet to extract features from source images, with information measures derived from these feature maps determining the extent of information retention in the images; it also employs Elastic Weight Consolidation to address the forgetting problem associated with continual learning. SwinFusion [13] uses a Transformer-based deep feature reconstruction unit and a CNN-based image reconstruction unit to capture global and local information for reconstructing the fused image. SDDNet [27] integrates dilated convolution with atrous spatial pyramid pooling by employing an improved spatial pyramid pooling (ASPP) module that reduces the computational complexity while effectively expanding the sensory field. STRNet [28] consists of a squeeze-excited attention-based encoder, a multi-head attention-based decoder, and a concise design of the network by maintaining its fast processing speed.

However, due to the constraints of the network structures in the methods above, most are designed to accept a predefined number of exposure sequence images, which can be limiting when dealing with complex and variable industrial scenes. These limitations highlight the need for more adaptable solutions in MEF tasks to better accommodate the diverse requirements and unpredictable conditions typical of industrial environments.

3. Proposed Method

In order to ensure the real-time performance of the algorithm in industrial scenarios, this paper adopts the downsampling-upsampling image processing method. The network is presented with a low-resolution version of the input sequence to learn the generation of weight maps instead of directly generating fused images as in prior works [5,6,7,25]. This study employs a fully convolutional network to achieve flexibility, which accepts inputs of any size and produces outputs of the corresponding size (referred to as dense prediction). The network is shared among different exposure images, enabling it to handle an arbitrary number of exposures. The paper combines multi-dimensional attention mechanisms, training storage unit modules, and DGF-guided filtering modules with the fully convolutional network. This combination helps the network model extract and utilize critical information to generate high-quality fused images efficiently.

This article addresses network model design to meet real-time performance requirements, high quality, and flexibility when applying algorithms in industrial settings. The proposed MEF-AT network model consists of a bilinear downsampler, a multi-dimensional attention mechanism, a CAN network [29], a training storage unit module, and a DGF guided filtering module. Figure 2 illustrates the architecture of the model.

This paper initially downsamples the input sequence

X_{k}

to obtain a low-resolution version

X_{k}^{l}

, which is then fed into a fully convolutional network equipped with a multi-dimensional attention mechanism to produce a low-resolution weight map

W_{k}^{l}

. Specifically, the low-resolution input sequence image

X_{k}^{l}

is first passed through a Channel Attention (CA) module [30], resulting in a feature map

X

. This feature map is then split along the batch size dimension and entered into a dilated convolution module with Spatial Attention (SA) to obtain the low-resolution weight map

W_{k}^{l}

.

X_{k}

,

X_{k}^{l}

, and

W_{k}^{l}

are inputs for the DGF upsampler to obtain a high-resolution weight map

W_{k}

. Finally, the fused image is obtained by computing the weighted sum of

X_{k}

and

W_{k}

. During training, a training storage unit utilizes the intermediate fusion results to enhance the fusion quality, forming a more effective training scheme.

3.1. Multi-Dimensional Attention Mechanism

The network is based on the fully convolutional network CAN. It is enhanced by adding channel and spatial attention mechanisms to improve the network’s ability to focus on and express critical information in complex industrial images. Small but critical pieces of information inevitably exist during the manufacturing process of workpieces. These pieces of information can often be lost due to the reflective properties of metal objects during photography. Therefore, focusing on these tiny critical details in industrial images has become one of the primary research focuses of this network design. This paper employs a multi-dimensional attention mechanism that adjusts the network’s focus on different image regions from global and local perspectives. The multidimensional attention mechanism enables the network model to automatically focus on the more important information regions in the industrial image by assigning weights to each position, while it can reduce the interference of the background on the image by lowering the weight of the background region, thus improving the robustness of the network model in complex environments. The mechanism improves the display of local details in the image, achieving a harmonious and unified overall appearance.

3.1.1. Channel Attention Mechanism

We first perform bilinear downsampling on the input sequence

X_{k} \in R^{1 \times H \times W}

at a rate s to obtain a lower-resolution version

X_{k}^{l} \in R^{1 \times H^{s} \times W^{s}}

. The downscaled sequences are fed into a CA module to generate a feature map X. Specifically, each input sequence

X_{k}^{l}

undergoes a conv2d, and the results are concatenated to form a 4D tensor. Subsequently, an attention mechanism is applied along the channel dimension

Y \in R^{K \times C \times H^{S} \times W^{S}}

.

p_{k, c}^{C} = \frac{1}{H^{s} \times W^{s}} \sum_{i = 1}^{H} \sum_{j = 1}^{W} Y_{k, c} (i, j),

(1)

Y_{k}^{C} = Y_{k} ⊙ σ (W_{2}^{C} \cdot δ (W_{1}^{C} \cdot p_{k}^{C})),

(2)

Y^{C} = {Y_{k}^{C}}, k = 1, \dots, K .

(3)

In the channel attention mechanism,

p_{k}^{C} \in R^{C \times 1 \times 1}

consists of scalars

p_{k, c}^{C}

. The linear weights in the CA mechanism are denoted as

W_{1}^{C}

and

W_{2}^{C}

. They are the weight matrices of two fully connected layers. The functions

σ

and

δ

represent the ReLU and Sigmoid activation functions, respectively. The operator ⊙ denotes element-wise multiplication with automatic broadcasting. The structure of the channel attention mechanism is shown in Figure 3.

The resultant feature map accumulates the rich contextual information necessary for predicting the subsequent weight maps

W_{k}^{l}

through the channel attention module that fuses features along the channel dimension. As illustrated in Figure 4, by comparing subfigures (b,c) or (d,e), it is observed that upon enabling the CA module, the weight map of the QR code region (marked by red boxes) in the original image1 is highlighted as displayed, indicating focused attention on fine details in the image. Conversely, in the original image3, the weight map of the QR code region exhibits lower weights, demonstrating better suppression of uneven illumination in the image.

3.1.2. Dilated Convolutions Module with Spatial Attention

In this paper, we effectively promote the effective fusion of local and global information of industrial images by a Dilated Convolutions Module with Spatial Attention. The exposure sequence X is further divided into

X_{1}, X_{2}, . . ., X_{K}

, (where each

X_{k} \in R^{C \times H^{S} \times W^{S}}

). Each

X_{k}

is then input into a dilated convolution module with spatial attention. This module consists of five dilated convolution branches, with dilation rates set to 2, 4, 6, 8, and 10, respectively. Each branch generates a feature map that incorporates a spatial attention mechanism [31]. In this context,

D (\cdot, r)

represents a dilated convolution with a dilation rate r, where dilation helps to expand the receptive field of the convolutional layers without reducing the resolution of the feature map or increasing the number of parameters significantly. The operator ⊗ denotes the conv2d operation.

D_{k}^{r} = D (X_{k}, r) \in R^{C \times H^{S} \times W^{S}}, r \in {2, 4, 6, 8, 10},

(4)

w_{k} = σ ([\frac{1}{C} \sum_{c = 1}^{C} D_{k}^{r} (c), max_{c} D_{k}^{r} (c)] \otimes W^{D}),

(5)

D_{k}^{S, r} = D_{k}^{r} ⊙ w_{k} .

(6)

After obtaining the outputs

D_{k}^{S, r}

, from each of the five branches in the dilated convolution module, these branches are concatenated and passed through another convolutional layer to produce the final weight map

W_{k}^{l}

. Spatial Attention helps optimize the spatial weighting of intra-frame pixels, effectively preserving details and suppressing artifacts. Dilated convolutions systematically aggregate multi-scale contextual information. They exponentially expand the receptive field without losing resolution or coverage. The structure of this process is depicted in Figure 5.

After completing the steps described, we gain multi-scale spatial attention from different dilation rates, allowing the generated feature maps to obtain more refined spatial structural information for predicting the subsequent weight maps

W_{k}^{l}

. In Figure 4, by comparing panels (b,d) or (c,e), we observe that the image weights become smoother. Furthermore, this paper conducts an ablation study to validate the effectiveness of the two attention modules, channel attention and spatial attention, as shown in Table 1. The study shows that activating either module leads to improved performance metrics. This result demonstrates that each attention mechanism contributes significantly to the network’s ability to finely tune the focus on crucial image details and enhance the overall quality of the image processing, leading to smoother and more accurate weight distributions in the output images.

3.2. Unsupervised Learning in Industrial Settings

Due to variations in industrial shooting conditions and the reflectivity of metal plates, controlling the exposure levels of the captured image sequences becomes challenging. When one or several source images exhibit significant exposure differences, the fusion performance of the images rapidly deteriorates. At the same time, because of the smooth and reflective properties of the metal workpieces, obtaining well-exposed images of metal plates can also be very challenging and time-consuming. Therefore, this paper adopts a fully convolutional network as the backbone, capable of accepting image sequences of any number and resolution while learning high-quality detail maps unsupervised.

After obtaining the low-resolution weight maps

W_{k}^{l}

, this paper uses a DGF module upsampling module to resize the weight maps back to their original resolution. The module uses guidance information provided by

X_{k}

,

X_{K}^{l}

, and

W_{k}^{l}

to generate higher-resolution weight maps with more detail. Traditional guided filters [32] are non-parametric modules that compute similarly across all different tasks. However, due to the significant differences between tasks, a single non-parametric guided filtering layer cannot perform well across various scenarios.

In contrast, the DGF upsampling module introduces dilated convolutions and pointwise convolution blocks to replace average filters and local linear models. By replacing non-parametric operations with convolutional layers and introducing learnable parameters into the guided filtering layer, the DGF module can better adapt to complex industrial tasks. This adaptation allows for more flexible and effective processing of images with varying exposure and reflective characteristics, ultimately enhancing the fusion quality under diverse industrial imaging conditions.

Given the high cost and time consumption of obtaining detailed authentic images in industrial settings, this paper employs the MEF-SSIM loss to train the network unsupervised.

θ

represents the parameters within the network described in this paper.

min_{θ} L_{MEF - SSIM} ({X_{k}}, X^{H}) .

(7)

3.3. Training Storage Unit

In traditional unsupervised image fusion methods, intermediate results obtained during training are simply discarded. However, these intermediate results contain crucial information pertinent to the fusion task, such as pixel intensity distribution, structural similarity, and gradients. They can provide abundant supervised signals to guide the training process effectively. To further enhance the performance of unsupervised methods, this paper introduces a novel training storage unit that advocates self-evolving training. By designing a storage unit that collects outputs from a previous period and implementing a memory loss function that maximally utilizes the valid information from prior outputs, our approach fully leverages intermediate fusion results to improve fusion quality, thereby establishing an evolutionary training scheme.

As depicted in Figure 6, the training storage unit architecture proposed in this paper is divided into two parts: the stage image storage module and the memory loss unit module. Stage image storage can store fused images derived from the image input sequence of the previous training epoch and integrate them into the image sequence, which will participate in training in the subsequent epoch. Using the output from the previous epoch as the selection source for the current epoch’s fusion images ensures that the current epoch’s output is superior to, or at least equivalent to, the previous epoch output.

The memory loss unit also utilizes the training results of intermediate fusion to further supervise the cooperative fusion of images. It is anticipated that clues about image fusion contained in previous outputs will be fully exploited during the training process of the current period.

L_{pixel} (O, I) = {‖ O - I ‖}_{F}^{2},

(8)

L_{ssim} (O, I) = 1 - S S I M (O, I),

(9)

L_{grad} (O, I) = {‖ \nabla O - \nabla I ‖}_{F}^{2},

(10)

The

S S I M (\cdot)

[33] (Structural Similarity Index) represents the structural similarity between two images, with O and

O_{p r e}

denoting the output images of the current and previous epochs during the training process, respectively. It is assumed that for the first epoch,

O_{pre} = O

.

∥\cdot∥ F

stands for the Frobenius norm and ∇ represents the gradient operator. Therefore, this loss term can be expressed as:

L_{memory} = L_{pixel} (O, O_{pre}) + L_{ssim} (O, O_{pre}) + L_{grad} (O, O_{pre}) .

(11)

3.4. ICG Evaluation Metric

For industrial images, uneven illumination distribution affects subsequent tasks such as recognition and detection. Compared to natural images, industrial images require that illumination effects be removed as thoroughly as possible. However, existing evaluation metrics often consider illumination information as one factor in assessing image quality, thus failing to fully describe whether the image illumination is uniform. Therefore, this paper proposes a no-reference uneven illumination evaluation metric for industrial images. The illumination component of an image with uneven illumination is extracted using a multi-scale Gaussian function approach. Then, the average gradient of the illumination component is solved to evaluate the degree of uneven illumination. The form of the Gaussian function is as follows:

G (x, y) = λ exp (- \frac{x^{2} + y^{2}}{c^{2}}) .

(12)

In this equation, c represents the scale factor and

λ

is the normalization constant that ensures the Gaussian function

G (x, y)

satisfies the normalization condition. The estimated value of the illumination component can be obtained by convolving the Gaussian function with the original image. The result is as follows:

I (x, y) = F (x, y) G (x, y) .

(13)

Here,

F (x, y)

denotes the input image, and

I (x, y)

represents the estimated illumination component. Based on the Retinex theory, the choice of scale factor c in the Gaussian function determines the range of effect for the convolution kernel. A larger value of c expands the range of the Gaussian function’s convolution kernel, enhancing its ability to preserve the color tone and better extract the global characteristics of illumination. Conversely, a smaller value of c reduces the range of the Gaussian function’s convolution kernel, enabling better compression of the dynamic range and making the local characteristics of illumination more prominent.

This paper adopts a multi-scale Gaussian function approach to balance the extraction of global and local illumination characteristics. By utilizing Gaussian functions of different scales to extract the illumination components of the scene individually, followed by weighted processing, an estimation of the illumination components is obtained. The overall expression is as follows:

I (x, y) = \sum_{i = 1}^{N} ω_{i} [F (x, y) G_{i} (x, y)] .

(14)

In Equation (14),

I (x, y)

represents the illumination component value at point

(x, y)

, which is obtained by extracting and weighting the illumination components using multiple Gaussian functions of different scales.

ω_{i}

denotes the weight coefficient of the illumination component extracted by the i scale Gaussian function, with i ranging from 1 to N, representing the number of scales used.

The average grayscale value difference between adjacent blocks is minimal for images without illumination distortion. The more severe the uneven illumination, the more significant the difference. Therefore, the mean gradient of the illumination component is used to assess the degree of illumination unevenness. First, the image

I (x, y)

is divided into

M \times N

small rectangular blocks. For each

p a t c h (i, j)

,the gradient

g (i, j)

is defined as the sum of the absolute differences with its neighboring patches.

g (i, j) = \sum_{i = 1}^{k} | h (i, j) - h_{k} (i, j) |, k = 1, 2, \dots 8 .

(15)

The average gradient is defined as:

I C G = \frac{1}{M \times N} \sum_{i = 1}^{M} \sum_{i = 1}^{N} g (i, j) .

(16)

A higher ICG value indicates more uneven illumination in the image. As illustrated in Figure 7, this paper presents the illumination component images of different types of metal plates using various methods.

4. Experiment Results and Analysis

Given the current lack of an MEF dataset for smooth workpieces, this paper addresses the issue by creating an MEF dataset consisting of three different types of metals. To ensure a diverse and representative industrial scene dataset, we used the Hangzhou Hikrobot MV-CS016-10GM camera (resolution of 1408 × 1024) to collect data from steel, iron, and aluminum plates in static scenes (resolution of 720 × 540). For each sequence, the exposure value of the images was varied by adjusting the exposure time. The dataset comprised 80 scenes, each containing approximately 6 to 8 exposure images (K), totaling 600 images. Among these, 480 images were used for training and 120 for testing. We denote the number of images in the exposure sequence as K.

To evaluate the superiority of our method on natural image datasets as well, we conducted experiments using 360 exposure sequences from SICE part I. Following the training-testing split set by SICE [34], 302 samples were used for training, and the remaining 58 samples were used for evaluation. We employed ICG, SD, and running time for the industrial dataset as evaluation metrics. For the SICE dataset, MEF-SSIM, PSNR, SSIM, and MI were used as the primary evaluation metrics to compare with our method.

The model training was conducted on an Ubuntu 20.04 operating system within the following development environment: Python 3.7.16, PyTorch 1.7.7, and CUDA 11.7.99. The hardware configuration included an Intel Core i7-12700K CPU and an NVIDIA GeForce RTX 3090 (24GB) GPU.

This paper conducts both qualitative and quantitative experiments on natural and industrial datasets and compares our method with eight previous MEF approaches. These approaches include traditional methods such as Mertens [1] and Li20 [23], as well as deep learning methods like DeepFuse [2], MEFNet [14], IFCNN [11], DPEMEF [10], TransMEF [16] and HoLoCo [7].

4.1. Industrial Dataset Experiment

The present study conducted qualitative and quantitative evaluations on an industrial image dataset, with the participation of six exposure images. In industrial imagery, we mainly focus on the quality of uneven illumination removal within the fused images. Therefore, this paper selected ICG and standard deviation (SD) as evaluation metrics for industrial images. The SD reflects the degree of pixel value dispersion within an image. A larger standard deviation indicates a wider range of variation in the pixel values of an image. A lower standard deviation is preferable for tasks involving dynamic enhancement and uneven illumination removal in industrial images. Additionally, in industrial settings, the real-time execution of algorithms is also a crucial factor in assessing algorithm performance. Hence, we conducted runtime testing of the algorithm on a GPU.

As demonstrated in Table 2, the method proposed in this paper outperforms other algorithms in terms of enhancing image dynamic range and removing uneven illumination. Despite a slight lag behind the MEFNet method in runtime testing, the approach presented in this paper still meets the real-time requirements of industrial applications and exhibits commendable performance.

This paper conducted qualitative testing on industrial images, as depicted in Figure 8. Compared to other algorithms, the proposed algorithm in this paper demonstrated excellent performance in terms of overall brightness uniformity of the images and successfully eliminated the uneven illumination in the images. To facilitate the observation of local image details, we have zoomed in on the red-boxed areas in the figure. We performed pixel distribution statistics separately for both local and global aspects of the fused images to more intuitively showcase the superiority of our approach. In Figure 9, we provide a comparative statistical analysis of pixel distribution for the locally zoomed in image. This enables a more precise representation of the concentration intervals of image grayscale values and the count of pixels for each grayscale level. Additionally, from a global perspective, we conducted pixel statistics on the blue line region in Figure 8, as illustrated in Figure 10. The pixel statistics approach aids in assessing the overall level of grayscale value variation in the image.

Based on Figure 9, it can be observed that the fusion results of the Mertens, MEFNet, and HoLoCo methods primarily distributed gray values in the overly bright range of 150–200. DeepFuse performed well in local zoom, but exhibited an overall lower grayscale value. The IFCNN method showed distribution across various grayscale value ranges, but maintained a certain proportion of pixels in underexposed and overexposed regions. The pixel peak distribution of the DPEMEF and TransMEF methods was between 50–100 and 100–150, and the overall brightness uniformity of the images were poor. Compared to the methods above, the proposed method in this paper exhibited concentrated grayscale values within the suitable brightness range of 100–150 in the data of metal plates, with appropriate overall brightness effects.The regions with grayscale values below 50 in the industrial image represent the foreground area, the dot-like region of the image’s QR code. Through comparative observation, our method demonstrated relatively lower grayscale values in the foreground area, which helped to accentuate the image’s foreground region.

For the global pixel distribution, we aimed for a uniform variation in the grayscale values of the image background, without any abrupt changes in brightness. Based on the grayscale value distribution in the image’s linear region shown in Figure 10, it is evident that in the DeepFuse method, the majority of pixels’ grayscale values were concentrated below 50, making it challenging to discern details in darker areas. The Mertens, Li20, and MEFNet methods showed a noticeable rise in grayscale values in high-light regions, whereas the IFCNN, DPEMEF, and TransMEF methods displayed drastic fluctuations. In contrast, there wass no significant upward trend in the grayscale curve of our method. It showed minimal grayscale value fluctuations, indicating a uniform distribution of grayscale values in the processed image and effective removal of uneven illumination. At the boundary between the foreground and background of the image (pixel position = 395, 438), there was a sharp decline in the grayscale values. Through comparison, it has been observed that our method effectively preserves the contrast between the foreground and background while simultaneously removing uneven illumination. In summary, the proposed method exhibits a smooth variation in the grayscale value curve of the images. It achieves a high contrast between the foreground and background, improves image dynamic range, and effectively eliminates the problem of uneven illumination.

We conducted qualitative analysis on several common metal sheet images to further evaluate our method, as illustrated in Figure 11. Our method exhibited optimal performance across images of different types of metal sheets. Furthermore, in Figure 7, we extracted the illumination components of selected images (first three rows) from Figure 11 using a multi-scale Gaussian function for further analysis. It is clear from the analysis that our method demonstrated overall low gradient values in the images, with uniform illumination distributed throughout the entire image. This further substantiates our method’s capability to remove uneven illumination and enhance image dynamic range.

4.2. Experiments with Natural Datasets

This study chose a subset of exposure sequences from the SICE dataset as the test set, with the number of exposed images (k) set to 2. We compared the fusion results of our approach with several other fusion methods, utilizing various types of image evaluation metrics to assess the quality of the fused images comprehensively. Among these metrics, MEF-SSIM and SSIM are derived from image structural information and PSNR relies on pixel statistics, while mutual information (MI) is grounded in information theory.

As shown in Table 3, the average values of these metrics in our method were 0.9630, 0.9355, 20.99, and 4.5725, respectively. The results indicate that our method outperforms others in preserving image information and providing higher image quality. Additionally, we conducted qualitative comparisons of different methods on three typical image sequences from the SICE dataset, as depicted in Figure 12.

As depicted in the local detail images in Figure 12, the fusion results of Mertens, TransMEF, and HoLoCo exhibited a natural visual effect but lacked detail and texture, with poor enhancement in darker areas. DeepFuse’s fused images suffered from low detail retention and exhibit deficiencies in effectively merging a wide range of exposures. MEFNet, IFCNN, and HoLoCo fusion results exhibited color distortions and relatively sparse detail information. In contrast, the method proposed demonstrated the best performance in enhancing brightness in darker areas and suppressing brightness in overexposed regions, exhibiting rich details, natural colors, and complete information retention.

4.3. Algorithmic Ablation Experiments

To further validate the effectiveness of each algorithm module, this paper conducted comparative studies on ablation experiments for the attention module, training storage unit, and DGF modules within an industrial image dataset. Table 1 shows that when both the attention module and training storage unit module are employed simultaneously, all metrics and visual effects achieve optimal performance. This result is primarily attributed to the fact that both of them enhance the information exchange among different exposure images. Additionally, with the addition of the DGF upsampling module, further enhancements in the MEF-SSIM metric were achieved by introducing learnable parameters to guide the filter, thereby demonstrating the effectiveness of the DGF algorithm module.

4.4. Running Time for Different Number of Exposure Sequences

Our method can flexibly handle different numbers of exposure sequence images to accommodate the complex and diverse nature of industrial scenes. We conducted experiments on several common K values. We assessed the runtime on different platforms, including a CPU and GPU, and compared it with six deep learning-based MEF methods, as shown in Table 4. The resolution of the evaluated images was 2K, and “-” indicates exclusion from the test due to DeepFuse’s strict input settings.

As shown in Table 4, our approach achieved similar runtime performance to MEFNet in CPU environments with different values of K. In GPU environments, our method’s runtime was slightly higher than MEF-Net’s, yet it still met the real-time requirements of industrial applications. In contrast, other methods exhibit substantial performance overhead. Through experimentation, it is observed that our method’s runtime across various common K values can meet the real-time demands of industrial scenarios.

5. Discussions

In this paper, a multi-exposure image fusion network based on a multidimensional attention mechanism with training storage units is proposed for HDR reconstruction of industrial images. Qualitative and quantitative experimental comparisons with existing methods show that the method achieves better dynamic range enhancement and uneven illumination removal, and effectively recovers information from overexposed regions. In addition, the experimental results on different datasets verify that the method has good generalisation ability. However, for industrial images made of plastic materials, the performance of our method is not satisfactory, but this is also an unsolved problem in the industry. In the future, we would like to explore a Neural Radiance Field [35,36,37] and introduce related types of tasks to further enhance the generalisation ability of the network. On the other hand, we will investigate the use of using quality assessment [38] methods in the reconstruction process to help better reconstruct images.

6. Conclusions

This paper introduces a high dynamic range image enhancement algorithm tailored for industrial settings, effectively addressing issues such as low dynamic range and uneven illumination on smooth-surfaced workpieces. It applies to static industrial images of arbitrary spatial resolutions and exposure frequencies. The algorithm enhances the network’s focus on critical information by integrating channel attention and spatial attention into the backbone network. The training storage unit effectively utilizes the valuable information obtained during network training, enabling the network to remove uneven illumination in the fused images and form an evolutionary training scheme. A novel no-reference evaluation metric is proposed to address the lack of evaluation metrics for uneven illumination. Additionally, we have captured a new dataset of multi-exposure fusion tasks for metal workpieces using industrial cameras. Our method can process 2K images on a GPU in less than 4 ms, outperforming other algorithms in image detail preservation and uneven illumination removal. Comprehensive experiments were conducted to demonstrate the effectiveness of MEF-AT.

Author Contributions

Conceptualization, C.S. and X.R.; Methodology, C.S., X.R., G.Z. and C.Z.; Software, C.S. and X.R.; Validation, X.R.; Formal analysis, X.R., G.Z. and S.Y.; Investigation, X.R., S.Y. and C.Z.; Resources, C.Z.; Data curation, Y.M.; Writing—original draft, X.R.; Writing—review & editing, C.S. and X.R.; Visualization, X.R.; Supervision, C.S.; Project administration, Y.W.; Funding acquisition, C.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Shandong Province Science and Technology SMEs Innovation Capability Enhancement Project (No. 2023TSGC0576 and No. 2023TSGC0605).

Data Availability Statement

The data that support the findings of this study are available from the author, Xiangqun Ren, 202283230026@sdust.edu.cn upon reasonable request.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Mertens, T.; Kautz, J.; Van Reeth, F. Exposure fusion: A simple and practical alternative to high dynamic range photography. Comput. Graph. Forum 2009, 28, 161–171. [Google Scholar] [CrossRef]
Ram Prabhakar, K.; Sai Srikar, V.; Venkatesh Babu, R. Deepfuse: A deep unsupervised approach for exposure fusion with extreme exposure image pairs. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4714–4722. [Google Scholar]
Xu, H.; Ma, J.; Zhang, X.P. MEF-GAN: Multi-exposure image fusion via generative adversarial networks. IEEE Trans. Image Process. 2020, 29, 7203–7216. [Google Scholar] [CrossRef]
Liu, J.; Shang, J.; Liu, R.; Fan, X. Attention-guided global-local adversarial learning for detail-preserving multi-exposure image fusion. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 5026–5040. [Google Scholar] [CrossRef]
Zhang, H.; Xu, H.; Xiao, Y.; Guo, X.; Ma, J. Rethinking the image fusion: A fast unified image fusion network based on proportional maintenance of gradient and intensity. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12797–12804. [Google Scholar]
Zhang, H.; Ma, J. SDNet: A versatile squeeze-and-decomposition network for real-time image fusion. Int. J. Comput. Vis. 2021, 129, 2761–2785. [Google Scholar] [CrossRef]
Liu, J.; Wu, G.; Luan, J.; Jiang, Z.; Liu, R.; Fan, X. HoLoCo: Holistic and local contrastive learning network for multi-exposure image fusion. Inf. Fusion 2023, 95, 237–249. [Google Scholar] [CrossRef]
Liu, J.; Shang, J.; Liu, R.; Fan, X. Halder: Hierarchical attention-guided learning with detail-refinement for multi-exposure image fusion. In Proceedings of the 2021 IEEE International Conference on Multimedia and Expo (ICME), Shenzhen, China, 5–9 July 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–6. [Google Scholar]
Deng, X.; Zhang, Y.; Xu, M.; Gu, S.; Duan, Y. Deep coupled feedback network for joint exposure fusion and image super-resolution. IEEE Trans. Image Process. 2021, 30, 3098–3112. [Google Scholar] [CrossRef] [PubMed]
Han, D.; Li, L.; Guo, X.; Ma, J. Multi-exposure image fusion via deep perceptual enhancement. Inf. Fusion 2022, 79, 248–262. [Google Scholar] [CrossRef]
Zhang, Y.; Liu, Y.; Sun, P.; Yan, H.; Zhao, X.; Zhang, L. IFCNN: A general image fusion framework based on convolutional neural network. Inf. Fusion 2020, 54, 99–118. [Google Scholar] [CrossRef]
Xu, H.; Ma, J.; Jiang, J.; Guo, X.; Ling, H. U2Fusion: A unified unsupervised image fusion network. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 502–518. [Google Scholar] [CrossRef]
Ma, J.; Tang, L.; Fan, F.; Huang, J.; Mei, X.; Ma, Y. SwinFusion: Cross-domain long-range learning for general image fusion via swin transformer. IEEE/CAA J. Autom. Sin. 2022, 9, 1200–1217. [Google Scholar] [CrossRef]
Ma, K.; Duanmu, Z.; Zhu, H.; Fang, Y.; Wang, Z. Deep guided learning for fast multi-exposure image fusion. IEEE Trans. Image Process. 2019, 29, 2808–2819. [Google Scholar] [CrossRef]
Ma, K.; Zeng, K.; Wang, Z. Perceptual quality assessment for multi-exposure image fusion. IEEE Trans. Image Process. 2015, 24, 3345–3356. [Google Scholar] [CrossRef]
Qu, L.; Liu, S.; Wang, M.; Song, Z. Transmef: A transformer-based multi-exposure image fusion framework using self-supervised multi-task learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Pomona, CA, USA, 24–28 October 2022; Volume 36, pp. 2126–2134. [Google Scholar]
Wu, H.; Zheng, S.; Zhang, J.; Huang, K. Fast end-to-end trainable guided filter. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 1838–1847. [Google Scholar]
Li, S.; Kang, X.; Fang, L.; Hu, J.; Yin, H. Pixel-level image fusion: A survey of the state of the art. Inf. Fusion 2017, 33, 100–112. [Google Scholar] [CrossRef]
Li, S.; Kang, X.; Hu, J. Image fusion with guided filtering. IEEE Trans. Image Process. 2013, 22, 2864–2875. [Google Scholar]
Kou, F.; Li, Z.; Wen, C.; Chen, W. Multi-scale exposure fusion via gradient domain guided image filtering. In Proceedings of the 2017 IEEE International Conference on Multimedia and Expo (ICME), Hongkong, China, 10–14 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1105–1110. [Google Scholar]
Goshtasby, A.A. Fusion of multi-exposure images. Image Vis. Comput. 2005, 23, 611–618. [Google Scholar] [CrossRef]
Ma, K.; Li, H.; Yong, H.; Wang, Z.; Meng, D.; Zhang, L. Robust multi-exposure image fusion: A structural patch decomposition approach. IEEE Trans. Image Process. 2017, 26, 2519–2532. [Google Scholar] [CrossRef] [PubMed]
Li, H.; Ma, K.; Yong, H.; Zhang, L. Fast multi-scale structural patch decomposition for multi-exposure image fusion. IEEE Trans. Image Process. 2020, 29, 5805–5816. [Google Scholar] [CrossRef]
Ma, K.; Duanmu, Z.; Yeganeh, H.; Wang, Z. Multi-exposure image fusion by optimizing a structural similarity index. IEEE Trans. Comput. Imaging 2017, 4, 60–72. [Google Scholar] [CrossRef]
Xu, H.; Haochen, L.; Ma, J. Unsupervised multi-exposure image fusion breaking exposure limits via contrastive learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Montréal, QC, Canada, 8–10 August 2023; Volume 37, pp. 3010–3017. [Google Scholar]
Qu, L.; Liu, S.; Wang, M.; Song, Z. Rethinking multi-exposure image fusion with extreme and diverse exposure levels: A robust framework based on Fourier transform and contrastive learning. Inf. Fusion 2023, 92, 389–403. [Google Scholar] [CrossRef]
Choi, W.; Cha, Y.J. SDDNet: Real-time crack segmentation. IEEE Trans. Ind. Electron. 2019, 67, 8016–8025. [Google Scholar] [CrossRef]
Kang, D.H.; Cha, Y.J. Efficient attention-based deep encoder and decoder for automatic crack segmentation. Struct. Health Monit. 2022, 21, 2190–2205. [Google Scholar] [CrossRef] [PubMed]
Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. arXiv 2015, arXiv:1511.07122. [Google Scholar]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
He, K.; Sun, J.; Tang, X. Guided image filtering. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 35, 1397–1409. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]
Cai, J.; Gu, S.; Zhang, L. Learning a deep single image contrast enhancer from multi-exposure images. IEEE Trans. Image Process. 2018, 27, 2049–2062. [Google Scholar] [CrossRef] [PubMed]
Huang, X.; Zhang, Q.; Feng, Y.; Li, H.; Wang, X.; Wang, Q. Hdr-nerf: High dynamic range neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 18398–18408. [Google Scholar]
Mildenhall, B.; Hedman, P.; Martin-Brualla, R.; Srinivasan, P.P.; Barron, J.T. Nerf in the dark: High dynamic range view synthesis from noisy raw images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 16190–16199. [Google Scholar]
Cui, Z.; Gu, L.; Sun, X.; Ma, X.; Qiao, Y.; Harada, T. Aleth-nerf: Illumination adaptive nerf with concealing field assumption. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 1435–1444. [Google Scholar]
Sun, W.; Min, X.; Tu, D.; Ma, S.; Zhai, G. Blind quality assessment for in-the-wild images via hierarchical feature fusion and iterative mixed database training. IEEE J. Sel. Top. Signal Process. 2023, 17, 1178–1192. [Google Scholar] [CrossRef]

Figure 1. Uneven illumination on metal workpieces.

Figure 2. Framework diagram of the MEF-AT network structure.

Figure 3. Framework diagram of the spatial attention mechanism.

Figure 4. Visualization of weight maps produced by metal workpieces. For weight maps, color map from deep blue to red represents value range [0, 1].

Figure 5. Dilated Convolutions Module with Spatial Attention.

Figure 6. Training Storage Unit framework diagram.

Figure 7. Qualitative comparison of light component maps.

Figure 8. Qualitative comparison of different methods on industrial datasets.

Figure 9. Comparative statistics of different methods for pixel distribution in locally magnified images.

Figure 10. Statistics on the distribution of pixels in the blue line region for different methods.

Figure 11. Qualitative comparison of different methods on different kinds of industrial images.

Figure 12. Qualitative comparison of different methods on natural datasets.

Table 1. Ablation study of the proposed attention module, training storage unit, and DGF modules on our dataset.

Attention Module	Training Storage Unit	DGF Module	MEF-SSIM
			0.9449
✓			0.9654
	✓		0.9558
		✓	0.9642
✓	✓		0.9670
✓		✓	0.9688
	✓	✓	0.9674
✓	✓	✓	0.9695

Bold in the table is the optimal value.

Table 2. The ICG, SD, and running time of all methods on our dataset.

Metrics	ICG	SD	Running Time (ms)
Mertens	1.566	67.60	1256
Li20	1.651	79.20	1050
Deepfuse	1.693	50.23	417
MEF-Net	1.666	61.78	2.15
IFCNN	1.815	58.01	2348
DPEMEF	1.563	54.96	169
TransMEF	1.607	51.01	43,378
HoLoCo	1.574	64.48	1149
Ours	1.494	46.52	4.23

Table 3. The PSNR, SSIM, MI, and MEF-SSIM of all methods on SICE dataset.

Metrics	MEF-SSIM	SSIM	PSNR	MI
Mertens	0.9410	0.8223	18.41	4.0270
Li20	0.9273	0.8745	17.84	3.9015
Deepfuse	0.7407	0.7543	12.41	3.8416
MEF-Net	0.9576	0.9322	19.70	4.2516
IFCNN	0.9314	0.9243	20.66	4.3726
DPEMEF	0.9568	0.9314	19.85	4.3948
TransMEF	0.9309	0.9337	19.69	4.2875
HoLoCo	0.9613	0.9341	20.38	4.3496
Ours	0.9630	0.9355	20.99	4.5725

Table 4. Running time of MEF methods on different K settings in milliseconds (ms).

	Methods	Exposure Numbers (K)
	Methods	2	3	4
CPU	Deepfuse	4886	-	-
	IFCNN	6090	8340	11,140
	MEFNet	80.23	82.45	84.79
	DPEMEF	1536	-	-
	TransMEF	204,835	-	-
	HoLoCo	7348	-	-
	Ours	83.42	83.79	86.61
GPU	Deepfuse	426	-	-
	IFCNN	1050	1225	1479
	MEFNet	1.66	1.71	1.75
	DPEMEF	174	-	-
	TransMEF	43,574	-	-
	HoLoCo	1248	-	-
	Ours	3.76	3.83	3.94

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shi, C.; Ren, X.; Mo, Y.; Zhang, G.; Yan, S.; Wang, Y.; Zhu, C. Real-Time High Dynamic Equalization Industrial Imaging Enhancement Based on Fully Convolutional Network. Electronics 2025, 14, 547. https://doi.org/10.3390/electronics14030547

AMA Style

Shi C, Ren X, Mo Y, Zhang G, Yan S, Wang Y, Zhu C. Real-Time High Dynamic Equalization Industrial Imaging Enhancement Based on Fully Convolutional Network. Electronics. 2025; 14(3):547. https://doi.org/10.3390/electronics14030547

Chicago/Turabian Style

Shi, Chenbo, Xiangqun Ren, Yuanzheng Mo, Guodong Zhang, Shaojia Yan, Yu Wang, and Changsheng Zhu. 2025. "Real-Time High Dynamic Equalization Industrial Imaging Enhancement Based on Fully Convolutional Network" Electronics 14, no. 3: 547. https://doi.org/10.3390/electronics14030547

APA Style

Shi, C., Ren, X., Mo, Y., Zhang, G., Yan, S., Wang, Y., & Zhu, C. (2025). Real-Time High Dynamic Equalization Industrial Imaging Enhancement Based on Fully Convolutional Network. Electronics, 14(3), 547. https://doi.org/10.3390/electronics14030547

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Real-Time High Dynamic Equalization Industrial Imaging Enhancement Based on Fully Convolutional Network

Abstract

1. Introduction

2. Related Work

3. Proposed Method

3.1. Multi-Dimensional Attention Mechanism

3.1.1. Channel Attention Mechanism

3.1.2. Dilated Convolutions Module with Spatial Attention

3.2. Unsupervised Learning in Industrial Settings

3.3. Training Storage Unit

3.4. ICG Evaluation Metric

4. Experiment Results and Analysis

4.1. Industrial Dataset Experiment

4.2. Experiments with Natural Datasets

4.3. Algorithmic Ablation Experiments

4.4. Running Time for Different Number of Exposure Sequences

5. Discussions

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI