A More Effective Zero-DCE Variant: Zero-DCE Tiny

: The purpose of Low Illumination Image Enhancement (LLIE) is to improve the perception or interpretability of images taken in low illumination environments. This work inherits the work of Zero-Reference Deep Curve Estimation (Zero-DCE) and proposes a more effective image enhancement model, Zero-DCE Tiny. First, the new model introduces the Cross Stage Partial Network (CSPNet) into the original U-net structure, divides basic feature maps into two parts, and then recombines it through the structure of cross-phase connection to achieve a richer gradient combination with less computation. Second, we replace all the deep separable convolutions except the last layer with Ghost modules, which makes the network lighter. Finally, we introduce the channel consistency loss into the non-reference loss, which further strengthens the constraint on the pixel distribution of the enhanced image and the original image. Experiments show that compared with Zero-DCE++, the network proposed in this work is more lightweight and surpasses the Zero-DCE++ method in some important image enhancement evaluation indexes.


Introduction
Due to the interference of equipment and environmental factors such as insufficient lighting and limited exposure time, the final image is often taken in a suboptimal environment, which is affected by the backlight, uneven lighting, and low light interference, resulting in the aesthetic quality of these images being impacted, which is unsatisfactory for higher-level tasks such as cell classification [1] and semantic segmentation in the process of robot arm grasping [2].Therefore, the enhancement of low-light-level images is a research field worth exploring.
Traditional low light level enhancement methods include the histogram equalization method [3] and the Retinex model method [4].Based on the histogram equalization method, the gray value of pixels in the image is changed by gray operation, so that the transformed image histogram is more uniform and the gray is clearer than the original image, to achieve the purpose of image enhancement.The method based on the Retinex model considers that the image data acquired by the human eyes depends on the incident light and the reflection of the object's surface.Usually, the incident light component can be obtained after filtering the original image signal, and then the reflection component can be solved through the mathematical relationship between the three variables to obtain the purpose of image enhancement.Although these traditional algorithms can achieve the effect of image enhancement, it is difficult to suppress the noise information generated in the process of image enhancement, resulting in the poor usability of the enhanced image.
With the development of deep learning, learning-based methods have been applied to image enhancement, including supervised learning (SL), reinforcement learning (RL), un-supervised learning (UL), zero sample learning (ZSL), and semi-supervised learning (SSL).Unsupervised learning and zero sample learning can directly learn from unlabeled samples, and the model can learn more generalized feature expressions from data.The model training in this work inherits the series work of Zero-DCE [5,6], which is different

•
The CSPNet structure is introduced into the original U-net structure, which can reduce the amount of computation and achieve a richer gradient combination.At the same time, except for the last layer, the Ghost module is used to replace the depth separable convolution, which further reduces the size of the image enhancement model.

•
The channel consistency loss is introduced into the non-reference loss: using KL divergence to enhance the consistency between the original image and the enhanced image on the difference between channels.
Section 2 introduces the overall architecture of Zero-DCE Tiny and the non-reference loss function used.Section 3 introduces the parameter setting of the Ablation Experiment and the comparison of relevant experimental results.Finally, this work compares the new method with Zero-DCE and Zero-DCE++ methods in sensory and quantitative aspects and tests the effect of each method in the downstream application.

Related Works
In this section, we mainly focus on the relevant work of zero sample learning in the field of image enhancement and summarize some commonly used model lightweight methods.

Zero Sample Learning for Image Enhancement
Zhang et al. [14] proposed a zero-order learning method that uses Exposure Correction Network (ExCNet) for backlight image restoration.It first uses a depth network to estimate the S-curve.Zhu et al. [15] proposed a three-branch CNN, called RRDNet, to repair the underexposed image by decomposing the input image into illuminance, reflectivity, and noise.Several kinds of loss functions are specially designed to drive zero-order learning.Zhao et al. [16] performed Retinex decomposition through a neural network and then used the RetinexDIP model based on Retinex to enhance low illumination images.Inspired by deep image priority (DIP) [17], RetinexDIP takes randomly sampled white noise as input, generates reflection components and illumination components through Retinex decomposition, and then uses the obtained illumination components for image enhancement.The training process uses some losses as constraints, such as reflection loss.Liu et al. [18] proposed a new principled framework to search for a lightweight priority architecture for low-light-level images in real scenes by injecting knowledge of low-light-level images.Zero-DCE [5] regards light enhancement as a curve estimation task for images.It takes low-light images as input and generates high-order curves as its output.These curves are used to adjust the input dynamic range at the pixel level to obtain an enhanced image.In addition, a fast and lightweight version called Zero-DCE++ [6] is proposed.Because the mapping from image to curve only needs a lightweight network, it realizes fast estimation.

Model Lightweight Method
Howard et al. [19] proposed the MobileNet network.In this network, the depth separable convolution is used to replace the ordinary convolution for the first time.The depth separable convolution is mainly composed of the depthwise convolution and the pointwise convolution.The depthwise convolution uses convolution to check the input features and convolute them respectively according to the channel to obtain the spatial information of the features, and the pointwise convolution uses 1 × 1 to obtain the information between different channels in the feature and achieve the lightweight effect through this combination method.In the ShuffleNet [20], the feature map obtained by group convolution was randomly and uniformly scrambled in deep separable convolution on the channel, and then a group convolution operation was carried out to replace the pointwise convolution operation, which also solved the problem of the lack of information exchange between different groups in the training process, as well as maintained the feature extraction ability of the neural network while reducing the weight.Han K et al. [21] proposed the GhostNet to solve the problem of traditional convolution containing a large amount of redundant information when extracting features.First, conventional convolution is performed with fewer convolution check inputs to obtain output features with fewer channels.After linear transformation of these features, the ghost feature map is obtained, and then the final feature map is obtained by splicing with the output features.Chien Yao Wang et al. [22] proposed the CSPNet to solve the incompatibility between deep separable convolution technology and some industrial IC designs.This network not only realizes richer gradient combinations but also reduces the calculation of the model.

Overall Architecture
This work inherits the method of image enhancement in the Zero-DCE++ paper [6], learns the mapping curve from a weak light image to a strong light image through a convolution neural network, and then uses the learned mapping curve to iteratively adjust the pixels of the original image for many times to achieve the purpose of adjusting the image in a large dynamic range.It is assumed that the enhancement curve parameter map A n (x) obtained through network learning is related to the coordinates of pixels.A corresponding enhancement curve will be applied to each pixel on the original image.The expression of the designed image enhancement is shown in Equation (1): where I represents the input image and n is the number of iterations.In this work, n is set to 8, which can achieve the relatively best image enhancement results.LE n (x) is an enhanced version of the last enhanced image LE n−1 (x), and A n (x) is a curve parameter mapping that has the same size as the given image.The process of image enhancement using Zero-DCE Tiny is shown in Figure 1.

DCE-Net Tiny
The original DCE-Net [5] used a simple CNN composed of seven convolution la It has a U-net structure.In the first six convolution layers, each convolution layer con

DCE-Net Tiny
The original DCE-Net [5] used a simple CNN composed of seven convolution layers.It has a U-net structure.In the first six convolution layers, each convolution layer consists of 32 convolution layers, the kernel size is 3 × 3 of which stride is 1, followed by the ReLU activation function.The last convolution layer consists of 32 convolution layers with a size of 3 × 3 of which stride is 1, followed by the Tanh activation function, which generates 24 curve parameter mappings for eight iterations, in which each iteration generates three curve parameter mappings for three channels (i.e., RGB channels).The downsampling and batch normalization layers that destroy the relationship between adjacent pixels are discarded.Later, in Zero-DCE++ [6], the ordinary convolution processing was replaced by the deep separable convolution to reduce the amount of computation.Wherein the size of the depthwise convolution kernel is 3 × 3, the stride is 1, and when the pointwise convolution kernel size is 1 × 1, the stride is 1.At the same time, the output layer only generates 3 curve parameter maps and then reuses them in different iteration stages.This will reduce the risk of oversaturation.
The reason for choosing this U-net structure is that the U-net structure can effectively integrate multi-scale features, which are very important to achieve satisfactory low illumination enhancement.However, layer hopping connections used in U-net networks may introduce redundant feature information into the final results.Therefore, we need to design a network that effectively combines shallow features and deep features to achieve the purpose of being lightweight but effective.Inspired by CSPNet [22] and GhostNet [21], we designed the model shown in Figure 2 to replace the previous model structure.In the new structure, the basic feature maps are split into two parts through the channel.The former is directly connected to the output layer, and the latter will act as the DCEnet [5].With the exception of the last layer, which still uses deep separable convolution, other layers are replaced by Ghost modules.As shown in Figure 3, the Ghost module first uses 1 × 1 convolution to condense the input feature map to achieve cross-channel feature extraction.After obtaining the condensed feature, it uses a 3 × 3 convolution kernel to convolute layer-by-layer to obtain an additional feature map.Finally, it stacks the 1 × 1 convolution result and the layer-by-layer convolution result to obtain the final feature map.The feature map obtained in the two steps is processed by the ReLU activation function.In Zero-DCE Tiny, the last convolutional layer is still followed by the Tanh activation function, and the input is iterated 8 times through the curve parameter map to generate the final enhanced image.The new network structure uses the Ghost modules to replace the depthwise separa ble convolution operation, which greatly improves the utilization efficiency of the feature map and reduces the amount of calculation.At the same time, introducing the CSPNe structure realizes richer feature fusion and strengthens the learning ability of the network Moreover, the final amount of calculation is further reduced due to the segmentation o the base feature map.

Non-Reference Loss Functions
This work inherits the non-reference loss function used in the Zero-DCE++ paper [6] Spatial consistency loss is mainly used to maintain the difference between the adjacen areas between the input image and its enhanced version and to encourage the spatial con sistency of the enhanced image.The new network structure uses the Ghost modules to replace the depthwise separable convolution operation, which greatly improves the utilization efficiency of the feature map and reduces the amount of calculation.At the same time, introducing the CSPNet structure realizes richer feature fusion and strengthens the learning ability of the network.Moreover, the final amount of calculation is further reduced due to the segmentation of the base feature map.

Non-Reference Loss Functions
This work inherits the non-reference loss function used in the Zero-DCE++ paper [6].Spatial consistency loss is mainly used to maintain the difference between the adjacent areas between the input image and its enhanced version and to encourage the spatial consistency of the enhanced image.
where K is the number of the local region and Ω(i) represents a collection of adjacent areas centered at the region i.As shown in Figure 4  Exposure Control Loss is used to control the exposure level.L exp can be expressed as: where M is the number of nonoverlapping local regions of size 16 × 16, the average pixel value of a local region in the enhanced version is denoted as Y, and E indicates a good exposure level.The loss of color constancy is mainly used to reduce color deviation in enhanced images, which can be expressed as: where J p denotes the pixel average value of the p channel in the enhanced image, and (p, q) represents a pair of channels.
The loss of illumination smoothness keeps the adjacent pixel values monotonous, thus avoiding overexposure and underexposure, which can be expressed as: where N is the number of iterations, and ∇ x and ∇ y represent the horizontal and vertical gradient operations, respectively.Inspired by the spatial consistency loss, this work proposes a new non-reference loss: channel consistency loss.As a new loss, channel consistency loss mainly enhances the consistency between the original image and the enhanced image in the channel pixel difference through KL divergence, and suppresses the generation of noise information and invalid features to improve the image enhancement effect.The channel consistency loss can be expressed as: In this work, R, G, and B represent the color channels of the original image, R , G and B represent the three-color channels of the enhanced image, and KL divergence is used to represent the difference between the two distributions.If the difference between the two is small, the KL divergence is small.When the two distributions are consistent, the KL divergence value is 0.
The total loss can be expressed as: where W spa , W exp , W col , and W kl are the weights of the losses.

Results
To be consistent with the previous work [5,6], we also used 360 multiple exposure sequences from Part 1 of the SICE dataset [23] as our training dataset.We randomly divided 3022 images with different exposure levels in the Part 1 [23] subset into two parts (2422 images for training and 600 images for validation).The images were resized to 512 × 512 × 3. We implemented our framework on RTX3060 GPU using PyTorch.The batch size is 8.We used a Gaussian function with a mean of 0 and a standard deviation of 0.02 to initialize the convolutional neural network and used the Adam optimizer to optimize the network.The Adam optimizer uses default parameters and a constant learning rate.The weights W spa , W exp , W col , and W kl were set to 1, 10, 5, 1600, and 5 to balance the loss ratio.Network training 100 rounds in total.
We used some public datasets for testing, including LIME [24] (10 images), and DICM [25] (64 images).In addition, we also collected a total of 2300 low light/normal light images on the part2 subset of the SICE dataset as the test dataset, and all images were adjusted to 1200 × 900 × 3. images on the part2 subset of the SICE dataset as the test dataset, and adjusted to 1200 × 900 × 3.   We added the channel consistency loss to the original version of the non-reference loss function and performed ablation experiments.Figure 6 compares the sensory results of the test image: After adding the loss of spatial consistency, the enhanced image is more natural and the overall contrast distribution of the image is more balanced.As shown in Figure 6, the house is less affected by the halo, and the details are clearer.

Ablation Study of Backbone Network
For the new backbone network, we introduce the CSPNet network structure and set the number of feature maps in the base layer to 32.We divide the basic feature maps into two parts.The former is directly connected to the output layer, and the latter will act as the DCE-net [5].At the same time, we replace all depth separable convolutions outside the last layer with the Ghost module.
We added the channel consistency loss to the original ver loss function and performed ablation experiments.Figure 6 com of the test image: after adding the loss of spatial consistency, the natural and the overall contrast distribution of the image is mor Figure 6, the house is less affected by the halo, and the details ar

. Ablation Study of Backbone Network
For the new backbone network, we introduce the CSPNet n the number of feature maps in the base layer to 32.We divide th two parts.The former is directly connected to the output layer, the DCE-net [5].At the same time, we replace all depth separa the last layer with the Ghost module.
Table 1 shows the original network and three network pa network structure into three types: "Only CSPNet structure", "O "Zero-DCE Tiny".We mainly compare five parameters, namely parameters (Total params), the amount of memory required fo memory), the number of floating-point operations (Total Flops) cation and addition required for network reasoning (Total Madd read and write (Total MemR+W).It can be seen from Table 1 achieved a lighter effect on multiple indicators.However, since large number of group convolutions, resulting in more memo Madd" and "Total MemR + W" metrics are slightly higher than " However, the experiments show that "Only CSPNet structure" enhancement effect, so we finally choose to obtain better image en at the cost of certain memory occupation.Table 1 shows the original network and three network parameters.We divide the network structure into three types: "Only CSPNet structure", "Only Ghost module", and "Zero-DCE Tiny".We mainly compare five parameters, namely the number of network parameters (Total params), the amount of memory required for node reasoning (Total memory), the number of floating-point operations (Total Flops), the amount of multiplication and addition required for network reasoning (Total Madd), and the sum of memory read and write (Total MemR + W).It can be seen from Table 1 that Zero-DCE Tiny has achieved a lighter effect on multiple indicators.However, since the Ghost module uses a large number of group convolutions, resulting in more memory occupancy, the "Total Madd" and "Total MemR + W" metrics are slightly higher than "Only CSPNet structure".However, the experiments show that "Only CSPNet structure" will lead to a poor image enhancement effect, so we finally choose to obtain better image enhancement performance at the cost of certain memory occupation.We provide input of different sizes for Zero-DCE Tiny.Table 2 summarizes the statistical relationship between enhanced performance and input size.We also show some Electronics 2022, 11, 2750 9 of 14 results by modifying the size of the network input image, as shown in Figure 7.As shown in Figure 7 and Table 2, the downsampling input size has no significant impact on the enhanced performance, but significantly saves computing costs.As shown in Table 2, 6 × ↓ obtained the highest average PSNR value, but because 12 × ↓ is better in model efficiency, we use it as the default configuration for the new network.We provide input of different sizes for Zero-DCE Tiny.Table 2 summarizes the statistical relationship between enhanced performance and input size.We also show some results by modifying the size of the network input image, as shown in Figure 7.As shown in Figure 7 and Table 2, the downsampling input size has no significant impact on the enhanced performance, but significantly saves computing costs.As shown in Table 2, 6 × ↓ obtained the highest average PSNR value, but because 12 × ↓ is better in model efficiency, we use it as the default configuration for the new network.

Benchmark Evaluations
In this section, we compare the new method with the classical benchmark models in qualitative and quantitative experiments.Finally, the new image enhancement method's gain effect on object detection in the dark is tested.

Visual and Perceptual Comparisons
We selected some classical benchmark methods to compare them with our methods for visual and perceptual comparisons.The new method chooses Zero-DCE Tiny as the backbone network, and adds the spatial consistency loss to the non-reference loss for training and testing.Figure 8 shows the enhanced image effects of some test images obtained by different methods under the same conditions.We tested three CNN-based methods (RetinexNet [9], LightenNet [26], MBLLEN [8]) and one GAN-based method (En-lightenGAN [27]) to replicate the results using open-source code.
Figure 8 shows the results of our tests on the SICE dataset.For outdoor scenes, the LightenNet, the MBLLEN, and the EnlightenGAN find it difficult to achieve clear enhancement results for difficult backlight areas, such as the face part.For RetinexNet, there are many overexposure cases in the image, including the face part, with poor overall sensory effects.For indoor scenes, MBLLEN performs well visually, but it is too smooth, which may filter out the detailed features of the original image.For RetinexNet, the noise

Benchmark Evaluations
In this section, we compare the new method with the classical benchmark models in qualitative and quantitative experiments.Finally, the new image enhancement method's gain effect on object detection in the dark is tested.

Visual and Perceptual Comparisons
We selected some classical benchmark methods to compare them with our methods for visual and perceptual comparisons.The new method chooses Zero-DCE Tiny as the backbone network, and adds the spatial consistency loss to the non-reference loss for training and testing.Figure 8 shows the enhanced image effects of some test images obtained by different methods under the same conditions.We tested three CNN-based methods (RetinexNet [9], LightenNet [26], MBLLEN [8]) and one GAN-based method (EnlightenGAN [27]) to replicate the results using open-source code.
Figure 8 shows the results of our tests on the SICE dataset.For outdoor scenes, the LightenNet, the MBLLEN, and the EnlightenGAN find it difficult to achieve clear enhancement results for difficult backlight areas, such as the face part.For RetinexNet, there are many overexposure cases in the image, including the face part, with poor overall sensory effects.For indoor scenes, MBLLEN performs well visually, but it is too smooth, which may filter out the detailed features of the original image.For RetinexNet, the noise information in the image is amplified, resulting in a poor enhancement effect.For EnlightenGAN, the enhanced image shows a certain color deviation.For the Zero-DCE series methods, the effects of Zero-DCE and Zero-DCE tiny methods are very close.Compared with Zero-DCE++, the enhancement effect of the face region is better.shows the original input, subfigure (b-h) respectively show the enhanced image results through LightenNet [26], MBLLEN [8], RetinexNet [9], EnlightenGAN [27], Zero-DCE [5], Zero-DCE++ [6] and Zero-DCE Tiny methods.
In the experiment, we found that in the Zero-DCE series of methods, the image enhancement effect of Zero-DCE Tiny is softer, as shown in Figure 9.For areas with strong sunlight, the roof part and the cross part in the enhanced image of Zero-DCE Tiny are clearer.At the sensory level, it shows that the new method is conducive to suppressing the problem of excessive local exposure.[26], MBLLEN [8], RetinexNet [9], EnlightenGAN [27], Zero-DCE [5], Zero-DCE++ [6] and Zero-DCE Tiny methods.
In the experiment, we found that in the Zero-DCE series of methods, the image enhancement effect of Zero-DCE Tiny is softer, as shown in Figure 9.For areas with strong sunlight, the roof part and the cross part in the enhanced image of Zero-DCE Tiny are clearer.At the sensory level, it shows that the new method is conducive to suppressing the problem of excessive local exposure.shows the original input, subfigure (b-h) respectively show the enhanced image results through LightenNet [26], MBLLEN [8], RetinexNet [9], EnlightenGAN [27], Zero-DCE [5], Zero-DCE++ [6] and Zero-DCE Tiny methods.
In the experiment, we found that in the Zero-DCE series of methods, the image enhancement effect of Zero-DCE Tiny is softer, as shown in Figure 9.For areas with strong sunlight, the roof part and the cross part in the enhanced image of Zero-DCE Tiny are clearer.At the sensory level, it shows that the new method is conducive to suppressing the problem of excessive local exposure.[8], RetinexNet [9], EnlightenGAN [27], Zero-DCE [5], Zero-DCE++ [6] and Zero-D Tiny methods.

Quantitative Comparisons
Table 3 shows the quantitative comparison of several image enhancement me We compared three image enhancement indicators on the part2 test dataset [23] signal to noise ratio (PSNR), structural similarity (SSIM), and mean absolute error (M where the SSIM value represents the similarity between the results and the real res terms of structural characteristics.The PSNR value (in the case of a low MAE value cates that the results obtained are closer to the actual situation.RetinexNet [9], EnlightenGAN [27], Zero-DCE [5], Zero-DCE++ [6] and Zero-DCE Tiny methods.

Quantitative Comparisons
Table 3 shows the quantitative comparison of several image enhancement methods.We compared three image enhancement indicators on the part2 test dataset [23]: peak signal to noise ratio (PSNR), structural similarity (SSIM), and mean absolute error (MAE), where the SSIM value represents the similarity between the results and the real results in terms of structural characteristics.The PSNR value (in the case of a low MAE value) indicates that the results obtained are closer to the actual situation.It can be seen from Table 3 that by introducing a new backbone network and channel consistency loss, the PSNR index and SSIM index are improved compared with Zero-DCE++ (when the MAE value is low), wherein the SSIM index even exceeds Zero-DCE.It shows that the loss of channel consistency helps improve the structural consistency of the original image and the enhanced image.At the same time, from Tables 1 and 4, we know that compared with Zero-DCE++, our network is more lightweight and the reasoning speed is more friendly to practical applications.At the same time, due to the reduction of the number of parameters, during our training, it only takes 35 min to train the model with a single RTX3060 graphics card.So, it is also very friendly to the second training of developers.In general, the new model is a more efficient image enhancement model that achieves lightweight while maintaining a good image enhancement effect.

Object Detection in the Dark
To test the gain effect of the improved image enhancement algorithm in the downstream application, we selected the object detection task in the low light environment to test the new algorithm.We mainly tested on the ExDark dataset [28], which was built specifically for low-light-level image recognition tasks.The ExDark dataset consists of 7363 low-light images which are marked as 12 object classes.We only use its test dataset, take Zero-DCE Tiny as the preprocessing step, and then use the pretrain ResNet50 classifier through the ImageNet.In the weak light test set, Zero-DCE Tiny was used as pretreatment to improve the classification accuracy from 22.02% (top-1) and 39.46% (top-5) to 27.86% (top-1) and 44.86% (top-5) after enhancement.This provides side evidence that image enhancement using Zero-DCE Tiny not only produces pleasant visual effects but also provides richer image details for downstream applications, which is conducive to improving the application effect of downstream applications.

Discussion
The new model Zero-DCE Tiny proposed in this paper is a further lightweight product of the Zero-DCE series models.The comprehensive results of multiple test datasets show that the new model can deal with low-light images in various scenarios well.In Furthermore, compared with the Zero-DCE++ version, the efficiency of the model is further improved.Shorter reasoning time and lower training cost make the new model more friendly to practical applications; this will promote the application of the deep learning image enhancement model in real life, such as the night vision instrument.More importantly, the upstream benefits of image enhancement will benefit downstream applications, so that image processing algorithms such as image detection and semantic segmentation can better cope with images in complex environments.

Conclusions
We propose a new backbone network Zero-DCE Tiny to replace Zero-DCE++ for low illumination image enhancement.It can use zero reference images for end-to-end training.At the same time, compared with the original method, the backbone network used in this paper not only enhances the feature fusion but also reduces the amount of computation and memory consumption.This paper also tests the new non-reference loss to verify the effectiveness of channel consistency loss in improving image contrast balance.The results show that the new image enhancement method can better balance the image enhancement

( 2 where
K is the number of the local region and represents a collection of adjacen areas centered at the region i.As shown in Figure 4, Y and I are the average pixel value o the local region in the enhanced image and the original image.Our local region is set to 4 × 4.

Figure 4 .
Figure 4. Mapping of spatial consistency loss.Subfigure (a,b) respectively show the setting of regions in the original image and the enhanced image

Figure 4 .
Figure 4. Mapping of spatial consistency loss.Subfigure (a,b) respectively show the setting of local regions in the original image and the enhanced image.

4. 1 .
Ablation Study 4.1.1.Ablation Study of Each Loss We performed ablation experiments on each loss function; the results are shown in Figure 5.As shown in Figure 5c, lack of spatial consistency loss L spa reduces the image contrast, for example, the part of the cloud in the image.As shown in Figure 5d, lack of exposure control loss L exp causes image enhancement invalid.As shown in Figure 5e, When the loss of color constancy L col is discarded, serious color projection occurs.Finally, as shown in Figure 5f, removing the light smoothness loss L tv A leads to obvious artifacts.

4. 1 . 1 .
Ablation Study of Each LossWe performed ablation experiments on each loss function; the resu Figure5.As shown in Figure5c, lack of spatial consistency loss re contrast, for example, the part of the cloud in the image.As shown in F exposure control loss causes image enhancement invalid.As sho When the loss of color constancy is discarded, serious color proje nally, as shown in Figure5f, removing the light smoothness loss artifacts.

Figure 5 .
Figure 5. Ablation study of each loss.Subfigure (a) shows the original input, sub the enhanced image result through Zero-DCE Tiny method, subfigure (c-f) resp image enhancement results after removing spatial consistency loss, exposure con consistency loss and illumination smoothness loss.

Figure 5 .
Figure 5. Ablation study of each loss.Subfigure (a) shows the original input, subfigure (b) shows the enhanced image result through Zero-DCE Tiny method, subfigure (c-f) respectively show the image enhancement results after removing spatial consistency loss, exposure control loss, color consistency loss and illumination smoothness loss.

Figure 6 .
Figure 6.Sensory comparison of kl loss ablation experiment.Subfigure put, subfigure (b) shows the enhanced image result through Zero-DCE shows the image enhancement result after removing channel consisten

Figure 6 .
Figure 6.Sensory comparison of kl loss ablation experiment.Subfigure (a) shows the original input, subfigure (b) shows the enhanced image result through Zero-DCE Tiny method, subfigure (c) shows the image enhancement result after removing channel consistency loss.

Figure 7 .
Figure 7. Ablation study of input image size.Subfigure (a) shows the original input, subfigure (b) shows the enhanced image result when the image resolution is not changed, subfigure (c-e) show the image enhancement results after downsampling the input image.

Figure 7 .
Figure 7. Ablation study of input image size.Subfigure (a) shows the original input, subfigure (b) shows the enhanced image result when the image resolution is not changed, subfigure (c-e) show the image enhancement results after downsampling the input image.

Figure 9 .
Figure 9. Visual comparisons among the results generated by the Zero-DCE series of methods.Subfigure (a) shows the original input, subfigure (b-d) respectively shows the enhanced image results through Zero-DCE [5], Zero-DCE++ [6] and Zero-DCE Tiny methods.

Figure 9 .
Figure 9. Visual comparisons among the results generated by the Zero-DCE series of methods.Subfigure (a) shows the original input, subfigure (b-d) respectively shows the enhanced image results through Zero-DCE [5], Zero-DCE++ [6] and Zero-DCE Tiny methods.

Figure 9 .
Figure 9. Visual comparisons among the results generated by the Zero-DCE series of methods.Subfigure (a) shows the original input, subfigure (b-d) respectively shows the enhanced image results through Zero-DCE [5], Zero-DCE++ [6] and Zero-DCE Tiny methods.

Table 1 .
Parameter comparison of the backbone network; the parameter is computed for an image of size 256 × 256 × 3.

Table 2 .
Effect of different input image resolutions on image enhancement.The FLOPs (in G) are computed for an image of size 1200 × 900 × 3. "number × ↓" indicates the times of downsampling the input image.The test image is from the part2 dataset of SCIE.

Table 2 .
Effect of different input image resolutions on image enhancement.The FLOPs (in G) are computed for an image of size 1200 × 900 × 3. "number × ↓" indicates the times of downsampling the input image.The test image is from the part2 dataset of SCIE.

Table 3 .
Comparison of image enhancement indexes.

Table 3 .
Comparison of image enhancement indexes.

Table 4 .
Model running speed comparison.