Infrared Image Adaptive Enhancement Guided by Energy of Gradient Transformation and Multiscale Image Fusion

: The detail enhancement and dynamic range compression of infrared (IR) images is an important issue and a necessary practical application in the domain of IR image processing. This paper provides a novel approach to displaying high dynamic range infrared images on common display equipment with appropriate contrast and clear detail information. The steps are chieﬂy as follows. First, in order to protect the weak global details in different regions of the image, we adjust the original normalized image into multiple brightness levels by adaptive Gamma transformation. Second, each brightness image is decomposed into a base layer and several detail layers by the multiscale guided ﬁlter. Details in each image are enhanced separately. Third, to obtain the image with global details of the input image, enhanced images in each brightness are fused together. Last, we ﬁlter out the outliers and adjust the dynamic range before outputting the image. Compared with other conventional or cutting-edge methods, the experimental results demonstrate that the proposed approach is effective and robust in dynamic range compression and detail information enhancement of IR image.


Introduction
The infrared sensor can capture the thermal radiation emitted by the objects, which is less impacted by the dark condition. It is widely applied in detection, scene surveillance, reconnaissance, and navigation, etc. due to its ability to operate 24 h a day. However, IR images do have many obvious shortcomings, including low contrast, weak details, and blurred resolution, when compared with visible images, which may trigger much inconvenience when people observe the images. Consequently, infrared sensors in high dynamic range (>8 bit) are always applied in practical application to capture more details in these years. If displaying the HDR images on normal facilities (8 bit) directly, some information in the original image could not be represented. The procedure to achieve high-quality visualization of HDR infrared image must take the following problems into consideration. First and foremost, the dynamic range of the output should be mapped to be acceptable for the display device. Meanwhile, in order to take advantage of the HDR sensor and guarantee convenience for the following work, weak details should be enhanced. Last but not least, the output should be as visually pleasing as possible.
the infrared sensor can capture the thermal radiation emitted by the object, and it is less impacted by the dark condition or the dim weather. Therefore, the fusion of infrared and visible light images can guarantee more complex and detailed scene information. Similarly, each of multi-exposure images has its own unique details. If these details are well fused into one image, a high-quality image with multiple details can be produced.
This paper presents a novel approach based on adaptive transform and image fusion to overcome the problems above and display HDR infrared images on LDR display equipment with appropriate contrast and clear and abundant detail information. Inspired by the idea of image fusion, we transform the original image into multiple brightness by gamma transformation followed by multiscaled guided filter enhancement to keep and enhance the details in the entire image. In order to simplify the selection of parameters, we adapt the energy of the gradient (EOG) to guide the transformation, and the entropy is utilized to guide the multiscaled guided filter enhancement. The experimental results can prove that our method can achieve acceptable results with the fixed parameters. For typical HDR infrared images of various scenes, the effect of our method is robust.
The rest of this paper can be chiefly described as follows. Section 2 describes the fundamental theory and specific steps of our proposed method. In Section 3, our experiment comparison of the methods are described in detail. In Section 4, the conclusion of the paper is presented. Finally, the acknowledgment is made in Section 5.

Proposed Theory
The proposed framework is shown in Figure 1. First, in order to keep the weak global details in different areas, we adopt an adaptive Gamma transformation to adjust the original normalized image into multiple brightness levels. Second, the multiscale guided filter is utilized to decompose the images in different brightness individually into a base layer and detail layers. Details in each image are enhanced separately. Third, to obtain the image with global details of the original image, we fuse the enhanced images in each brightness together. Last, we filter out the bad pixels and adjust the dynamic range before outputting the image.

EOG Guided Gray Distribution Adjustment
Generally, the dynamic range of the HDR IR image (14 bit, 16 bit or more) far exceeds the dynamic range of the typical display range. Linear mapping is widely used due to its simplicity, but it is not suitable for most of the IR images whose gray levels are unevenly distributed. Different gamma correction parameters have different stretching effects on the image. A smaller gamma value can brighten the entire image and increase the contrast in darker areas; a larger gamma value can darken the entire image and increase the contrast in brighter areas. In order to keep the weak global details in different areas, we adjust the original normalized image into multiple brightness levels. However, manual selection of parameters with experience is required for each image in general, which generates inconvenience in the application.

Energy of Gradient
EOG is a well-established method for evaluating the clarity of the infrared image, due to its simplicity and accuracy. Energy of Gradient (EOG) is chosen as the evaluation criteria for the richness of image details. Let f (x, y) be the value of the pixel (x, y). The EOG can be calculated as follows, where im is an image, and

EOG Guided Gray Distribution Adjustment
Let S i = {S bright , S morderate , S dark } be set candidates of γ, and use the EOG function to evaluate and select the optimal value of γ for image in each interval. Thus, the original image is adjusted to several images with rich details of multiple brightness levels.
Denote the original normalized image as I input . Let S 1 (bright), S 2 (moderate), and S 3 (dim) be the three intervals: where EOG(image) is the energy of the gradient of the image. Through calculating, the value of γ which can produce the image with the maximum EOG in each brightness intervals is picked out adaptively. Moreover, the details in different areas of the image can be kept separately.

Multiscale Guided Filter Decomposition
He et al. [12] presented the guided image filter (GF), which not only benefits in edge-preserving but also be computationally efficient. Consequently, it is widely applied in the domain of image processing currently. We adopt the guided filter to decompose the image, in which the guide is identical to the filtering input I. The local linear model between the guide image I and the filter output Q is the critical assumption of the guided filter. Q is a linear transformation of I in a window ω k centered at the pixel k. where Here, |ω| is the number of pixels of ω k , µ k is the mean of I in ω k , σ 2 k is the variance of I in the ω k , and I k is the mean value of I in ω k .
When the area has rich details, the σ 2 k is relatively large, a k approaches 1, and b k tends to 0; the guided filter can keep the details in the local area. When the area has rich details, the σ 2 k is relatively small, a k approaches 0, and b k tends to p k ; the guided filter behaves as a weighted mean filter. ε is a parameter depended on the image information, which determines whether it is an edge should be preserved.
Therefore, the guided filter behaves as an edge-preserving smoothing operator. For simplicity, we refer to it as Q = GF(I). Q can be regarded as the base layer of the input image I, which contains the low-frequency information of the input image I, reflects the intensity change of the image on a large scale, while (I − Q) can be regarded as the detail layer, which contains the high-frequency information of input image I, reflecting the details of the image on a small scale.
As introduced above, we can obtain a smoothed base layer and a detail layer by the guided filter. In order to obtain more complete details, we could utilize the guided filter iteratively to obtain the multiscale smoothed images. Meanwhile, the multiscale detail images can be generated. The specific procedure can be described as follows.
The B i is the ith base layer while the D i is the ith detail layer. Then, we can decompose the original image as follows.
Specifically, in our study and experiment, as is shown in the Figure 1, we decompose the image into three layers: one base layer and two detail layers. Therefore, the multiscale decomposition progress can be described as

Adaptive Multiscale Guided Filter Composition
Each image can be decomposed into several layers; the details in the infrared image are typically weak. As is shown in Figure 2, the input image is one of the EOG guided transformed images, there are rich details captured by the HDR infrared sensors in the detail layers, but they are too weak to be observed.
The composition of the base layer and detail layers can be described as follows.
The layers are linearly accumulated together, the value of each coefficient α i gives expression to the importance of the ith detail layer, the more the information in the layer, larger the α i . In order to adaptively choose the value of α i , we adopt entropy to evaluate the richness of information in each layer: where p i is the probability of gray level i in the image.
C is a fixed coefficient and z is a very small number added to prevent the denominator from being 0. In Figure 2 the effect of the step is shown; the weak details are enhanced. The weak details in detail layers (b,c) become much clearer in (e,f). What needs to be explained is that the figures of detail layers are stretched 10 times for better visibility; the details in fact are much weaker. The effect of the proposed multiscale guided filter enhancement in this section can also be obviously reflected in Figure 3. There are two groups of images: panels (a,c) are results of the proposed method without the multiscale guided filter enhancement, in which the information is ambiguous, while panels (b,d) are the results of proposed method, which is much more visually comfortable.

Image Fusion
Inspired by the research hotspots including infrared and visible light images fusion and the fusion of multi-exposure images, which aim at fusing details in different images of a same scene to obtain an image with rich information, we fuse the images generated by previous steps. We adopt a method [29] with clear mathematical principles and high computational efficiency.
Through the steps above, a set of enhanced images in different brightness from the original image can be generated. We regard those images as multi-exposed images. In order to maintain local details well, we block the image in the fusion. Let {i n k } = {i n k |1 < k < K} be a set of column vectors of N 2 dimensions expanded from the blocks at the identical location of the source images that contains K images in multi-brightness. k means the block is from the kth image of the set, n means the nth location. The elements of the vector are value of each pixel in the image block. N is the side length of the image block. n corresponds to the position of the patch in the entire image. In order to express, analyze, and process the feature of a block, the vector i n k can be decomposed into three components, including signal strength p n k . Signal structure s n k and mean intensity µ i n k . The definitions of the components are chiefly as follows.
µ i n k is a vector, in which all the elements equal to the mean value of i n k .
Obviously, the contrast of an image block can be directly reflected by the signal strength component p n k = i n k − µ i n k . Generally speaking, the higher the contrast, the clearer the block or image. While the excessive contrast may trigger an unrealistic scene. Considering the input images (blocks) are undistorted, we could assume that the block has the largest contrast corresponds to the optimal visibility. Therefore, we choose the highest signal strength of all source image blocks as the signal strength of the fused image block: Determine the structure of the set of image blocks as a series of unit length vectors , and each one points to a direction in the vector space. The structure of the fused image block should represent the structures of the series of image blocks. Specifically, the relationship between the structure of fused block and the input blocks is defined in a simple but effective way: The definition of mean intensity of each block is where L(µ k , µ i n k ) is a weighting function which is controlled by the mean value of the kth whole image µ k and the mean value of the current block µ i n k in the kth image. L(·) should be relatively large when the block i n k is in a well-exposed region, and vice versa. To specify it, we adopted a two-dimensional Gaussian function: When signal strength p n , signal structure s n , and mean intensity µ n are computed, the new vector i n , which means the vector of the fused image block, can be defined and the block can be reconstructed: The blocks from the source sequence are obtained by a moving window with a fixed stride D. The pixels in the overlapping blocks are averaged to produce the final output of this step.

Outliers Filtering
Generally, there are still some outliers in the image, which are usually the brightest or darkest. Specifically, the maximum or minimum value in the image may be the outliers of the image, which affect the result of dynamic range adjustment. To cope with the problem, we have adopted a simple and effective method.
For instance, to avoid manual selection of the parameters with experience, we assume that there are two outliers in each row or in each patch with fixed size a. Take an image in size of a × b as the example. First, take every pixel value in the whole image in descending order. Then, pick out the ath value as the minimum value f min and the ath last value as the maximum value f max . Finally, adjust the image according to the effective values.

Experimental Settings
In order to measure the effect and the efficiency of the proposed method.Multiple 16 bits infrared images selected from typical scenes in databases FLIR Thermal Starter Dataset Version 1.3 [31] and LTIR Dataset Version 1.0 [32] were utilized for testing. The information of the images including image size and dynamic range is listed in Table 1. Meanwhile, four well-established methods (HE [1], CLAHE [5], MSR [23], and Reinhard [24]) and two novel approaches (AHPBC [6] and LEP [14]) were introduced for comparison. In those methods, we select the parameters as the authors advised or with experience. In Section 2.1.2 (Equation (4)), the three brightness intervals (bright S 1 , moderate S 2 , and dark S 3 ) are set as follows, S 1 = [0.1, 0.7], S 2 = (0.7, 1.5], and S 3 = (1. 5,8]. In Section 2.2.1 (Equation (6)), we set the value of ε be related to the variance of the entire image, because the effect of ε is determining whether it is an edge should be preserved. In our experiment, In Section 2.2.2 (Equation (11)), the values of α k determine the enhancement of the details. Throughout our experiment, we obtain two detail layers, and in Equation (13), C = 7 and z = 0.0001.
In Section 2.3 (Equation (18)), ρ determines the contribution of each block to the fused block's structure. Obviously, the contribution increases along with the strength of the block. Theoretically, ρ > 0 is feasible. We set ρ = 4 in our experiment. In Equation (21), σ g and σ l control the spread of the profile along µ k and l k . We set σ g = 0.2 and σ l = 0.5, a smaller value of σ g relative to σ l is important to generate results with good visual impression. Additionally, we set the size of blocks and the moving window stride: N = 11 and D = 2, as the author advised.
Throughout the paper, parameters mentioned above are adopted for typical infrared images with different characteristics. Results have demonstrated that the proposed method is capable of effectively enhancing IR image.

Visual Comparisons
To compare the effects of the methods intuitively, the enhanced results of the algorithms are given in Figures 4-10. We discuss the results in detail in Section 4.

Quantitative Comparison
Generally, good display performance means high clarity and even gray level distribution. In order to do the quantitative comparison, the Tenengrad [33], Entropy, Naturalness Image Quality Evaluator (NIQE) [34], and Perception-based Image Quality Evaluator (PIQE) [35] are introduced. They are widely used in evaluating the quality of an image.
The Tenengrad is written as S(x, y) = G x * I(x, y) + G x * I(x, y) (26) where I(i, j) denotes the gray value of the pixel (x, y), * denotes convolution, and n is the number of pixels in image. The Tenengrad is utilized to reflect the clarity of the whole image. Theoretically, the larger the Tenengrad value is, the higher the contrast, and the better the visibility of the details of the image. The calculation Tenengrad results are listed in Table 2. The even distribution of the pixel values of the image is another goal of image enhancement. And the entropy of an image is a common approach to reflect the pixel value's distribution. Specifically and theoretically, the larger the entropy value is, the more evenly the gray levels distributed. The entropy of an 8 bit image is written as where p i is the probability of gray level i in the image. The calculation Entropy results are listed in Table 3. NIQE measures the distance between the NSS-based features calculated from image to the features obtained from an image database used to train the model. The features are modeled as multidimensional Gaussian distributions. We calculate it by the Matlab function niqe(), which returns a non-negative scalar. Theoretically, the lower value of NIQE is, the better the perceptual quality of the image. The results are listed in Table 4.
PIQE calculates the no-reference quality score for an image through block-wise distortion estimation. We calculate it by the Matlab function piqe(), which returns a non-negative scalar in the range [0, 100]. The PIQE score is inversely correlated to the perceptual quality of an image. A low PIQE value indicates high perceptual quality and high PIQE value indicates low perceptual quality. The results are listed in Table 5.

Running Time Comparison
In order to do an efficiency comparison, the above-listed algorithms are tested, using MATLAB R2018b on a personal computer (Intel core i5-8250U; CPU:1.60 GHz; Memory: 8 GB). The size of the tested images are listed in Table 1. The calculation time results are listed in Table 6.

Discussion
Image group Figure 4 is an example of infrared images of rich scene information including human, bicycles, benches, ground, and so on. HE and CLAHE enhanced the contrast, while a large amount of local details lost. The AHPBC and MSR can enhance the details to some extent, while the dynamic range of the result image is so small that the visibility is poor. The result of Reinhard is visually comfortable but the some texture information is still ambiguous. LEP can enhance the image well in general, but generates a halo. Compared with other six approaches, our method meets the best performance.
Image group Figures 5 and 6 are examples of low contrast image, which contains many details about the texture. The dynamic range of the original IR image is so narrow that HE and CLAHE fail in the enhancement of the details, while some regions in their results are over-enhanced, and some noises are generated. AHPBC, MSR, and Reinhard can preserve the global contrast but has a relatively weak compatibility in the enhancement of the local details. LEP can successfully enhance the edge of the humans in the image, but some tiny details like the texture of the road are still dim. Our method yields the best enhancement results, producing global detail enhancement without noise generation. Compared with the original linear mapped image, the results of AHPBC and Reinhard are still blurred, even though there might be a great change in brightness. MSR can increase the contrast to some extent, but the effect on local detail enhancement is relatively weak. The noise in the result of LEP is obvious. It can be indicated by the comparison of the results in Figures 7 and 8 that the proposed method creates the most visually comfortable results, which reveal the details most fully.
Image group Figures 9 and 10 are examples of image with blurred details. Due to low contrast and weak details in the original image, HE and CLAHE not only fail to reproduce the details, but also generate noises. Objects such as trees, buildings, and pedestrians in AHPBC's results are blurred. The results of MSR and Reinhard are too dark to observe the information. Relatively, the results of LEP and the proposed method are visually pleasing; comparing with the results of LEP, noise in the proposed results is weaker.
The results of the Tenengrad for the test images are shown in the Table 2. In theory, the higher value of Tenengrad, the clearer the entire image. In accordance with the result of visual comparisons, the proposed method and LEP achieve higher Tenengrads.
As being reported in Table 3, comparing about the entropy, the proposed method and LEP have the robust result, and our method obtains the slightly better value than LEP does. Practically, there are more details in the results of our proposed method.
As being reported in Table 4, comparing about the NIQE, lower value of NIQE reflect better perceptual quality of image. the overall difference of AHPBC, MSR, Reinhard, LEP, and our proposed method is not obvious.
As being reported in Table 5, lower value of NIQE reflect better perceptual quality of image. In general, the proposed method and LEP achieve better results, but the average result of our proposed method is the best.
As being reported in Table 6, as our approach introduce multiscale analysis and image fusion, the calculation time of the proposed algorithm is much more than the conventional and famous methods HE, CLAHE, MSR, and Reinhard. Our method runs relatively slower than LEP, but more quickly than AHPBC. How to accelerate our algorithm is one of the key points of our future work. Hardware acceleration is one of our choices. After optimization, our method is very likely to process image on real-time application.
All in all, performance of the proposed algorithm is verified by experiments with images with various characteristics. The above analysis of the results shows that the proposed method has strength in detail enhancement of the HDR infrared image. The dynamic range compression and detail enhancement results are visually comfortable without excessively obvious noise.

Conclusions
In this paper, a novel high dynamic range infrared image enhancement method is introduced. This method is capable of compressing the dynamic range, adjusting the gray levels, and enhancing the details effectively. The proposed approach is mainly based on adaptive Gamma correction, multiscale guided filter, and image fusion. First, in order to keep the weak global details in different area, we adopt an EOG-guided Gamma transformation, which is adaptive to adjust the original normalized image into multiple brightness levels. Second, the multiscale guided filter is utilized iteratively to decompose each brightness image into a base layer and several detail layers. Details in each image are enhanced separately and composed adaptively. Third, to obtain the image with global details of the input image, enhanced image in each brightness is fused together. Last, we filter out the bad pixels and adjust the dynamic range before outputting the image. Tested on HDR IR images of different scenes with sundry details and background, the experiment result indicates that the proposed method can compress the dynamic range while higher the contrast, enhance the details effectively, and generate a visually pleasing result. It should be pointed out that in the step of guided transformation, the EOG function is just chosen to guarantee the simplicity and correctness of the algorithm. That is to say, the function could be changed according to the case with flexibility in the future work. Meanwhile, the method of the enhancement of the decomposed layers could also be extended, which also provides new point for the research.
Author Contributions: F.C. proposed the original idea, performed the experiment and wrote the original manuscript; J.Z. contributed to the direction, content, and revised the manuscript and funding acquisition; project administration J.C.; T.X. revised the manuscript; G.L. contributed to the content; and X.P. contributed to the content, revised the manuscript and project administration. All authors have read and agreed to the published version of the manuscript.