A New Photographic Reproduction Method Based on Feature Fusion and Virtual Combined Histogram Equalization

A desirable photographic reproduction method should have the ability to compress high-dynamic-range images to low-dynamic-range displays that faithfully preserve all visual information. However, during the compression process, most reproduction methods face challenges in striking a balance between maintaining global contrast and retaining majority of local details in a real-world scene. To address this problem, this study proposes a new photographic reproduction method that can smoothly take global and local features into account. First, a highlight/shadow region detection scheme is used to obtain prior information to generate a weight map. Second, a mutually hybrid histogram analysis is performed to extract global/local features in parallel. Third, we propose a feature fusion scheme to construct the virtual combined histogram, which is achieved by adaptively fusing global/local features through the use of Gaussian mixtures according to the weight map. Finally, the virtual combined histogram is used to formulate the pixel-wise mapping function. As both global and local features are simultaneously considered, the output image has a natural and visually pleasing appearance. The experimental results demonstrated the effectiveness of the proposed method and the superiority over other seven state-of-the-art methods.


Introduction
In the real world, the luminance intensity of environmental scenes has a very wide range. From glimmer starlight to blazing sunlight, the luminance variation could span over ten orders of magnitude. The human visual system (HVS) has an outstanding ability to adapt and perceive about 5~6 orders of magnitude. Previously, most consumer cameras can only capture nearly 2~3 orders of luminance variation. Due to the limited dynamic range, the captured image severely suffers from detail loss, especially in highlight and shadow regions [1].
With advancements in optical sensing, high dynamic range (HDR) sensors that can capture the entire luminance range of a real-world scene have been developed [2]. For example, some latest high-end digital single-lens reflex cameras, or some devised sensors including multiple sensor elements with different exposure levels, are able to capture entire details of both dark and bright parts of the scene simultaneously. Although the cost of such HDR sensors is high, the captured HDR images can contain a larger bit depth of the image than 24-bit depth per pixel. Typically, HDR images are stored in floating point and require 32-bit depth per pixel.
Despite the increasing availability of HDR images, presenting HDR scenes on traditional low dynamic range (LDR) or standard dynamic range (SDR) displays remains problematic because an SDR display can only display 256 brightness levels. The high cost and the under-developing display technology remain major obstacles to the mass production of HDR display devices. To solve this problem, the photographic reproduction method becomes an essential technique and has been a prominent subject in the field of image sensing research or the image-based applications [3].

Related Work
A photographic reproduction method should not only provide contrast adjustment but also the preservation of the luminance, details, and even the vividness of the colors of the original image. According to their modeling characteristics, most photographic reproduction methods are usually divided into the following three categories. First, the global-based photographic reproduction methods perform the same mapping function for all image pixels based on the global features of the input HDR image. In other words, an input pixel value produces a specific output value, regardless of its position. Both linear mapping functions and different nonlinear mapping functions are used to mimic the HVS. Reinhard and Devlin [4] proposed a global-based reproduction method using electrophysiology and the photoreceptor model that reflects human perception. It is a fast algorithm; however, the detail preserving is not considered. Mantiuk et al. [5] presented a piece-wise linear reproduction method that minimizes visible distortions by considering the penalty using the HVS contrast perception model. To reproduce optimal scene-referred images on a range of display devices, their method can adjust the image content by considering the display characteristics and surrounding illumination. However, the local operation, such as sharpening, is not considered. In [6], Kim et al. proposed integrating a modified weighted least square filter with mapping, which can preserve detail and maintain the global contrast through the competitive learning neural network. Furthermore, the color shift issue is solved by utilizing the Helmholtz-Kohlrausch effect in the light correction stage. Gommelet et al. [7] designed a global-based reproduction method to address the optimal rate-distortion problem, which typically occurs in the reconstruction of an HDR signal. In [7], a novel distortion model was built that takes the image gradient into account. Khan et al. [8] proposed a reproduction method that uses the features of HVS and the threshold versus intensity curve to adjust the individual bin-widths of the input image histogram. A global-based reproduction function is built using the modified histogram; however, the local information of an image is not used during the reproduction.
The overall pros of the global-based photographic reproduction methods are the capability to preserve the global contrast of the original images, and in addition, the computational complexity is low. However, the global compression of the dynamic range is typically accompanied by the suppression of local contrast, which is the inevitable con of the global-based methods. Moreover, especially for the scenes with a large difference in brightness, the bright and dark regions are the most severely sacrificed by detail loss because, compared to the entire dynamic range, the local intensity variation in such regions are almost ignorable.
To overcome the shortcomings of global-based methods, local-based photographic reproduction methods are proposed. For the local-based reproduction methods, different reproduction functions are designed to each pixel based on the pixel position and its surrounding pixels. Different pixel positions can share the same intensity but may relate to different reproduced values. Ahn et al. [9] proposed a local-based reproduction method, which utilizes center/surround retinex theory to adapt the local contrast. This work demonstrates superior contrast enhancement in the scenes with low log-average luminance. Nevertheless, in [9], sometimes over-enhancement occurs in the details. Tan et al. [10] proposed a halo-free reproduction method, which applies using an L0 smoothing filter to mimic the adaptability of the HVS mechanism. Cyriac et al. [11] presented a two-stage reproduction method, where the first stage is a simple gamma curve for global compression, and the second stage is a local-based reproduction scheme using psychophysical models for the visual cortex. However, the local-based processing in the second stage tends to degrade the overall naturalness. Croci et al. [12] proposed a reproduction method to reproduce the HDR video, where a tone-curve-space alternative is used as a substitute for the temporal per-pixel coherency to increase the computational efficiency. Li et al. [13] presented a clustering-based content and color adaptive reproduction method, which divides the input image into patches. By analyzing the local content information (e.g., patch mean, color variation, and color structure), the patches are formed into clusters, and the tone mapping is performed via a more compact domain. However, the patch-based processing tends to ignore the image information as a whole.
Due to considering local features, the typical pro of the local-based photographic reproduction methods is to provide more local details in dark and bright regions compared with global-based reproduction methods. Therefore, the local-based methods are suitable to the scenes where a large brightness difference exists. However, they are vulnerable to artifacts such as halo effects and block effects, which cause an unnatural overall appearance. Moreover, the global contrast decreases because of the lack of global features.
As both global-and local-based photographic reproduction methods have some drawbacks, some researchers have proposed decomposition-based photographic reproduction methods, which use decomposition techniques to obtain large-scale image structures (i.e., base layers) and small-scale image textures (i.e., detail layers), and thus, global-based (and local-based) approaches can be used for specific layers accordingly. Gu et al. [14] designed a local edge-preserving filter that has a locally adaptive property. Based on the filter, a retinex-based approach is presented, where the image is decomposed into one base layer and three detail layers for the reproduction manipulation. Barai et al. [15] proposed an HVS-inspired reproduction method, where the saliency map information is fed into the guided filter for image decomposition. Then, global compression and detail enhancement are performed in the base layer and the detail layer, respectively. Mezeni and Saranovac [16] presented an enhanced reproduction method, which decomposes the image into base/detail layers. Then, the base layer is scaled partially in the linear domain and partially in the logarithmic domain, and a detail enhancement is performed in the dark areas of the detail layer. However, when generating the output image, it is hard to fuse individual layers suitably. Liang et al. [17] presented a hybrid layer decomposition model for photographic reproduction, where a sparsity term is used to model the piecewise smoothness of the base layer, and the other sparsity term with a structural prior is used to model the piecewise constant effect of the detail layer. Miao et al. [18] presented a macromicro-modeling reproduction method, in which multi-layer decomposition is utilized from the perspective of the micro model, and content-based global compression is utilized from the perspective of the macro model. The representative pro of the decomposition-based photographic reproduction methods is the flexibility to deal with different base and detail layers separately. However, the con of such a method is the difficulty of blending individual layers smoothly. That is, some blurs tend to occur in the final layer fusion process.
To exemplify the superiority of this work, Figure 1 shows a visual comparison among the global-based, local-based, decomposition-based methods, and our proposed method. In view of the abovementioned shortcomings of the global-based, local-based, and decomposition-based methods, this paper presents a new photographic reproduction method, which has the following three main advantages:

•
We propose using a hybrid histogram analysis scheme to extract mutually compatible global/local features in parallel, and a feature fusion scheme to construct the virtual combined histogram, which allows us to inherit the superiority of the global-based (and the local-based) methods smoothly. • Instead of performing late fusion (i.e., finally fusing all the processed layers as the decomposition-based methods do), the proposed virtual combined histogram equalization scheme can fuse global/local features in an earlier stage, which increases the naturalness of the output image. • Owning to the difference between the dark/bright regions and normal-luminance regions, we propose using the weight map to adaptively modify the weights locally in the feature fusion. • Instead of performing late fusion (i.e., finally fusing all the processed layers as th decomposition-based methods do), the proposed virtual combined histogram equalization scheme can fuse global/local features in an earlier stage, which increase the naturalness of the output image.

•
Owning to the difference between the dark/bright regions and normal-luminanc regions, we propose using the weight map to adaptively modify the weights locally in the feature fusion.

Figure 1.
A rough comparison among a global-based reproduction method [8] (bottom left), a local-based reproduction method [9] (top left), a decomposition-based reproduction method [18] (right), and our proposed method (middle). This example shows that our method inherits the advantages of both global-and local-based reproduction method, while avoiding the unnaturalness issue of decomposition-based reproduction method (due to processing different layers separately. Figure 2 shows the overall framework of this study. The proposed method i designed due to the strategy of improving the visibility of highlight and shadow areas while maintaining the global naturalness of the original image. In the pre-processing stage, a quick photographic reproduction method [19] is applied to the original HDR signal to obtain a pilot image ( ), which is a preliminary reproduced result with a simple global compression. Although the pilot image might suffer from detail loss locally it is good enough for us to distinguish the dark/bright regions from the normal-luminanc regions.  [8] (bottom left), a localbased reproduction method [9] (top left), a decomposition-based reproduction method [18] (right), and our proposed method (middle). This example shows that our method inherits the advantages of both global-and local-based reproduction method, while avoiding the unnaturalness issue of decomposition-based reproduction method (due to processing different layers separately. Figure 2 shows the overall framework of this study. The proposed method is designed due to the strategy of improving the visibility of highlight and shadow areas, while maintaining the global naturalness of the original image. In the pre-processing stage, a quick photographic reproduction method [19] is applied to the original HDR signal to obtain a pilot image (I Pilot ), which is a preliminary reproduced result with a simple global compression. Although the pilot image might suffer from detail loss locally, it is good enough for us to distinguish the dark/bright regions from the normal-luminance regions. Subsequently, we modify the work of [20] for highlight/shadow detection as follows. First, a specular-free image ( ) is defined as follows:

Pre-Processing for the Highlight/Shadow Detection
where the subscript ∈ , , indicates one of the RGB color channels, and the dark channel ( ) is defined as follows: ( , ) = min ( , ) .
(2)  Subsequently, we modify the work of [20] for highlight/shadow detection as follows. First, a specular-free image (I SF ) is defined as follows: where the subscript c ∈ R, G, B indicates one of the RGB color channels, and the dark channel (I Dark ) is defined as follows: As I SF is obtained by subtracting the minimum of RGB values from I Pilot , at least one of the three channels in I SF equals zero at each pixel position. Then, the modified specular-free (MSF) image is obtained by adding the average of the dark channel image to the specular-free image as follows: In [20], the difference between the MSF image and the pilot image can be used to detect the highlight regions in the image. With this feature, we find that if we multiply a correction parameter (θ) with the threshold and compare it with the pilot image, we can also detect shadow regions. Therefore, the proposed highlight/shadow detection scheme can be expressed as follows: where δ c = I Pilot c − I MSF c , θ = 0.8 is an empirical value (from our experiments, 0.75 ≤ θ ≤ 0.85 would produce accurate detection result), and the threshold value (thr) is obtained by applying the Otsu method in the pilot image. The Otsu method is an automatic way of creating binarization in image processing, and we find that it is suitable to determine the threshold in Equation (4). Figure 3 shows an example of the highlight/shadow detection results, which will be used as the estimate of the steering weight coefficients in the feature fusion stage (described later in Section 3.4).

Luminance Separation and Initial Logarithmically Normalization
As luminance information is mainly affected by the dynamic range, distinguishing luminance and chrominance from the original HDR signal is a common approach in photographic reproduction. In this study, luminance information was extracted by converting from RGB color space to CIE XYZ color space through the ITU-R BT.709 standard.
where indicates the input HDR signal and indicates the corresponding

Luminance Separation and Initial Logarithmically Normalization
As luminance information is mainly affected by the dynamic range, distinguishing luminance and chrominance from the original HDR signal is a common approach in photo- graphic reproduction. In this study, luminance information was extracted by converting from RGB color space to CIE XYZ color space through the ITU-R BT.709 standard.
where I H indicates the input HDR signal and L in indicates the corresponding luminance, which contains no chromatic information.
For different scenes, their dynamic range may vary quite greatly. To avoid the inconsistent dynamic range issue, the logarithmic function is a typical process to compress the luminance domain according to the following Weber-Fechner theory: where 10 −6 is added to avoid the singularity error occurring as the input pixel luminance equals zero. Furthermore, to match the property that perceived brightness is proportional to the logarithm of the actual luminance intensity, its logarithmically normalized value can be expressed as follows: where max L log (i, j) and min L log (i, j) represent the maximum and minimum values of L log (i, j), respectively. To adapt various lighting conditions, the normalized logarithmic luminance value (L log _n (i, j)), which always ranges between 0 and 1, are analyzed in the following steps.

Feature Extraction through Mutually Hybrid Histogram Analysis
The main challenge in photographic reproduction is to preserve both the global and local features of the original image, i.e., maintaining both the entire luminance balance and local detail information. In this study, the abovementioned features are neither the feature points used in computer vision, nor the feature vectors used in machine learning. The feature represents the general property of the entire image (i.e., global feature) or of individual local regions (i.e., local features) that are needed in the proposed photographic reproduction procedure.
Some proposed reproduction methods perform the global-based (and local-based) processes separately; in other words, they first apply a global luminance adaption and then perform local detail enhancement. However, we argue that this type of two-step strategy may not be the optimal solution because the goals of these two steps are inherently conflicting: one is to enhance the global features, and the other one is to enhance the local features.
As shown in Figure 4, we propose a parallel framework to simultaneously analyze the global histogram (constructed by the entire image) and local histogram (constructed by individual local image patches). The underlying concept of the proposed mutually hybrid histogram analysis is to extract the mutually compatible features from two statistical approaches. inherently conflicting: one is to enhance the global features, and the other one is to enhance the local features.
As shown in Figure 4, we propose a parallel framework to simultaneously analyze the global histogram (constructed by the entire image) and local histogram (constructed by individual local image patches). The underlying concept of the proposed mutually hybrid histogram analysis is to extract the mutually compatible features from two statistical approaches.

Figure 4.
Proposed mutually hybrid histogram analysis approach. We analyze the global and local histograms simultaneously using different statistical methods, i.e., Gaussian mixture model for the former and stratified sampling for the latter. Although different statistical methods are applied, we aim to extract the mutually compatible features to form a virtual combined histogram (introduced later in Section 3.4).

Global Region Analysis and Global Feature Extraction
In global region analysis, the logarithmically normalized luminance plane is first transformed into a global histogram of levels with equal bin width, where is empirically set as one thousand. When divided by the total number of pixels in the image, the global histogram ℎ ( ) can be viewed as a probability density function of pixels. A parametric statistical method called the Gaussian mixture model (GMM) can then be used to structure ℎ ( ) as a weighted summation of three Gaussian functions as follows: where , = 0,1, … , − 1 indicates the quantized reproduced levels of _ , and , = 1,2,3 is the weight of the -th Gaussian function. The reason for using three Gaussian functions to approximate ℎ ( ) is because in photographic reproduction, we normally concern the following three main parts: the highlight area, midtone area, and shadow area. From Equation (8), we refer to the global feature set as the following: The expectation-maximization (EM) algorithm [21] was adopted to solve the GMM estimation problem, which is used to find the maximum likelihood estimates of . Proposed mutually hybrid histogram analysis approach. We analyze the global and local histograms simultaneously using different statistical methods, i.e., Gaussian mixture model for the former and stratified sampling for the latter. Although different statistical methods are applied, we aim to extract the mutually compatible features to form a virtual combined histogram (introduced later in Section 3.4).

Global Region Analysis and Global Feature Extraction
In global region analysis, the logarithmically normalized luminance plane is first transformed into a global histogram of K levels with equal bin width, where K is empirically set as one thousand. When divided by the total number of pixels in the image, the global histogram h G (x k ) can be viewed as a probability density function of pixels. A parametric statistical method called the Gaussian mixture model (GMM) can then be used to structure h G (x k ) as a weighted summation of three Gaussian functions as follows: where {x k , k = 0, 1, . . . , K − 1} indicates the quantized reproduced levels of L log _n , and α G n , n = 1, 2, 3 is the weight of the n-th Gaussian function. The reason for using three Gaussian functions to approximate h G (x k ) is because in photographic reproduction, we normally concern the following three main parts: the highlight area, midtone area, and shadow area. From Equation (8), we refer to the global feature set as the following: The expectation-maximization (EM) algorithm [21] was adopted to solve the GMM estimation problem, which is used to find the maximum likelihood estimates of parameters in the statistical models involving unobserved latent variables. In this study, the likelihood function is defined as follows: To efficiently find the optimal θ G , the derivatives of the log-likelihood with respect to the initial α G n , µ G n , and σ G n are, respectively, set as zero (i.e., the expectation step), which yields a new parameter set of GMM (i.e., the maximization step). The EM algorithm iteratively switches between the expectation step and the maximization step until it converges (Please refer to [21] for the details of EM).

Local Region Analysis and Local Feature Extraction
In local region analysis, a sliding window scheme is adopted to visit each individual local region in raster scan order. Figure 5 illustrates the local region analysis, where a local region is of size M × M (M = 129 as the default) and is centered at the current processing position (i, j). Each local region is first divided into sixteen units with a size of 32 × 32 pixels, and 2 × 2 units constitute a partially overlapped subblock, e.g., the orange (or the green) square shown in Figure 5.
the likelihood function is defined as follows: To efficiently find the optimal , the derivatives of the log-likelihood with res to the initial , ,and are, respectively, set as zero (i.e., the expectation step), w yields a new parameter set of GMM (i.e., the maximization step). The EM algori iteratively switches between the expectation step and the maximization step unt converges (Please refer to [21] for the details of EM).

Local Region Analysis and Local Feature Extraction
In local region analysis, a sliding window scheme is adopted to visit each individ local region in raster scan order. Figure 5 illustrates the local region analysis, where a l region is of size × ( = 129 as the default) and is centered at the current proces position ( , ). Each local region is first divided into sixteen units with a size of 32 pixels, and 2 × 2 units constitute a partially overlapped subblock, e.g., the orange (or green) square shown in Figure 5.  (12) and (13), and its simplification is achieved by utilizing summed-area table defined in Equations (16) and (17). With consideration of the estimation accuracy and computation cost, each l region is subsampled into nine partially overlapping subblocks ( ) that correspon the two corner sets. First, the top-left (TL) corner set can be expressed by the followin , = 1,2, … ,9 .
With consideration of the estimation accuracy and computation cost, each local region is subsampled into nine partially overlapping subblocks (B sub n ) that correspond to the two corner sets. First, the top-left (TL) corner set can be expressed by the following: where C TL . Second, the bottom-right (BR) corner set can be expressed by the following: C BR n , n = 1, 2, . . . , 9 .
where C BR local histogram h L (x k ) as a set of nine Gaussian functions g x k , µ L n , σ L n and to find the local feature set as the following: Instead of using GMM, we adopt another statistical method called stratified sampling, in which the entire block is divided into homogeneous subblocks (defined as strata). The reason for partially overlapping is to avoid image artifacts such as the blocking effect and the halo effect. The distribution of each subblock is intentionally simulated as a Gaussian function, where the subblock mean and the subblock standard deviation are treated as the corresponding µ L n and σ L n in Equation (14), respectively. In addition, a spatial kernel (K) is used to weight the spatial correlation as follows: Inspired by [22], we adopted a summed-area table approach [23] to reduce the computation complexity of local region analysis as follows. First, the summed-area table (T SA ) was generated by calculating the sum of all the pixels above and to the left of the current position as the following: Similar to Equation (16), the square summed-area table (T 2 SA ) was generated by calculating the sum of all pixel squares as the following: Notably, both T SA and T 2 SA could be efficiently computed through a one-pass procedure over the image by the following: where p = 1 and 2.
Once the two summed-area tables were generated, the mean and standard deviation of each subblock could be quickly obtained by looking up T SA and T 2 SA because of the following closed-form solutions: where N is the number of pixels in the subblock, and S = T 2 . The four positions (i 0 , j 0 ), (i 0 , j 1 ), (i 1 , j 0 ), and (i 1 , j 1 ) indicate the top-left, the top-right, the bottom-left, and the bottom-left corners of the subblock, respectively.

Virtual Combined Histogram Construction Based on Feature Fusion
Histogram equalization (HE) is a well-known method that by analyzing the histogram, pixel intensities can be arranged for enhancing the global contrast while maintaining image details by pursuing maximum entropy. As shown in the bottom row of Figure 4, both the global histogram and the highlight/shadow local histogram can be approximated (or characterized) as Gaussian mixtures. In this study, we propose a virtual combined histogram construction scheme based on nominally fusing the local/global Gaussian mixtures as follows.
First, considering that there is minor detail loss in the normal-luminance regions and more detail loss in the under-luminance (or over-luminance) regions during the reproduction process, the highlight/shadow detection result of Equation (4) is adopted to generate a binary map, where the highlight/shadow pixels are recorded as "1", and the midtone pixels are recorded as "0". A weight map function (τ i,j ) is generated by convolving the binary map with a Gaussian low-pass filter (the Matlab inbuilt imgaussfilt function) to smooth the weighting difference. By doing so, we aim to make greater use of the local features (i.e., increase the weight map value in bright/dark regions) because the details of such regions are generally vulnerable to loss. The weight map function varies in different pixel positions because the weighting of local features should be region independent. Therefore, a virtual combined histogram is constructed by fusing global and local features through the following: where the subscript (i, j) indicate the pixel position, ω 1 and ω 2 represent the initial fusion weights (we set ω 1 = 0.4 and ω 2 = 0.6 empirically), and h G (x k ) and h L i,j (x k ) indicate the global and the local Gaussian mixtures, respectively. Moreover, we set an upper bound to constrain the maximum τ i,j value as 0.2. That is, the minimum weight to the global Gaussian mixtures in Equation (21) is guaranteed to be 0.2 to preserve the overall naturalness.

Luminance Modification and Color Recovery
Through the virtual combined histogram, a look-up table is generated in the traditional HE manner with linear interpolation. That is, the output luminance plane was modified by the following: where L out is the adjusted luminance and CDF i,j (x k ) is the Cumulative Distribution Function (CDF), which corresponds to the virtual combined histogram in Equation (21). Overall, the pixel-wise modification function was controlled by manipulating both the global and local features through the virtual combined histogram. As each combined histogram was a summation of weighted Gaussian functions, the property of the Gauss error function was used to simplify the calculation by using the following: where Φ(x k | µ, σ) is the Gaussian CDF with parameters (µ, σ). The beauty of the proposed virtual combined histogram scheme is that during the luminance modification process, only the global (and local) feature sets are used. Actually, the construction of an entire histogram is not needed. Moreover, the Gauss error function Er f (x) can be approximated from [24]  where tanh is the hyperbolic tangent function. Finally, the output reproduced LDR image was obtained from restoring the color information by the following: where HDR R,G,B represents the three channel values of the original HDR image, L in and L out represent the luminance before and after the proposed method, and s is the saturation factor (set as 0.6 in this study).

Experimental Results
In this section, we subjectively and objectively compare the effectiveness of the proposed method with those of the other photographic reproduction methods to confirm whether it affords more advantages than these methods. We selected seven classical and state-of-the-art methods for our experiments, including the following global-based reproduction method: Reinhard et al. [4] (published in 2005); the following three local-based reproduction methods: Ahn et al. [9] (published in 2013), Li et al. [13] (published in 2018), and Gao et al. [25] (published in 2020); and the following three decomposition-based reproduction methods: Gu et al. [14] (published in 2013), Liang et al. [17] (published in 2018), and Miao et al. [18] (published in 2019). For a comparison of the computational complexity, taking the image memorial (with size of 768 × 512) as an example, the processing time needed to generate a reproduced image is 0.252 s (in [4]), 0.533 s (in [9]), 5.301 s (in [13]), 0.627 s (in [25]), 0.788 s (in [14]), 2.189 s (in [17]), 0.733 s (in [18]), and 2.201 s (in the proposed method). All the methods are adjusted with the default parameters based on the suggestion of the original papers. In addition, the software and CPU are MATLAB R2016a and Intel Core i7, respectively.

Subjective Analysis
In subjective analysis, the performance of different methods can be judged through side-by-side visual comparison, such as according to the amount of regional detail information, the naturalness, etc. The simple baseline LDR images shown in Figure 6 indicate that a large luminance difference exists between the highlight and shadow areas of these test images; thus, many details are lost. where , , represents the three channel values of the original HDR image, and represent the luminance before and after the proposed method, and is the saturation factor (set as 0.6 in this study).

Experimental Results
In this section, we subjectively and objectively compare the effectiveness of the proposed method with those of the other photographic reproduction methods to confirm whether it affords more advantages than these methods. We selected seven classical and state-of-the-art methods for our experiments, including the following global-based reproduction method: Reinhard et al. [4] (published in 2005); the following three localbased reproduction methods: Ahn et al. [9] (published in 2013), Li et al. [13] (published in 2018), and Gao et al. [25] (published in 2020); and the following three decompositionbased reproduction methods: Gu et al. [14] (published in 2013), Liang et al. [17] (published in 2018), and Miao et al. [18] (published in 2019). For a comparison of the computational complexity, taking the image memorial (with size of 768 × 512) as an example, the processing time needed to generate a reproduced image is 0.252 s (in [4]), 0.533 s (in [9]), 5.301 s (in [13]), 0.627 s (in [25]), 0.788 s (in [14]), 2.189 s (in [17]), 0.733 s (in [18]), and 2.201 s (in the proposed method). All the methods are adjusted with the default parameters based on the suggestion of the original papers. In addition, the software and CPU are MATLAB R2016a and Intel Core i7, respectively.

Subjective Analysis
In subjective analysis, the performance of different methods can be judged through side-by-side visual comparison, such as according to the amount of regional detail information, the naturalness, etc. The simple baseline LDR images shown in Figure 6 indicate that a large luminance difference exists between the highlight and shadow areas of these test images; thus, many details are lost.  Figure 7 shows the reproduced results obtained using the Synagoguei test image. In Figure 7a, although the global brightness is balanced, the appearance of details is information tends to be overemphasized during decomposition and merging procedures. In Figure 7g, the details of the shadow areas (red and blue rectangles) are clear. However, the global contrast is unnatural: the highlight sky region is darkened, whereas the shadow areas are brightened, thus degrading the overall visual quality. In Figure 7h, our method shows advantages in preserving the details of the highlight and shadow areas because the proposed virtual combined histogram increases the pixel weights of local features for the highlight and shadow areas.  [9]. (c) Result of [13]. (d) Result of [25]. (e) Result of [14]. (f)Result of [17]. (g) Result of [18]. (h) Result of the proposed method. Figure 8 shows the reproduced results obtained using the Cadik_Desk02 test image. It is an indoor scene in which the lamp causes an extreme luminance difference in the captured image. In Figure 8a, the text on the book is barely perceptible because of the strong lighting. In Figure 8b,c, the detailed texture of the book is maintained; however, the global contrast in both figures is unbalanced, and the color tone is flat. In Figure 8d,  [9]. (c) Result of [13]. (d) Result of [25]. (e) Result of [14]. (f)Result of [17]. (g) Result of [18]. (h) Result of the proposed method. base layer are processed separately, thereby overamplifying the detail information. In Figure 8f, the details are not evident (blue rectangle), and the global contrast is insufficient. In Figure 8g, although details are visible, the overall image appears unreal owing to the imbalance between the macro-and the micro-models. In Figure 8h, our method exhibits excellent naturalness of the image. Furthermore, because of the improved visibility of the highlight and shadow areas, more visual content is retained, and the overall color naturalness is satisfactory.   [9].
(c) Result of [13]. (d) Result of [25]. (e) Result of [14]. (f) Result of [17]. (g) Result of [18]. (h) Result of the proposed method. Figure 9 shows the reproduced results obtained using the C33_Store test image. In Figure 9a, the detailed information of the red and blue rectangles is lost as a result of global-based processing. In Figure 9b, although the details on the right side are more visible than those in Figure 9a, the regional details of the red rectangle are lost as a result of insufficient brightness. In Figure 9c,e, detailed information is perceptible but the degree  [9]. (c) Result of [13]. (d) Result of [25]. (e) Result of [14]. (f) Result of [17]. (g) Result of [18]. (h) Result of the proposed method. rectangle) areas are visible; nevertheless, the image is somewhat unnatural due to the lack of global contrast. In Figure 9h, our method demonstrates favorable visual richness because both the global and local characteristics are simultaneously considered through the construction of a virtual combined histogram. Consequently, the details in the highlight and shadow areas are clearly presented and the contrast and color naturalness of the entire image are improved.  [9].
(c) Result of [13]. (d) Result of [25]. (e) Result of [14]. (f)Result of [17]. (g) Result of [18]. (h) Result of the proposed method. Figure 7 shows the reproduced results obtained using the Synagoguei test image. In Figure 7a, although the global brightness is balanced, the appearance of details is restricted by the global-based model. In Figure 7b, the regional scene performances in both the red and blue rectangles are poor and indistinct for the human eyes. In Figure 7c, the details of the shadow areas are preserved, whereas those of the bright area (such as the white dome) are almost imperceptible, and the tone of the entire image is monotonous and flat. In Figure 7d, the details of the red and blue rectangles are visible; however, the color of sky is oversaturated, resulting in a poor visual experience. In Figure 7e,f, although the details of the red and blue rectangles can be clearly seen, the naturalness is inevitably lost. As such methods are based on detail and base layer decomposition, image information tends to be overemphasized during decomposition and merging procedures. In Figure 7g, the details of the shadow areas (red and blue rectangles) are clear. However, the global contrast is unnatural: the highlight sky region is darkened, whereas the shadow areas are brightened, thus degrading the overall visual quality. In Figure 7h, our method shows advantages in preserving the details of the highlight and shadow areas because the proposed virtual combined histogram increases the pixel weights of local features for the highlight and shadow areas. Figure 8 shows the reproduced results obtained using the Cadik_Desk02 test image. It is an indoor scene in which the lamp causes an extreme luminance difference in the captured image. In Figure 8a, the text on the book is barely perceptible because of the strong lighting. In Figure 8b,c, the detailed texture of the book is maintained; however, the global contrast in both figures is unbalanced, and the color tone is flat. In Figure 8d, the details are slightly preserved; however, some color shading occurs. In Figure 8e, the details are well retained; however, the overall appearance is over-sharpened (e.g., lampshade in red rectangle). This is because in the method of [14], the detail layer and base layer are processed separately, thereby overamplifying the detail information. In Figure 8f, the details are not evident (blue rectangle), and the global contrast is insufficient. In Figure 8g, although details are visible, the overall image appears unreal owing to the imbalance between the macro-and the micromodels. In Figure 8h, our method exhibits excellent naturalness of the image. Furthermore, because of the improved visibility of the highlight and shadow areas, more visual content is retained, and the overall color naturalness is satisfactory. Figure 9 shows the reproduced results obtained using the C33_Store test image. In Figure 9a, the detailed information of the red and blue rectangles is lost as a result of global-based processing. In Figure 9b, although the details on the right side are more visible than those in Figure 9a, the regional details of the red rectangle are lost as a result of insufficient brightness. In Figure 9c,e, detailed information is perceptible but the degree of naturalness is low and the visual effects are not rich enough. In Figure 9e, enhanced smoothing is performed without consideration of the spatial correlation of the detail layer, leading to sharper and less-natural images. In Figure 9d, the detailed information of the red rectangle is slightly visible; however, the color is not vivid enough and lacks global contrast. In Figure 9f, although the overall appearance is natural, the visibility and sharpness in the red and blue rectangles areas are insufficient. In Figure 9g, the global contrast is good and the details of the highlight (i.e., blue rectangle) and shadow (i.e., red rectangle) areas are visible; nevertheless, the image is somewhat unnatural due to the lack of global contrast. In Figure 9h, our method demonstrates favorable visual richness because both the global and local characteristics are simultaneously considered through the construction of a virtual combined histogram. Consequently, the details in the highlight and shadow areas are clearly presented and the contrast and color naturalness of the entire image are improved.

Objective Analysis
In addition to the described subjective analysis, several objective quality indices were also applied to evaluate whether our method outperforms the other algorithms. The first quality index is called the tone-mapped image quality index (TMQI) [26]. The TMQI evaluates the quality of the reproduced images in terms of the following three aspects: structural similarity (TMQI-S), naturalness (TMQI-N), and overall quality (TMQI-Q) as follows. The TMQI-S value can be expressed by the following: where σ x , σ y , and σ xy are the local standard deviations and the cross-correlation between the corresponding HDR and LDR patches; and C 1 and C 2 are the positive stabilizing constants. As suggested in [26], the local window size is set as 11 × 11. The TMQI-N value can be expressed by the following: where ρ is a normalization factor, and P m and P d are the Gaussian and the Beta probability density functions, respectively. The TMQI-Q value can be expressed by the following: where a is a weighting used to adjust the relative importance of the two terms (a is set as 0.8011, as suggested in [26]); S and N indicate TMQI-S and TMQI-N values, respectively; and α and β indicate their sensitivities (α = 0.3046 and β = 0.7088, as suggested in [26]). As shown in Equation (26), the TMQI-S is calculated using the standard deviations and cross-correlation between the HDR images and the reproduced results. As shown in Equation (27), the TMQI-N is calculated using Gaussian and Beta probability density functions that model the histograms of the means and standard deviations in the statistics conducted on massive natural images. As shown in Equation (28), the TMQI-Q is obtained from the weighted indices of structural similarity (S value) and naturalness (N value) by using a power function to adjust these two indicators. For the TMQI-S, the TMQI-N, and the TMQI-Q, a larger index value indicates a better quality of the reproduced result. Table 1 lists the results of comparisons using Figures 7-9; apparently, the proposed method not only generates more visually pleasing reproduced results (as shown in Figures 7-9), but also outperforms the other seven algorithms in terms of the average TMQI-S, TMQI-N, and TMQI-Q. To further evaluate whether the proposed method is more effective than the other methods, we selected twenty-two test images from the datasets online [27][28][29][30]. Figure 10 shows some thumbnails of the test images, and Table 2 lists their names with the corresponding dynamic ranges. Moreover, four more objective quality metrics were added for conducting a thorough discussion. The first one was the feature similarity index for tone-mapped images (FSITM-TMQI) [31], an improved version of the TMQI that is based on a comparison of the phase-derived feature maps of the original HDR and the reproduced images. As in the case of the TMQI, a larger FSITM-TMQI value indicates a higher image quality. The second one was the dubbed blind/referenceless image spatial quality evaluator (BRISQUE) [32]. Unlike the TMQI and FSITM-TMQI, the BRISQUE is a no-reference quality assessment that evaluates the possible loss of naturalness in the spatial domain through scene statistics. The third one is the Blind TMQI (BTMQI) [33], another type of no-reference quality assessment that evaluates image quality by introducing features of statistical naturalness, structural preservation, and information entropy. For both the BRISQUE and BTMQI, lower values indicate less loss of overall naturalness, that is, better quality. The fourth one is the Integrated Local Natural Image Quality Evaluator (IL-NIQE) [34], which is a non-reference quality evaluation based on integrating multiple image statistics such as texture, color, and contrast. The IL-NIQE value reflects the global naturalness of the output image. The lower the IL-NIQE value is, the more natural it is. To further evaluate whether the proposed method is more effective than the other methods, we selected twenty-two test images from the datasets online [27][28][29][30]. Figure 10 shows some thumbnails of the test images, and Table 2 lists their names with the corresponding dynamic ranges. Moreover, four more objective quality metrics were added for conducting a thorough discussion. The first one was the feature similarity index for tone-mapped images (FSITM-TMQI) [31], an improved version of the TMQI that is based on a comparison of the phase-derived feature maps of the original HDR and the reproduced images. As in the case of the TMQI, a larger FSITM-TMQI value indicates a higher image quality. The second one was the dubbed blind/referenceless image spatial quality evaluator (BRISQUE) [32]. Unlike the TMQI and FSITM-TMQI, the BRISQUE is a no-reference quality assessment that evaluates the possible loss of naturalness in the spatial domain through scene statistics. The third one is the Blind TMQI (BTMQI) [33], another type of no-reference quality assessment that evaluates image quality by introducing features of statistical naturalness, structural preservation, and information entropy. For both the BRISQUE and BTMQI, lower values indicate less loss of overall naturalness, that is, better quality. The fourth one is the Integrated Local Natural Image Quality Evaluator (IL-NIQE) [34], which is a non-reference quality evaluation based on integrating multiple image statistics such as texture, color, and contrast. The IL-NIQE value reflects the global naturalness of the output image. The lower the IL-NIQE value is, the more natural it is. Figure 10. Thumbnails of partial test images with corresponding information provided in Table 2   The scatter plot in Figure 11 shows the detailed information of all twenty-two test images with each of the seven objective quality indices-TMQI-S, TMQI-N, TMQI-Q, FSITM-TMQI, BRISQUE, BTMQI, and IL-NIQE. This figure shows that the performance of the proposed method was among the top three for most evaluation indicators. Table 3 lists the average scores of the twenty-two test images obtained using different methods. With regard to the full-reference quality assessments (TMQI-S, TMQI-N, TMQI-Q, and FSITM-TMQI), our method obtained the best scores for these four assessments. The proposed method achieved the highest scores for the average TMQI-S, TMQI-N, and TMQI-Q, indicating that it achieved a strong balance between image structure and naturalness. In addition, our method also obtained the highest score for the average FSITM-TMQI, indicating that it generated more visually pleasing images based on the evaluation using phase-derived feature maps. The scatter plot in Figure 11 shows the detailed information of all twenty-two test images with each of the seven objective quality indices-TMQI-S, TMQI-N, TMQI-Q, FSITM-TMQI, BRISQUE, BTMQI, and IL-NIQE. This figure shows that the performance of the proposed method was among the top three for most evaluation indicators. Table 3 lists the average scores of the twenty-two test images obtained using different methods. With regard to the full-reference quality assessments (TMQI-S, TMQI-N, TMQI-Q, and FSITM-TMQI), our method obtained the best scores for these four assessments. The proposed method achieved the highest scores for the average TMQI-S, TMQI-N, and TMQI-Q, indicating that it achieved a strong balance between image structure and naturalness. In addition, our method also obtained the highest score for the average FSITM-TMQI, indicating that it generated more visually pleasing images based on the evaluation using phase-derived feature maps. With regard to the no-reference quality assessments (BRISQUE, BTMQI, and IL-NIQE), our methods all obtained the best scores of the average BRISQUE, BTMQI, and IL-NIQE. By considering both global and local features to generate a virtual combined histogram, this method maintains the naturalness of an image and produces an output reproduced image with high image quality. Compared with global-and local-based reproduction methods that consider only global features (or only local features), our method can simultaneously take advantage of global and local features. Compared with the decomposition-based methods, our method does not need to process the base and detail layers separately, thus avoiding unnaturalness when blending different image layers. Overall, in Table 3, our method achieved the highest score in all seven assessments, indicating its excellent performance with natural-looking and rich information.
For the subjective analysis and evaluation, we invited 20 participants (10 males and 10 females) to take a subjective visual quality test. The participants were asked to rate the visual subjectiveness of all the images without knowing the applied method on the output images of twenty-two scenes using the eight comparative algorithms. The score ranges from 1 to 10 points, where 1 point means "unsatisfied" and 10 points means "excellent". The mean and standard deviation of the mean opinion scores (MOS) of the subjective users are shown in Figure 12, where the proposed method is significantly better than the other methods.
In addition, the abovementioned FSITM-TMQI is actually obtained by averaging the scores of RGB channels, i.e., the FSITM-R, FSITM-G, and FSITM-B, respectively. The FSITM quality evaluation index is based on using the local phase similarity to construct a noise-independent feature map in the R, G, and B planes. In view of this, we further With regard to the no-reference quality assessments (BRISQUE, BTMQI, and IL-NIQE), our methods all obtained the best scores of the average BRISQUE, BTMQI, and IL-NIQE. By considering both global and local features to generate a virtual combined histogram, this method maintains the naturalness of an image and produces an output reproduced image with high image quality. Compared with global-and local-based reproduction methods that consider only global features (or only local features), our method can simultaneously take advantage of global and local features. Compared with the decomposition-based methods, our method does not need to process the base and detail layers separately, thus avoiding unnaturalness when blending different image layers. Overall, in Table 3, our method achieved the highest score in all seven assessments, indicating its excellent performance with natural-looking and rich information.
For the subjective analysis and evaluation, we invited 20 participants (10 males and 10 females) to take a subjective visual quality test. The participants were asked to rate the visual subjectiveness of all the images without knowing the applied method on the output images of twenty-two scenes using the eight comparative algorithms. The score ranges from 1 to 10 points, where 1 point means "unsatisfied" and 10 points means "excellent". The mean and standard deviation of the mean opinion scores (MOS) of the subjective users are shown in Figure 12, where the proposed method is significantly better than the other methods. In addition, the abovementioned FSITM-TMQI is actually obtained by averaging the scores of RGB channels, i.e., the FSITM-R, FSITM-G, and FSITM-B, respectively. The FSITM quality evaluation index is based on using the local phase similarity to construct a noiseindependent feature map in the R, G, and B planes. In view of this, we further compare the average FSITM-R, FSITM-G, and FSITM-B using the twenty-two test images. As shown in Figure 13, our improved method performs better than the other seven reproduction methods in all the RGB channels of the FSITM, indicating that our method is not only really close to the real-world scene but also has an attractive visually pleasing character and natural color appearance. Table 3. Overall comparison of average TMQI, FSITM-TMQI, BRISQUE, BTMQI, and IL-NIQE using the twenty-two t images.

Conclusions
Although HDR cameras are popularized in the digital photography indus current price of an HDR display is unaffordable to common people. Th

Conclusions
Although HDR cameras are popularized in the digital photography industry, the current price of an HDR display is unaffordable to common people. Therefore, Figure 13. Comparison of the average FSITM-R, FSITM-G, and FSITM-B using the twenty-two test images.

Conclusions
Although HDR cameras are popularized in the digital photography industry, the current price of an HDR display is unaffordable to common people. Therefore, photographic reproduction techniques have great commercial potential due to the limited availability of HDR displays. This paper presented a new reproduction method, which considers global/local features simultaneously to achieve both global contrast-maintenance and local detail-preservation. Instead of performing the global-based and local-based processes separately, we combined two statistical approaches to extract the mutually compatible features to form a virtual combined histogram. In the feature fusion stage, a weight map is used to modify the importance between the global and local features. Moreover, with the integration of Gauss error function and global/local feature sets, the construction of an entire histogram is not actually needed in the luminance modification stage. From the experimental results, the proposed method outperforms other state-of-the-art methods in terms of various visual comparisons (Figures 7-9) and objective evaluations (Tables 1 and 3, Figures 11 and 13). In the future, we plan to conduct the Wilcoxon test and the Friedman test to check whether the experimental results are statistically significant.