2.1. Rough Extraction of Specular Information
The reflected lights can be decomposed into diffuse and specular components [
3], with the diffuse components reflecting the texture information of target, and the specular components reflecting the specular information [
17,
18].
The obtained image
of size
can be reshaped into an array
(
) of length
. The discrimination threshold
for the diffuse reflection component is then calculated from
, as follows,
where
represents the average value,
represents the standard deviation, and
represents the intensity threshold of the specular information, which can be adjusted according to the actual situation of the image to obtain the specular information of different areas. When
is less than
, the corresponding pixels can be seen as the diffuse information. Take the stripe-projection image as an example, the intensity threshold
, which determines the area of extracted highlights, is set to 1.05, and the discrimination threshold,
, is equal to 196.77. Thus, the geometric factor
of the specular information can be calculated as follows,
is a one-dimensional array of length
, and can be seen as the proximity between pixels and specular information. Large proximity implies that the pixels’ value is close to the specular pixels. Conversely, a low proximity value suggests that the pixels are highly likely not specular information. Then, the obtained
can be reorganized into
to represent the roughly extracted highlight pixels, which is normalized and then binarized to obtain the specular pixels, then according to the area of specular pixels, extract the largest connected area
, which is a logical matrix. Extracting the largest area can enhance the robustness and accuracy by avoiding noise in small specular areas, reducing computational load, ensuring representative specular features, preventing overprocessing for natural visuals, and providing a stable intensity recovery. Here, the dilatation algorithm is employed to process the
to obtain the
, which comprises pixels in the largest specular area and its surrounding diffuse reflection pixels. Then, perform the difference operation, and take the absolute value to obtain
, which is the diffuse information near maximum specular region, as follows,
Then, the
is transformed into a logical matrix
, where the non-zero elements in
were converted to a logical value of 1 (true), and zeros were converted to a logical value of 0 (false). Then, use
and
as indexes to extract the corresponding pixels
and
from
respectively, which represents the intensity of the specular region and its nearby diffuse region. Using the
and
as indexes to extract
and
from
, where the
is the specular information of the specular region, and
is the specular information of the nearby diffuse region. Variable
denotes the indices of elements in arrays
,
,
, and
. Thus, based on the principle of smooth transition, we have,
where
is a scale factor used to adjust the intensity of specular information, which makes the diffuse information smooth and natural in image. The
,
,
, and
are the mean intensities of
,
,
, and
, respectively. Then, the value of pixels
of the specular information can be calculated by
, which is expressed as,
For the stripe-projection image, according to Equations (2)–(4),
is 1.44. After the separation, the specular information is reorganized into a matrix
.
Figure 1 shows a solved example of stripe-projection image, different
determines the extracted highlight areas, and further determines the extent of the subsequent highlight reduction. It can be seen from
Figure 1 that there is much concealed texture information in the rough-extraction specular information
, and direct difference operations on the image will eliminate both specular and texture information, which is unsuitable. Since the gradient variation in pixels in the specular region is much smaller than that of texture information, there is a significant difference in frequency domain [
19]. In this paper, a frequency domain method is presented to improve the image highlight suppression.
2.2. First Fusion Strategy
Wavelet transform [
20,
21] is more suitable for the target image with complex features compared with Fourier transform, because the orthogonal wavelet reconstruction has better stability, and the symmetric wavelet bases can eliminate phase distortion, which can better reconstruct the decomposed images. Furthermore, wavelet bases with strong compact support have a higher attenuation speed so that can better detect fine features in images, and good smoothness can make it more beneficial for improving the frequency resolution during image decomposition and reducing distortion during image reconstruction.
In this paper, the 2-D discrete wavelet transform (2-D DWT) is used to perform multi-level decomposition of the images along the row, column, and diagonal directions. In a 2-D DWT, an image is divided into four sub-images because the process involves applying wavelet filters in both horizontal and vertical directions. (1) Horizontal filtering: The image is initially filtered along the rows (horizontal direction) using low-pass and high-pass wavelet filters, and then downsampled to create two sets of coefficients: low-frequency and high-frequency. (2) Vertical filtering: These coefficients are then filtered and downsampled along the columns (Vertical direction) using the low-pass and high-pass wavelet filters. The composite filtering produces four sub-bands: (1) Low-Low pass filtering: Results from low-pass filtering in both directions, obtaining the approximation of the image (or the lowpass filtered image). (2) Low-High pass filtering: Results from low-pass filtering horizontally and high-pass filtering vertically, obtaining horizontal edge details. (3) High-Low pass filtering: Results from high-pass filtering horizontally and low-pass filtering vertically, obtaining vertical edge details. (4) High-High pass filtering: Results from high-pass filtering in both directions, obtaining diagonal edge details.
For multi-level 2-D DWT, the object of each decomposition is the low-frequency information after the previous level 2-D DWT, as
Figure 2 shows,
Thus, the combination of these horizontal and vertical filtering steps generates four sub-images, each highlighting different frequency components and spatial characteristics of the original image. In this paper, a Symlet wavelet is employed to suppress the highlights based on single-image, two-fusion strategies are employed to restore the texture in highlight areas and suppress the specular information, respectively. The first fusion combines high-frequency information from imgs with imgt to enhance texture details, it is crucial for improving the visibility of fine details obscured by highlights. The second fusion, on the other hand, focuses on integrating the texture restoration image from first fusion back with the low-frequency information of , this ensures that the specular information can be removed while the texture of the image is preserved. Describing both fusions separately is essential because they serve different purposes within the highlight suppression framework and contribute uniquely to the final outcome.
Firstly, in order to extract the concealed texture information in
to enhance the texture feature of original image
, a multi-level DWT employed to decompose the low-frequency and high-frequency information of
and
respectively, the next-level decomposition object is the low-frequency information of previous-level decomposition. The technical route is as
Figure 3 shows.
In
Figure 3,
denotes the low-frequency information of
at different levels, while
,
, and
denote the horizontal, vertical, and diagonal components of high-frequency information at different levels. Similarly,
denotes the low-frequency information of
at different levels, where the “A” denotes the “Approximation”, it means the low-frequency information,
,
,
denote the horizontal, vertical, and diagonal components of high-frequency information at different levels, where the “
H” denotes the “Horizontal direction”, “
V” denotes the “Vertical direction”, “
D” denotes the “Diagonal direction”, and the subscript “1” denotes the first 2-D DWT and fusion strategy.
After the multi-level DWT operation to the
and
respectively, the low-frequency information (
,
) and the high-frequency information at different levels (
,
,
,
,
,
) are obtained. The decomposition results and reconstruction images are shown in
Figure 4.
Figure 4 illustrates that a single wavelet decomposition of
ImgS does not completely separate low- and high-frequency information. The high-frequency information
is not visually obvious due to weak texture features in
. However, after deeper wavelet decomposition, the high-frequency information becomes clearer, with stripes gradually becoming apparent, while residual texture features in low-frequency information
are gradually eliminated. The texture features are fully eliminated in
, accurately reflecting the specular information. For
, its low and high frequency information are effectively separated by DWT, where the low-frequency information
still contains the texture information, but interestingly, from the beginning of 2-level decomposition, the high-frequency features in
are removed effectively, and the subsequent decompositions further separate the fine features, which proves the difference between highlight information and texture information in the frequency domain. After extracting the concealed texture information from
by multi-level DWT, the original images
are first processed to obtain texture restoration images
.
The high-frequency information
in
is extracted to enhance the high-frequency information
in
. Then, the information sets
and
are subjected to hierarchical gain and fusion. The fusion strategies are expressed as follows,
where
,
,
, and
, respectively, represent the low-frequency and high-frequency information at different levels of fusion image
. The
,
,
,
,
,
,
, and
, respectively, represent the gain coefficients at different levels of the first wavelet decomposition information. Then, the low-frequency information of fusion image can be obtained by inverse discrete wavelet transform, as follows,
where
represents the inverse discrete wavelet transform (IDWT) operation, through the IDWT, and the information set
is reconstructed into the low-frequency information
up one level.
Since the high-frequency information
reflects the residual texture information in
, it needs to be retained, for
, the high-frequency information
reflects the real texture information, here, an autocorrelation-function-based algorithm is used to calculate the gain coefficients of high-frequency information, as follows,
where
and
are the variables,
are the size of each high-frequency information,
is the reward factor,
is an indicator to indicate whether the calculation condition is met,
and
are the displacements in the
and
direction, respectively, which determine which two pixels in the image are used for comparison, and
is the variable of offset, where
is the size of a rectangular window. It means the computation between each pixel within a
window and the pixel that is displaced by
and
in the
and
direction, respectively.
represents the high-frequency information of
and
. Taking the stripe-projection image as an example, in the fusion for constructing
, the gain coefficient
is
,
is
,
is
,
is
, and the
,
, and
were equal to 1 due to the preserve texture principle. After multi-level reconstruction by IDWT, the first-level information set
is obtained, then a high-pass filter with
convolution kernel is employed to extract the edge information of residual texture in
, as follows,
where the
is negative number, together with positive number
form the convolution kernel, in order to ensure that the extracted edge information does not destroy the overall texture of the original image due to excessive enhancement; here, a reference value is provided, where
is −1, and
is 0.7.
,
, and
denote the horizontal, vertical, and diagonal component high-frequency information in first level after edge enhancement, the operator “
” denotes the convolution operation. The filter results are added to the first-level high-frequency information set for enhancement. Then, the obtained
,
, and
are used to replace the corresponding high-frequency information
,
, and
, IDWT is used to reconstruct them, as follows,
From
Figure 4, texture information of all levels is enhanced compared with original image
after wavelet decomposition.
a is a texture restoration image, and on this basis, the highlight suppression images
are obtained by removing the low-frequency information of
.