2.1. Rough Extraction of Specular Information
The reflected lights can be decomposed into diffuse and specular components [
3], with the diffuse components reflecting the texture information of target, and the specular components reflecting the specular information [
17,
18].
The obtained image 
 of size 
 can be reshaped into an array 
 (
) of length 
. The discrimination threshold 
 for the diffuse reflection component is then calculated from 
, as follows,
        
        where 
 represents the average value, 
 represents the standard deviation, and 
 represents the intensity threshold of the specular information, which can be adjusted according to the actual situation of the image to obtain the specular information of different areas. When 
 is less than 
, the corresponding pixels can be seen as the diffuse information. Take the stripe-projection image as an example, the intensity threshold 
, which determines the area of extracted highlights, is set to 1.05, and the discrimination threshold, 
, is equal to 196.77. Thus, the geometric factor 
 of the specular information can be calculated as follows,
        
 is a one-dimensional array of length 
, and can be seen as the proximity between pixels and specular information. Large proximity implies that the pixels’ value is close to the specular pixels. Conversely, a low proximity value suggests that the pixels are highly likely not specular information. Then, the obtained 
 can be reorganized into 
 to represent the roughly extracted highlight pixels, which is normalized and then binarized to obtain the specular pixels, then according to the area of specular pixels, extract the largest connected area 
, which is a logical matrix. Extracting the largest area can enhance the robustness and accuracy by avoiding noise in small specular areas, reducing computational load, ensuring representative specular features, preventing overprocessing for natural visuals, and providing a stable intensity recovery. Here, the dilatation algorithm is employed to process the 
 to obtain the 
, which comprises pixels in the largest specular area and its surrounding diffuse reflection pixels. Then, perform the difference operation, and take the absolute value to obtain 
, which is the diffuse information near maximum specular region, as follows,
        
Then, the 
 is transformed into a logical matrix 
, where the non-zero elements in 
 were converted to a logical value of 1 (true), and zeros were converted to a logical value of 0 (false). Then, use 
 and 
 as indexes to extract the corresponding pixels 
 and 
 from 
 respectively, which represents the intensity of the specular region and its nearby diffuse region. Using the 
 and 
 as indexes to extract 
 and 
 from 
, where the 
 is the specular information of the specular region, and 
 is the specular information of the nearby diffuse region. Variable 
 denotes the indices of elements in arrays 
, 
, 
, and 
. Thus, based on the principle of smooth transition, we have,
        
        where 
 is a scale factor used to adjust the intensity of specular information, which makes the diffuse information smooth and natural in image. The 
, 
, 
, and 
 are the mean intensities of 
, 
, 
, and 
, respectively. Then, the value of pixels 
 of the specular information can be calculated by 
, which is expressed as,
        
For the stripe-projection image, according to Equations (2)–(4), 
 is 1.44. After the separation, the specular information is reorganized into a matrix 
. 
Figure 1 shows a solved example of stripe-projection image, different 
 determines the extracted highlight areas, and further determines the extent of the subsequent highlight reduction. It can be seen from 
Figure 1 that there is much concealed texture information in the rough-extraction specular information 
, and direct difference operations on the image will eliminate both specular and texture information, which is unsuitable. Since the gradient variation in pixels in the specular region is much smaller than that of texture information, there is a significant difference in frequency domain [
19]. In this paper, a frequency domain method is presented to improve the image highlight suppression.
  2.2. First Fusion Strategy
Wavelet transform [
20,
21] is more suitable for the target image with complex features compared with Fourier transform, because the orthogonal wavelet reconstruction has better stability, and the symmetric wavelet bases can eliminate phase distortion, which can better reconstruct the decomposed images. Furthermore, wavelet bases with strong compact support have a higher attenuation speed so that can better detect fine features in images, and good smoothness can make it more beneficial for improving the frequency resolution during image decomposition and reducing distortion during image reconstruction.
In this paper, the 2-D discrete wavelet transform (2-D DWT) is used to perform multi-level decomposition of the images along the row, column, and diagonal directions. In a 2-D DWT, an image is divided into four sub-images because the process involves applying wavelet filters in both horizontal and vertical directions. (1) Horizontal filtering: The image is initially filtered along the rows (horizontal direction) using low-pass and high-pass wavelet filters, and then downsampled to create two sets of coefficients: low-frequency and high-frequency. (2) Vertical filtering: These coefficients are then filtered and downsampled along the columns (Vertical direction) using the low-pass and high-pass wavelet filters. The composite filtering produces four sub-bands: (1) Low-Low pass filtering: Results from low-pass filtering in both directions, obtaining the approximation of the image (or the lowpass filtered image). (2) Low-High pass filtering: Results from low-pass filtering horizontally and high-pass filtering vertically, obtaining horizontal edge details. (3) High-Low pass filtering: Results from high-pass filtering horizontally and low-pass filtering vertically, obtaining vertical edge details. (4) High-High pass filtering: Results from high-pass filtering in both directions, obtaining diagonal edge details. 
For multi-level 2-D DWT, the object of each decomposition is the low-frequency information after the previous level 2-D DWT, as 
Figure 2 shows,
Thus, the combination of these horizontal and vertical filtering steps generates four sub-images, each highlighting different frequency components and spatial characteristics of the original image. In this paper, a Symlet wavelet is employed to suppress the highlights based on single-image, two-fusion strategies are employed to restore the texture in highlight areas and suppress the specular information, respectively. The first fusion combines high-frequency information from imgs with imgt to enhance texture details, it is crucial for improving the visibility of fine details obscured by highlights. The second fusion, on the other hand, focuses on integrating the texture restoration image from first fusion back with the low-frequency information of , this ensures that the specular information can be removed while the texture of the image is preserved. Describing both fusions separately is essential because they serve different purposes within the highlight suppression framework and contribute uniquely to the final outcome.
Firstly, in order to extract the concealed texture information in 
 to enhance the texture feature of original image 
, a multi-level DWT employed to decompose the low-frequency and high-frequency information of 
 and 
 respectively, the next-level decomposition object is the low-frequency information of previous-level decomposition. The technical route is as 
Figure 3 shows.
In 
Figure 3, 
 denotes the low-frequency information of 
 at different levels, while 
, 
, and 
 denote the horizontal, vertical, and diagonal components of high-frequency information at different levels. Similarly, 
 denotes the low-frequency information of 
 at different levels, where the “A” denotes the “Approximation”, it means the low-frequency information, 
, 
, 
 denote the horizontal, vertical, and diagonal components of high-frequency information at different levels, where the “
H” denotes the “Horizontal direction”, “
V” denotes the “Vertical direction”, “
D” denotes the “Diagonal direction”, and the subscript “1” denotes the first 2-D DWT and fusion strategy.
After the multi-level DWT operation to the 
 and 
 respectively, the low-frequency information (
, 
) and the high-frequency information at different levels (
, 
, 
, 
, 
, 
) are obtained. The decomposition results and reconstruction images are shown in 
Figure 4.
Figure 4 illustrates that a single wavelet decomposition of 
ImgS does not completely separate low- and high-frequency information. The high-frequency information 
 is not visually obvious due to weak texture features in 
. However, after deeper wavelet decomposition, the high-frequency information becomes clearer, with stripes gradually becoming apparent, while residual texture features in low-frequency information 
 are gradually eliminated. The texture features are fully eliminated in 
, accurately reflecting the specular information. For 
, its low and high frequency information are effectively separated by DWT, where the low-frequency information 
 still contains the texture information, but interestingly, from the beginning of 2-level decomposition, the high-frequency features in 
 are removed effectively, and the subsequent decompositions further separate the fine features, which proves the difference between highlight information and texture information in the frequency domain. After extracting the concealed texture information from 
 by multi-level DWT, the original images 
 are first processed to obtain texture restoration images 
.
 The high-frequency information 
 in 
 is extracted to enhance the high-frequency information 
 in 
. Then, the information sets 
 and 
 are subjected to hierarchical gain and fusion. The fusion strategies are expressed as follows,
        
        where 
, 
, 
, and 
, respectively, represent the low-frequency and high-frequency information at different levels of fusion image 
. The 
, 
, 
, 
, 
, 
, 
, and 
, respectively, represent the gain coefficients at different levels of the first wavelet decomposition information. Then, the low-frequency information of fusion image can be obtained by inverse discrete wavelet transform, as follows,
        
        where 
 represents the inverse discrete wavelet transform (IDWT) operation, through the IDWT, and the information set 
 is reconstructed into the low-frequency information 
 up one level. 
Since the high-frequency information 
 reflects the residual texture information in 
, it needs to be retained, for 
, the high-frequency information 
 reflects the real texture information, here, an autocorrelation-function-based algorithm is used to calculate the gain coefficients of high-frequency information, as follows,
        
        where 
 and 
 are the variables, 
 are the size of each high-frequency information, 
 is the reward factor, 
 is an indicator to indicate whether the calculation condition is met, 
 and 
 are the displacements in the 
 and 
 direction, respectively, which determine which two pixels in the image are used for comparison, and 
 is the variable of offset, where 
 is the size of a rectangular window. It means the computation between each pixel within a 
 window and the pixel that is displaced by 
 and 
 in the 
 and 
 direction, respectively. 
 represents the high-frequency information of 
 and 
. Taking the stripe-projection image as an example, in the fusion for constructing 
, the gain coefficient 
 is 
, 
 is 
, 
 is 
, 
 is 
, and the 
 , 
, and 
 were equal to 1 due to the preserve texture principle. After multi-level reconstruction by IDWT, the first-level information set 
 is obtained, then a high-pass filter with 
 convolution kernel is employed to extract the edge information of residual texture in 
, as follows,
        
        where the 
 is negative number, together with positive number 
 form the convolution kernel, in order to ensure that the extracted edge information does not destroy the overall texture of the original image due to excessive enhancement; here, a reference value is provided, where 
 is −1, and 
 is 0.7. 
, 
, and 
 denote the horizontal, vertical, and diagonal component high-frequency information in first level after edge enhancement, the operator “
” denotes the convolution operation. The filter results are added to the first-level high-frequency information set for enhancement. Then, the obtained 
, 
, and 
 are used to replace the corresponding high-frequency information 
, 
, and 
, IDWT is used to reconstruct them, as follows,
        
From 
Figure 4, texture information of all levels is enhanced compared with original image 
 after wavelet decomposition. 
 a is a texture restoration image, and on this basis, the highlight suppression images 
 are obtained by removing the low-frequency information of 
.