Specular Reﬂection Detection and Inpainting in Transparent Object through MSPLFI

: Multispectral polarimetric light ﬁeld imagery (MSPLFI) contains signiﬁcant information about a and 0.226, respectively) for all the sub-apertures of the 18 transparent objects in MSPLFI dataset as compared with those obtained from the methods in the literature considered in this paper. Future work will exploit the integration of machine learning for better SRD accuracy and SRI quality.


Introduction
The emerging significance of specular reflection detection and inpainting (SRDI) has been actively pursued in the computer vision community over the last few decades. The presence of specular reflection creates potential difficulties for tasks such as detection, segmentation, and matching, as it captures significant information about an object's distribution, shape, texture, and roughness features that cause discontinuity in its omnipresent, object-determined diffuse part [1]. Once specular reflection is detected, it may be used to synthesize a scene [2] or to estimate lighting direction and surface roughness [3,4]. While passing through the surface of a transparent object, some incoming lights are immediately reflected back into the space and are called surface or specular reflections, and others penetrate the surface and then reflect back into the air body or diffuse reflections [5]. Due to a transparent object lacking its own texture, it is always a difficult and challenging task to detect its specular reflections and inpainting [6]. The potential application of specular reflection detection and inpainting in transparent objects through multispectral polarimetric light field imagery (MSPLFI) includes 3D shape reconstruction, detection and segmentation, surface normal generation, and defect analysis.
By integrating advanced communication tools and techniques, multispectral polarimetric imagery (MSPI) can extract an object's meaningful information, such as surface features, shapes, and roughness, in optical sensing images [7]. Potential applications of it could investigate acquiring an imaging system that performs image denoising [8], image dehazing [9], and semantic segmentation [10]. Multispectral imaging is a mode commonly reported in the literature for enhancing color reproduction [11], illuminant estimation [12], vegetation phenology [13,14], shadow detection [15], and background segmentation [16,17]. Additionally, although a multispectral cue is capable of generating information through penetrating deeper into an object, it is sometimes infeasible for extracting the object's inherent features. Together with a polarimetric cue, where specific photoreceptors are used for polarized light vision, MSPI is applied in applications such as specular and diffuse separation [18], material classification [19], shape estimation [20], target detection [21][22][23], anomaly detection [24], man-made object separation [25], and camouflaged object separation [26]. Recently, a light field (LF) cue has gained popularity in the graphics community for detecting and segmenting some complex tasks, such as transparent object recognition [27], classification [28], and segmentation [29] from a background, by analyzing the distortion features of a single shot captured by an LF camera. Each pixel in an LF image is capable of having six degrees of freedom to extract the hidden information unable to be captured by MSPI cues. The aim of the proposed research is to use the multisensory cues of MSPLFI, which can effectively detect the specular reflection and the corresponding suppression in a transparent object.
Firstly, it is necessary to separate specular reflection from diffuse reflection. Each pixel in MSPLFI can be defined as the sum of specular and diffuse reflections following the dichromatic reflection model [30] as L(λ, ρ, L, θ i , θ r , g) = L Spec (λ, ρ, L, θ i , θ r , g) + L Di f f (λ, ρ, L, θ i , θ r , g), where L s (λ, ρ, L, θ i , θ r , g) is the specular reflection, L s (λ, ρ, L, θ i , θ r , g) the diffuse reflection, λ the wavelength in the multispectral visible band (400 nm-700 nm), ρ the orientation of the polarimetric filter (rotating at 0 • , 45 • , 90 • , 135 • ), L the LF direction in which the light rays are traveling in space, and θ i , θ r , g the geometric parameters indicating incidence, viewing, and phase angles, respectively. The individual components in Equation (1) can be further decomposed into two parts, composition and magnitude, as in Equation (2). Composition is a relative spectral power distribution (c Spec (surface reflection) or c Di f f (body reflection)) that depends on only wavelength, polarization, and LF but is independent of geometry. Magnitude is a geometric scale factor (ω Spec or ω Di f f ) which depends on only geometry and is independent of the wavelength, polarization, and LF. L(λ, ρ, L, θ i , θ r , g) = ω Spec (θ i , θ r , g)c Spec (λ, ρ, L) + ω Di f f (θ i , θ r , g)c Di f f (λ, ρ, L), As the appearance of a transparent object is highly biased by its background's texture and color, it is a challenging task to detect, segment, and suppress the specular reflections on it. Through predicting multispectral changes per sub-aperture image in the LF, the proposed research detects specular reflected pixels. In terms of inpainting, as it can be predicted that a pixel in a LF image has six degrees of freedom and can appear within any surrounding four-connected pixels in a sub-aperture image, a pixel pattern with maximum acceptability is selected to suppress an SRD pixel. Briefly, the proposed system firstly describes the significance of the joint utilization of multisensory cues, then captures an MSPLFI object dataset, proposes a two-fold algorithm for detecting and suppressing specular reflections, evaluates both detection accuracy and suppression quality in terms of statistical distinct metrics and, finally, compares performance with those of some other methods in the existing literature.
The main contribution of this research is two-fold. Firstly, an SRD algorithm that predicts changes in MSPLFI by calculating mean (µ) and covariance (Σ) of each sub-aperture index of the LF to predict specular reflections through applying the Mahalanobis distance is proposed. Then, the predicted changes in unpolarized and polarized images are averaged, and a threshold is applied to obtain a final SRD pixel mask (SRD-PM). However, due to the absence of publicly available multisensory 6D datasets to evaluate the performance of the proposed research, we firstly built an image acquisition system to capture an MSPLFI object dataset. Secondly, an SRI algorithm which extends the final SRD-PM in an immediately neighboring pixel using the RGB channels of both polarized and unpolarized sub-apertures in the LF is proposed. For a pixel in the SRD-PM, all the four-connected neighboring pixel patterns per sub-apertures of the LF, excluding those already in the SRD-PM, are carefully selected and a distance matrix is computed based on their intensities. Finally, the pixel pattern with the minimum distance is chosen for the task of inpainting. The performances of these approaches are evaluated and compared using a private MSPLFI object dataset to demonstrate the significance of this research. This paper is organized as follows. In Section 2, the background to SRD and SRI is fully described. In Section 3, the details of the private MSPLFI dataset, including image acquisition setup, multisensory cues, and pixels' degrees of freedom, are analyzed. In Section 4, a complete two-fold SRDI framework and corresponding algorithms are presented with proper mathematical and logical explanations. In Section 5, the performances of the proposed SRD and SRI algorithms are evaluated by distinct statistical metrics. Additionally, detection accuracy and suppression quality of the proposed SRDI are visualized and compared with those of existing approaches. Finally, concluding remarks and suggested future directions are provided in Section 6.

Related Works
SRD techniques usually assume that the intensities of specular pixels vary from those of diffuse ones in multiple spectra as where τ G is a global threshold, P (x, y, c, λ, ρ| i) the final SRD-PM at pixel (x, y) of a fused spectrum (λ) at a polarimetric orientation (ρ) in sub-aperture index i of the LF (L), d the distance between the pixel of the predicted specular pixel (S) and that of the fused image in spectrum λ(I) at orientation ρ. In this section, a brief review of the literature related to SRDI techniques for multisensory cues of MSPLFI is provided.

Specular Reflection Detection (SRD)
Recent works on SRD are categorized in two major ways, single and multiple imagebased, where the latter depends on specific conditions such as lighting direction and viewpoint. Based on a single-textured color image, Tan [31] iteratively shifts the maximum chromaticity of each pixel between two neighboring ones. An iteration stops when the chromaticity difference satisfies a certain threshold value and generates a specular-free (SF) image. The final SF image ensures a similar geometrical distribution even though it contains only diffuse reflections. However, for a large image with more specularity, this techique may lead to erroneous diffuse reflections with excessive and inaccurate removal Remote Sens. 2021, 13, 455 4 of 27 as well as higher computational complexity. Subtracting the minimum color channel value from each channel, Yoon [32] obtains an SF two-band image. Capturing images from a dynamic light source, Sato [33] integrates the dichromatic reflection model for separation by analyzing color signatures in many images captured by a moving light source. A series of linear basis functions are introduced by Lin [34], and the lighting direction is changed to decompose the reflection components.
The modified SF (MSF) technique introduced by Shen [35] ensures robustness to the influence of noise on chromaticity. It subtracts the minimum RGB value from an input image and works in an iterative manner by selecting a predefiend offset value using the least-squares criterion. Nguyen [36] proposes an MSF method that integrates tensor voting to obtain the dominant color and distribution of diffuse reflections in a region. To improve the separation performance, Yamamoto [37] applies a high-emphasis filter on individual reflection components to separate them [35]. However, all these methods suffer from artifacts and inaccuracy if the brightness of the input image is high.
Recent literature on SRD reveals that the specular reflection of an object's area has a stronger polarization signature than its diffuse reflection. Placing a polarization filter in front of an imaging sensor, Nayar [18] proposes separating the specular reflection components from an object's surface with heavy textures. Considering the textures and the surface colors of neighboring pixels, many authors [31,38,39] could separate specular reflections through neighboring pixel patterns. Applying a bilateral filter with coefficients, Yang [39] proposes an extension of Tan's [31] method in which the diffuse chromaticity is maximized. Although it provides faster separation and better accuracy, it still suffers from some problems for separating specular reflections in a transparent object. Akashi [40] also employs the dichromatic reflection model to separate specular reflections in single images based on sparse non-negative matrix factorization (NMF) composed of only non-negative values regulated by parameters such as sparse regularization, pixel color, and convergence. Although this method demonstrates better separation accuracy than those of Tan [31] and Yang [39], inaccurate parameter settings may lead to artifacts in the separation of specular reflections.
An SUV color space for separating specular and diffuse reflections from S and UV channels, respectively, of a single image or image sequence in an iterative manner is proposed by Mallick [38]. However, discontinuities in the surface color may lead to erroneous detection of secular reflections. In [41], Arnold applies image segmentation based on non-linear filtering and thresholding to separate specular and diffuse reflections in medical imaging. Saint [42] proposes increasing the gap between two reflection components and then applying a non-linear filter to isolate spike components in an image histogram. In [43], Meslouhi integrates the dichromatic reflection model to detect specular reflections. In our research, we use multisensory cues to detect specular reflections by predicting changes among multiband data.

Specular Reflection Inpainting (SRI)
SRI refers to restoring an SRD pixel pattern with semantically and visually believable content through analyzing neighboring pixel patterns. Recent works in the literature on SRI depend mainly on patch-based similarity, with similar patch-or diffusion-based inpainting proposed to fill an SRD pixel pattern by spreading color intensities from its background to its holes [8,9,44,45]. Traditional inpainting approaches apply an interpolation technique on the surrounding pixels to restore an SRD pixel pattern [46,47]. Based on temporal information in an endoscopic video image sequence, Vogt [48] proposes a well-inpainting method. Cao [49] develops an inpainting technique for averaging the pixels in a sliding rectangular window and later replacing it with an SRD pixel. Although this method is simple and relatively fast to compute, it lacks robustness due to varying window sizes based on the SRD's connected pixels. In [50], an average intensity of a contour is calculated to replace the SRD pixels by author Oh but may lead to strong gradients. In [41], Arnold proposes a two-level inpainting technique which replaces SRD pixels with the centroid color within a certain distance and applies a Gaussian kernel for smoothing using a binary weight mask. Although the inpainting quality is better than those of other methods, it may produce some artifacts and blur for large spectral areas by integrating a partial differential equation with gradient thresholding. In [51], Yang proposes a convex model for suppressing the reflection from a single input image. In [52], Criminisi describes an image inpainting method in which an affected region is filled by some exemplars. As these techniques may produce artifacts and fail to suppress large reflection areas, our proposed method reconstructs the specular reflected pixels through analyzing their four-connected neighbors in the sub-apertures of the 4D-LF.

Analysis of MSPLFI Transparent Object Dataset
Regarding SRD and SRI, the proposed research uses multisensory cues through capturing different objects in MSPLFI, each of which is defined as a function of 6D as where (u, v) is the image plane referring to an image's spatial dimensions, (s, t) the viewpoint plane referring to the direction in which the light rays are traveling in space, λ the wavelength in the multispectral visible band (400 nm-700 nm), and ρ the orientation of the polarimetric filter (rotating at 0 • , 45 • , 90 • , 135 • ).
In this section, acquisition of the MSPLFI object dataset and then its use for detecting and suppressing specular reflections in a transparent object are described.

Experimental Setup
As there is no dataset available for the evaluation of SRDI in a transparent object that integrates multiple cues of MSPLFI, Figure 1 illustrates our setup for image acquisition to generate a problem-specific object dataset in a constrained environment with a plenoptic camera, Lytro Illum, used to capture all the LF images. We place different band filters in front of the camera to capture multispectral images and a linear polarization filter rotating at 0 • , 45 • , 90 • , and 135 • to manually obtain different polarimetric images with two light sources used to obtain accurate spectral reflections. The lighting is similar for different objects, and we retain the same background for them, which completely matches most of the objects in most of the area with the purpose of creating a complex environment from which to segment a whole object. One of the light sources is located beside the camera lens at 45 • angle and another is located on the top object's location. The energy levels of multiple spectra are not similar; however, individual cues contain a useable amount of information when capturing MSPLFI. In [41], Arnold proposes a two-level inpainting technique which replaces SRD pixels with the centroid color within a certain distance and applies a Gaussian kernel for smoothing using a binary weight mask. Although the inpainting quality is better than those of other methods, it may produce some artifacts and blur for large spectral areas by integrating a partial differential equation with gradient thresholding. In [51], Yang proposes a convex model for suppressing the reflection from a single input image. In [52], Criminisi describes an image inpainting method in which an affected region is filled by some exemplars. As these techniques may produce artifacts and fail to suppress large reflection areas, our proposed method reconstructs the specular reflected pixels through analyzing their four-connected neighbors in the sub-apertures of the 4D-LF.

Analysis of MSPLFI Transparent Object Dataset
Regarding SRD and SRI, the proposed research uses multisensory cues through capturing different objects in MSPLFI, each of which is defined as a function of 6D as where ( , ) is the image plane referring to an image's spatial dimensions, ( , ) the viewpoint plane referring to the direction in which the light rays are traveling in space, the wavelength in the multispectral visible band (400 nm-700 nm), and the orientation of the polarimetric filter (rotating at 0°, 45°, 90°, 135°).
In this section, acquisition of the MSPLFI object dataset and then its use for detecting and suppressing specular reflections in a transparent object are described.

Experimental Setup
As there is no dataset available for the evaluation of SRDI in a transparent object that integrates multiple cues of MSPLFI, Figure 1 illustrates our setup for image acquisition to generate a problem-specific object dataset in a constrained environment with a plenoptic camera, Lytro Illum, used to capture all the LF images. We place different band filters in front of the camera to capture multispectral images and a linear polarization filter rotating at 0°, 45°, 90°, and 135° to manually obtain different polarimetric images with two light sources used to obtain accurate spectral reflections. The lighting is similar for different objects, and we retain the same background for them, which completely matches most of the objects in most of the area with the purpose of creating a complex environment from which to segment a whole object. One of the light sources is located beside the camera lens at 45° angle and another is located on the top object's location. The energy levels of multiple spectra are not similar; however, individual cues contain a useable amount of information when capturing MSPLFI.

MSPLFI Transparent Object Dataset
In Figure 2, the median specular reflections of the sub-aperture images of 18 transparent objects (O#1-O#18) captured through MSPLFI are presented with their corresponding labels. To evaluate the performance of the image inpainting technique, some balls are placed inside object O#1.

MSPLFI Transparent Object Dataset
In Figure 2, the median specular reflections of the sub-aperture images of 18 transparent objects (O#1-O#18) captured through MSPLFI are presented with their corresponding labels. To evaluate the performance of the image inpainting technique, some balls are placed inside object O#1. We consider five different shots for each spectrum of each object. Of them, one corresponds to the unpolarized version of the image captured without using a polarization filter and the other four to four different polarization filter orientations (0°, 45°, 90°, and 135°) using a linear polarizer. We consider multiple spectra in the visible range (400 nm-700 nm) to obtain images in the multispectral environment. Figure 3 shows the center subaperture images of object O#8 in multiple color bands of violet, blue, green, yellow, orange, red, pink, and RGB in polarized and unpolarized versions. As can be seen, due to the nature of polarization, on average, 50% of the photons get blocked while passing through a lossless polarizer at different orientations.
The LF images are 4D data obtained from different viewpoints, with each image presented as a sub-aperture plane ( , ) with its tangent direction ( , ). In our experiments, we consider 11 × 11 sub-aperture images, including their center viewpoints, with their spatial representations denoted by ( , ). Figure 4 shows the 4D-LF images of object O#8 in the violet color band, with the center viewpoint image at the cross-section of the S and the T lines denoted as the (6,6) position in the hyperplane ( , , , ). We consider five different shots for each spectrum of each object. Of them, one corresponds to the unpolarized version of the image captured without using a polarization filter and the other four to four different polarization filter orientations (0 • , 45 • , 90 • , and 135 • ) using a linear polarizer. We consider multiple spectra in the visible range (400 nm-700 nm) to obtain images in the multispectral environment. Figure 3 shows the center sub-aperture images of object O#8 in multiple color bands of violet, blue, green, yellow, orange, red, pink, and RGB in polarized and unpolarized versions. As can be seen, due to the nature of polarization, on average, 50% of the photons get blocked while passing through a lossless polarizer at different orientations.
The LF images are 4D data obtained from different viewpoints, with each image presented as a sub-aperture plane (s, t) with its tangent direction (u, v). In our experiments, we consider 11 × 11 sub-aperture images, including their center viewpoints, with their spatial representations denoted by (u, v). Figure 4 shows the 4D-LF images of object O#8 in the violet color band, with the center viewpoint image at the cross-section of the S and the T lines denoted as the (6,6) position in the hyperplane (s, t, u, v).         Figure 5 presents an example of object O#1's scene flow among its sub-aperture images and their relative directions. In Figure 5a, the arrow indicates that all the viewpoint images' motion flows to the center viewpoint image and, in Figure 5b, each pixel has six degrees of freedom in the LF images, with the region of interest (ROI) regarding the scene flow indicated by a yellow rectangle. In Figure 5c, the pixel displacements are shown with their corresponding intensity flow plots, which confirm that the intensity of the ROI varies in different viewpoints.

Degrees of Freedom
Remote Sens. 2021, 13, x FOR PEER REVIEW 8 of 30 Figure 5 presents an example of object O#1's scene flow among its sub-aperture images and their relative directions. In Figure 5a, the arrow indicates that all the viewpoint images' motion flows to the center viewpoint image and, in Figure 5b, each pixel has six degrees of freedom in the LF images, with the region of interest (ROI) regarding the scene flow indicated by a yellow rectangle. In Figure 5c, the pixel displacements are shown with their corresponding intensity flow plots, which confirm that the intensity of the ROI varies in different viewpoints.

Proposed Two-fold SRDI Framework
In this section, the proposed two-fold SRDI framework based on the distinctive features of MSPLFI cues is discussed and presented in Figure 6. Firstly, a 6D dataset of different transparent objects is captured, and then Reed-Xiaoli (RX) detector [53] is applied to obtain the actual specular reflection of an object through predicting changes among multiband. Secondly, a pixel neighborhood-based inpainting method for suppressing this reflection is proposed.

Proposed Two-fold SRDI Framework
In this section, the proposed two-fold SRDI framework based on the distinctive features of MSPLFI cues is discussed and presented in Figure 6. Firstly, a 6D dataset of different transparent objects is captured, and then Reed-Xiaoli (RX) detector [53] is applied to obtain the actual specular reflection of an object through predicting changes among multiband. Secondly, a pixel neighborhood-based inpainting method for suppressing this reflection is proposed.

Specular Reflection Detection (SRD)
The proposed system detects specular reflected pixels in transparent objects through predictions of multiband changes. Firstly, a raw lenslet (.LFR) image is decoded into a 4D ( , , , ) LF one, where ( , ) denotes the image's position in the hyperplane and ( , ) its spatial region. The MSPLF imagery was captured by the Lytro Illum camera, which can

Specular Reflection Detection (SRD)
The proposed system detects specular reflected pixels in transparent objects through predictions of multiband changes. Firstly, a raw lenslet (.LFR) image is decoded into a 4D (s, t, u, v) LF one, where (s, t) denotes the image's position in the hyperplane and (u, v) its spatial region. The MSPLF imagery was captured by the Lytro Illum camera, which can capture 15 × 15 sub-apertures per shot. However, due to the main lens of the camera being circular, vignetting occurs at its edge. Hence, only the inner 11 × 11 sub-apertures are retained. It could be argued that few more sub-apertures at the top, the bottom, the left, and the right could be as good-if not better-than the corner sub-apertures kept in the 11 × 11 array, but excluding them keeps them in a square array for simplicity. As our main purpose is to detect and suppress specularity in a transparent object, we maximize an object's area with a minimum surrounding background. In order to compute the specular reflections in unpolarized images, we convert all the multiband unpolarized 4D LF ones into their corresponding grayscale ones. For each sub-aperture index, we store the individual band images in a column vector, with their mean (µ) and covariance (Σ) calculated for the Mahalanobis distance as The 2D distance matrix represents the changes among the multiband images per subaperture index, which is also observed as specular reflection. We also predict the maximum specularity in unpolarized 4D images. In order to draw specular reflections in polarized images, we firstly calculate the Stokes parameters (S 0 −S 2 ) [54], which describe the linear polarization characteristics using a three-element vector (S), as shown in Equation (6), where S 0 represents the total intensity of light, S 1 the difference between the horizontal and vertical polarizations, and S 2 the difference between the linear +45 • and -45 • ones. The I 0 0 , I 45 0 , I 90 0 , and I 135 0 are the different input images for the system at polarized angles of 0 0 , 45 0 , 90 0 , and 135 0 , respectively.
The degree of linear polarization (DoLP) is a measure of the proportion of the linear polarized light relative to the light's total intensity, and the angle of linear polarization (AoLP) is the orientation of the major axis of the polarization ellipse, which represents the polarizing angle where the intensity should be the strongest. They are derived from the Stokes vector according to Equations (7) and (8), respectively. To calculate the linear polarized image, firstly, the polarimetric components are concatenated, as shown in Equation (9). Then, a concatenated image is generated in the hue, saturation, value (HSV) color space and converted to the RGB color space, as in Equation (10), where LP stands for linear polarization.
For each sub-aperture index of DoLP and LP, we store individual band images in a separate column vector. Then, a similar procedure (unpolarized specular detection) is followed to calculate the maximum specularity in the LP and the DoLP 4D imagery. The average of three specularities (RX − NP, RX − LP, RX − DoLP) shows the overall predicted specularity in an object of MSPLFI, with a threshold (Otsu's method and in the range (0-1)) applied to obtain the SRD pixels in binary form. The complete process for detecting specular pixels in a transparent object is described in Algorithm 1.

Specular Reflection Inpainting (SRI)
In this research, the SRD pixels are suppressed through analyzing the distances among four connected neighboring pixels. Firstly, four different regions in an image are identified, as shown in Figure 7. Algorithm 1 predicts region A as an SRD pixel but, for better inpainting quality, both regions A and B are considered specular reflected pixels. It is to be noted that region B contains the pixel patterns (color channels) that are the immediate neighbors of region A. Then, all the connected regions are identified and labeled for the task of inpainting. The complete process for inpainting the detected specular pixels in transparent object is described in Algorithm 2.

Algorithm 2. SRI in Transparent Object
Input: MSPLFI Object Dataset, SRD-PM Output: SRD Pixel Inpainting in RGB 1: Strengthen SRD-PM (output from Algorithm 1) by labeling all neighboring pixels as SRD ones 2: Compute connected components and label them 3: Calculate baseline image per sub-aperture index by taking minimum pixel intensities of both polarized and unpolarized images in RGB channels 4: for all common sub-aperture images do 5: for all labels do 6: for all pixel patterns (P (x,y,c | i) ) in SRD-PM do 7: if labels (SRD-PMs) exist then 8: Compute distances (d (j,k | x, y ) ) among 4-connected neighbors not in SRD-PM in each channel, as in Equation (11), and store them in 2D-matrix (dM (nrow,ncol) ), as in Equation (12)  9: Winning pixel pattern is index (IDX) of pixel pattern corresponding to column-wise minimum sum of dM (nrow,ncol) , as in Equations (13) and (14) for inpainting of specular reflections 10: end if 11: end for 12: end for 13: end for 14: repeat steps 4 to 13 to calculate maximum specular reflection in suppressed image of transparent object from already suppressed sub-apertures better inpainting quality, both regions A and B are considered specula is to be noted that region B contains the pixel patterns (color channels) diate neighbors of region A. Then, all the connected regions are identif the task of inpainting. The complete process for inpainting the detected transparent object is described in Algorithm 2.

Algorithm 2. SRI in Transparent Object
Input: MSPLFI Object Dataset, SRD-PM Output: SRD Pixel Inpainting in RGB 1: Strengthen SRD-PM (output from Algorithm 1) by labeling all neighboring pixels as SRD 2: Compute connected components and label them 3: Calculate baseline image per sub-aperture index by taking minimum pixel intensities of b and unpolarized images in RGB channels 4: for all common sub-aperture images do 5: for all labels do 6: for all pixel patterns ( ( , , | ) ) in SRD-PM do 7: if labels (SRD-PMs) exist then 8: Compute distances ( ( , | , ) ) among 4-connected neighbors not in SRD-PM nel, as in Equation (11), and store them in 2D-matrix ( ) ), as in Eq 9: Winning pixel pattern is index ( ) of pixel pattern corresponding to colum imum sum of ( , ) , as in Equations (13) and (14) for inpainting of sp tions 10: end if 11: end for 12: end for 13: end for 14: repeat steps 4 to 13 to calculate maximum specular reflection in suppressed image of tran from already suppressed sub-apertures A baseline image per sub-aperture index is computed by taking t intensities in both polarized and unpolarized RGB channels. The aim specular reflected areas in the image, with the distance between two p lated by A baseline image per sub-aperture index is computed by taking the minimum pixel intensities in both polarized and unpolarized RGB channels. The aim is to suppress the specular reflected areas in the image, with the distance between two pixel-patterns calculated by where P (x, y, c, j | i) and P (x, y, c, k | i) are the two four-connected neighbors of the pixel pattern (P (x, y, c | i) ) in sub-aperture index i and d (j,k | x,y) the distance between the two pixel patterns corresponding to P (x, y, c | i) in sub-aperture index i. A 2D matrix [55] of the distances among the pixel patterns is calculated by Equation (12). The pattern corresponding to the lowest column-wise sum of the distances is selected as the winning one (P (x, y, c, IDX| i) ) for the task of SRI in Equations (13) and (14).

Experimental Results
In this section, performance evaluations and comparisons of the proposed two-fold SRDI and other approaches using different metrics for specular pixel detection and inpainting are discussed. Additionally, analyses of their computational times are conducted.

Selection of Performance Evaluation Metric
Both SRD and SRI are evaluated by commonly used statistical evaluation metrics for quantifying detection accuracy and inpainting quality.

Selection of SRD Metric
The SRD method is evaluated at the pixel level of a binarized scene in which the pixels related to the specular and the diffuse reflections are white and black, respectively. Its performance can be divided into four pixel-wise classification results: true positive (T p ), which means a correctly detected diffuse pixel; false positive (F p ), that is, a specular reflected pixel incorrectly detected as a diffuse reflected one; true negative (T n ), which indicates a correctly detected pixel with specularity; and false negative (F n ), that is, a diffuse reflected pixel incorrectly detected as a specular reflected one. The binary classification metrics used are precision, recall or sensitivity, F1-score, specificity, geometric-mean (Gmean), and accuracy. Precision is the number of diffuse reflected pixels detected that are actually diffuse reflected ones, while recall is the number of diffuse reflected pixels detected from the actual diffuse reflected ones (recall and sensitivity are similar). The F1-score (a boundary F1 measure) is the harmonic mean of precision and recall values, which measures how closely the predicted boundary of an object matches its ground-truth and is an overall indicator of the performance of binary segmentation. Specificity (a T n fraction) is the proportion of actual negatives predicted as negatives, sensitivity (a T p fraction) the proportion of actual positives predicted as positives, G-mean the root of the product of specificity and sensitivity, and accuracy the proportion of true results obtained, either T n or T p . The mathematical evaluation measures of the aforementioned metrics are shown in Equations (15) to (20) [17,56].
Accuracy (AC) = T p + T n T p + F n + T n + F p ,

Selection of Inpainting Quality Metric
Currently, the quality of a fused image can be quantitively evaluated using the metrics [57] structural similarity index (SSIM), peak signal-to-noise ratio (PSNR), mean squared error (IMMSE), and mean absolute deviation (MAD). The SSIM is an assessment index of the image quality based on computations of luminance, contrast, and structural components of the reference and the reconstructed images, with the overall index a multiplicative combination of these three components. The PSNR block computes the PSNR between the reference and the suppressed images in decibels (dB), with higher values of SSIM and PSNR indicating better quality of the reconstructed or the suppressed image. The IMMSE computes the average squared error between the reference and the reconstructed images, while MAD indicates the sum of the absolute differences between the pixel values of these images divided by the total number of pixels, which is used to measure the standard error of the reconstructed image. Lower values of IMMSE and MAD indicate better quality of the reconstructed image. Considering two images (x and y), the aforementioned mathematical evaluation metrics are shown in Equations (21) to (24). where, l(x, y) = 2µ x µ y + C 1 where µ x , µ y , σ x , σ y and σ xy are local means, standard deviations, and cross-covariances of images x and y.
PSNR(x, y) = 10.log 10 MAX 2 where MAX denotes the range of the image (x or y) datatype

Generation of Ground Truth
To evaluate the performance of the proposed two-fold SRDI, we generate two different ground truths for each object, as shown in Figure 8. The SRD and the SRI ones are created manually by an expert, with the maximum possible specular reflected area in the MSPLFI object dataset covered. Figure 8 shows the two-way SRD ground truth generation, where a pixel with an intensity above a threshold (Otsu's method and in the range (0-1)) level is considered a specular reflected pixel. The final column in Figure 13 presents the objects' SRD binary ground truths, with black and white pixels indicating their diffuse and specular reflected pixels, respectively. The final column in Figure 18 shows the objects' SRI ground truths. Due to the real scene in the MSPLFI object dataset, some pixels in an object may exhibit amounts of both specular and diffuse reflections but, to measure the performance in terms of quantity and enable further comparisons, each pixel is classified manually as either specular or diffuse reflected, and the ground truth is re-named as the quasi-ground truth.  Figure 9 shows the SRD rates in terms of the SRD metrics of precision, recall, F1score, G-mean, and accuracy for nine sample objects both separately ( Figure 9) and together for all objects (O#1-O#18) ( Figure 10) using the proposed method. For each object, a total of 121 sub-aperture images are used to measure its specularity and box plots to statistically analyze our experiments. Figure 9 exhibits the SRD metric values obtained for nine sample objects separately. Remaining objects are presented in Appendix Section (Figure A1). Accuracy has a higher median value than the F1-score and the G-mean for all the objects, with O#9 and O#3 having superior median values of 0.804, 0.832, and 0.996, and 0.874, 0.882, and 0.991 for F1-score, G-mean, and accuracy, respectively, compared with those of the other objects.  Figure 9 shows the SRD rates in terms of the SRD metrics of precision, recall, F1-score, G-mean, and accuracy for nine sample objects both separately ( Figure 9) and together for all objects (O#1-O#18) ( Figure 10) using the proposed method. For each object, a total of 121 sub-aperture images are used to measure its specularity and box plots to statistically analyze our experiments. Figure 9 exhibits the SRD metric values obtained for nine sample objects separately. Remaining objects are presented in Appendix A ( Figure A1). Accuracy has a higher median value than the F1-score and the G-mean for all the objects, with O#9 and O#3 having superior median values of 0.804, 0.832, and 0.996, and 0.874, 0.882, and 0.991 for F1-score, G-mean, and accuracy, respectively, compared with those of the other objects.

Analysis of SRD Rate
Similarly, Figure 10 shows the combined SRD rates for 121 sub-aperture + 1 maximum images × 18 objects = 2196 images. Accuracy has a better overall median and 75th percentile values for all the objects combined (0.981 and 0.992, respectively) compared to the F1-score (0.643 and 0.770, respectively) and the G-mean (0.656 and 0.752, respectively). a total of 121 sub-aperture images are used to measure its specularity and box plots to statistically analyze our experiments. Figure 9 exhibits the SRD metric values obtained for nine sample objects separately. Remaining objects are presented in Appendix Section (Figure A1). Accuracy has a higher median value than the F1-score and the G-mean for all the objects, with O#9 and O#3 having superior median values of 0.804, 0.832, and 0.996, and 0.874, 0.882, and 0.991 for F1-score, G-mean, and accuracy, respectively, compared with those of the other objects. Similarly, Figure 10 shows the combined SRD rates for 121 sub-aperture + 1 maximum images × 18 objects = 2196 images. Accuracy has a better overall median and 75th percentile values for all the objects combined (0.981 and 0.992, respectively) compared to the F1-score (0.643 and 0.770, respectively) and the G-mean (0.656 and 0.752, respectively).

Comparison of SRD Rates of Proposed Method and Those in Literature
It is worth mentioning that the performances of the existing SRD methods con are not exactly comparable, as each reports its accuracy for a specific image set u ferent contexts. Moreover, the accuracy values obtained from them and the color-m techniques used for segmentation may vary.
In Table 1, the performances of SRD in terms of different evaluation metric proposed and other methods are compared for the 18 individual objects. For visua

Comparison of SRD Rates of Proposed Method and Those in Literature
It is worth mentioning that the performances of the existing SRD methods considered are not exactly comparable, as each reports its accuracy for a specific image set using different contexts. Moreover, the accuracy values obtained from them and the colormapping techniques used for segmentation may vary.
In Table 1, the performances of SRD in terms of different evaluation metrics for the proposed and other methods are compared for the 18 individual objects. For visualization purposes, short forms of the authors' names are written in the first column, that is, Ak., Sn., Yn., Ym., Ar., St., and Ms. refer to Akashi, Shen, Yang, Yamamoto, Arnold, Saint, and Meslouhi, respectively. The SRD metric values in the object index columns correspond to the maiden specular image among the sub-aperture ones. The final column (overall mean ?SA)) corresponds to the mean ± SD values of the 121 sub-aperture + 1 maximum images × 18 objects = 2196 images together.  As can be seen, the overall mean SRD different metric values are higher for the proposed method than the studies discussed in this paper, as shown in the final column in Table 1. Additionally, considering all the sub-aperture images of the 18 distinct objects, mean F1-score, G-mean, and accuracy values for the proposed method are 0.546 ± 0.13, 0.654 ± 0.11 and 0.974 ± 0.01, respectively. In Figure 11, the SRD metric values for the 18 individual objects (O#1-O#18) and their maximum specular reflections obtained from different methods are compared. As can be seen, the proposed method achieves superior median values for the F1-score, G-mean and accuracy of 0.662, 0.816 and 0.971, respectively.
As can be seen, the overall mean SRD different metric values are higher for the proposed method than the studies discussed in this paper, as shown in the final column in Table 1. Additionally, considering all the sub-aperture images of the 18 distinct objects, mean F1-score, G-mean, and accuracy values for the proposed method are 0.546 ± 0.13, 0.654 ± 0.11 and 0.974 ± 0.01, respectively. In Figure 11, the SRD metric values for the 18 individual objects (O#1-O#18) and their maximum specular reflections obtained from dif-ferent m Figure 11. Evaluation results for SRD performances of different methods for maximum specular reflected images of 18 objects in terms of precision, recall, F1-score, G-mean and accuracy.
In Figure 12, the SRD metric values for 121 sub-aperture + 1 maximum images × 18 objects = 2196 images with their specular reflections obtained by different methods are presented. As can be seen, the proposed method has superior median values for F1-score, G-mean, and accuracy of 0.643, 0.676, and 0.981, respectively, to those of the others. In Figure 12, the SRD metric values for 121 sub-aperture + 1 maximum images × 18 objects = 2196 images with their specular reflections obtained by different methods are presented. As can be seen, the proposed method has superior median values for F1-score, G-mean, and accuracy of 0.643, 0.676, and 0.981, respectively, to those of the others. Evaluation results for SRD performances of different methods for 121 sub-aperture + 1 maximum images × 18 objects = 2196 images with specular reflections in terms of precision, recall, F1-score, G-mean, and accuracy.

Visualization of SRD Rates of Different Methods
In Figure 13, the SRD accuracies obtained by different methods for the maximum specular reflected images of sample objects in the MSPLFI object dataset are presented. As can be seen, the proposed approach reports fewer SRD errors than the others. Remaining objects are presented in Appendix Section ( Figure A2). Evaluation results for SRD performances of different methods for 121 sub-aperture + 1 maximum images × 18 objects = 2196 images with specular reflections in terms of precision, recall, F1-score, G-mean, and accuracy.

Visualization of SRD Rates of Different Methods
In Figure 13, the SRD accuracies obtained by different methods for the maximum specular reflected images of sample objects in the MSPLFI object dataset are presented. As can be seen, the proposed approach reports fewer SRD errors than the others. Remaining objects are presented in Appendix A ( Figure A2).  Figure 13. Comparison of SRD accuracies of different methods for sample objects in MSPLFI dataset.

Analysis of SRI Quality
The SRI qualities in terms of the normalized SRI metrics SSIM, PSNR, IMMSE, and MAD for the nine sample objects using the proposed method are presented separately in Figure 14 and then together for all objects (O#1-O#18) in Figure 15. For each object, a total of 121 sub-aperture + 1 maximum images are considered to measure its SRI and box plots used to statistically analyze our experiments. It is to be noted that a suppressed image

Analysis of SRI Quality
The SRI qualities in terms of the normalized SRI metrics SSIM, PSNR, IMMSE, and MAD for the nine sample objects using the proposed method are presented separately in Figure 14 and then together for all objects (O#1-O#18) in Figure 15. For each object, a total of 121 sub-aperture + 1 maximum images are considered to measure its SRI and box plots used to statistically analyze our experiments. It is to be noted that a suppressed image with high SSIM and PSNR values and low IMMSE and MAD ones is close to the quasi-ground truth. Figure 14 shows that the SSIM has a higher median value than the PSNR but the IMMSE a lower one than the MAD for all the objects while object O#1 has superior median values of 0.966, 0.820, 0.038, and 0.131 for SSIM, PSNR, IMMSE, and MAD, respectively, to those of the other objects. Remaining objects are presented in Appendix B ( Figure A3). Similarly, Figure 15 shows the normalized SRI qualities of (121 Sub-aperture + 1 maximum) × 18 Objects = 2196 images together. The SSIM has better overall median and 75 th percentile values for all the objects combined (0.966 and 0.980, respectively) than the PSNR (0.735 and 0.778, respectively) and the IMMSE better overall median and 75th percentile values for all the objects (0.073 and 0.118, respectively) than the MAD (0.226 and 0.273, respectively). with high SSIM and PSNR values and low IMMSE and MAD ones is close to the quasiground truth. Figure 14 shows that the SSIM has a higher median value than the PSNR but the IMMSE a lower one than the MAD for all the objects while object O#1 has superior median values of 0.966, 0.820, 0.038, and 0.131 for SSIM, PSNR, IMMSE, and MAD, respectively, to those of the other objects. Remaining objects are presented in Appendix Section ( Figure A3). Similarly, Figure 15 shows the normalized SRI qualities of (121 Sub-aperture + 1 maximum) × 18 Objects = 2196 images together. The SSIM has better overall median and 75 th percentile values for all the objects combined (0.966 and 0.980, respectively) than the PSNR (0.735 and 0.778, respectively) and the IMMSE better overall median and 75th percentile values for all the objects (0.073 and 0.118, respectively) than the MAD (0.226 and 0.273, respectively).   with high SSIM and PSNR values and low IMMSE and MAD ones is close to the quasiground truth. Figure 14 shows that the SSIM has a higher median value than the PSNR but the IMMSE a lower one than the MAD for all the objects while object O#1 has superior median values of 0.966, 0.820, 0.038, and 0.131 for SSIM, PSNR, IMMSE, and MAD, respectively, to those of the other objects. Remaining objects are presented in Appendix Section ( Figure A3). Similarly, Figure 15 shows the normalized SRI qualities of (121 Sub-aperture + 1 maximum) × 18 Objects = 2196 images together. The SSIM has better overall median and 75 th percentile values for all the objects combined (0.966 and 0.980, respectively) than the PSNR (0.735 and 0.778, respectively) and the IMMSE better overall median and 75th percentile values for all the objects (0.073 and 0.118, respectively) than the MAD (0.226 and 0.273, respectively).

Comparison of SRI Rates of Proposed Method and Those in Literature
It is worth mentioning that the performances of the existing SRI methods are not exactly comparable, as each reports its accuracy for a specific image set in a different context. Additionally, the quality obtained by the methods and the color-mapping techniques used for inpainting may vary.
In Table 2, the performances of SRI in the proposed and other methods for the 18 individual objects are compared using different evaluation metrics. For visualization, short forms of the authors' names written in the first column as Ar., Yg., Cr., St., Ak., Sn., and Ym. refer to Arnold, Yang, Criminisi, Saint, Akashi, Shen, and Yamamoto, respectively. The SRI metric values in the object index columns correspond to the maiden image of the 121 sub-aperture specular reflected suppressed ones. The final column (overall mean (SA)) corresponds to the mean ± SD values of the 121 sub-aperture + 1 maximum images × 18 objects = 2196 images together. As can be seen, the SRI metric values are significantly better for the proposed method than for the others considered, as shown in the final column in Table 2. For all the sub-aperture images of the 18 distinct objects, the mean SSIM, PSNR, IMMSE, and MAD values obtained from the proposed method are 0.956 ± 0.02, 24.51 ± 2.11, 257.6 ± 119, and 8.427 ± 2.51, respectively. SSIM: structural similarity index; PSNR: peak signal-to-noise ratio; IMMSE: mean squared error; MAD: mean absolute deviation.
In Figure 16, comparisons of the SRI metric values of individual methods in terms of SSIM, PSNR, IMMSE, and MAD of 18 individual objects (O#1-O#18) with their maiden specular inpainting is presented. It can be seen that the proposed method has superior median values for SSIM and PSNR of 0.985 and 0.754 and the lowest median values for IMMSE and MAD of 0.063 and 0.217, respectively. In Figure 16, comparisons of the SRI metric values of individual methods in terms of SSIM, PSNR, IMMSE, and MAD of 18 individual objects (O#1-O#18) with their maiden specular inpainting is presented. It can be seen that the proposed method has superior median values for SSIM and PSNR of 0.985 and 0.754 and the lowest median values for IMMSE and MAD of 0.063 and 0.217, respectively.      Figure 18 presents the SRI qualities obtained by different methods for the maiden specular reflected images of sample scenes in the MSPLFI object dataset. Remaining objects are presented in Appendix Section ( Figure A4). As can be seen, the proposed approach demonstrates better SRI quality than the others.

Arnold [41]
Yang [ Figure 18 presents the SRI qualities obtained by different methods for the maiden specular reflected images of sample scenes in the MSPLFI object dataset. Remaining objects are presented in Appendix B ( Figure A4). As can be seen, the proposed approach demonstrates better SRI quality than the others.  Figure 18 presents the SRI qualities obtained by different methods for the maiden specular reflected images of sample scenes in the MSPLFI object dataset. Remaining objects are presented in Appendix Section ( Figure A4). As can be seen, the proposed approach demonstrates better SRI quality than the others.

Arnold [41]
Yang [ Figure 18. Comparison of SRI accuracies of different methods for sample objects in MSPLFI dataset.

Conclusions
In this paper, a two-fold SRDI framework is proposed. As transparent objects lack their own textures, combining multisensory imagery cues improves their levels of specular detection and inpainting. Based on the private MSPLFI object dataset, the proposed SRD and SRI algorithms demonstrate better detection accuracy and suppression quality, respectively, than other techniques. In SRD, predictions of multiband changes in the subapertures in both polarized and unpolarized images are calculated and combined to obtain the overall specularity in transparent objects. In SRI, firstly, a distance matrix based on four-connected neighboring pixel patterns is calculated, and then the most similar one is selected to replace the specular pixel. The proposed algorithms predict better detection accuracy and inpainting quality in terms of F1-score, G-mean, accuracy, SSIM, PSNR, IMMSE, and MAD than other techniques reported in this paper. The experimental results illustrate the validity and the efficiency of the proposed method based on diverse performance evaluation metrics. They also demonstrate that it significantly improves the SRD metrics (with mean F1-score, G-mean, and accuracy 0.643, 0.656, and 0.981, respectively) and SRI ones (with the mean SSIM, PSNR, IMMSE, and MAD 0.966, 0.735, 0.073, and 0.226, Figure 18. Comparison of SRI accuracies of different methods for sample objects in MSPLFI dataset.

Conclusions
In this paper, a two-fold SRDI framework is proposed. As transparent objects lack their own textures, combining multisensory imagery cues improves their levels of specular detection and inpainting. Based on the private MSPLFI object dataset, the proposed SRD and SRI algorithms demonstrate better detection accuracy and suppression quality, respectively, than other techniques. In SRD, predictions of multiband changes in the subapertures in both polarized and unpolarized images are calculated and combined to obtain the overall specularity in transparent objects. In SRI, firstly, a distance matrix based on fourconnected neighboring pixel patterns is calculated, and then the most similar one is selected to replace the specular pixel. The proposed algorithms predict better detection accuracy and inpainting quality in terms of F1-score, G-mean, accuracy, SSIM, PSNR, IMMSE, and MAD than other techniques reported in this paper. The experimental results illustrate the validity and the efficiency of the proposed method based on diverse performance evaluation metrics. They also demonstrate that it significantly improves the SRD metrics (with mean F1-score, G-mean, and accuracy 0.643, 0.656, and 0.981, respectively) and SRI ones (with the mean SSIM, PSNR, IMMSE, and MAD 0.966, 0.735, 0.073, and 0.226, respectively) for 18 transparent objects, each with 121 sub-apertures, in MSPLFI compared with those in the existing literature referenced in this paper.
As an extension of this work, we will investigate a machine learning technique for feature extraction and learning and testing of SRD and SRI performances on the MSPLFI object dataset. As it is known that a transparent object contains the same texture as its background, developing an automatic algorithm for segmenting it from its background in multisensory imagery will also be explored.  Evaluation results for SRD performances of proposed method for 122 specular reflected images (121 sub-apertures + 1 maximum) of 9 sample objects separately using different SRD metrics. Evaluation results for SRD performances of proposed method for 122 specular reflected images (121 sub-apertures + 1 maximum) of 9 sample objects separately using different SRD metrics.  Figure A2. Comparison of SRD accuracies of different methods for sample objects in MSPLFI dataset.

Appendix B
Visualizations of SRI Methods.  Figure A2. Comparison of SRD accuracies of different methods for sample objects in MSPLFI dataset.

Appendix B
Visualizations of SRI Methods.  Figure A2. Comparison of SRD accuracies of different methods for sample objects in MSPLFI dataset.

Appendix B
Visualizations of SRI Methods. Evaluation results for SRI performances of proposed method for 122 specular reflection suppressed images (121 sub-aperture + 1 maximum ones) of 9 sample objects separately using different SRI metrics. Evaluation results for SRI performances of proposed method for 122 specular reflection suppressed images (121 sub-aperture + 1 maximum ones) of 9 sample objects separately using different SRI metrics.
Remote Sens. 2021, 13, x FOR PEER REVIEW 28 of 30 Figure A4. Evaluation results for SRI performances of proposed method for 122 specular reflection suppressed images (121 sub-aperture + 1 maximum ones) of nine sample objects separately using different SRI metrics. Figure A4. Evaluation results for SRI performances of proposed method for 122 specular reflection suppressed images (121 sub-aperture + 1 maximum ones) of nine sample objects separately using different SRI metrics.