Enhanced Defect Detection in Additive Manufacturing via Virtual Polarization Filtering and Deep Learning Optimization

Su, Xu; Peng, Xing; Zhou, Xingyu; Cao, Hongbing; Shan, Chong; Li, Shiqing; Qiao, Shuo; Shi, Feng

doi:10.3390/photonics12060599

Open AccessArticle

Enhanced Defect Detection in Additive Manufacturing via Virtual Polarization Filtering and Deep Learning Optimization

by

Xu Su

¹,

Xing Peng

^1,2,*

,

Xingyu Zhou

¹,

Hongbing Cao

^1,2,

Chong Shan

³,

Shiqing Li

⁴,

Shuo Qiao

^1,2 and

Feng Shi

^1,2

¹

College of Intelligent Science and Technology, National University of Defense Technology, Changsha 410073, China

²

National Key Laboratory of Equipment State Sensing and Smart Support, Changsha 410073, China

³

State Key Laboratory of Functional Crystals and Devices, Shanghai Institute of Ceramics, Chinese Academy of Sciences, Shanghai 201899, China

⁴

College of Physics, Zhejiang University of Technology, Hangzhou 310023, China

^*

Author to whom correspondence should be addressed.

Photonics 2025, 12(6), 599; https://doi.org/10.3390/photonics12060599

Submission received: 22 May 2025 / Revised: 4 June 2025 / Accepted: 9 June 2025 / Published: 11 June 2025

(This article belongs to the Special Issue Advances in Micro-Nano Optical Manufacturing)

Download

Browse Figures

Versions Notes

Abstract

Additive manufacturing (AM) is widely used in industries such as aerospace, medical, and automotive. Within this domain, defect detection technology has emerged as a critical area of research focus in the quality inspection phase of AM. The main challenge lies in that under extreme lighting conditions, strong reflected light obscures defect feature information, leading to a significant decrease in the defect detection rate. This paper introduces a novel methodology for intelligent defect detection in AM components with reflective surfaces, leveraging virtual polarization filtering (IEVPF) and an improved YOLO V5-W model. The IEVPF algorithm is designed to enhance image quality through the virtual manipulation of light polarization, thereby improving defect visibility. The YOLO V5-W model, integrated with CBAM attention, DenseNet connections, and an EIoU loss function, demonstrates superior performance in defect identification across various lighting conditions. Experiments show a 40.3% reduction in loss, a 10.8% improvement in precision, a 10.3% improvement in recall, and a 13.7% improvement in mAP compared to the original YOLO V5 model. Our findings highlight the potential of combining virtual polarization filtering with advanced deep learning models for enhanced AM surface defect detection.

Keywords:

polarimetric imaging; defect detection; additive manufacturing; deep learning

1. Introduction

AM technology, also known as “3D printing” technology, originated in the 1960s. This technology, known for its layer-by-layer fabrication process, offers a high level of design freedom and cost efficiency [1,2]. Consequently, it provides a strong competitive edge in various industries. Following decades of development, AM technology has become an integral component in a variety of industries, including aerospace, medical, and automotive [3,4,5]. During the AM process, high-energy lasers are employed to melt powder layers, thereby giving rise to the formation of high-temperature melt pools. Upon cooling, these pools adopt the desired shape [6,7]. Different processing conditions directly affect the morphology of melt pools [7,8]. Improper processing parameter settings have been demonstrated to be a factor in the instabilities of melt pools, thereby inducing defects such as bumps, holes, and cracks [9,10,11]. These defects reduce the mechanical properties and forming density of the workpiece, causing the physical performance to deteriorate and severely affects the actual performance and service life of the workpiece [7,12]. Consequently, the effective detection of defective components and their subsequent prevention from deployment is of paramount importance.

Traditional non-destructive defect detection methodologies include radiographic testing, ultrasonic testing, liquid penetrant testing, magnetic particle testing, and eddy current testing, etc. [13,14,15,16]. However, such methodologies are often characterized by high costs, operational complexity, and slow detection rates, or they exhibit specific limitations on the categories of materials to be tested, resulting in a limitation of application [13,14,15,16]. In view of the expanding and diverse market demands, these traditional methods may not consistently align with production requirements. In recent years, deep neural network technology has developed rapidly and achieved significant progress in the field of defect detection [17,18,19,20]. In 1998, LeCun et al. proposed LeNet, and effectively addressed the task of handwritten digit recognition, marking the true advent of Convolutional Neural Networks (CNNs) and propelling object detection algorithms into the deep learning era [21]. In 2013, Ross Girshick et al. proposed R-CNN, applying CNNs to feature extraction, which leveraged the excellent feature extraction capability of CNNs to improve the detection rate of datasets [22]. In 2015, Kaiming He et al. proposed ResNet, breaking through the limitations of neural network depth and further enhancing detection precision [23]. In the same year, Joseph Redmon et al. proposed YOLO, a method that simultaneously predicts the positions and categories of targets in an image through a unified neural network, achieving fast and efficient object detection [24]. In the same year, Shaoqing Ren et al. proposed Faster R-CNN, an algorithm that improved the speed of training and testing stages while also enhancing detection precision [25]. In 2016, Gao Huang et al. proposed DenseNet, establishing connections between different layers, thereby further alleviating the problem of gradient disappearance and improving the precision of target detection [26]. The advent of these neural network-based studies has precipitated a paradigm shift in the realm of defect detection technology. They have not only improved the precision and processing speed of models, but also enabled detection systems to more effectively identify and classify various surface defects. These studies have not only enhanced the performance and reliability of surface defect detection but also broadened their practical applications domains.

Despite the existence of numerous superior methods for surface defect detection, their performance is often constrained by the specifics of various application scenarios, thereby preventing them from realizing their full potential [27,28,29]. For instance, the inspection model remains highly susceptible to specular reflection interference under complex illumination conditions, leading to defect features being overwhelmed by noise. Additionally, it exhibits an inadequate detection accuracy for micro-defects (e.g., cracks smaller than 0.1 mm) and demonstrates a notable generalization bottleneck in scenarios involving multi-material hybrid surfaces. This necessitates the replacement of detection models to accommodate variations in lighting across different manufacturing settings, thereby increasing the complexity and cost of detection. In order to address these issues, this research proposes a surface defect detection method for workpieces that demonstrate an excellent robustness under complex lighting conditions and hold a broad application potential. This approach is conducive to reducing the cost of workpiece processing and advancing research in related fields.

The defect detection method proposed in this research encompasses two principal components. Firstly, we developed the IEVPF algorithm, a physical model-based polarization light suppression method. Taking three or more linearly polarized images as input data, the algorithm calculates the incident light intensity and polarization angle parameters for each pixel, dynamically superimposes virtual polarizers on each pixel, realizes an optimized suppression of polarized light, and outputs a filtered image. Relying on the underlying design of the physical model, this technology has the advantages of producing natural and realistic images, having a strong interpretability of the physical mechanism, a high computational efficiency, and convenient engineering deployment. Secondly, the enhanced images are then utilized to train an improved YOLO V5-W object detection algorithm, culminating in a unified and broadly applicable detection model. To validate the effectiveness of this approach, a series of experiments was designed. Specifically, both YOLO V5 and YOLO V5-W models were trained on datasets from four polarization directions and an enhanced dataset, thus generating a total of ten experimental datasets. Systematic comparative evaluations were conducted on each model’s loss, precision, recall, and mean Average Precision (mAP) values. Experiments show a 40.3% reduction in loss, a 10.8% improvement in precision, a 10.3% improvement in recall, and a 13.7% improvement in mAP compared to the original YOLO V5 model. The results of these experiments robustly substantiate that virtual polarization filtering technology and the improved YOLO V5-W model are both effective in enhancing the model’s defect detection capabilities.

The structure of this paper is as follows: Section 2 introduces the architecture of the Multi-Source Polarized Imaging System (MPIS), the procedural flow of the IEVPF algorithm and the implementation methods of the defect detection experiments. Section 3 validates the feasibility of virtual polarization filtering based on Maxwell’s electromagnetic field theory and proposes the principle of the virtual polarization filtering algorithm based on the Stokes vector. Section 4 refines the YOLO V5 model, integrating the CBAM attention mechanism, DenseNet dense connections, Carafe upsampling operator, and EIoU loss function, forming the YOLO V5-W model [26,30,31,32]. Section 5 details the virtual polarization filtering image enhancement and surface defect detection experiments, and analyzes the advantages of the proposed algorithmic model based on experimental data. Section 6 summarizes the entire paper and provides a prospective on future research directions.

2. Principle

2.1. Multi-Source Polarized Imaging System

Polarized imaging technology, compared to traditional optical detection methods, possesses distinctive advantages as it can capture the spectral, polarization, and spatial information of the target, and is extensively applied in industrial imaging, remote sensing, biomedical diagnostics, and military applications [33,34,35,36]. Defect detection based on polarization technology is beneficial for extracting texture structure, surface material, and surface roughness information from the polarization information of the target, effectively enhancing the precision and reliability of detection [37]. Thereby, this research constructed a MPIS based on polarization technology to collect image data, as shown in Figure 1. The system mainly includes the workpiece under test, LED light source, linear polarizer, electric rotating frame, CMOS camera, and computer. The linear polarizer used in the experiment is the THORLABS LPVISE100-A linear polarizer, with a size of 25.4 mm and a working wavelength range of 400–700 nm; the CMOS image sensor has a resolution of 2448 × 2048, a pixel size of 3.45 μm, and a lens focal length of 50 mm; the electric rotating frame is the Standa FPSTA-8MPR16-1, capable of a 360° rotation angle and 0.75 arcmin step resolution control of the polarizer rotation, equipped with an 8SMC4-USB controller. The computer used for image acquisition and analysis in the experiment is a ThinkPad S2, with an i5 8250U CPU, with a maximum frequency of 1.8 GHz. During the experiment, the workpiece under test is illuminated by the LED light source, and the light reflected from the workpiece’s surface is collected by the CMOS image sensor after traversing the polarizer.

2.2. Image Enhancement Based on Virtual Polarization Filtering Algorithm

The algorithm flow of IEVPF follows, including six steps: data acquisition, image sampling, determining input, solving, and outputting results, as shown in Figure 2.

Specifically, the detailed methods of each step are as follows:

Step 1: (Data Acquisition) MPIS captures images of a defective object, obtaining four polarized images from different directions of 0°, 90°, 45°, and 135° under the same ambient lighting conditions (see Figure 2A).

Step 2: (Image Sampling) The computer receives the input of the four polarized images and uses the OpenCV library for sampling to obtain the corresponding grayscale value matrix (see Figure 2B).

Step 3: (Determine Input) Select the pixel to be processed, and from the four grayscale value matrices, select the three smallest values to form the input of the image enhancement algorithm (see Figure 2C). The rationale for this design is as follows: Although the IEVPF algorithm designed in the subsequent section requires only three inputs, the polarization direction of natural light sources is unfixed, making it difficult to ensure that the images selected at three fixed polarization angles are all free from overexposure. In Step 3, selecting the three grayscale value matrices with the smallest values from the four as inputs can effectively avoid overexposure issues in the input images, thereby enhancing the algorithm’s robustness. Furthermore, the number of acquired images need not be limited to four; the greater the number of selectable images at different polarization angles, the stronger the algorithm’s robustness.

Step 4: (Solve

S_{N R}

) Based on the three-grayscale values input, determine the matrix W and solve

S_{N R}

(see Figure 2D).

Step 5: (Solve

A_{D U}^{N}

) According to the algorithm principle, calculate the new grayscale value of the pixel

A_{D U}^{N}

(see Figure 2E).

Step 6: (Output the result) If all pixels have been processed, then output the image; otherwise, return to step 3 (see Figure 2F).

2.3. Detection Process

Figure 3 demonstrates the implementation method of the YOLO V5-W proposed in this research, covering three main stages: dataset generation, model training, and performance evaluation.

Specifically, the detailed methods of each step are as follows:

Phase 1: (Dataset Generation) This study first identified three types of defects: cracks, bulges, and holes. Images were captured at four polarization directions (0°, 90°, 45°, and 135°) by adjusting the polarizer via a PC under a constant external lighting environment(see Figure 4). Enhanced images were then generated using the IEVPF algorithm based on these four images. The above procedures were repeated to obtain a large dataset, which was subsequently labeled for defects. The dataset includes 500 images for each defect type, totaling 1500 images. The dataset was divided into training, validation, and test sets at an 8:1:1 ratio. Crack defects in the dataset exhibit lengths ranging from 0.1 to 5 mm and widths from 0.05 to 0.3 mm, presenting irregular linear morphologies. Some cracks extend along material textures with branching phenomena. Bulge defects have heights between 0.2 and 3 mm and base diameters from 0.5 to 4 mm, taking shapes of hemispheres, cones, or irregular blocks, with subtle wrinkles on some surfaces. Hole defects have diameters from 0.5 to 4 mm, with cross-sections mostly circular or elliptical, and some edges showing burrs or chipping traces. All defects are randomly distributed on the sample material surface. The light intensity fluctuates between 200 and 1000 lux, and the ambient light incident angle varies from 0° to 60° to simulate defect imaging under extremely complex lighting conditions.

Phase 2: (Model Training) Firstly, mosaic data augmentation technology is applied to fuse the training images to enhance the robustness of the model [38]. Subsequently, to address the surface defect detection problem, the YOLO V5-W model is designed, integrating the CBAM attention mechanism, DenseNet dense connection, Carafe upsampling operator, and EIoU loss function. The technical details will be discussed in detail in Section 4. Ultimately, systematically optimize the hyperparameters of the model, including the learning rate, batch size, and number of iterations, to enhance the overall performance of the model and ensure its detection precision and stability in various environments.

Phase 3: (Performance Evaluation) Firstly, an index system is established for evaluating model performance, including key parameters such as precision, recall, and mAP. Then, systematically evaluate and compare the performance of 10 models to thoroughly analyze their performance. Finally, the trained model is deployed in actual defect detection tasks for conclusive evaluation, thereby verifying its effectiveness and stability in practical applications.

3. The IEVPF Methodology

3.1. Feasibility Proof of IEVPF

In the natural realm, light waves, as a type of electromagnetic wave, usually propagate in the form of planar sinusoidal waves. Accordingly, taking the direction of light wave propagation as the x-axis, the electric field vector

E (x, t)

of the light wave can be represented as follows:

E (x, t) = Re [E \cdot e^{j (ω t - k x + ψ)}] \cdot {\overset{\land}{e}}_{p}

(1)

where

k

is wave vector, which is defined as follows:

k = \frac{ω}{c} \cdot {\overset{\land}{e}}_{x}

(2)

E

is the electric field amplitude,

ω

is the angular frequency of the electric field,

t

is the time of electromagnetic field propagation,

c

is the speed of light,

{\overset{\land}{e}}_{x}

is the unit vector in the positive x-axis direction,

{\overset{\land}{e}}_{p}

is the unit vector of the direction of electric field vibration, that is, the polarization direction of light, x is the position vector from the light wave’s propagation, and

ψ

is the electric field’s initial phase.

Under the assumption that a single pixel is solely influenced by a small external point light source and unaffected by light reflections from other regions [39], it can be reasoned that the imaging light wave corresponding to a single pixel exhibits a unified expression.

Therefore, when a single pixel point is taken as the research object, its imaging light wave electric field

\overset{\land}{E}

of can be represented in a complex form:

\overset{\land}{E} = E \cdot e^{j (ω t - kx + ψ)} \cdot {\overset{\land}{e}}_{p}

(3)

The external environment of the electric field is idealized as a vacuum, and according to electromagnetic field theory, the electric field and its induced magnetic field

\overset{\land}{B}

satisfy Ampere–Maxwell’s Law:

\nabla \times \overset{\land}{B} = μ_{0} J + μ_{0} ε_{0} \frac{\partial \overset{\land}{E}}{\partial t}

(4)

where

μ_{0}

is the vacuum permeability,

ε_{0}

is the vacuum permittivity, and

J

is the current density. In a vacuum environment

J = 0

, taking the curl of both sides of Equation (4) gives the following:

\nabla^{2} \overset{\land}{B} = \frac{\partial^{2} \overset{\land}{B}}{\partial x^{2}} {\overset{\land}{e}}_{b} = \frac{d^{2} \overset{\land}{B}}{d x^{2}} {\overset{\land}{e}}_{b} = - ω μ_{0} ε_{0} k \times \overset{\land}{E} = - \frac{ω^{2}}{c^{3}} \overset{\land}{E} ({\overset{\land}{e}}_{x} \times {\overset{\land}{e}}_{p})

(5)

where

{\overset{\land}{e}}_{b}

is the magnetic field vibration direction. Solving the differential equation gives the following:

\overset{\land}{B} = \frac{\overset{\land}{E}}{c} ({\overset{\land}{e}}_{x} \times {\overset{\land}{e}}_{p})

(6)

Thus, the light wave energy flux density vector

S_{n}

can be obtained as follows:

S_{n} = \frac{1}{μ_{0}} Re [\overset{\land}{E}] {\overset{\land}{e}}_{p} \times Re [\overset{\land}{B}] ({\overset{\land}{e}}_{x} \times {\overset{\land}{e}}_{p}) = \frac{E^{2}}{μ_{0} c} \cos^{2} (ω t - k x + ψ) {\overset{\land}{e}}_{p}

(7)

Then, the light wave average power P can be obtained as follows:

P = \frac{1}{T} \int_{0}^{T} S_{n} d t = \frac{1}{2} \sqrt{\frac{ε_{0}}{μ_{0}}} E^{2}

(8)

where

T

is the light wave period.

Starting from the form of photons, the relationship between the grayscale value of the pixel points and the average power of the light wave is analyzed. In the MPIS imaging process, when the image is not overexposed, the image pixel point’s grayscale value

A_{DU}

and the function relationship of the electron number

N_{E}

can be represented as follows:

N_{E} = α \cdot (A_{DU} - A_{DU 0})

(9)

where α is the camera’s gain, and

A_{DU 0}

is the camera’s bias. The camera’s quantum efficiency

f (\cdot)

can effectively describe the relationship between the incident photons’ number and the excited electrons’ number

N_{E}

as follows:

N_{E} = f (λ) \cdot N_{L}

(10)

where

λ

is the light wavelength. Combining Equations (9) and (10) and substituting into equation

P \cdot Δ t = \frac{N_{L} h c}{λ}

, we get the following:

A_{DU} = \frac{λ P Δ t}{α hc} f (λ)

(11)

From Equation (12), it can be seen that the grayscale value of the image is a value of three independent variables: the light wavelength, the light average power, and the exposure time. Furthermore, Equation (7) reveals that, when the light energy flux density is obtained, the incident light polarization information

{\overset{\land}{e}}_{p}

has been eliminated. This implies that the incident light polarization state cannot be deduced even with the knowledge of the wavelength, exposure time, and incident light grayscale value. Additionally, regardless of what the incident light’s polarization state is, the grayscale value of the image remains invariant provided certain conditions are satisfied. This conclusion provides a theoretical basis for the subsequent algorithm IEVPF, which is capable of altering the virtual incident light’s polarization state without compromising the integrity of the image’s original information.

3.2. Principle of IEVPF Algorithm

To facilitate the description of the light polarization state, the Stokes vector

S = {(s_{0}, s_{1}, s_{2}, s_{3})}^{T}

is used to represent the light [37]. The

s_{0}

component represents light intensity, the

s_{1}

and

s_{2}

components represent linear polarization components, and the

s_{3}

component represents circular polarization components. When light passes through an optical system, its polarization state will change. The Mueller matrix of the system can describe this change [40]. Assuming the system’s Mueller matrix is

M_{i}

, then we have the following:

S_{i} = M_{i} \cdot S = [\begin{matrix} m_{i 11} & m_{i 12} & m_{i 13} & m_{i 14} \\ m_{i 21} & m_{i 22} & m_{i 23} & m_{i 24} \\ m_{i 31} & m_{i 32} & m_{i 33} & m_{i 34} \\ m_{i 41} & m_{i 42} & m_{i 43} & m_{i 44} \end{matrix}] [\begin{matrix} s_{0} \\ s_{1} \\ s_{2} \\ s_{3} \end{matrix}]

(12)

where

S_{i}

represents the Stokes vector after the light passes through the system i. Since the natural light circular polarization component

s_{3}

is usually 0, Equation (12) can be reduced to [41]:

S_{i} = M_{i} \cdot S = [\begin{matrix} m_{i 11} & m_{i 12} & m_{i 13} \\ m_{i 21} & m_{i 22} & m_{i 23} \\ m_{i 31} & m_{i 32} & m_{i 33} \end{matrix}] [\begin{matrix} s_{0} \\ s_{1} \\ s_{2} \end{matrix}]

(13)

Considering a single pixel as the research object, we obtain the following:

s_{i 0} = \frac{α hc {A_{DU}}_{i}}{λ_{i} f (λ_{i})} = m_{i 11} s_{0} + m_{i 12} s_{1} + m_{i 13} s_{2}

(14)

where

s_{i 0} = P_{i} \cdot Δ t_{i}

is the power received by the lens area corresponding to the pixel point. The powers received by the same region of three camera systems with different polarization directions are substituted into Equation (14) and written in matrix form as follows:

I = {[\begin{matrix} s_{10} & s_{20} & s_{30} \end{matrix}]}^{T} = [\begin{matrix} m_{111} s_{0} + m_{112} s_{1} + m_{113} s_{2} \\ m_{211} s_{0} + m_{212} s_{1} + m_{213} s_{2} \\ m_{311} s_{0} + m_{312} s_{1} + m_{313} s_{2} \end{matrix}] = [\begin{matrix} m_{111} & m_{112} & m_{113} \\ m_{211} & m_{212} & m_{213} \\ m_{311} & m_{312} & m_{313} \end{matrix}] [\begin{matrix} s_{0} \\ s_{1} \\ s_{2} \end{matrix}] = W \cdot S

(15)

Assuming that the external environmental changes are very little, it can be considered that

λ_{1} = λ_{2} = λ_{3} = λ

. Thus, based on Equation (11), we use the grayscale values of pixels to characterize matrix I:

I = {[\begin{matrix} s_{10} & s_{20} & s_{30} \end{matrix}]}^{T} = \frac{α hc}{λ f (λ)} [\begin{matrix} {A_{DU}}_{1} \\ {A_{DU}}_{2} \\ {A_{DU}}_{i 3} \end{matrix}]

(16)

when the matrix

W

is invertible, the Stokes vector of the incident light can be solved from the image’s grayscale, that is, as follows:

S (λ) = W^{- 1} \cdot I = \frac{α hc}{λ f (λ)} W^{- 1} [\begin{matrix} {A_{DU}}_{1} \\ {A_{DU}}_{2} \\ {A_{DU}}_{i 3} \end{matrix}]

(17)

Furthermore, assuming that the exposure time of these three systems is the same, we have

Δ t_{1} = Δ t_{2} = Δ t_{3} = Δ t

. Hence, the incident light’s energy flux density Stokes vector is defined as follows:

S_{p} (λ, Δ t) = \frac{S (λ)}{Δ t}

(18)

Based on the first three components of the Stokes vector, the polarization angle

A_{OP}

’s information of the incident light can be obtained, represented as follows:

A_{OP} = \frac{1}{2} \arctan \frac{s_{2}}{s_{1}}

(19)

For a single pixel point,

A_{OP}

is a quantity independent of

λ

and

Δ t

. Superimposing a virtual polarizer with an angle of

A_{OP} + \frac{π}{2}

on this area, polarized light can effectively be suppressed. The reduced Mueller matrix of this polarizer can be expressed as follows:

M_{A_{OP}} = \frac{1}{2} [\begin{matrix} 1 & - \cos (2 A_{OP}) & - \sin (2 A_{OP}) \\ - \cos (2 A_{OP}) & \cos^{2} (2 A_{OP}) & \frac{\sin (4 A_{OP})}{2} \\ - \sin (2 A_{OP}) & \frac{\sin (4 A_{OP})}{2} & \sin^{2} (2 A_{OP}) \end{matrix}]

(20)

The light’s energy flux density after passing through the polarizer is the following:

s_{p A_{OP} 0} = \frac{1}{2} [\begin{matrix} 1 & - \cos (2 A_{OP}) & - \sin (2 A_{OP}) \end{matrix}] S_{p} (λ, Δ t)

(21)

Since the light’s wavelength does not change after passing through the polarizer, after substituting into Equation (11), the pixel point’s grayscale

A_{DU}^{N}

after virtual polarization filtering is the following:

A_{DU}^{N} = \frac{λ s_{p A_{OP} 0} Δ t^{'}}{α hc} f (λ) = \frac{1}{2} \frac{Δ t^{'}}{Δ t} [\begin{matrix} 1 & - \cos (2 A_{OP}) & - \sin (2 A_{OP}) \end{matrix}] W^{- 1} [\begin{matrix} {A_{DU}}_{1} \\ {A_{DU}}_{2} \\ {A_{DU}}_{3} \end{matrix}]

(22)

where

Δ t^{'}

is the ideal exposure time set on the computer side. Changing its value, the image’s exposure will be changed. Hence,

A_{DU}^{N}

is a quantity independent of

λ

, so the algorithm can conduct virtual polarization filtering according to the input image and the set exposure time ratio to obtain the desired image, without the wavelength, exposure time, and other information of the incident light.

3.3. Optimization and Processing for Overexposure in the IEVPF Algorithm

3.3.1. IEVPF Algorithm Optimization

Based on the above principle, IEVPF performs polarization filtering on the image’s each point, which may suppress crucial information in the image, thereby reducing the precision of subsequent target detection. Hence, we can regard non-polarized light as information, while polarized light is regarded as noise. When the signal-to-noise ratio

S_{NR}

is greater than the threshold K, it can be considered that the information at that point is richer compared to the noise, and thus there is no need to utilize a virtual polarizer for filtering. Hence, the following processing is applied as follows:

A_{DU}^{N} = \{\begin{matrix} \frac{λ s_{p A_{OP} 0} Δ t^{'}}{α hc} f (λ) where S_{NR} \leq K \\ \frac{λ s_{p 0} Δ t^{'}}{α hc} f (λ) where S_{NR} > K \end{matrix}

(23)

S_{NR} = \frac{s_{0} - \sqrt{s_{1} + s_{2}}}{\sqrt{s_{1} + s_{2}}}

(24)

3.3.2. Processing for Overexposure

In the above algorithm, both the calculation of the Stokes vector and the updating of the pixel grayscale values are depended on Equation (12). However, this Equation is only valid when the input image is not overexposed. When the input image is overexposed, the algorithm may produce incorrect information, thereby reducing the imaging quality and affecting the precision of subsequent target detection. Since overexposure is inevitable under complex lighting conditions, the following method is adopted to address this issue.

Since the circular polarization light component is typically zero in nature, light waves can be regarded as a combination of unpolarized light and linearly polarized light. Its reduced Stokes vector relationship can be expressed as follows:

S = S_{u} + S_{p} = [\begin{matrix} s_{u 0} \\ 0 \\ 0 \end{matrix}] + [\begin{matrix} s_{p 0} \\ s_{p 1} \\ s_{p 2} \end{matrix}]

(25)

where

S_{u}

is the non-polarized light’s reduced Stokes vector, and

S_{p}

is the polarized light’s reduced Stokes vector. The polarization angle of its linear polarization part is denoted as

δ_{1}

, which can be obtained by Equation (19). Let the angle of the polarizer be

δ_{2}

. Then, the light intensity

s_{0}^{'}

passing through this polarizer is given by the following:

s_{0}^{'} = \frac{1}{2} s_{u 0} + s_{p 0} \cos^{2} (δ_{1} - δ_{2})

(26)

If the input image exhibits overexposure, it can be discussed under two scenarios:

(1): If $s_{u 0} ≫ s_{p 0}$ , then $s_{0}^{'} \approx \frac{1}{2} s_{u 0}$ . Thus, all input images must be overexposed, and the algorithm cannot solve any information from them, and the output grayscale value is 255.
(2): If $s_{u 0}$ is not much greater than $s_{p 0}$ , then there may exist non-overexposed images in the input images. If they exist, then the grayscale value $A_{DU}^{N}$ calculated by the original method must be greater than the product of the non-overexposed grayscale value $A_{DU i}$ and the exposure time ratio $\frac{Δ t^{'}}{Δ t}$ , and it does not play a filtering role. Consequently, let $A_{DU}^{N} \leftarrow A_{DU i} \frac{Δ t^{'}}{Δ t}$ to achieve overexposure processing.

For simplicity, set the processing threshold to the same threshold as Equation (23), and integrate the condition “if the input image has an overexposure phenomenon” into the update method. Obtain the update method of Equation (27).

A_{DU}^{N} \leftarrow \min \{A_{DU}^{N}, A_{DU 1} \frac{Δ t^{'}}{Δ t}, A_{DU 2} \frac{Δ t^{'}}{Δ t}, A_{DU 3} \frac{Δ t^{'}}{Δ t}\} where S_{NR} \leq K

(27)

Such processing can achieve overexposure processing.

4. The Improvement of the Automative Deep Learning-Based Defect Detection Algorithm

4.1. YOLO V5 Model

The YOLO (You Only Look Once) framework is a suite of high-performance, real-time object detection algorithms, renowned for its single-stage detector architecture [24,42,43,44,45,46,47]. This architecture facilitates efficient target detection without compromising detection precision and real-time inference capabilities. YOLO conceptualizes object detection as a regression problem, directly forecasting the bounding box coordinates and class probabilities within a singular neural network pass, thereby streamlining the object detection process. This innovative approach has consolidated YOLO’s status as a preferred choice among numerous researchers and practitioners in the field. The evolution of the YOLO algorithm has reached its tenth iteration, with YOLOv5 representing a significant advance on previous versions, which have led to substantial enhancements in detection precision [48,49]. The YOLOv5 model has demonstrated consistent superiority in performance across a range of application contexts, thereby highlighting its efficacy and reliability in detection tasks.

Motivated by these advancements, this research adopts YOLOv5 as the foundational detection model and proposes further enhancements to bolster the precision and robustness of defect detection in images augmented by the IEVPF algorithm. The objective of these improvements is to leverage the computational efficiency of YOLOv5, in conjunction with the image enhancement capabilities of the IEVPF algorithm, to optimize the overall performance of the defect detection system.

4.2. YOLO V5-W

To enhance the practical performance of YOLOv5 in surface defect detection tasks, this research has made four improvements to YOLOv5, forming an intelligent surface defect recognition model called YOLOv5-W. The specific improvements are outlined in four parts as follows:

Part 1: Replace some of the C3 (CSP Bottleneck with three convolutions) modules with DC3 (DenseNet-C3) modules to enhance the model’s ability to extract and fuse deep-layer features, taking advantage of the dense connection characteristics of DenseNet [26], effectively improving the efficiency of information flow and gradient propagation;

Part 2: Introduce the Convolutional Block Attention Module (CBAM) attention mechanism to enhance the expressiveness of the feature map and strengthen the model’s ability to detect small targets [30];

Part 3: Replace the upsampling operator with the Carafe upsampling operator to improve the detail and precision of the upsampling results, and improve the quality of detail recovery [31];

Part 4: Use the EIoU loss function instead of the CIoU loss function, improving the model’s convergence speed and optimization capability through the improved bounding box regression mechanism, while enhancing the detection performance of small targets [32]. The network structure of the improved model is depicted in Figure 5.

To accommodate the input dimensions of 640×640×3, this research duplicates the pixel values from the single-channel image across all three-color channels. The backbone network employs the CSPDarkNet53 architecture, which integrates CBS and C3 modules for the extraction and fusion of shallow features. These are then utilized by DC3 modules for the extraction of deep-layer features, which are subsequently integrated with the shallow-layer features. The CBAM and SPPF modules further enhance feature representation capabilities, optimize computational efficiency, and bolster the model’s robustness. In the neck network, a Feature Pyramid Network (PANet) structure is implemented, deepening the network architecture through top-down and bottom-up feature fusion mechanisms, integrating multi-scale features, and enhancing the semantic representation of defects across three distinct scales of feature maps. To address the subtle defect features in the images, this research introduces DC3 and CBAM modules into the shallow layer of the neck network. The implementation of these modules results in the extraction of more refined semantic features, an enhanced feature expressiveness, and an effective suppression of background noise. Ultimately, the head network extracts feature maps at three scales, 80 × 80 × 128, 40 × 40 × 256, and 20 × 20 × 512, from the neck network and generates the final output vector comprising category probabilities, confidence scores, and bounding boxes during the inference process. This approach significantly enhances the extraction of deep features without substantially increasing the model’s complexity and computational expense. Consequently, this has led to an effective improvement in the performance of surface defect detection.

4.2.1. Algorithm Optimization

The C3 module in the original YOLO V5 model is based on the ResNet architecture, and its feature information propagation expression is [23]:

x_{l} = H_{l} (x_{l - 1}) + x_{l - 1}

(28)

where

H_{l} (\cdot)

is the nonlinear transformation function, and

x_{l}

is the input of the l-th layer. The construction of the C3 module, as described, is depicted in Figure 6.

The C3 module bolsters the network’s feature extraction and fusion capabilities via residual connections, thereby augmenting the model’s accuracy and performance in object detection tasks. However, the inadequate depth of the C3 module and insufficient direct connections from intermediate layers to the output layer can lead to incomplete extraction of feature semantic information, as well as a potential loss of critical information from intermediate layers. Inspired by DenseNet, the DC3 module is proposed to address this challenge [26]. The feature information propagation mechanism of DenseNet establishes direct connections between the feature maps of all preceding layers, enabling each layer to directly access the features of all prior layers. This design enhances the continuity of feature transmission and significantly improves the network’s representational capacity. Specifically, the feature propagation formulation in DenseNet is as follows:

x_{l} = H_{l} (x_{0}, x_{1}, \dots, x_{l - 1})

(29)

The DC3 module constructed in this manner is illustrated in Figure 7.

The DC3 module adopts the dense connection architecture of DenseNet, establishing direct connections between the feature maps of each layer and all preceding layers. This design enables efficient feature reuse and effective gradient propagation. This design alleviates the vanishing gradient problem and enhances feature utilization efficiency, thereby further boosting the performance in object detection tasks.

4.2.2. Introduction of Attention Mechanism

In surface defect images, defect regions typically occupy a minimal number of pixels, classifying them as small-object detection cases. Such cases are highly susceptible to interference from the background and other irrelevant factors. To address the absence of an attention mechanism in the original network, this study introduces the CBAM attention mechanism at the Backbone and Neck stages, thereby enhancing the network’s capability to focus on critical targets [30]. Through the adaptive modulation of feature map weights via channel and spatial attention mechanisms, CBAM effectively enhances the network’s capability to detect small targets and fine-grained features [50]. The structure of the mechanism is presented in Figure 8. The CBAM includes two independent submodules, CAM (Channel Attention Module) and SAM (Spatial Attention Module), which are attention mechanisms acting on channels and spaces, respectively.

(1): The expression for the channel attention mechanism CAM is as follows:

$M_{c} (F) = σ (M L P (A v g P o o l (F)) + M L P (M a x P o o l (F))) = σ (W_{1} (W_{0} (F_{a v g}^{c})) + W_{1} (W_{0} (F_{m a x}^{c})))$

(30)

where σ represents the sigmoid activation function, and $W_{0}$ and $W_{1}$ represent the parameters of the neural network’s multi-layer perceptron. The specific algorithm flow of CAM is as follows: Firstly, apply global max pooling and global average pooling to the input feature map $F (H \times W \times C)$ to obtain two feature maps of size 1 × 1 × C. These two feature maps are then fed into an MLP comprising two layers, generating two corresponding outputs. Subsequently, these two outputs are integrated via element-wise addition and subjected to a Sigmoid activation function, thereby generating the channel attention features $M_{c}$ . This channel attention mechanism adaptively modulates the importance of each channel in the feature map, thereby enhancing the network’s sensitivity to critical features.
(2): The expression for the spatial attention mechanism SAM is the following:

$M_{s} (F) = σ (f^{7 \times 7} ([A v g P o o l (F); M a x P o o l (F)])) = σ (f^{7 \times 7} ([F_{a v g}^{c}; F_{m a x}^{c}]))$

(31)

where $f^{7 \times 7}$ represents the convolution operation with a convolution kernel size of 7 × 7. The detailed algorithmic procedure of SAM is outlined as follows: Firstly, multiply the channel attention feature $M_{c}$ with the original feature map $F$ to generate the input feature map $F^{'}$ for the spatial attention mechanism. Subsequently, global max pooling and global average pooling are applied to the feature map $F^{'}$ to obtain two feature maps with dimensions H × W × 1. Next, concatenate these two feature maps and process them through a convolution operation with a convolution kernel size of 7 × 7, generating a spatial attention feature map. Finally, the feature map is subjected to a Sigmoid activation function, yielding the spatial attention feature $M_{s}$ . Ultimately, multiply $M_{s}$ with the feature map $F^{'}$ to generate the optimized feature output. This spatial attention mechanism enhances the network’s sensitivity to critical spatial regions by adaptively modulating the importance of spatial locations in the feature map.

4.2.3. The Improvement of the Upsampling Operator

The original YOLOv5 model utilizes nearest-neighbor interpolation for up-sampling in the Neck component. The underlying principle of this method is to assign the grayscale value of the output pixel to that of the nearest neighboring pixel in the original image. Specifically, in the upsampling process, nearest-neighbor interpolation determines the pixel in the input image closest to the target location and uses its value to assign to the corresponding pixel in the output image. This method streamlines the interpolation process and preserves the original image characteristics, as depicted in Figure 9. However, this approach is prone to induce aliasing artifacts and a loss of fine details in the upsampled image, consequently compromising the accuracy of defect detection [51]. Thus, more sophisticated interpolation methods are frequently adopted to enhance feature recovery and improve image reconstruction accuracy.

Therefore, the Carafe upsampling operator is adopted in this model. The working principle of this operator is illustrated in Figure 10 [31].

The Carafe upsampling operator comprises two core modules: the upsampling kernel prediction module and the feature reorganization module.

(1): The up-sampling kernel prediction module initially compresses the channels of the feature map, converting the size of the original feature map from $H \times W \times C$ to $H \times W \times C_{m}$ , where $C_{m}$ is the number of compressed channels, implemented by a $1 \times 1$ convolution operation. Next, a further convolution operation is performed on the compressed feature map to generate a feature map of size $H \times W \times σ^{2} k_{up}^{2}$ . Here, $σ$ represents the upsampling scale factor, and $k_{up}^{2}$ is the size of the upsampling kernel. Finally, this feature map is expanded into an up-sampling kernel of shape $σ H \times σ W \times k_{up}^{2}$ . To ensure the stability of the convolution operation, the upsampling kernel is normalized such that the sum of its weights equals 1, thereby preserving the overall energy and information quantity in the feature map.
(2): Subsequently, the feature reorganization module employs the upsampling kernel to project the feature information from the low-resolution feature map onto the high-resolution domain. More specifically, this process employs the predicted upsampling kernel to compute pixel-wise weighted sums for each spatial location, thereby generating a high-fidelity high-resolution feature map. This approach significantly restores image details and improves image clarity through precise weighted operations, thus enhancing the spatial resolution and feature representation of the image.

The Carafe operator exhibits a large receptive field, thereby enabling an efficient aggregation of global-scale image information. This operator enables content-aware processing for specific instances, thereby enhancing the representational capability of feature maps through the dynamic generation of adaptive convolution kernels. Compared with traditional methods, the Carafe operator performs superiorly in terms of computational overhead, with lower computational costs, a smaller model size, and a higher computational efficiency. This characteristic allows Carafe to enhance image reconstruction and feature extraction performance while maintaining faster processing speeds.

4.2.4. The Improvement of the Loss Function

The precision, training efficiency, and robustness of an object detection model are significantly impacted by the selection of the loss function. Choosing an appropriate loss function can significantly enhance the overall performance of the model. The original YOLOv5 model employs the CIoU (Complete Intersection over Union) as the loss function [52]. CIoU introduces a ratio measurement term between the predicted box and the true box based on DIoU (Distance Intersection over Union), effectively accelerating the regression speed of the predicted box and improving training efficiency [53]. The mathematical expression is as follows:

L_{CIoU} = 1 - I o U + \frac{ρ^{2} (b, b^{gt})}{c^{2}} + β v

(32)

β = \frac{v}{1 - I o U + v}

(33)

v = \frac{4}{π^{2}} (\arctan \frac{w^{gt}}{h^{gt}} - a r c t a n \frac{w}{h})

(34)

where IoU is the ratio of the intersection area of the predicted box and the true box to the union area,

ρ (\cdot)

is the Euclidean distance, and c is the diagonal distance of the smallest enclosing rectangle. Meanwhile,

w^{gt}

,

h^{gt}

, and

b^{gt}

are the width, height, and center point coordinate of the predicted box, while w, h, and b are the width, height, and center point coordinate of the true box. The physical meanings of the parameters are presented in Figure 11.

Although the CIoU loss function has demonstrated effectiveness in bounding box regression tasks, it still suffers from several notable limitations [54]. For example, when the aspect ratios of the predicted and ground-truth bounding boxes are in a proportional relationship, the aspect ratio penalty term in the CIoU loss function may become ineffective. This indicates that the loss function fails to effectively guide the optimization process in such scenarios. In addition, based on the gradient formula for the width and height of the predicted bounding box, an increase in the value of one dimension necessitates a corresponding decrease in the other dimension. This attribute confines the ability of the box’s size to increase or decrease simultaneously during the regression process. Consequently, CIoU may not fully optimize the box’s shape and position in scenarios with inconsistent aspect ratios, thereby affecting the detection precision and the model’s generalization capability.

To address the issues, this model introduces the EIoU (Enhanced Intersection over Union) to replace CIoU [32]. The expression is as follows:

L_{EIoU} = L_{IoU} + L_{dis} + L_{asp} = 1 - I o U + \frac{ρ^{2} (b, b^{gt})}{{(w^{c})}^{2} + {(h^{c})}^{2}} + \frac{ρ^{2} (w, w^{gt})}{{(w^{c})}^{2}} + \frac{ρ^{2} (h, h^{gt})}{{(h^{c})}^{2}}

(35)

where

L_{IoU}

denotes the Intersection over Union (IoU) loss,

L_{dis}

denotes the distance loss, and

L_{asp}

denotes the aspect ratio loss. Meanwhile,

w^{c}

and

h^{c}

are the width and height of the smallest enclosing box that covers both the predicted and target bounding boxes, respectively. The EIoU loss function substantially enhances the accuracy and convergence rate of bounding box regression by directly minimizing the discrepancies in width and height between the predicted and ground-truth bounding boxes. This loss function not only focuses on the overlap region between the predicted and ground-truth bounding boxes but also considers the quantitative discrepancies in their width and height, thereby providing more precise gradient information during the optimization process. This approach enables the model to adjust bounding box dimensions more efficiently, reducing the geometric differences between predicted and ground-truth bounding boxes. Thereby, it enhances the accuracy of object detection and the training efficiency of the model. Consequently, the EIoU loss function demonstrates a superior performance and increased robustness in handling various target scales and shapes.

5. Experimental Results and Discussion

The experiments of this research consist of two parts: the IEVPF experiment and the surface defect detection experiment. Table 1 shows the equipment employed to complete these two experiments.

5.1. IEVPF Experiment

In the field of optics, the suppression of polarized light using polarizers represents a classic and widely adopted approach, which achieves the modulation of polarized light by leveraging the characteristic of selective transmission for light with specific polarization directions. In this context, the experiment designs a comparative verification procedure by specifically selecting four typical images with distinct polarization directions as control samples. A multi-dimensional comparative analysis is conducted between these samples and the glare suppression results obtained via the IEVPF method. This comparative design enables a comprehensive validation of the IEVPF method’s effectiveness from different polarization angles, highlighting its suppression advantages in complex polarized light environments.

Additionally, in subsequent defect detection research, a similar comparative verification strategy is employed. By contrasting the detection results of conventional methods with those obtained after introducing IEVPF pretreatment, this approach further corroborates the significant enhancement of detection accuracy and reliability by this method, as evidenced from practical application scenarios.

5.1.1. IEVPF Evaluation Criteria

In this part of the experiment, the grayscale histogram of the image is used to evaluate the results. The grayscale histogram can effectively visualize the frequency distribution of pixels at different grayscale levels, thereby facilitating the analysis of the image’s contrast, brightness, and dynamic range. In this experiment, based on the algorithm principle, most pixels in the image are subjected to virtual polarization filtering in the optimal polarization direction. The expected result is that, in the grayscale histogram of the ideal output image, most areas with low and medium grayscale values will exhibit a downward trend. This characteristic demonstrates that through virtual polarization filtering, the grayscale values in low-contrast regions are reduced, thereby achieving background suppression and enhancing the image’s visual clarity and feature discriminability.

5.1.2. IEVPF Experimental Results and Discussion

(1): IEVPF’s results

The experimental results of image enhancement by IEVPF and their histograms are shown in Figure 12 and Figure 13.

(2): Analysis of experimental results

Compare the distribution of grayscale values in each image interval as shown in Table 2.

After image enhancement processing, compared with the images at 0°, 45°, and 90° polarization directions, the number of pixels with grayscale values greater than 204 has decreased by two orders of magnitude, demonstrating a significant suppression effect on overexposure. Compared with the image at the 135° polarization direction, the number of pixels with grayscale values above 152 has minimal change, only decreasing by about 7%. However, in the low grayscale value area, the number of pixels with grayscale values in the range [0, 50] has increased by 33%, while the number of pixels with grayscale values in the range [51, 101] has decreased by 25%. This change indicates that the enhanced image retains relatively stable features in high-grayscale regions, while the overall grayscale level in low-grayscale regions is decreased, thereby effectively mitigating background noise. This processing approach contributes to improving the model’s sensitivity and accuracy in defect detection tasks, as background noise reduction can accentuate the features of target regions, making defect identification more distinct.

5.2. Surface Defect Detection Experiment

5.2.1. Evaluation Criteria

In this part of the experiment, the employed evaluation criteria are precision, recall, and mAP value, to respectively verify the actual application effect of the model proposed in this paper [48].

5.2.2. Hyperparameter Settings

The hyperparameters configured for this experiment were set to 200 epochs, a batch size of 4, with the remaining settings adhering to the yolov5s configuration.

5.2.3. Results and Discussion

Loss

The following Figure 14 illustrates the training loss values of YOLO V5 and YOLO V5-W on five datasets during the training process, totaling 10 models.

In the figure, YOLO V5-X and YOLO V5-W-X represent the YOLO V5 model and the improved YOLO V5-W model trained on dataset X, respectively. It can be seen that, as follows:

(1): For the same model, the model trained on the IEVPF dataset has a lower Loss value than the model trained on the datasets of the four polarization directions.
(2): For the same dataset, the Loss value of YOLO V5-W is lower than that of YOLO V5.
(3): The YOLO V5-W model trained on the IEVPF dataset has the lowest Loss value.

These results (see Figure 14 and Table 3) indicate that YOLO V5-W exhibits a higher localization precision, classification precision, and network confidence compared to YOLO V5. Additionally, the results further demonstrate that applying virtual polarization filtering to images can significantly enhance the YOLO model’s performance in object detection tasks. These findings highlight the significance of virtual polarization filtering in data preprocessing and its potential for improving model performance.

(1): On these five datasets, the Loss values are reduced by 51.1%, 34.5%, 50.6%, 32.0%, and 33.1%, with an average reduction of 40.3%. It is evident that across all datasets, the YOLOv5-W model demonstrates superior performance compared to the YOLOv5 model.
(2): Compared to the YOLO V5 model trained on the four polarization direction datasets, the YOLO V5 model trained on the IEVPF dataset has Loss values reduced by 37.2%, 24.7%, 31.0%, and 12.6%, which possesses an average reduction of 26.4%. The YOLO V5-W model trained on the IEVPF dataset has Loss values reduced by 14.0%, 23.1%, 6.7%, and 14.0% compared to the models trained on the four polarization direction datasets, with an average reduction of 14.5%. These results not only demonstrate that the virtual polarization filtering-enhanced dataset is better suited for target detection tasks but also highlight the strong robustness of the YOLOv5-W model, enabling it to effectively adapt to dataset variations.

Precision

The precision result during the training process is presented in Figure 15. Observing the first five charts, it is evident that the YOLO V5-W model presents a higher precision during the training process compared to the YOLO V5 model. The “origin” curve in Figure 15F represents the average precision of the models trained on the four different polarization direction datasets, which means the average of the eight curves in the first four charts, while the “Enhance” curve represents the average precision of the models trained on the IEVPF dataset, which presents the average of the two curves in the fifth chart. These results indicate that employing the virtual polarization filtering enhanced dataset and the improved YOLO V5-W model both enhance the model’s precision. Table 4 lists the average precision after 50 iterations of each model.

(1): The training results on the five datasets illustrate that the precision of the YOLO V5-W model has increased by 11.8%, 14.6%, 7.3%, 9.3%, and 11.2%, with an average increase of 10.8%. This result indicates that the YOLOv5-W model exhibits a higher precision across all datasets when compared to the YOLOv5 model.
(2): The YOLO V5 model trained on the IEVPF dataset has a precision increased by 3.0%, 7.1%, −2.4%, and 3.4%, compared to the models trained on the four polarization direction datasets, with an average increase of 2.8%. The YOLO V5-W model trained on the IEVPF dataset has a precision increased by 2.4%, 3.7%, 1.52%, and 5.33% compared to the models trained on the four polarization direction datasets, with an average increase of 3.2%. These results demonstrate that training models with the IEVPF dataset can enhance model detection accuracy, thereby validating the potential of virtual polarization filtering as an effective data augmentation technique for improving the performance of surface defect detection models.

Figure 15. The precision of each model during the training process. (A) The precision of the YOLO V5 and YOLO V5-W trained on the 0° polarization direction dataset. (B) The precision of the YOLO V5 and YOLO V5-W trained on the 45° polarization direction dataset. (C) The precision of the YOLO V5 and YOLO V5-W trained on the 90° polarization direction dataset. (D) The precision of the YOLO V5 and YOLO V5-W trained on the IEVPF dataset. (E) The precision of the YOLO V5 and YOLO V5-W trained on the 0° polarization direction dataset. (F) The average precision of the models trained on the four polarization direction datasets and the IEVPF dataset, respectively.

Recall

The recall during the training process is presented in Figure 16. From the first five charts, it is evident that during the training process, the recall of the YOLO V5-W is higher than that of YOLO V5. Observing the sixth chart, using the IEVPF dataset to train models can slightly enhance the model’s recall. These results indicate that using the IEVPF dataset and the improved YOLO V5-W model both enhance the model’s recall. The average recall after 50 iterations of each model is presented in Table 5.

(1): On these five datasets, the recall has increased by 10.8%, 16.0%, 7.9%, 6.7%, and 10.2%, with an average increase of 10.3%. No matter which dataset, YOLO V5-W presents a higher recall compared to YOLO V5.
(2): The YOLO V5 model trained on the IEVPF dataset has a recall increased by 2.4%, 10.8%, −1.0%, and 2.0% compared to the models trained on the four polarization direction datasets, with an average increase of 3.6%. The YOLO V5-W model trained on the IEVPF dataset has a recall increased by 1.7%, 4.9%, 1.3%, and 5.4% compared to the models trained on the four polarization direction datasets, with an average increase of 3.3%. This indicates that training models with the IEVPF dataset can enhance model recall, thereby validating the potential of virtual polarization filtering as an effective data augmentation technique for improving the performance of surface defect detection models.

Figure 16. The recall of each model during the training process. (A) The recall of the YOLO V5 and YOLO V5-W trained on the 0° polarization direction dataset. (B) The recall of the YOLO V5 and YOLO V5-W trained on the 45° polarization direction dataset. (C) The recall of the YOLO V5 and YOLO V5-W trained on the 90° polarization direction dataset. (D) The recall of the YOLO V5 and YOLO V5-W trained on the IEVPF dataset. (E) The recall of the YOLO V5 and YOLO V5-W trained on the 0° polarization direction dataset. (F) The average recall of the models trained on the four polarization direction datasets and the IEVPF dataset, respectively.

mAP

The following Figure 17 shows the mAP0.5-0.95 during the training process. Observing the first five charts, during the training process, the mAP value of the YOLO V5-W model is higher than the mAP value of the YOLO V5 model. From the sixth chart, using the IEVPF dataset to train models can enhance the model’s mAP value. These results indicate that employing the IEVPF dataset and the improved YOLO V5-W model both enhance the model’s mAP value. Table 6 lists the average mAP value after 50 iterations of each model.

(1): On these five datasets, the mAP value has increased by 14.9%, 15.0%, 13.2%, 12.6%, and 12.7%, with an average increase of 13.7%. No matter which dataset, YOLO V5-W presents a higher mAP value compared to YOLO V5.
(2): The YOLO V5 model trained on the IEVPF dataset has a mAP value increased by 6.4%, 11.6%, 2.4%, and 6.3% compared to the models trained on the four polarization direction datasets, with an average increase of 6.7%. The YOLO V5-W model trained on the IEVPF dataset has a mAP value increased by 4.2%, 9.3%, 1.7%, and 6.4% compared to the models trained on the four polarization direction datasets, with an average increase of 5.4%. This indicates that training models with the IEVPF dataset can modestly enhance model mAP, thereby validating the potential of virtual polarization filtering as an effective data augmentation technique for improving the performance of surface defect detection models.

Table 6. The average mAP value of each model after 50 iterations during the training process.

	YOLO V5-0	YOLO V5-45	YOLO V5-90	YOLO V5-135	YOLO V5-Enh
*mAP10⁻²**	53.57	48.36	57.59	53.68	59.94
	YOLO V5-W-0	YOLO V5-W-45	YOLO V5-W-90	YOLOV5-W-135	YOLO V5-W-Enh
*mAP10⁻²**	68.42	63.33	70.86	66.23	72.59

Test Performance

The following Figure 18 compares the performance of each model on a portion of the test set, with the bump type defect.

(1): For the same model, the one trained on the IEVPF dataset exhibits a higher detection precision than that trained on datasets corresponding to four polarization filtering directions.
(2): For the same dataset, YOLO V5-W has a higher recognition precision than YOLO V5.
(3): When detecting images with inadequate polarization suppression effects by models trained on the IEVPF dataset (as illustrated at 0° and 90°), some continuous undulations may be erroneously classified as cracks. This phenomenon occurs because following virtual polarization filtering enhancement, a marked difference emerges between cracks and a series of continuous bumps. By contrast, in images with inadequate polarization suppression, such differentiation is less distinct. Therefore, it can be concluded that images enhanced post virtual polarization filtering are more amenable to model training for target detection tasks.
(4): Overall, the YOLOv5-W-Enh model, which is based on the improved YOLOv5-W architecture and trained on the IEVPF dataset, is the only one that has neither missed nor falsely detected any defects in the virtual polarization filtering enhancement experiments, achieving the highest detection accuracy and confidence scores.

Figure 17. The mAP0.5–0.95 of each model during the training process. (A) The mAP0.5–0.95 of the YOLO V5 and YOLO V5-W trained on the 0° polarization direction dataset. (B) The mAP0.5–0.95 of the YOLO V5 and YOLO V5-W trained on the 45° polarization direction dataset. (C) The mAP0.5–0.95 of the YOLO V5 and YOLO V5-W trained on the 90° polarization direction dataset. (D) The mAP0.5–0.95 of the YOLO V5 and YOLO V5-W trained on the IEVPF dataset. (E) The mAP0.5–0.95 of the YOLO V5 and YOLO V5-W trained on the 0° polarization direction dataset. (F) The average mAP0.5–0.95 of the models trained on the four polarization direction datasets and the IEVPF dataset, respectively.

These four points illustrate that the IEVPF algorithm and the improved YOLO V5-W model proposed in this research can effectively enhance the precision of surface defect recognition.

Figure 18. The performance of each model on the test set.

6. Conclusions

This paper proposes a novel surface defect detection method. We develop a MPIS to capture images at four polarization angles, then use Stokes vector analysis to validate virtual polarization filtering and propose the IEVPF algorithm for image enhancement. The improved YOLO V5-W model replaces C3 with DC3 modules for a deeper feature extraction, integrates CBAM attention for small-target detection, upgrades upsampling to Carafe, and uses EIoU loss for a faster convergence. Experiments on five datasets show YOLO V5-W reduces loss by 40.3%, and improves precision/recall/mAP by 10.8%/10.3%/13.7% over YOLO V5. Compared to raw polarization datasets, models using IEVPF-enhanced data achieve a 14.5% loss reduction and 3–5% performance gains, demonstrating the method’s superiority in complex defect detection. Although the IEVPF algorithm employed in this paper effectively suppresses background noise and enhances the expressive ability of image features, this two-stage processing method also introduces additional time overhead, thereby reducing the real-time detection performance of the model. In the future, we intend to integrate this technology into the neural network to construct a single-stage detection model, thereby enhancing the algorithm’s running speed.

Notably, our proposed method shows general applicability to metallic materials such as iron and titanium, which can be attributed to the inherent difference in polarization properties between diffuse and specular reflections.

Author Contributions

Methodology, X.S.; Validation, X.Z.; Investigation, H.C.; Resources, C.S.; Writing—original draft, X.S.; Writing—review & editing, S.L.; Visualization, S.Q.; Supervision, F.S.; Funding acquisition, X.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Natural Science Foundation of Hunan Province (2024JJ6460), the National Natural Science Foundation of China (52305594), and the China Postdoctoral Science Foundation Grant (2024M754299).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Pragana, J.P.M.; Bragança, I.M.F.; Martins, P.A.F. Hybrid metal additive manufacturing: A state-of-the-art review. Adv. Ind. Manuf. Eng. 2021, 2, 100032. [Google Scholar] [CrossRef]
Zhou, K.; Bai, X.Y.; Tan, P.F.; Yan, W.T.; Li, S.F. Preface: Modeling of additive manufacturing. Int. J. Mech. Sci. 2024, 265, 108909. [Google Scholar] [CrossRef]
Mercado, F.; Rojas, A. Additive manufacturing methods: Techniques, materials, and closed-loop control applications. Int. J. Adv. Manuf. Technol. 2020, 109, 17–31. [Google Scholar] [CrossRef]
Niu, P.D.; Li, R.D.; Zhu, S.Y.; Wang, M.B.; Chen, C.; Yuan, T.C. Hot cracking, crystal orientation and compressive strength of an equimolar CoCrFeMnNi high-entropy alloy printed by selective laser melting. Opt. Laser Technol. 2020, 127, 106147. [Google Scholar] [CrossRef]
Peng, X.; Kong, L.B.; Chen, Y.; Wang, J.H.; Xu, M. A preliminary study of in-situ defects measurement for additive manufacturing based on multi-spectrum. In Proceedings of the 9th International Symposium on Advanced Optical Manufacturing and Testing Technologies: Subdiffraction-Limited Plasmonic Lithography and Innovative Manufacturing Technology, Chengdu, China, 26–29 June 2018; SPIE: Bellingham, WA, USA; p. 1084217. [Google Scholar]
Vafadar, A.; Guzzomi, F.; Rassau, A.; Hayward, K. Advances in metal additive manufacturing: A review of common processes, industrial applications, and current challenges. Appl. Sci. 2021, 11, 1213. [Google Scholar] [CrossRef]
Frazier, W.E. Metal additive manufacturing: A review. J. Mater. Eng. Perform. 2014, 23, 1917–1928. [Google Scholar] [CrossRef]
Wong, K.V.; Hernandez, A. A review of additive manufacturing. Int. Sch. Res. Not. 2012, 10, 208760. [Google Scholar] [CrossRef]
Kruth, J.P.; Froyen, L.; Van Vaerenbergh, J.; Mercelis, P.; Rombouts, M.; Lauwers, B. Selective laser melting of iron-based powder. J. Mater. Process. Technol. 2004, 149, 616–622. [Google Scholar] [CrossRef]
Paulson, N.H.; Gould, B.; Wolff, S.J.; Stan, M.; Greco, A.C. Correlations between thermal history and keyhole porosity in laser powder bed fusion. Addit. Manuf. 2020, 34, 101213. [Google Scholar] [CrossRef]
Gu, D.D.; Hagedorn, Y.; Meiners, W.; Meng, G.B.; Batista, R.J.S.; Wissenbach, K.; Poprawe, R. Densification behavior, microstructure evolution, and wear performance of selective laser melting processed commercially pure titanium. Acta Mater. 2012, 60, 3849–3860. [Google Scholar] [CrossRef]
Fu, Y.Z.; Downey, A.R.J.; Yuan, L.; Zhang, T.Y.; Pratt, A.; Balogun, Y. Machine learning algorithms for defect detection in metal laser-based additive manufacturing: A review. J. Manuf. Process. 2022, 75, 693–710. [Google Scholar] [CrossRef]
Peng, X.; Kong, L.B.; Han, W.; Wang, S. Multi-Sensor Image Fusion Method for Defect Detection in Powder Bed Fusion. Sensors 2022, 22, 8023. [Google Scholar] [CrossRef]
Papa, I.; Lopresto, V.; Langella, A. Ultrasonic inspection of composites materials: Application to detect impact damage. Int. J. Lightweight Mater. Manuf. 2021, 4, 37–42. [Google Scholar] [CrossRef]
Tian, G.; Sophian, A.; Taylor, D.; Rudlin, J. Electromagnetic and eddy current NDT: A review. Insight Non-Destr. Test. Cond. Monit. 2001, 43, 302–306. [Google Scholar]
Wu, Q.; Dong, K.; Qin, X.P.; Hu, Z.Q.; Xiong, X.C. Magnetic particle inspection: Status, advances, and challenges—Demands for automatic non-destructive testing. NDT E Int. 2024, 143, 103030. [Google Scholar] [CrossRef]
Yang, W.N.; Chen, M.Y.; Wu, H.; Lin, Z.Y.; Kong, D.Q.; Xie, S.L.; Takamasu, K. Deep learning-based weak micro-defect detection on an optical lens surface with micro vision. Opt. Express 2023, 31, 5593–5608. [Google Scholar] [CrossRef]
Xu, L.S.; Dong, S.H.; Wei, H.T.; Ren, Q.Y.; Huang, J.W.; Liu, J.Y. Defect signal intelligent recognition of weld radiographs based on YOLO V5-IMPROVEMENT. J. Manuf. Process. 2023, 99, 373–381. [Google Scholar] [CrossRef]
Liu, G.; Dwivedi, P.; Trupke, T.; Hameiri, Z. Deep Learning Model to Denoise Luminescence Images of Silicon Solar Cells. Adv. Sci. 2023, 10, e2300206. [Google Scholar] [CrossRef] [PubMed]
Ma, D.Y.; Jiang, P.; Shu, L.S.; Geng, S.N. Multi-sensing signals diagnosis and CNN-based detection of porosity defect during Al alloys laser welding. J. Manuf. Syst. 2022, 62, 334–346. [Google Scholar] [CrossRef]
Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Girshick, R.B.; Donahue, J.; Darrell, T.; Malik, J.; Berkeley, U. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
He, K.M.; Zhang, X.; Ren, S.Q.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.B.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Ren, S.Q.; He, K.M.; Girshick, R.B.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 39, 1137–1149. [Google Scholar] [CrossRef]
Huang, G.; Liu, Z.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar]
Bhatt, P.; Malhan, R.; Rajendran, P.; Shah, B.C.; Thakar, S.; Yoon, Y.J.; Gupta, S.K. Image-Based Surface Defect Detection Using Deep Learning: A Review. ASME J. Comput. Inf. Sci. Eng. 2021, 21, 040801. [Google Scholar] [CrossRef]
Liu, G.L. Surface Defect Detection Methods Based on Deep Learning: A Brief Review. In Proceedings of the 2020 2nd International Conference on Information Technology and Computer Application (ITCA), Guangzhou, China, 18–20 December 2020; pp. 200–203. [Google Scholar]
Peng, G.F.; Song, T.; Cao, S.X.; Zhou, B.; Jiang, Q. A two-stage defect detection method for unevenly illuminated self-adhesive printed materials. Sci. Rep. 2024, 14, 20547. [Google Scholar] [CrossRef]
Woo, S.; Park, J.; Lee, J.; Kweon, I. CBAM: Convolutional Block Attention Module. arXiv 2018, arXiv:1807.06521. [Google Scholar]
Wang, J.Q.; Chen, K.; Xu, R.; Liu, Z.W.; Loy, C.C.; Lin, D.H. CARAFE: Content-Aware ReAssembly of FEatures. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3007–3016. [Google Scholar]
Zhang, Y.F.; Ren, W.Q.; Zhang, Z.; Jia, Z.; Wang, L.; Tan, T. Focal and Efficient IOU Loss for Accurate Bounding Box Regression. arXiv 2021, arXiv:2101.08158. [Google Scholar] [CrossRef]
Tong, L.; Huang, X.Y.; Wang, P.; Ye, L.; Peng, M.; An, L.; Sun, Q.; Zhang, Y.; Yang, G.; Li, Z.; et al. Stable mid-infrared polarization imaging based on quasi-2D tellurium at room temperature. Nat. Commun. 2020, 11, 2308. [Google Scholar] [CrossRef] [PubMed]
Wolrige, S.H.; Howe, D.; Majidiyan, H. Intelligent Computerized Video Analysis for Automated Data Extraction in Wave Structure Interaction; A Wave Basin Case Study. J. Mar. Sci. Eng. 2025, 13, 617. [Google Scholar] [CrossRef]
Majidiyan, H.; Enshaei, H.; Howe, D.; Wang, Y. An Integrated Framework for Real-Time Sea-State Estimation of Stationary Marine Units Using Wave Buoy Analogy. J. Mar. Sci. Eng. 2024, 12, 2312. [Google Scholar] [CrossRef]
Powell, S.B.; Garnett, R.; Marshall, J.; Rizk, C. and Gruev, V. Bioinspired polarization vision enables underwater geolocalization. Sci. Adv. 2018, 4, eaao6841. [Google Scholar] [CrossRef]
Zhu, Z.M.; Xiang, P.; Zhang, F.M. Polarization-based method of highlight removal of high-reflectivity surface. Optik 2020, 221, 165345. [Google Scholar] [CrossRef]
Zhang, H.; Zhang, S.J.; Zou, R.B. Select-Mosaic: Data Augmentation Method for Dense Small Object Scenes. arXiv 2024, arXiv:2406.05412. [Google Scholar]
Mikš, A.; Pokorný, P. Explicit calculation of Point Spread Function of optical system. Optik 2021, 239, 166885. [Google Scholar] [CrossRef]
KU, S.; Mahato, K.K.; Mazumder, N. Published in Lasers in Medical Science. Polarization-resolved Stokes-Mueller imaging: A review of technology and applications. Lasers Med. Sci. 2019, 34, 1283–1293. [Google Scholar] [CrossRef] [PubMed]
Xiong, W.; Hsu, C.; Bromberg, Y.; Antonio-Lopez, J.; Correa, R.A.; Cao, H. Complete polarization control in multimode fibers with polarization and mode coupling. Light Sci. Appl. 2017, 7, 54. [Google Scholar] [CrossRef]
Bochkovskiy, A.; Wang, C.Y.; Liao, H. YOLOv4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.E.; Fu, C.Y.; Berg, A. SSD: Single Shot MultiBox Detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Springer International Publishing: Cham, Switzerland, 2015; pp. 21–37. [Google Scholar]
Zhang, S.F.; Wen, L.Y.; Bian, X.; Lei, Z.; Li, S. Single-Shot Refinement Neural Network for Object Detection. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4203–4212. [Google Scholar]
Wang, D.F.; Zhang, B.; Cao, Y.; Lu, M.Y. SFssD: Shallow feature fusion single shot multiboxdetector. In Proceedings of the International Conference in Communications, Signal Processing, and Systems, Urumqi, China, 20–22 July 2019; Springer: Singapore, 2019. [Google Scholar]
Varghese, R.; Sambath, M. YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness. In Proceedings of the 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), Chennai, India, 18–19 April 2024; pp. 1–6. [Google Scholar]
Hussain, M. YOLOv1 to v8: Unveiling Each Variant–A Comprehensive Review of YOLO. IEEE Access 2024, 12, 42816–42833. [Google Scholar] [CrossRef]
Li, X. A real-time detection algorithm for kiwifruit defects based on YOLOv5. Electronics 2021, 10, 1711. [Google Scholar] [CrossRef]
Qi, J.T.; Liu, X.N.; Liu, K.; Xu, F.; Guo, H.; Tian, X.L.; Li, M.; Bao, Z.Y.; Li, Y. An improved YOLOv5 model based on visual attention mechanism: Application to recognition of tomato virus disease. Comput. Electron. Agric. 2022, 194, 106780. [Google Scholar] [CrossRef]
Zhang, D.; Han, J.; Cheng, G.; Yang, M. Weakly supervised object localization and detection: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 5866–5885. [Google Scholar] [CrossRef]
Parsania, P.; Virparia, P. A Review: Image Interpolation Techniques forImage Scaling. Int. J. Innov. Res. Comput. Commun. Eng. 2015, 2, 7409–7414. [Google Scholar] [CrossRef]
Wang, X.; Song, J. ICIoU: Improved Loss Based on Complete Intersection Over Union for Bounding Box Regression. IEEE Access 2021, 9, 105686–105695. [Google Scholar] [CrossRef]
Zheng, Z.H.; Wang, P.; Liu, W.; Li, J.Z.; Ye, R.G.; Ren, D.W. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. arXiv 2019, arXiv:1911.08287. [Google Scholar] [CrossRef]
Kaur, R.; Singh, S. A comprehensive review of object detection with deep learning. Digit. Signal Process. 2022, 132, 103812. [Google Scholar] [CrossRef]

Figure 1. A schematic diagram of the MPIS structure.

Figure 2. Flowchart of the IEVPF algorithm. (A) Data Acquisition. (B) Image Sampling. (C) Determine Input. (D) Solve

S_{N R}

. (E) Solve

A_{D U}^{N}

. (F) Output the result.

Figure 2. Flowchart of the IEVPF algorithm. (A) Data Acquisition. (B) Image Sampling. (C) Determine Input. (D) Solve

S_{N R}

. (E) Solve

A_{D U}^{N}

. (F) Output the result.

Figure 3. Detection principal diagram.

Figure 4. A flowchart of the dataset construction for a certain defect.

Figure 5. A schematic diagram of the YOLO V5-W structure.

Figure 6. A schematic diagram of the C3 module structure.

Figure 7. A schematic diagram of the DC3 module structure.

Figure 8. A schematic diagram of the CBAM attention mechanism structure.

Figure 9. A schematic diagram of the principle of nearest-neighbor interpolation.

Figure 10. A schematic diagram of the structure of the Carafe upsampling operator.

Figure 11. A schematic diagram of CIoU parameters.

Figure 12. Input and output pictures and their histograms. (A) A picture of the 0° polarization direction and its histogram. (B) A picture of the 45° polarization direction and its histogram. (C) A picture of the 90° polarization direction and its histogram. (D) A picture of the 135° polarization direction and its histogram. (E) An image enhanced via the IEVPF algorithm and its histogram.

Figure 13. Grayscale histograms of input and output.

Figure 14. Loss values of each model during the training process.

Table 1. Experimental platform parameters.

Equipment	Module
Operate system	Windows 11 operating system
CPU	11th Gen Intel(R) Core (TM) i7-11800H
RAM	16.0 GB
GPU	NVIDIA GeForce GTX 3060
CUDA	11.7
Pytorch	2.2.2
Python	3.8.19

Table 2. The grayscale value distribution of input and output.

Image Type	[0, 50]	[51, 101]	[102, 152]	[153, 203]	[204, 255]
0°	36,851	169,060	248,843	250,369	548,253
45°	16,445	93,285	163,133	202,253	778,260
90°	117,346	452,768	395,023	181,519	106,720
135°	506,940	611,276	112,140	15,802	7218
Enhance	674,910	459,317	97,738	14,617	6794

Table 3. The average Loss value after 50 iterations of each model.

	YOLO V5-0	YOLO V5-45	YOLO V5-90	YOLO V5-135	YOLO V5-Enh
*Loss10⁻²**	15.34	12.80	13.98	11.03	9.64
	YOLO V5-W-0	YOLO V5-W-45	YOLO V5-W-90	YOLO V5-W-135	YOLO V5-W-Enh
*Loss10⁻²**	7.50	8.39	6.91	7.50	6.45

Table 4. The average precision of each model after 50 iterations during the training process.

	YOLO V5-0	YOLO V5-45	YOLO V5-90	YOLO V5-135	YOLO V5-Enh
*Precision10⁻²**	82.50	78.40	87.85	82.07	85.49
	YOLO V5-W-0	YOLO V5-W-45	YOLO V5-W-90	YOLOV5-W-135	YOLO V5-W-Enh
*Precision10⁻²**	94.28	92.99	95.14	91.33	96.66

Table 5. The average recall of each model after 50 iterations during the training process.

	YOLO V5-0	YOLO V5-45	YOLO V5-90	YOLO V5-135	YOLO V5-Enh
*Recall10⁻²**	86.41	78.01	89.79	86.82	88.80
	YOLO V5-W-0	YOLO V5-W-45	YOLO V5-W-90	YOLOV5-W-135	YOLO V5-W-Enh
*Recall10⁻²**	97.25	94.03	97.72	93.54	98.97

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Su, X.; Peng, X.; Zhou, X.; Cao, H.; Shan, C.; Li, S.; Qiao, S.; Shi, F. Enhanced Defect Detection in Additive Manufacturing via Virtual Polarization Filtering and Deep Learning Optimization. Photonics 2025, 12, 599. https://doi.org/10.3390/photonics12060599

AMA Style

Su X, Peng X, Zhou X, Cao H, Shan C, Li S, Qiao S, Shi F. Enhanced Defect Detection in Additive Manufacturing via Virtual Polarization Filtering and Deep Learning Optimization. Photonics. 2025; 12(6):599. https://doi.org/10.3390/photonics12060599

Chicago/Turabian Style

Su, Xu, Xing Peng, Xingyu Zhou, Hongbing Cao, Chong Shan, Shiqing Li, Shuo Qiao, and Feng Shi. 2025. "Enhanced Defect Detection in Additive Manufacturing via Virtual Polarization Filtering and Deep Learning Optimization" Photonics 12, no. 6: 599. https://doi.org/10.3390/photonics12060599

APA Style

Su, X., Peng, X., Zhou, X., Cao, H., Shan, C., Li, S., Qiao, S., & Shi, F. (2025). Enhanced Defect Detection in Additive Manufacturing via Virtual Polarization Filtering and Deep Learning Optimization. Photonics, 12(6), 599. https://doi.org/10.3390/photonics12060599

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhanced Defect Detection in Additive Manufacturing via Virtual Polarization Filtering and Deep Learning Optimization

Abstract

1. Introduction

2. Principle

2.1. Multi-Source Polarized Imaging System

2.2. Image Enhancement Based on Virtual Polarization Filtering Algorithm

2.3. Detection Process

3. The IEVPF Methodology

3.1. Feasibility Proof of IEVPF

3.2. Principle of IEVPF Algorithm

3.3. Optimization and Processing for Overexposure in the IEVPF Algorithm

3.3.1. IEVPF Algorithm Optimization

3.3.2. Processing for Overexposure

4. The Improvement of the Automative Deep Learning-Based Defect Detection Algorithm

4.1. YOLO V5 Model

4.2. YOLO V5-W

4.2.1. Algorithm Optimization

4.2.2. Introduction of Attention Mechanism

4.2.3. The Improvement of the Upsampling Operator

4.2.4. The Improvement of the Loss Function

5. Experimental Results and Discussion

5.1. IEVPF Experiment

5.1.1. IEVPF Evaluation Criteria

5.1.2. IEVPF Experimental Results and Discussion

5.2. Surface Defect Detection Experiment

5.2.1. Evaluation Criteria

5.2.2. Hyperparameter Settings

5.2.3. Results and Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI