BPN-YOLO: A Novel Method for Wood Defect Detection Based on YOLOv7

: The detection of wood defect is a crucial step in wood processing and manufacturing, determining the quality and reliability of wood products. To achieve accurate wood defect detection, a novel method named BPN-YOLO is proposed. The ordinary convolution in the ELAN module of the YOLOv7 backbone network is replaced with Pconv partial convolution, resulting in the P-ELAN module. Wood defect detection performance is improved by this modification while unnecessary redundant computations and memory accesses are reduced. Additionally, the Biformer attention mechanism is introduced to achieve more flexible computation allocation and content awareness. The IOU loss function is replaced with the NWD loss function, addressing the sensitivity of the IOU loss function to small defect location fluctuations. The BPN-YOLO model has been rigorously evaluated using an optimized wood defect dataset, and ablation and comparison experiments have been performed. The experimental results show that the mean average precision (mAP) of BPN-YOLO is improved by 7.4% relative to the original algorithm, which can better meet the need to accurately detecting surface defects on wood.


Introduction
Wood is a valuable resource in nature, referring to the wooden materials used in furniture making, building construction, manufacturing appliances, etc. Wood is easy to obtain and process and has always been a major element in furniture and construction materials, but in the growth and processing of trees, due to physiological, pathological, man-made and other reasons, a variety of defects such as live knots, dead knots, cracks, etc., will occur [1].These defects reduce the quality of the finished product as well as the use of wood, affect the mechanical properties of the wood, weaken its overall strength and hardness, and even make the wood lose its value completely.Therefore, the detection of wood defects has become an indispensable part of the manufacturing process.
The detection of wood defects is mainly divided into two categories: internal defects and external defects [2].The main methods for internal defect detection are vibration-based, ultrasound-based and X-ray-based.Vibration-based detection generally involves giving the wood a discrete or continuous vibration and recording the frequency of the vibration to determine whether or not there are defects in the wood.This method can result in a lack of ability to accurately identify defects due to the limited information that can be provided by the frequency of the vibration [3].Ultrasound-based detection [4,5] is realized through ultrasound imaging, which can identify wood defects more accurately due to its higher excitation frequency compared to the vibration-based method.The method of X-ray-based detection utilizes the difference in the degree of absorption and attenuation of X-rays when the rays pass through the inspected wood to determine whether defects are present in the wood [6,7].Although this method is better than the previous two methods in terms of detection accuracy, its high cost and radiation damage to the user limits its application in the detection field.
The main methods for detecting external defects in wood are traditional manual identification [8], 3D laser scanning and image recognition-based techniques [9,10].Most of the industry requires the use of the human eye on the external defects of the wood to identify.This method is vulnerable to visual fatigue by inspectors, and subjective factors have a greater impact, which will lead to errors and missed defects.Moreover, the traditional manual identification method is inefficient.Three-dimensional laser scanning is used to identify defects in wood by emitting light into the wood.The reflected light is picked up by an inspection instrument, and the resulting data are processed and analyzed by a computer.This method utilizes the irregularities of certain defects in wood, such as protruding or recessed defects [11].However, it is not as accurate in detecting defects with small surface variations, such as dead knots or discoloration.In addition, the equipment used is expensive.The image recognition method captures high-quality images via highresolution, high-speed cameras.To ensure image quality suitable for further processing, it is important to consider the surface (rough or planed) and the reflective properties of the wood, the lighting conditions, the industrial environment, and many other factors.Moreover, image recognition technology will not be susceptible to the subjective and operational influence of the inspector, with better detection accuracy and strong stability.Therefore, in the field of wood, using image recognition technology for the detection of wood defects has become an effective solution [12].
As image recognition technology advances, particularly with the integration of deep learning into the realm of image recognition, an increasing number of fields are opting to employ deep learning-based network models for the detection of defects.These defect detection models are mainly categorized into two types: single-stage and double-stage.The double-stage detection models include R-CNN [13], Fast R-CNN [14], Faster R-CNN [15], and mask R-CNN models [16], etc.They are mainly characterized by generating candidate frames and then filtering and categorizing the defects.Compared to single-stage detection models, they usually have higher detection accuracy, but are accompanied by the problems of large computational data, long time for training, and slow detection speed.Typical single-stage detection models such as SSD [17], RetinaNet [18], YOLO series [19][20][21][22][23], etc., which are characterized by the fact that they do not need to generate candidate frames to predict the target first and can simultaneously predict and classify the target.In general, they are less demanding on hardware, easier to configure, and faster to train and use; however, the total precision falls short of that achieved by two-stage detectors.Within the realm of wood defect detection utilizing deep learning techniques, several scholars have conducted related research work.Fan et al. [24] used the Resnet V2 structure to construct a Faster R-CNN framework for wood defect detection, and the model achieved a 95% correct detection rate for wood knots and a 98% correct detection rate for holes.However, this double-stage detection model, although obtaining high detection accuracy, compromises detection speed, failing to satisfy real-time demands in practical use.Meng et al. [25] proposed the SGN-YOLO model based on the YOLOv5 network, which improves the accuracy of wood defect detection by enhancing the network's learning ability and context awareness.The mAP (please see chapter 3.2 for detailed definition) value reaches 86.4%, which is 3.1% better than the original YOLOv5 model, and 7.1% and 13.6% better than the SSD and Faster R-CNN models, respectively.However, the method only improves mAP value by 3.1% over the original model, and it still has room for improvement in terms of mAP enhancement.Wang et al. [26] optimized the YOLOv3 network by using GridMask, Ghost structure, and improved confidence loss function, which improves the robustness of the network, and strengthens the generalization ability of network.The experimental results demonstrate an increase in the mAP value from the baseline of 80.8% to 86.5%, but the accuracy in crack and wormhole defects is not yet satisfactory.Gao et al. [27] proposed a bilinear fine-grained convolutional neural network (BLNN) for the integration of multiscale features utilizing a deep learning framework, which utilizes Pytorch 1.8.0 to train a sub-network composed of two convolutional neural networks and fuses the bilinear connections leveraging the trained sub-networks' feature extraction abilities so as to obtain the fine-grained features for identifying the defective images of wood.In their experiments, the mAP reached an impressive 99.2%.However, only one type of wood defects, "knots", was included in their dataset.Its effectiveness in detecting other types of wood defect is unknown.The work of the above researchers has promoted the development of wood defect detection to a certain extent.They have shown their ideas regarding how to improve the accuracy and speed of wood defect detection.However, to meet the high requirements of enterprises regarding the accuracy of wood defect detection in practical applications, the problem of accuracy in recognizing wood defect still needs to be further improved.
In view of the above analysis, among the various methods of wood defect detection, the network model based on deep learning is a cost-effective method.In this paper, the You Only Look Once version 7 (YOLOv7) network model is used as the research object for the problem of detecting and recognizing small targets such as wood defects, the Biformer attention mechanism, the Pconv partial convolution and the Normalized Wasserstein Distance (NWD) loss function were introduced.A wood defect detection model named BPN-YOLO (Biformer-Pconv-NWD-based YOLOv7) is proposed to enhance the perception and detection ability of YOLOV7 for various types of wood defects, improve detection accuracy, and meet practical application requirements.The primary contributions are outlined as follows: 1.
A wood defect detection model based on YOLOv7, named BPN-YOLO, is proposed, which improves the performance of the model for wood defect detection.

2.
By integrating the Biformer attention mechanism into BPN-YOLO, the model gains flexibility in computational allocation and content perception, enabling dynamic query-aware sparsity that enhances wood defect detection accuracy.

3.
By replacing the 3 × 3 ordinary convolution of the efficient layer aggregation network (ELAN) module of the backbone network with Pconv partial convolution, unnecessary redundant computations and memory accesses are reduced to improve the detection performance.

4.
By utilizing the NWD loss function, the sensitivity of the original loss function to the deviation of the position of small defect is solved, and thus the accuracy of wood defect detection is greatly improved.
The structure of this paper is organized as follows: The BPN-YOLO model proposed in this paper is described in detail in Section 2. The comparison and ablation experiments are conducted on our proposed model to verify the effectiveness in Section 3. The strengths and weaknesses of our proposed model are discussed and analyzed in Section 4. Section 5 provides conclusions and future research directions.

Materials Dataset Acquisition
We employed a substantial dataset of wood surface defects from the VSB Technical University in Ostrava, a institution renowned for its expertise in automated visual quality control, for our experimental research [28].The dataset contained a total of 20,275 images, of which 18,283 images had single or multiple defects and 1992 images had no defects at all.This dataset includes Live Knot, Crack, Quartzity, and ten other defects.The size of the images is 2800 × 1024.The various types of wood defects are shown in Figure 1.We screened out the defects of Quartzity, Blue stain and Overgrown with very small number of three defects, and then subsequently 3600 images were filtered from the original dataset to create a novel dataset specifically for this experiment.Ultimately, the screened images were partitioned into training and test sets at a ratio of 9:1.Furthermore, a 10% subset of the training set was allocated for validation purposes.The specific dataset construction information is shown in Table 1.were partitioned into training and test sets at a ratio of 9:1.Furthermore, a 10% subset of the training set was allocated for validation purposes.The specific dataset construction information is shown in Table 1.YOLOv7 [23], which is a single-stage target detection model, stands out for its rapid detection speed and high accuracy when compared to other existing algorithms, and it frames object detection as a regression task.The overall network structure of YOLOv7 is similar to that of YOLOv5, which also employs a backbone network structure based on FPNs (feature pyramid networks) and introduces Intersection over Union (IOU) screening and data enhancement techniques.Moreover, YOLOv7 employs the ELAN module and the maximum pooling layer (MPconv) module, which increases the ability of the network model regarding target feature extraction and feature fusion [29].The network structure of YOLOv7 comprises three primary components: the input layer, the backbone network, and the head module [30].Firstly, in the input module, preprocessing, scaling, and normalization operations are performed on the input image, and then the processed image is sent to backbone for feature extraction.In the backbone module, a deep CNN is utilized to capture features from images, while the head module analyzes this feature data at three different scales: large, medium, and small.Ultimately, defects are detected by generating candidate frames, classifier classification and position prediction.
The preprocessing stage uses splicing and hybrid data enhancement techniques, while the input module scales the input image to 640 × 640 size by scaling, cropping, or filling.The backbone module consists of multiple convolutional neural networks (CNNs).YOLOv7 [23], which is a single-stage target detection model, stands out for its rapid detection speed and high accuracy when compared to other existing algorithms, and it frames object detection as a regression task.The overall network structure of YOLOv7 is similar to that of YOLOv5, which also employs a backbone network structure based on FPNs (feature pyramid networks) and introduces Intersection over Union (IOU) screening and data enhancement techniques.Moreover, YOLOv7 employs the ELAN module and the maximum pooling layer (MPconv) module, which increases the ability of the network model regarding target feature extraction and feature fusion [29].The network structure of YOLOv7 comprises three primary components: the input layer, the backbone network, and the head module [30].Firstly, in the input module, preprocessing, scaling, and normalization operations are performed on the input image, and then the processed image is sent to backbone for feature extraction.In the backbone module, a deep CNN is utilized to capture features from images, while the head module analyzes this feature data at three different scales: large, medium, and small.Ultimately, defects are detected by generating candidate frames, classifier classification and position prediction.
The preprocessing stage uses splicing and hybrid data enhancement techniques, while the input module scales the input image to 640 × 640 size by scaling, cropping, or filling.The backbone module consists of multiple convolutional neural networks (CNNs).After the image is preprocessed, it first passes through four convolutional layers (i.e., CBS), and then passes through the ELAN module and MPconv module.The ELAN and MPconv modules enable the model to learn and converge more efficiently, while expanding the perceptual field of the current elemental layer, improving the generalization and robustness of the network [31].In Head, the receptive field is enlarged by the Spatial Pyramid Pooling and Cross Stage Partial Connections (SPPCSPC) module, which allows the network to adapt to images with different resolution sizes.A structure called ELAN2 is also used to learn the features after fusion at different scales, which further enhances the feature extraction.In the prediction phase, reparameter convolution (RePconv) is introduced, wherein this distinctive residual structure can be simplified to a basic convolution, thus reducing the network's complexity without sacrificing its prediction performance.Figure 2 illustrates the architecture of YOLOv7.
After the image is preprocessed, it first passes through four convolutional layers (i.e., CBS), and then passes through the ELAN module and MPconv module.The ELAN and MPconv modules enable the model to learn and converge more efficiently, while expanding the perceptual field of the current elemental layer, improving the generalization and robustness of the network [31].In Head, the receptive field is enlarged by the Spatial Pyramid Pooling and Cross Stage Partial Connections (SPPCSPC) module, which allows the network to adapt to images with different resolution sizes.A structure called ELAN2 is also used to learn the features after fusion at different scales, which further enhances the feature extraction.In the prediction phase, reparameter convolution (RePconv) is introduced, wherein this distinctive residual structure can be simplified to a basic convolution, thus reducing the network's complexity without sacrificing its prediction performance.Figure 2 illustrates the architecture of YOLOv7.

BPN-YOLO
Compared with other networks, although the YOLOv7 network has certain advantages in terms of inference speed and detection accuracy, there are still some shortcomings if it is directly applied to the task of wood defect detection.First of all, in the task of wood defect detection, most of the defects are small targets.Traditional network models exhibit limited capability in extracting features from these small targets and in learning their features effectively.Secondly, the original network consists of 104 layers, comprising numerous convolutional, pooling, and fully connected layers, leading to a proliferation of parameters during the detection process and escalating the network's complexity.As a result, the required resources for wood defect detection are more demanding, and the inference speed is slower.Finally, a loss function based on the IOU indicator is used in the original network.If the prediction frame and the real frame do not intersect or have the same intersection ratio during the inference process, the failure to reflect the size of the distance between the two frames will result, and the same loss function will be calculated.In this case, the loss function fails to offer the appropriate direction for the network's learning process.
In view of the above analysis, we optimize the network model based on YOLOv7 with the goal of improving the precision of detecting wood defects.The proposed BPN-YOLO network structure is illustrated in Figure 3.As depicted in Figure 3, in the backbone part, the ELAN module was optimized by replacing the 3 × 3 conventional Conv with Pconv to generate a new P-ELAN module (See the P-ELAN Module section in 2.2.2. for details).By introducing the P-ELAN module, the parameters and floating-point operations (FLOPs) of the network are reduced.The head section incorporates the Biformer dynamic sparse attention mechanism, which is integrated into the SPPCSPC module.This mechanism adaptively adjusts attention weights based on the input wood image features, allowing for varying levels of attention to be allocated to different positions or features.In addition, it takes advantage of the fact that the NWD metric is not affected by the positions of the predicted and real frames and has the advantages of scale invariance and positional bias smoothing.To enhance the network's suitability for detecting small wood defects, we adopt the NWD metric, replacing the traditional IOU metric.The traditional IOU metric's sensitivity to positional deviations and its limitations in detecting small wood defects are addressed by introducing the NWD metric.

BPN-YOLO
Compared with other networks, although the YOLOv7 network has certain advantages in terms of inference speed and detection accuracy, there are still some shortcomings if it is directly applied to the task of wood defect detection.First of all, in the task of wood defect detection, most of the defects are small targets.Traditional network models exhibit limited capability in extracting features from these small targets and in learning their features effectively.Secondly, the original network consists of 104 layers, comprising numerous convolutional, pooling, and fully connected layers, leading to a proliferation of parameters during the detection process and escalating the network's complexity.As a result, the required resources for wood defect detection are more demanding, and the inference speed is slower.Finally, a loss function based on the IOU indicator is used in the original network.If the prediction frame and the real frame do not intersect or have the same intersection ratio during the inference process, the failure to reflect the size of the distance between the two frames will result, and the same loss function will be calculated.In this case, the loss function fails to offer the appropriate direction for the network's learning process.
In view of the above analysis, we optimize the network model based on YOLOv7 with the goal of improving the precision of detecting wood defects.The proposed BPN-YOLO network structure is illustrated in Figure 3.As depicted in Figure 3, in the backbone part, the ELAN module was optimized by replacing the 3 × 3 conventional Conv with Pconv to generate a new P-ELAN module (See the P-ELAN Module section in 2.2.2. for details).By introducing the P-ELAN module, the parameters and floating-point operations (FLOPs) of the network are reduced.The head section incorporates the Biformer dynamic sparse attention mechanism, which is integrated into the SPPCSPC module.This mechanism adaptively adjusts attention weights based on the input wood image features, allowing for varying levels of attention to be allocated to different positions or features.In addition, it takes advantage of the fact that the NWD metric is not affected by the positions of the predicted and real frames and has the advantages of scale invariance and positional bias smoothing.To enhance the network's suitability for detecting small wood defects, we adopt the NWD metric, replacing the traditional IOU metric.The traditional IOU metric's sensitivity to positional deviations and its limitations in detecting small wood defects are addressed by introducing the NWD metric.

P-ELAN Module
The ELAN module, which plays a crucial role in the model, is composed of multiple regular convolutional layers and requires multiple convolution operations at runtime, which leads to a huge amount of data in the model and affects the inference time of the computation.Therefore, to reduce the computational effect as well as to improve the performance, we introduce a convolution operation called partial convolution [32].Pconv further optimizes the size of the computation and reduces the number of FLOPs by exploiting the redundancy of the feature map [33].In this way, only some of the input channels use regular convolution to extract features without affecting the other channels, which not only reduces the computational redundancy and memory access but also improves inference speed and detection accuracy.Based on the above considerations, Pconv is introduced into the ELAN module by replacing Pconv with 3 × 3 Conv in CBS, and the P-ELAN module is proposed.This module reduces computational parameters without affecting the overall architecture and more efficiently extracts the features while increasing the speed of inference detection.Figure 4 depicts the configuration of the P-ELAN module.

P-ELAN Module
The ELAN module, which plays a crucial role in the model, is composed of multiple regular convolutional layers and requires multiple convolution operations at runtime, which leads to a huge amount of data in the model and affects the inference time of the computation.Therefore, to reduce the computational effect as well as to improve the performance, we introduce a convolution operation called partial convolution [32].Pconv further optimizes the size of the computation and reduces the number of FLOPs by exploiting the redundancy of the feature map [33].In this way, only some of the input channels use regular convolution to extract features without affecting the other channels, which not only reduces the computational redundancy and memory access but also improves inference speed and detection accuracy.Based on the above considerations, Pconv is introduced into the ELAN module by replacing Pconv with 3 × 3 Conv in CBS, and the P-ELAN module is proposed.This module reduces computational parameters without affecting the overall architecture and more efficiently extracts the features while increasing the speed of inference detection.Figure 4 depicts the configuration of the P-ELAN module.

Biformer Attention Mechanism
In the wood defect detection process, the target area occupies only a small portion of the entire wood surface, which is characterized by fewer pixels and weaker feature representations.This causes it to be a challenge for the model to extract effective defect feature information from wood.The YOLOv7 is a deep neural network model, which usually uses CNN as a basis.By segmenting the image into grids, it performs defect detection by predicting the bounding box and defect category for each individual grid.The CNN is essentially a localized process and therefore cannot access the feature relationship between local and global [34].To solve the above problems, attention mechanisms can be incorporated into the feature extraction stage.The attention mechanism, a widely adopted approach, enhances model accuracy by assigning varying weights to different input parts in the neural network, thereby improving feature extraction in complex scenarios [35].This method enables the model to concentrate on pertinent information while disregarding irrelevant information, leading to enhanced model performance.
While traditional attention mechanisms have advantages in capturing long-range contextual dependencies, they typically require more memory and are more computationally expensive.The attention mechanisms need to compute the correlation between each input position and other positions during computation, which leads to higher computational complexity.To alleviate the problems of large memory, high computational cost, and lack of accuracy in detecting small targets, researchers have proposed the use of sparse queries that focus only on key-value pairs [36].Various related researches have arisen based on this, including notions such as deformability, expansion, and localization [34].However, the majority of these methods attempt to address the issue by infusing the attention mechanism with manually designed and content-agnostic sparsity.These approaches all share a common limitation, relying on manually crafted static patterns and

Biformer Attention Mechanism
In the wood defect detection process, the target area occupies only a small portion of the entire wood surface, which is characterized by fewer pixels and weaker feature representations.This causes it to be a challenge for the model to extract effective defect feature information from wood.The YOLOv7 is a deep neural network model, which usually uses CNN as a basis.By segmenting the image into grids, it performs defect detection by predicting the bounding box and defect category for each individual grid.The CNN is essentially a localized process and therefore cannot access the feature relationship between local and global [34].To solve the above problems, attention mechanisms can be incorporated into the feature extraction stage.The attention mechanism, a widely adopted approach, enhances model accuracy by assigning varying weights to different input parts in the neural network, thereby improving feature extraction in complex scenarios [35].This method enables the model to concentrate on pertinent information while disregarding irrelevant information, leading to enhanced model performance.
While traditional attention mechanisms have advantages in capturing long-range contextual dependencies, they typically require more memory and are more computationally expensive.The attention mechanisms need to compute the correlation between each input position and other positions during computation, which leads to higher computational complexity.To alleviate the problems of large memory, high computational cost, and lack of accuracy in detecting small targets, researchers have proposed the use of sparse queries that focus only on key-value pairs [36].Various related researches have arisen based on this, including notions such as deformability, expansion, and localization [34].However, the majority of these methods attempt to address the issue by infusing the attention mechanism with manually designed and content-agnostic sparsity.These approaches all share a common limitation, relying on manually crafted static patterns and employing a shared subset of key-value pairs across all queries.This results in modules that are not able to be adaptive and do not interfere with each other.To address these issues, Leizhu and his team innovatively proposed the Biformer attention mechanism [37], which is a dynamic, queryaware sparse attention mechanism.We introduced the Biformer attention mechanism into the model's backbone network.The attention mechanism filters out the most irrelevant key-value pairs at the coarse region level to eliminate extraneous defect information, after which token-to-token attention is calculated for the remaining regions.The introduced Biformer attention mechanism can reduce the computation and storage consumption and at the same time can enhance the model's perception of defective information.which is a dynamic, query-aware sparse attention mechanism.We introduced the Bi-former attention mechanism into the model's backbone network.The attention mechanism filters out the most irrelevant key-value pairs at the coarse region level to eliminate extraneous defect information, after which token-to-token attention is calculated for the remaining regions.The introduced Biformer attention mechanism can reduce the computation and storage consumption and at the same time can enhance the model's perception of defective information.
Figure 5 illustrates the architecture of the Biformer attention mechanism.Introducing the Biformer attention mechanism enables the network to emphasize global features while maintaining consideration of the relationship with local features.By employing this method, the model can more effectively differentiate between the background and defects of the identified target, thereby enhancing the accuracy of wood surface defect detection.

NWD Loss Function
Recognizing small targets has always been a challenge for object detection, and the defect detection in wood in this paper belongs to the category of small target detection.For example, when detecting Crack, Dead_knots, Live_Knot, etc., the small and elongated nature of the defects may lead to the loss of crucial features during the feature extraction process or even result in wood defects being overlooked altogether, thereby diminishing the accuracy of wood defect detection.In YOLOv7, the IOU metric is used to measure small targets.However, the IOU metric is highly sensitive to the offset of small target displacement, and small positional fluctuations may cause a huge change in the IOU metric [38].Due to this issue, small bounding box positional shifts may make the IOU lower the set threshold, and misclassification of positive samples as negative samples occurs.This case will result in similarity between positive and negative samples.In addition, convergence of the network will become difficult.
To address the above issues, we introduced the utilization of a novel metric known as NWD as a substitute for the IOU metric [39].NWD is a similarity analysis method in which the degree of overlap of the prediction frames does not have much effect [40].Specifically, we represent the predicted bounding box and the true value bounding box as two-dimensional Gaussian distributions.Subsequently, we employ NWD to calculate the

NWD Loss Function
Recognizing small targets has always been a challenge for object detection, and the defect detection in wood in this paper belongs to the category of small target detection.For example, when detecting Crack, Dead_knots, Live_Knot, etc., the small and elongated nature of the defects may lead to the loss of crucial features during the feature extraction process or even result in wood defects being overlooked altogether, thereby diminishing the accuracy of wood defect detection.In YOLOv7, the IOU metric is used to measure small targets.However, the IOU metric is highly sensitive to the offset of small target displacement, and small positional fluctuations may cause a huge change in the IOU metric [38].Due to this issue, small bounding box positional shifts may make the IOU lower the set threshold, and misclassification of positive samples as negative samples occurs.This case will result in similarity between positive and negative samples.In addition, convergence of the network will become difficult.
To address the above issues, we introduced the utilization of a novel metric known as NWD as a substitute for the IOU metric [39].NWD is a similarity analysis method in which the degree of overlap of the prediction frames does not have much effect [40].Specifically, we represent the predicted bounding box and the true value bounding box as two-dimensional Gaussian distributions.Subsequently, we employ NWD to calculate the comparability between these two distributions.The NWD metric possesses a number of advantages over IOU metrics, especially in small target detection.The first is scale invariance.After the introduction of the NWD metric, wood defects are less likely to be categorized as negative samples, and the network convergence becomes faster.The second is the smoothing of the positional fluctuations; unlike the IOU metric, the NWD metric is not sensitive to the positional fluctuations, which means that even if the predicted box and the real box are shifted in their positions, it is still possible to obtain valid computational data.In addition, the NWD metric is capable of assessing the similarity of bounding boxes that neither overlap nor encompass one another, which helps the model to measure the distance between the prediction box and the real box without the gradient disappearing in case the two boxes do not intersect due to large positional fluctuations.

Experiment and Results
In this part of the experiment, we utilized a dataset containing 3600 images of wood defects and designed a series of ablation and comparison experiments around the AP and mAP values to evaluate the model's capability by adding the above improved parts to the original model to verify its impact on the model.Meanwhile, comparisons are made with conventional models such as YOLOv5, YOLOv7, YOLOv8, YOLOv9, RT-DETR, and Faster R-CNN as a way to fully demonstrate the performance of our proposed BPN-YOLO model for detecting wood defect.

Experimental Details
This study's training and testing procedures were conducted utilizing the PyTorch 1.7.1 deep learning framework, with computation acceleration provided by CUDA 11.0.The software environment for these experiments involved the Windows 11 operating system, and the Python 3.8 programming language served as the primary development tool.For hardware, an Intel ® Core TM i9-13900HX CPU at 2.20 GHz from Intel Corporation (Santa Clara, CA, USA) and an NVIDIA GeForce RTX 4060 Laptop GPU in NVIDIA Corporation (Santa Clara, CA, USA) were used.They are both headquartered in Santa Clara, California.For the deployment environment, a standard set of hyper-parameters was used for all experiments conducted on different architectures to ensure the accuracy of the experiments.We maintained a uniform batch size of 4 and conducted training over 200 epochs; training time is about 9.788 h.

Performance Evaluation
To assess the performance of the BPN-YOLO algorithm in wood defect detection, we selected two evaluation metrics: average precision (AP) and mean average precision (mAP), which are instrumental in validating the rationality of our proposed approach.Precision (P) represents the proportion of accurately detected wood defects out of all detected wood defects, while Recall (R) signifies the proportion of accurately detected wood defects out of all correctly detected and undetected defects.It is calculated as shown in Equations ( 1) and (2).Precision = TP TP+FP (1) where TP denotes the count of samples correctly classified as wood defects, FP refers to the count of samples where non-wood defects are mistakenly identified as wood defects, and FN refers to the count of samples where actual wood defects are misclassified as background.AP represents the correlation between P and R, with its value equivalent to the area under the Precision-Recall (PR) curve.This metric assesses the algorithm's sensitivity across different categories and provides an overall reflection of its performance.
The formula for AP is given by where P represents precision and R represents recall.mAP represents the average AP for all categories of wood defects and the formula is as follows: In Equation ( 4), i represents the category of the object being detected, C represents the type of the object category being detected, which is 7 in this experiment, and AP is the same as above as a measure of the average accuracy of recognizing the same category.

Ablation Experiments
In this section, we explore the effectiveness of our three proposed improvements in increasing the efficiency of wood defect detection, and we conduct a series of ablation experiments with the same parameters and in the same environment to ensure the scientific accuracy of the experiments.The experimental data are shown in Table 2.We conducted four sets of experiments, added them to different positions in the network, and compared them with the original YOLOv7 algorithm using the mAP and AP values.YOLOv7 + BF denotes the addition of Biformer's attention mechanism after the SPPCSPC module in the original network, YOLOv7 + PC denotes the replacement of all the 3 × 3 regular convolutions with partial convolutions in the ELAN module in the backbone network, YOLO + NWD denotes the replacement of the IOU metrics with NWD metrics in the original loss function, and BPN-YOLO denotes our proposed improved model.Based on the results of the ablation experiments, we can see that compared with the YOLOv7 model, the mAP value is improved by 2.8%, 2.8%, and 4.7% when the Biformer attention mechanism, Pconv convolution, and NWD loss function are introduced individually.When the model is introduced with all three improved modules at the same time, the mAP value reaches the highest value of 81.8%, which is higher than the original network module by 7.4%.This illustrates the enhanced performance of our wood defect detection achieved through the application of our newly proposed BPN-YOLO network model.
To validate the effectiveness of the Biformer attention mechanism in enhancing the accuracy of wood defect detection, rigorous experiments were carried out.Five classical attention mechanisms were added to the original YOLOv7 network, which are ECA [41], CBAM [42], EMA [43], SK [44], and SSA [45], respectively.YOLOv7 denotes the original network, YOLOv7 + ECA denotes the addition of the ECA attention mechanism, and so on.The results of the comparison of different attention mechanisms are shown in Table 3.We analyzed Table 3 and came to the following conclusions: (1) The addition of the ECA attention mechanism improves the mAP value by 2.3% over the original model.This is because the ECA attention mechanism is able to evaluate the importance of each feature channel, and by assigning different weights to each channel, it enables the network to pay more attention to the feature channels that are critical to the task.However, ECA reduces the computational complexity by global average pooling, so this operation may lose some positional information.(2) CBAM improves the mAP value by 0.9%.Because CBAM fuses channel attention and spatial attention together, it enables the model to capture important feature information in different dimensions simultaneously.This fusion helps the model to understand the intrinsic structure of the input data more comprehensively, which improves the accuracy of detection.However, CBAM extracts attentional information over space and channels by performing multiple convolutions and pooling of the input, which increases computational complexity and reduces real-time responsiveness.(3) The mAP value of EMA is reduced by 3.1%.This may be due to the fact that EMA focuses on different feature regions at multiple scales of attentional mechanisms.In some cases, this multiscale attention may lead to distraction of attention from effectively focusing on the critical regions of wood defects, thus affecting detection accuracy.(4) The mAP value of SK decreased by 1.5%.This may be due to the SK mechanism tuning the receptive field by dynamically selecting the convolutional kernel size, but this may not match the existing receptive field design in the YOLOv7 network.Features of wood defects may require a specific size of receptive field to be detected optimally, and the dynamic tuning of SK attention may fail to optimize this.(5) The mAP value of SSA was reduced by 0.9%.This may be due to the fact that SSA introduces attentional weights by introducing them at different locations in the sequence, which may lead to a dispersion of the model's attentional resources.
In the wood defect detection task, the model may need to concentrate on specific defect features, and the introduction of SSA may make it difficult for the model to concentrate on this critical information, thus affecting the detection accuracy.( 6) Biformer provides the highest improvement in mAP value among all the comparison modules, with a 2.8% improvement over the original model.This is because Biformer is able to consider both global and local information of the input sequence.It efficiently captures long-range dependencies and local patterns in sequences by using two attention heads, one focusing on global information and one on local information.This design helps to better handle the complex and multi-scale features in the wood defect detection task.
Based on the above analysis, we can see that the mAP values of the ECA and CBAM attention mechanisms increased by 2.3% and 0.7%, respectively, after they were added to the original YOLOv7 network.mAP values of the EMA, SK, and SSA attention mechanisms decreased by 3.1%, 1.5%, and 0.9%, respectively.While both the ECA and CBAM modules showed improvement for accuracy, the Biformer attention mechanism also increased the mAP values by 0.5% and 2.1% compared to ECA and CBAM, respectively.The experimental and analytical results show that the Biformer attention mechanism is the optimal module, so we introduced the Biformer attention mechanism into the network.

Comparisons with Other Methods and Experiments
In this section, several comparison experiments with several state-of-the-art target detection models were carried out, and the performance of BPN-YOLO model in wood defect detection was further verified.
The PR curve is an image that can visualize the AP value, which is short for precisionrecall curve.The larger the area enclosed under the PR curve, the greater the indication that the performance of model is better, and when the value is equal to 1, it means that the model perfectly detects all targets.Figure 6 shows the AP values of the seven types of wood defects used in this experiment, and (a), (b), (c), (d), (e), and (f) are the AP values corresponding to YOLOv5, YOLOv7, YOLOv8, YOLOv9; RT-DETR, and the BPN-YOLO, respectively.It can be seen that the AP values of the BPN-YOLO model in these seven types of wood defects are almost all higher than those of the conventional model, especially in the Knot_Missing-type defects, which are 32.2%, 10.6%, 14%, 8.4% and 13.5% higher than the others, respectively.
The detection outcomes for YOLOv5, YOLOv7, YOLOv8, YOLOv9, and BPN-YOLO on the seven distinct wood defect types have been visually represented, with the findings presented in Figure 7 and Table 4.As is evident from Table 4, the performance of BPN-YOLO in processing all kinds of defects is faster than the original YOLOv7, and it also surpasses YOLOv5 in overall speed, which demonstrates a very high processing speed in the wood defect detection task.In Figure 7, the number shown on each detection box denotes the confidence level, ranging from 0 to 1.This metric reflects the model's certainty in detecting the defects: values closer to 1 indicate high confidence that the defect exists, whereas values closer to 0 indicate low confidence or uncertainty regarding the presence of the defect.From Figure 7, it can be seen that the model can generate more accurate prediction frames, especially on defects such as Live_Knot, Marrow, and Resin, with significantly higher confidence than the others, and there are no cases of erroneous detection or missed detection of wood defects.Overall, BPN-YOLO maintains an extremely high inference speed while improving detection accuracy.work.

Comparisons with Other Methods and Experiments
In this section, several comparison experiments with several state-of-the-art target detection models were carried out, and the performance of BPN-YOLO model in wood defect detection was further verified.
The PR curve is an image that can visualize the AP value, which is short for precision-recall curve.The larger the area enclosed under the PR curve, the greater the indication that the performance of model is better, and when the value is equal to 1, it means that the model perfectly detects all targets.Figure 6 shows the AP values of the seven types of wood defects used in this experiment, and (a), (b), (c), (d), (e), and (f) are the AP values corresponding to YOLOv5, YOLOv7, YOLOv8, YOLOv9; RT-DETR, and the BPN-YOLO, respectively.It can be seen that the AP values of the BPN-YOLO model in these seven types of wood defects are almost all higher than those of the conventional model, especially in the Knot_Missing-type defects, which are 32.2%, 10.6%, 14%, 8.4% and 13.5% higher than the others, respectively.The detection outcomes for YOLOv5, YOLOv7, YOLOv8, YOLOv9, and BPN-YOLO on the seven distinct wood defect types have been visually represented, with the findings presented in Figure 7 and Table 4.As is evident from Table 4, the performance of BPN-YOLO in processing all kinds of defects is faster than the original YOLOv7, and it also surpasses YOLOv5 in overall speed, which demonstrates a very high processing speed in the wood defect detection task.In Figure 7, the number shown on each detection box denotes the confidence level, ranging from 0 to 1.This metric reflects the model's certainty Furthermore, the gradient-weighted class activation mapping (Grad-CAM) technique was employed to enhance the adequacy and rationality of the visualization aspect within the experimental process.Grad-CAM is a method that aids in visualizing the decisionmaking mechanisms of deep convolutional neural networks.Through heat mapping, it enables us to clearly observe the region of interest and the level of attention of the model in the defective category.If the model correctly identifies a defect in the wood, the Grad-CAM image will show a highlighting response in the defective region; i.e., it will generate different shades of color on the image.The redder and darker the color of a region, the more attention the model pays to those regions.The Grad-CAM method was utilized to derive a visualization of the experimental outcomes, as depicted in Figure 8. From Figure 8, we can clearly see that compared to the other models, the BPN-YOLO model has redder and darker colors in the region where the wood defects are located, and it aggregates the target region more accurately.Furthermore, the gradient-weighted class activation mapping (Grad-CAM) technique was employed to enhance the adequacy and rationality of the visualization aspect within the experimental process.Grad-CAM is a method that aids in visualizing the decision-making mechanisms of deep convolutional neural networks.Through heat mapping, it enables us to clearly observe the region of interest and the level of attention of the model in the defective category.If the model correctly identifies a defect in the wood, the Grad-CAM image will show a highlighting response in the defective region; i.e., it will generate different shades of color on the image.The redder and darker the color of a region, the more attention the model pays to those regions.The Grad-CAM method was utilized to derive a visualization of the experimental outcomes, as depicted in Figure 8. From Figure 8, we can clearly see that compared to the other models, the BPN-YOLO model has redder  To validate the efficacy of the introduced BPN-YOLO model for wood defect detection further, six mainstream target detection models, namely YOLOv5, YOLOv7, YOLOv8, YOLOv9, RT-DETR, and Faster R-CNN, were chosen for comparison.The results of the comparative experiments are summarized in Table 5. Upon analyzing the data in Table 5, it is evident that the BPN-YOLO exhibits superior AP and mAP values compared to other models.

Discussion
Wood is an important construction and manufacturing material, and its quality directly affects the safety and quality of construction and manufacturing projects.However, traditional wood defect detection methods have many shortcomings, such as relying on To validate the efficacy of the introduced BPN-YOLO model for wood defect detection further, six mainstream target detection models, namely YOLOv5, YOLOv7, YOLOv8, YOLOv9, RT-DETR, and Faster R-CNN, were chosen for comparison.The results of the comparative experiments are summarized in Table 5. Upon analyzing the data in Table 5, it is evident that the BPN-YOLO exhibits superior AP and mAP values compared to other models.

Discussion
Wood is an important construction and manufacturing material, and its quality directly affects the safety and quality of construction and manufacturing projects.However, traditional wood defect detection methods have many shortcomings, such as relying on manual visual inspection, low efficiency, and susceptibility to subjective factors.In contrast, the use of deep learning technology enables automated and efficient wood defect detection, improves detection accuracy and speed, and reduces labor and time costs.
In this study, wood defect images collected by VSB Technical University in Ostrava were used as a dataset.Based on YOLOv7 network, BPN-YOLO was proposed by combining Biformer attention mechanism, Pconv partial convolution and NWD loss function.BPN-YOLO showed excellent performance in wood defect detection, which was 7.4% higher than the original model's mAP value and reached 81.8%.Among them, the defect detection accuracy for cracks reached 79.7% exceeding the 77.21% in the literature's [26] improved method using YOLOV3.Moreover, the literature's [27] method using a bilinear fine-grained convolutional neural network with BLNN multiscale feature fusion only detects one kind of defects in knots, although its defect detection accuracy for wood reaches an amazing 99.2%.In contrast, we conducted experiments on seven types of defects, and the mAP was 81.8%, which is more widely applied than the method in the literature [27].In addition, the mAP value of our proposed BPN-YOLO network (81.8%) is likewise higher than that in the literature's [46] improved YOLOv8 network (76.5%) using the SE attention mechanism and GVC neck structure.
Although our proposed network achieves good results, there are certain limitations.Our approach needs to be optimized in terms of detection speed as well as network size.Although BPN-YOLO incorporates Pconv partial convolution, which reduces the number of parameters in the network, the model is complicated by the introduction of the Biformer attention mechanism.This leads to the problem that the final network parameters are larger and the detection speed is reduced.
In conclusion, the BPN-YOLO proposed in this study has great potential for wood defect recognition.The application of this technology has a wide range of practical value and can play an important role in the fields of wood processing and quality inspection, thus enhancing the quality and market competitiveness of wood products.

Conclusions
In this paper, based on the study of the existing YOLOv7 network model, the BPN-YOLO model is proposed to address the problem of poor accuracy in wood defect detection.To enhance the model's perception of the image content of wood defects and improve the accuracy of small target detection, the Biformer attention mechanism is employed, and the mAP value is thus improved by 2.8%.To reduce unnecessary calculations and realize the improvement of speed and accuracy, the P-ELAN module is proposed and replaced the regular convolution of the ELAN module of the backbone network; the mAP value is improved by 2.8% too, and the inference time is shorter than that of the original YOLOv7 model.To balance the allocation of positive and negative samples in the model and attenuate the sensitivity of the IOU loss function to fluctuations in defect location, the NWD loss function is adopted, and the mAP value is improved by 4.7%.Comparison experimental results with mainstream target detection models show that the BPN-YOLO model proposed in this paper with achieved significant values in several evaluation indicators.Among them, the mAP value reaches 81.8%, which is 11.8%, 7.4%, 5.3%, 6.8%, 13.5% and 21.2% higher than YOLOv5, YOLOv7, YOLOv8, YOLOv9, RT-DETR and Faster R-CNN models, respectively.The BPN-YOLO model is capable of identifying and localizing wood defects more quickly and accurately.Our research proposes an innovative model for wood defect detection and localization, and provides an effective solution in this field.In our future work, we aim to further improve the detection speed and continue to optimize the model structure to ensure the stability of the model accuracy while improving the detection speed.

Figure 5
Figure5illustrates the architecture of the Biformer attention mechanism.Introducing the Biformer attention mechanism enables the network to emphasize global features while maintaining consideration of the relationship with local features.By employing this method, the model can more effectively differentiate between the background and defects of the identified target, thereby enhancing the accuracy of wood surface defect detection.

Table 1 .
Defect distribution in the dataset.

Table 1 .
Defect distribution in the dataset.

Table 2 .
Results of ablation experiment.

Table 3 .
The improvement of different attention modules.

Table 4 .
Comparison of experimental inference time.

Table 4 .
Comparison of experimental inference time.

Table 5 .
Comparison of various detection models.

Table 5 .
Comparison of various detection models.