WDNET-YOLO: Enhanced Deep Learning for Structural Timber Defect Detection to Improve Building Safety and Reliability

Lin, Xiaoxia; Gong, Weihao; Sun, Lin; Yang, Xiaodong; Leng, Chunwei; Li, Yan; Niu, Zhenyu; Meng, Yingzhou; Xiao, Xinyue; Zhang, Junyan

doi:10.3390/buildings15132281

Open AccessArticle

WDNET-YOLO: Enhanced Deep Learning for Structural Timber Defect Detection to Improve Building Safety and Reliability

by

Xiaoxia Lin

¹,

Weihao Gong

¹

,

Lin Sun

^1,*,

Xiaodong Yang

¹,

Chunwei Leng

²,

Yan Li

²,

Zhenyu Niu

²,

Yingzhou Meng

¹

,

Xinyue Xiao

¹ and

Junyan Zhang

¹

College of Intelligent Equipment, Shandong University of Science and Technology, Taian 271001, China

²

Hanqing Data Consulting Co., Ltd., Zibo 255000, China

^*

Author to whom correspondence should be addressed.

Buildings 2025, 15(13), 2281; https://doi.org/10.3390/buildings15132281

Submission received: 5 June 2025 / Revised: 24 June 2025 / Accepted: 26 June 2025 / Published: 28 June 2025

(This article belongs to the Section Building Structures)

Download

Browse Figures

Versions Notes

Abstract

Structural timber is an important building material, but surface defects such as cracks and knots seriously affect its load-bearing capacity, dimensional stability, and long-term durability, posing a significant risk to structural safety. Conventional inspection methods are unable to address the issues of multi-scale defect characterization, inter-class confusion, and morphological diversity, thus limiting reliable construction quality assurance. To overcome these challenges, this study proposes WDNET-YOLO: an enhanced deep learning model based on YOLOv8n for high-precision defect detection in structural wood. First, the RepVGG reparameterized backbone utilizes multi-branch training to capture critical defect features (e.g., distributed cracks and dense clusters of knots) across scales. Second, the ECA attention mechanism dynamically suppresses complex wood grain interference and enhances the discriminative feature representation between high-risk defect classes (e.g., cracks vs. knots). Finally, CARAFE up-sampling with adaptive contextual reorganization improves the sensitivity to morphologically variable defects (e.g., fine cracks and resin irregularities). The analysis results show that the mAP50 and mAP50-95 of WDNET-YOLO are improved by 3.7% and 3.5%, respectively, compared to YOLOv8n, while the parameters are increased by only 4.4%. The model provides a powerful solution for automated structural timber inspection, which directly improves building safety and reliability by preventing failures caused by defects, optimizing material utilization, and supporting compliance with building quality standards.

Keywords:

structural timber defect detection; timber quality control; YOLOv8; building materials

1. Introduction

Structural timber serves as a fundamental building material in modern construction, extensively utilized in load-bearing components such as beams, columns, and trusses due to its favorable strength-to-weight ratio, sustainability, and aesthetic properties [1]. However, surface defects—including cracks, dead knots, live knots, resin accumulation, decay, and insect damage—significantly deteriorate its mechanical performance, dimensional stability, and long-term structural reliability [2]. These imperfections create localized stress concentrations, reduce load-carrying capacity, accelerate moisture penetration, and ultimately compromise building integrity, posing substantial safety risks throughout a structure’s service life [3]. Consequently, developing efficient and accurate defect detection technologies is imperative for ensuring construction safety, extending building lifespan, complying with international timber grading standards [4,5], and minimizing material waste in architectural applications.

Conventional inspection methodologies primarily depend on manual visual assessment or rule-based image processing techniques [6,7,8]. Manual inspection suffers from critical limitations: low efficiency (requiring trained specialists), high labor costs, susceptibility to subjective judgment, and inconsistent defect identification—particularly for subtle or morphologically complex flaws [9]. While automated image processing improves throughput, its reliance on handcrafted feature extractors (e.g., edge detectors, texture filters) proves inadequate for distinguishing defects from intricate wood grain patterns or adapting to the high variability of natural imperfections (e.g., irregular crack propagation, knot clusters) [10]. These deficiencies hinder reliable quality assurance in construction-grade timber, where undetected defects may propagate into catastrophic structural failures during extreme loading events [11].

The emergence of deep learning has revolutionized defect detection through its capacity for adaptive feature extraction and high-precision classification [12,13]. There mainly exist single-stage and two-stage detection algorithms in the field of target detection [14]. Two-stage algorithms (e.g., the R-CNN series) first generate candidate frames and then perform feature classification, with high accuracy but large computation and slow speed. A typical representative, Faster R-CNN [15] improves the detection speed by using a region suggestion network (RPN), but its complex structure is still difficult to meet the real-time demand. The improved algorithm Mask R-CNN adds segmentation branches on the basis of detection [16], which is suitable for scenarios requiring accurate segmentation such as medical images, but the higher computational complexity further restricts real-time applications. Single-stage detection algorithms directly complete the detection through a single network, which has the advantages of fast speed and real-time performance [17]. SSD algorithms utilize multi-scale feature maps to detect targets of different sizes [18], which is suitable for real-time scenarios such as image search [19]; the YOLO series is known for its balance of speed and accuracy, and YOLOv1 dramatically improves the speed through a regressive detection framework [20,21], and the subsequent versions. The performance is continuously strengthened by structural optimization [22], but there are still defects in small target leakage detection and insufficient adaptability to complex scenes [23].

In addition to the mainstream target detection algorithms mentioned above, there are also some improved models and methods for specific problems. For example, RetinaNet solves the problem of positive and negative sample imbalance in target detection by introducing the Focal Loss loss function, which significantly improves the detection performance of small objects [24]. Focal Loss effectively mitigates the problem of category imbalance by reducing the weight of the easy-to-categorize samples and focusing more attention on the difficult-to-categorize samples. EfficientDet, on the other hand, achieves state-of-the-art performance on multiple benchmark datasets through an efficient backbone network and multi-scale feature fusion [25], but it requires a large amount of computational resources. In addition, Transformer-based target detection models (e.g., DETR) can better handle long-range dependencies by introducing an attention mechanism, which is suitable for target detection in complex scenes [26]. However, the training process of DETR is more complex, and the inference speed is slow, which limits its wide use in real-time applications [27]. In addition to object detection frameworks, semantic segmentation models such as DeepLab offer pixel-level defect localization capabilities. By leveraging atrous convolution and spatial pyramid pooling, DeepLab achieves fine-grained segmentation of irregular defects (e.g., crack propagation paths) while maintaining resolution fidelity [28]. Meanwhile, EfficientNet optimizes computational efficiency through compound scaling of network depth, width, and resolution, enabling high-accuracy defect classification with minimal resource overhead [29]. These approaches provide complementary solutions for scenarios requiring detailed morphological analysis or edge-device deployment.

In the practical application of structural timber defect detection, the selection of target detection models needs to comprehensively consider the balance of detection accuracy, real-time performance, and computational resources. For example, FastYolo significantly improves the inference speed while ensuring high accuracy by optimizing the model architecture and introducing a lightweighting strategy [30], which is suitable for use in resource-constrained environments. And the improved models based on YOLOv8, such as LCS-YOLOv8 [31], further improve the detection accuracy and efficiency by introducing a lightweight feature extraction module and a multi-scale detection head, which is suitable for high-precision target detection tasks. Additionally, some studies have attempted to combine traditional image processing techniques with deep learning to enhance the robustness and adaptability of detection [32]. For example, Chen et al. [33] employed HOG features combined with an SVM classifier in wood defect detection, achieving acceptable recall rates under controlled imaging conditions (such as uniform lighting or CT scans). However, this method faces significant limitations in complex-textured scenes: textural interference in knotty regions leads to a substantial increase in false positive rates, while sub-millimeter defects result in significantly higher false negative rates due to insufficient feature extraction compared to conventional defects. This indicates that while deep learning techniques have made significant progress in structural wood defect detection, the path of integrating traditional and deep learning approaches still needs to address such challenges [34,35].

This research addresses the critical need for efficient structural wood defect detection in practical production environments. We focus on eight safety-critical surface defects: quartz, live knot, marrow, resin, dead knot, knot with crack, missing knot, and crack. These imperfections present fundamental challenges characterized by multi-scale coexistence complicating hierarchical feature extraction, inter-class similarity hindering accurate defect discrimination, and high morphological diversity requiring precise geometric reconstruction. However, traditional detection technologies often fall short when faced with these challenges.

To enable robust detection, we curated a specialized dataset from the VSB Technical University benchmark, comprising 4000 high-resolution RGB images (2800 × 1024 pixels) with pixel-level annotations. The original pixel values range from 0 to 255 across the three color channels, targeting the aforementioned eight defect classes critical for structural integrity assessment.

Therefore, developing efficient and accurate detection technologies that are suitable for practical deployment and can effectively address the shortcomings of traditional detection technologies in the face of these core challenges is of great practical significance. To this end, we propose WDNET-YOLO, an enhanced framework based on YOLOv8n, which aims to systematically overcome these limitations through three key innovations:

Enhanced RepVGG Backbone: Captures hierarchical defect features through multi-branch structural reparameterization, addressing multi-scale detection challenges [36].
Efficient Channel Attention: Enhances discriminative feature representation via dynamic channel recalibration, overcoming inter-class similarity limitations [37].
CARAFE Up-sampling: Preserves morphological details with instance-adaptive kernels, accommodating high geometric variability [38].

2. Materials and Methods

2.1. YOLOv8

YOLOv8 achieves the synergistic optimization of accuracy and speed in real-time target detection [39]. As shown in Figure 1, its architecture consists of a multi-scale feature extraction backbone (Backbone), a feature enhancement network (Neck), and a decoupled detection head (Head). The Backbone network is based on a modified CSPDarknet architecture, using C2f modules instead of the traditional C3 modules [40]. C2f enhances the sensitivity to small defects (e.g., insect holes) while reducing the number of references by 25% through cross-stage feature interactions with a dual-path design (3 × 3 depth-separable convolution and 1 × 1 cross-channel convolution). The feature enhancement network uses a streamlined version of the PAN-FPN architecture [41] to fuse multi-resolution features through bidirectional cross-scale connections. Compared to YOLOv5, YOLOv8 removes the redundant up-sampling layer and introduces the C2f module to retain more detailed information. The detection head is decoupled to separate the classification and regression tasks. The classification branch predicts the category probability through the fully connected layer, and the regression branch directly outputs the target location offset. The anchor-free strategy is adopted to avoid the matching bias between the preset anchor frames and the morphological diversity of wood surface defects, which improves the detection flexibility.

2.2. WDNET-YOLO

The YOLOv8 algorithm demonstrates robust performance in detecting standard-sized objects [42], yet faces significant challenges in structural timber defect inspection scenarios. Construction-grade timber datasets typically contain numerous defects characterized by minute dimensions, high morphological diversity, and substantial visual similarity between critical defect classes such as tension-reducing cracks and compression-weakening knots. Furthermore, multi-scale overlapping defects exacerbate detection complexity, potentially compromising structural safety assessments. To address these construction safety-critical challenges, this study proposes WDNET-YOLO—an enhanced model that achieves superior detection performance through synergistic optimization of the backbone network, channel attention mechanisms, and dynamic context-aware feature fusion.

The architecture of the proposed WDNET-YOLO framework is illustrated in Figure 2. The backbone network is reconfigured using RepVGG modules to implement a multi-branch training strategy that captures hierarchical defect features essential for structural integrity evaluation. During training, the 3 × 3 convolutional branch models global morphological characteristics of safety-critical defects, such as crack propagation patterns in load-bearing beams. Simultaneously, the 1 × 1 convolutional branch enhances cross-channel interactions to distinguish dense knot clusters from background textures—critical for assessing shear resistance in timber joints. The identity mapping branch preserves spatial fidelity while suppressing interference from natural wood grain noise. Batch normalization nonlinearly fuses these multi-branch features, significantly improving sensitivity to multi-scale defects that impact structural performance.

An Efficient Channel Attention (ECA) module is integrated at the terminus of the backbone network to mitigate false detection risks in safety-critical applications. This mechanism extracts channel-wise statistical features through global average pooling, then dynamically recalibrates channel weights using adaptive 1D convolution kernels. By selectively amplifying discriminative features between high-risk defect classes (e.g., cracks compromising tensile strength vs. knots reducing load capacity) while suppressing complex texture interference, the ECA module significantly enhances classification reliability for structural timber components.

Within the feature pyramid network, CARAFE up-sampling combined with C2f modules optimizes morphological sensitivity for construction-grade timber inspection. CARAFE generates content-aware kernels that adaptively reorganize contextual information, preserving critical geometric details such as microcrack initiation points and resin distribution anomalies that indicate material degradation. Concurrently, the C2f module facilitates efficient cross-stage feature interaction, maintaining defect morphology integrity during multi-scale fusion. This dual mechanism substantially enhances detection capabilities for geometrically complex defects affecting long-term structural durability.

The RepVGG backbone provides highly discriminative foundational features for the ECA module’s channel weighting operations, while its global defect modeling capability complements CARAFE’s local detail preservation. ECA suppresses background noise through dynamic channel recalibration, and CARAFE optimizes small-defect retention via context-aware restructuring. Collectively, these components establish a coarse-to-fine defect detection pipeline specifically engineered for automated quality assurance in timber construction, where undetected defects may lead to catastrophic structural failures.

2.3. REPVGG

In structural timber defect detection, traditional single-path backbone networks exhibit limited capability in concurrently capturing global defect morphology (e.g., continuous crack propagation in load-bearing beams) and local detail features (e.g., fine boundaries of knots within timber joints). This limitation stems from constrained receptive fields and insufficient feature interactions, resulting in elevated miss rates for safety-critical small-scale defects such as clustered knots and scattered cracks that compromise structural integrity. To address these challenges, our study reconfigures the YOLOv8n backbone using RepVGG modules, implementing a multi-branch collaborative training strategy to enhance multi-scale feature representation for construction-grade timber assessment.

During training, parallel branch structures are deployed: The 3 × 3 convolutional branch models global morphological characteristics of safety-critical defects. The 1 × 1 convolutional branch enhances cross-channel interactions to distinguish dense knot clusters from background textures—critical for assessing shear resistance evaluation. Crucially, the 1 × 1 branch, with its finer receptive field, is particularly adept at capturing localized features from minute defects (e.g., sub-pixel cracks) that may fall below the effective receptive field of the 3 × 3 branch, especially in clustered defect scenarios. An identity branch preserves original feature localization fidelity, providing a baseline representation. Following feature fusion via batch normalization, nonlinear superposition achieves dynamic equilibrium between these complementary feature streams: global context, fine-grained local details, and baseline fidelity. This integration significantly improves detection sensitivity for multi-scale defects, including co-located ones, affecting timber service life. Importantly, the subsequent batch normalization (BN) layer during training plays a key role in harmonizing the contributions of these branches. While the identity mapping inherently carries background texture information (“noise”), the BN layer normalizes the activations across the fused multi-branch features. This normalization, coupled with learnable scale and shift parameters within BN, allows the network to dynamically attenuate the influence of purely noisy identity contributions and amplify the discriminative signals (including those from the 1 × 1 branch) during the nonlinear fusion process. Furthermore, the later ECA module (Section 2.4) provides an additional mechanism for suppressing irrelevant channel-wise noise, including potential residual noise propagated from the identity branch.

During inference, the multi-branch structure is transformed into a single 3 × 3 convolutional layer through mathematical reparameterization. The weights and biases from each training branch are fused as

W_{f u s e} = W^{(3)} + p a d (W^{(1)}) + I \cdot W^{(i d)},

(1)

b_{f u s e} = b^{(3)} + b^{(1)} + b^{(i d)},

(2)

where

p a d (\cdot)

operation extends the 1 × 1 convolution kernel to 3 × 3 size by zero padding and

I

is the unit matrix.

As demonstrated in Figure 3, this design preserves multi-scale feature extraction capabilities through structural reparameterization. The consolidated network accurately models both global fracture morphology and local defect details, essential for predicting residual load capacity in structural timber components.

2.4. ECA

In structural timber defect detection, complex and variable wood texture backgrounds and high inter-class feature similarity (e.g., between tension-reducing cracks and compression-weakening knots, or between different knot types) frequently cause model confusion. This leads to false positives (misidentifying grain as defects) and missed detections (failing to identify genuine flaws obscured by texture), severely compromising the reliability of building safety assessments.

In structural timber defect detection, complex wood texture backgrounds and inter-class feature similarity (e.g., cracks and knots) frequently cause model interference, leading to false positives and missed detections that compromise building safety assessments. While traditional attention mechanisms like CBAM improve feature differentiation through joint channel-spatial modeling, their multi-branch structures introduce computational overhead, and channel attention modules relying on fully connected layers risk information loss through dimension compression. To address these limitations in construction material inspection, we implement the Efficient Channel Attention (ECA) mechanism—a lightweight dynamic channel weighting strategy that enhances defect feature discrimination while minimizing computational complexity for practical timber grading applications.

As depicted in Figure 4, the ECA module operates through the following optimized workflow:

Firstly, the input feature map

X \in R^{C \times H \times W}

to generate channel-level statistical feature vectors compressed along the spatial dimension:

Z_{C} = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} X_{c, i, j},

(3)

where

Z_{C}

characterizes the global context information of the cth channel and provides the basis for subsequent channel weight assignments.

To optimize the efficiency of local cross-channel interactions, the module dynamically adjusts the size of the one-dimensional convolutional kernel k according to the number of channels C, whose dimensions are adaptively generated by a nonlinear function:

k = ψ (C) = ⌊ \frac{{l o g}_{2} (C)}{γ} + \frac{β}{γ} ⌋_{o d d},

(4)

where

γ

and

β

are hyperparameters that guarantee that the

[\cdot]_{o d d}

result is odd. Adaptation of feature interaction requirements for different channel dimensions. This design avoids the limitation of fixed kernel dimensions and reduces computational complexity through dynamic kernel adjustment.

Subsequently, a one-dimensional convolution operation is performed on the feature vector

Z_{C}

to model the dependencies between neighboring channels:

a_{c} = C o n v 1 D (z_{c}, k),

(5)

Nonlinear correlations between channels are captured through local interactions, replacing the parameter redundancy of traditional fully connected layers. The convolution output is normalized to the channel attention weights by a Sigmoid function:

α_{c} = σ (a_{c}) = \frac{1}{1 + e^{- a_{c}}},

(6)

Ultimately, the original feature maps are multiplied channel-by-channel with the weight vectors to achieve dynamic calibration of the feature maps:

y_{c, i, j} = α_{c} \cdot X_{c, i, j},

(7)

This process significantly enhances discrimination between structurally consequential defect classes, such as cracks and knots by amplifying critical channel responses while suppressing wood grain noise, thereby improving reliability in safety-critical timber inspections. This dynamic channel recalibration provides a secondary line of defense against noise interference, including potential residual background texture noise propagated from earlier layers (e.g., the identity branch of RepVGG), further ensuring that discriminative features dominate the representation.

The ECA module introduces minimal computational overhead due to its lightweight design. The 1D adaptive convolution operates on a compressed channel vector Z ∈ RC × 1 × 1, reducing GFLOPs to O (C⋅k), where k ≤ 5 (typically k = 3). For a 640 × 640 input, this corresponds to <0.01 GFLOPs—below the precision threshold of standard deep learning profilers, explaining its negligible impact until combined with CARAFE.

2.5. CARAFE

In timber construction quality assurance, accurate detection of surface defects directly determines structural reliability, load-bearing capacity, and building longevity. Morphologically variable defects—such as irregularly propagating cracks (creating unpredictable stress concentrations), complex-textured knots at joint interfaces (reducing effective area), and resin pockets/inhomogeneities (indicating potential zones of weakness or moisture ingress)—pose significant challenges. Traditional up-sampling methods with fixed kernel sizes and limited receptive fields struggle to capture the global context and fine details of these safety-critical imperfections, potentially leading to missed detections or inaccurate sizing. CARAFE overcomes these limitations through a content-aware feature reorganization mechanism that dynamically generates adaptive up-sampling kernels based on input feature maps. This approach enables deep correlation modeling of contextual defect information through localized dot-product operations, significantly expanding receptive fields while enhancing defect semantic representation via nonlinear feature interactions—critical for pinpointing subtle defects in structural timber.

As illustrated in Figure 5, the CARAFE module comprises two core components essential for construction-grade timber inspection: The Up-sampling Kernel Prediction Module first reduces computational load through channel compression: a 1 × 1 convolution decreases input channels from C to Cm. A kencoder × kencoder convolutional layer then predicts up-sampling kernels with Cm input channels and σ²k²up output channels, where kup × kup defines the kernel dimensions (typically 3 × 3) and σ denotes the upscaling factor. These adaptive kernels encode defect-specific contextual patterns vital for structural assessment. The Feature Reorganization Module generates the output feature map (dimensions σH × σW × C) by computing dot products between kup × kup regions centered on input feature map locations and their corresponding predicted kernels. This process is particularly crucial for timber defect detection as it extracts fine-grained features correlating with structural performance—such as microcrack propagation tendencies and resin distribution anomalies affecting moisture resistance—by establishing mappings from output coordinates to input regions. This allows comprehensive utilization of semantic information, substantially improving detection accuracy for safety-critical defects.

Through content-aware reorganization, CARAFE significantly enhances sensitivity to morphologically variable defects that critically impact structural durability and load capacity. By preserving fine geometric details (e.g., microcrack initiation points) and capturing long-range contextual correlations (e.g., the full path of a crack), CARAFE provides more precise defect localization and characterization. This robust solution for automated quality control is vital for preventing catastrophic structural failures under service loads caused by undetected or mischaracterized imperfections.

3. Results and Discussion

3.1. Dataset

Originally comprising 20,275 high-resolution images (2800 × 1024 pixels) documenting ten categories of wood surface imperfections, this resource was rigorously filtered through a three-step protocol: (1) Safety-critical class retention: Only images containing the eight defects most threatening to structural integrity (quartz, live knot, marrow, resin, dead knot, knot with crack, missing knot, and crack) were retained; (2) Quality control: Exclusion of images with motion blur, inconsistent lighting, or ambiguous annotations; and (3) Distribution preservation: The final curated dataset of 4000 images maintains the original defect occurrence ratios observed in the full dataset (Table 1). This ensures representative sampling while eliminating noise. Data partitioning followed an 8:1:1 ratio for training, validation, and testing.

All images underwent standardized preprocessing to ensure robustness and convergence efficiency. RGB pixel values were normalized to the [0, 1] range through division by 255, a critical step for stabilizing gradient descent during optimization. Defect categories were systematically converted to integer indices via ordinal encoding (0: Quartz, 1: Live knot, 2: Marrow, 3: Resin, 4: Dead knot, 5: Knot with crack, 6: Knot missing, 7: Crack), with bounding box annotations formatted as normalized coordinates (center_x, center_y, width, height) relative to image dimensions. During training, we employed YOLOv8′s augmentation pipeline including mosaic composition, random geometric transformations, and photometric adjustments to enhance model generalization across variable imaging conditions typical in structural timber inspection environments.

3.2. Analysis Environment Configuration and Network Parameters

This analysis was carried out in a standardized training environment, and the hardware configuration, software environment, and model training parameters are shown in Table 2 and Table 3. All analyses used the same parameter settings to ensure comparable results.

The hyperparameters were selected based on established practices for YOLO-series models and preliminary grid-search experiments. We adopted SGD with Nesterov momentum due to its proven generalization capability in object detection tasks. Initial learning rate (0.01), momentum (0.937), and weight decay (5 × 10⁻⁴) were calibrated through a reduced grid search on 10% of the training data, evaluating mAP50 at epoch 100. We tested learning rates {0.1, 0.01, 0.001}, momentum {0.9, 0.937, 0.98}, and weight decay {1e⁻³, 5e⁻⁴, 1e⁻⁴}, selecting values that maximized validation recall. The epoch count (220) was determined via early stopping with a 30-epoch patience threshold on mAP50-95. Comparative tests show that AdamW (lr = 1e⁻³, weight decay = 0.05) has a faster initial convergence speed, but the final mAP50 is 1.2% lower.

3.3. Evaluation Indicators

This study constructs a quantitative evaluation system from two dimensions: detection accuracy and model complexity. In terms of detection accuracy, precision rate, recall rate, F1 score, and average precision mean are used as the core indexes; in terms of computational efficiency, the number of model parameters is used as the core index, and the formula is as follows:

P r e c i s i o n = \frac{T P}{T P + F P} \times 100 %,

(8)

R e c a l l = \frac{T P}{T P + F N} \times 100 %,

(9)

A P = \frac{Σ_{1}^{k} P \times R}{K},

(10)

m A P = \frac{Σ_{1}^{k} A P}{K} \times 100 %,

(11)

F 1 = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l} \times 100 %

(12)

where TP denotes the number of correctly detected defective samples, FP is the number of misdetected non-defective samples, and FN corresponds to the number of missed real defective samples. p measures the accuracy of the model in correctly identifying positive class samples, R reflects the completeness of the model in covering real defects, and the F1 score combines the two as a core performance indicator through harmonic averaging. AP is the average precision rate of each defective class, and mAP is the mean of the AP values of all defective categories, where AP values are averaged. The mAP50 and mAP50-95 metrics reported for the YOLOv8n and WDNET-YOLO models are based on the average of five independent training experiments.

3.4. Ablation Analyses

In order to verify the synergistic optimization effect of the improved modules on the structural timber defect detection performance, this study gradually introduces RepVGG, ECA, and CARAFE modules based on the YOLOv8n baseline model, and constructs the ablation analysis. The contribution mechanism of each module is systematically analyzed in terms of detection accuracy (mAP, F1) and computational efficiency (GFLOPs), and the results of the ablation analyses are shown in Table 4.

All GFLOPs measurements used consistent 640 × 640 inputs. ECA’s near-zero GFLOPs occur because: (1) its global average pooling (H × W1) adds no multiply-add operations and (2) the subsequent 1D convolution on C-dimensional vectors requires only C × k operations (e.g., 0.0002 GFLOPs for C = 512, k = 3), which rounds to 0.0 GFLOPs in standard reporting. CARAFE’s kernel prediction and reorganization operations increase GFLOPs measurably.

The comprehensive multi-round evaluation (Table 5) confirms the statistically significant reliability of the WDNET-YOLO improvements, achieving a 3.7% increase in mAP50 and a 3.5% increase in mAP50-95. These gains significantly exceed the run-to-run variance by factors of 8.8 for mAP50 and 6.6 for mAP50-95. The results of the paired t-tests were highly significant (t (4) = 16.5, p < 0.0001 for mAP50; t (4) = 13.2, p < 0.0001 for mAP50-95), clearly demonstrating that the observed improvements surpass training noise and represent genuine performance enhancements.

In order to observe the comparison of the results of the ablation analyses more intuitively during the analyses, we plotted the mAP50 results of each model as a visual data plot, as shown in Figure 6.

Based on the results of the ablation analyses, we found that the RepVGG module, without increasing the number of parameters and computation, elevated mAP50 to 67.2% and mAP95 to 37.5%. After superimposing the ECA channel attention, mAP50 and mAP50-95 further increased to 68.6% and 38.5%, respectively, with F1 reaching 68.8%. Finally, with the introduction of CARAFE dynamic up-sampling, mAP50 and mAP50-95 were significantly increased to 69.2% and 40.2%, respectively, and precision increased by 4.6% to 77.1%. “After the synergistic optimization of the three modules, WDNET-YOLO achieves significant improvements over the YOLOv8n baseline: mAP50 increases by 3.7% (from 65.5% to 69.2%), mAP50-95 increases by 3.5% (from 36.7% to 40.2%), and F1 score increases by 1.5% (from 68.1% to 69.6%). More importantly, this performance improvement was achieved with only a 4.4% increase in the number of parameters (from 3 million to 3.1 million parameters). These results strongly validate the effectiveness of the proposed improvement scheme and demonstrate an excellent balance between detection accuracy and model efficiency.

3.5. Comparative Analyses

In this study, the performance of the proposed WDNET-YOLO model is compared with other mainstream target detection algorithms, and the comparison results are shown in Table 6.

Based on the results of the comparison analyses, we find that WDNET-YOLO performs well in all the metrics, especially in the metrics of mAP50 and mAP95, which reach 0.693 and 0.402, respectively, and are significantly higher than the comparison models such as YOLOv5n, YOLOv10n, and Faster CNN. In addition, WDNET-YOLO maintains good efficiency by having higher accuracy with a more moderate number of parameters and computational complexity. As shown in the comparison chart of detection results in Figure 7 and Figure 8, compared with the YOLOv8n baseline model, WDNET-YOLO obtains a much better detection performance without a significant increase in computational effort, thus demonstrating the potential for application in structural timber defect detection tasks.

When the input resolution is reduced from 2800 × 1024 to 640 × 640, microcracks with a width ≤ 5 pixels in the original image are compressed to the sub-pixel scale. The morphological information of such defects may be partially lost. To assess the impact of resolution reduction on sub-pixel-level defects, 136 microcracks with a width ≤ 5 pixels were selected from the dataset. The performance of the three scenarios was compared under unified experimental conditions.

As shown in Table 7 results, the sensitivity loss due to reduced resolution is negligible: WDNET-YOLO’s recall rate (0.557) at 640 × 640 resolution is 4.7% lower than the original resolution (0.604). This small decrease demonstrates that the WDNET-YOLO model effectively mitigates the loss of sub-pixel morphological information caused by the significant downsampling, maintaining high sensitivity for detecting microcracks even below 5 pixels in the original image. Under the same 640 × 640 input, WDNET-YOLO’s recall rate (0.557) is 3.9% higher than that of YOLOv8n (0.518), demonstrating that its enhanced structure provides substantial optimization for microcrack detection. Future research should explore further improvements in detection capabilities through a high-low resolution collaborative framework.

3.6. Discussion

The results presented in the ablation and comparative analyses demonstrate the effectiveness of the proposed WDNET-YOLO model. This advanced deep learning framework integrates three synergistic innovations into the YOLOv8n architecture specifically to address critical safety challenges in timber construction through high-precision defect detection.

The RepVGG-reconfigured backbone significantly enhances the model’s ability to capture multi-scale defect features critical for structural assessment. This capability is essential for precisely identifying diverse imperfections ranging from minute insect holes (potential initiation points for decay) to large crack propagation patterns (directly threatening load-bearing capacity) in beams and columns, which are critical for assessing structural integrity. The multi-branch design, particularly the synergy between the 1 × 1 branch (capturing fine details like microcracks below the 3 × 3 receptive field) and the 3 × 3 branch (modeling global context), proved crucial in handling clustered defects. The batch-normalization-based fusion mechanism effectively balanced the contributions, preventing overwhelming noise from the identity branch while preserving valuable localization fidelity (Table 4, RepVGG improves mAP50 by 1.7%). Complementing this, the ECA attention mechanism can dynamically suppress interference caused by complex and variable wood grain textures—the main source of false positives in traditional methods. At the same time, it can selectively amplify distinguishing features between high-risk defect categories (e.g., tension-weakened cracks and compression-weakened knots, live knots and dead knots). If these categories are misclassified, it could lead to catastrophic errors when estimating the load-bearing capacity of wooden components. Further enhancing morphological sensitivity. CARAFE up-sampling with adaptive context reorganization meticulously preserves minute geometric details of safety-critical micro-defects exhibiting high variability. For instance, accurately capturing the width, branching, and termination points of fine cracks is essential for assessing their potential to propagate under load and cause catastrophic fracture. The directly addresses a key durability concern in modern timber construction.

WDNET-YOLO achieves 3.7% higher mAP50 and 3.5% higher mAP50-95 compared to YOLOv8n, while maintaining computational efficiency with only a 4.4% parameter increase. These improvements translate directly to enhanced building safety. Superior detection sensitivity for defects affecting structural durability reduces the risk of undetected flaws compromising performance. Reduced misclassification during material grading prevents improper load capacity estimation. Reliable defect quantification provides robust support for compliance with international timber construction standards. Rigorous ablation and comparative analyses consistently demonstrate WDNET-YOLO’s superiority over mainstream detectors, including YOLOv5n, SSD, and Faster R-CNN, particularly in complex inspection scenarios involving dense knots, fine cracks, and varied resin distributions characteristic of real-world construction-grade timber.

While WDNET-YOLO demonstrates significant advantages, three limitations warrant consideration: validation was performed exclusively on timber specimens under controlled imaging conditions, leaving performance on other construction materials unverified; the current RGB-based approach cannot detect internal defects; and real-time deployment on resource-constrained edge devices requires further optimization despite the model’s parameter efficiency. To address these constraints and enhance practical adoption, future efforts will focus on three interconnected pathways: developing multi-modal fusion techniques combining RGB with ultrasonic sensing for comprehensive defect characterization; optimizing edge deployment through TensorRT quantization for NVIDIA Jetson devices and Raspberry Pi-based portable inspection kits; and validating the framework across steel-concrete composites and engineered bamboo systems.

4. Conclusions

This study proposes WDNET-YOLO, an enhanced deep learning model for structural wood defect detection, which addresses the shortcomings of traditional detection techniques in terms of multi-scale feature detection, high defect similarity, and diverse defect morphology through three synergistic innovations integrated into YOLOv8n: the RepVGG backbone significantly improves multi-scale feature capture essential for identifying crack propagation in load-bearing members; the ECA attention mechanism suppresses wood texture interference to reliably discriminate high-risk defects (e.g., structural-integrity-compromising cracks vs. knots); and CARAFE up-sampling preserves morphological details of safety-critical micro-defects like resin inhomogeneity that accelerate material degradation. Analysis validation confirms that WDNET-YOLO achieves a 3.7% improvement in mAP50 (from 65.5% to 69.2%) and a 3.5% improvement in mAP50-95 (from 36.7% to 40.2%) over the YOLOv8n baseline, with only a 4.4% increase in parameters (from 3.0 M to 3.1 M). This indicates a very favorable trade-off, namely, an improvement in detection accuracy with only a slight increase in model complexity, directly enhancing building safety through superior sensitivity to durability-affecting defects, reduced misclassification in material grading, and reliable defect quantification for timber construction standards compliance. WDNET-YOLO outperforms mainstream detectors in complex inspection scenarios, providing a deployable solution for automated construction quality control that optimizes material utilization and extends structural service life, with future work focusing on embedded on-site deployment and multi-modal sensing integration to further advance structural engineering practice.

Author Contributions

Conceptualization: W.G., X.L., X.Y.; methodology: Y.M. and X.X.; software: J.Z.; formal analysis: W.G. and L.S.; investigation: C.L. and Y.L.; funding acquisition: Z.N. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Shandong Province Science and Technology Small and Medium-sized Enterprises Innovation Ability Improvement Project (Grant no. 2024TSGC0285) and the Taian Science and Technology Innovation Development Project (Grant no. 2023GX027). Lin Sun was supported by the Shandong Province Science and Technology Small and Medium-Sized Enterprises Innovation Ability Improvement Project (grant no. 2024TSGC0285), and Xiaoxia Lin was supported by the Taian Science and Technology Innovation Development Project (grant no. 2023GX027).

Data Availability Statement

The data that support the findings of this study are available from the author Weihao Gong, upon reasonable request.

Acknowledgments

All acknowledged individuals have provided consent for this acknowledgement. The authors would like to express their gratitude to the members of the research group for their support.

Conflicts of Interest

Authors Chunwei Leng, Yan Li and Zhenyu Niu was employed by the company Hanqing Data Consulting Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Asdrubali, F.; Ferracuti, B.; Lombardi, L.; Guattari, C.; Evangelisti, L.; Grazieschi, G. A review of structural, thermo-physical, acoustical, and environmental properties of wooden materials for building applications. Build. Environ. 2017, 114, 307–332. [Google Scholar] [CrossRef]
Sandak, J.; Goli, G.; Cetera, P.; Sandak, A.; Cavalli, A.; Todaro, L. Machinability of minor wooden species before and after modification with thermo-vacuum technology. Materials 2017, 10, 121. [Google Scholar] [CrossRef]
Riggio, M.; Sandak, J.; Franke, S. Application of imaging techniques for detection of defects, damage and decay in timber structures on-site. Constr. Build. Mater. 2015, 101 Pt 2, 1241–1252. [Google Scholar] [CrossRef]
Naylor, A.; Hackney, P. A review of wood machining literature with a special focus on sawing. BioResources 2013, 8, 3122–3135. [Google Scholar] [CrossRef]
Chen, L.C.; Pardeshi, M.S.; Lo, W.T.; Sheu, R.-K.; Pai, K.-C.; Chen, C.-Y.; Tsai, P.-Y.; Tsai, Y.-T. Edge-glued wooden panel defect detection using deep learning. Wood Sci. Technol. 2022, 56, 477–507. [Google Scholar] [CrossRef]
Fan, C.; Zhuang, Z.; Liu, Y.; Yang, Y.; Zhou, H.; Wang, X. Bilateral Defect Cutting Strategy for Sawn Timber Based on Artificial Intelligence Defect Detection Model. Sensors 2024, 24, 6697. [Google Scholar] [CrossRef]
Hu, C.; Tanaka, C.; Ohtani, T. Locating and identifying sound knots and dead knots on sugi by the rule-based color vision system. J. Wood Sci. 2004, 50, 115–122. [Google Scholar] [CrossRef]
Espinosa, L.; Brancheriau, L.; Cortes, Y.; Prieto, F.; Lasaygues, P. Ultrasound computed tomography on standing trees: Accounting for wood anisotropy permits a more accurate detection of defects. Ann. For. Sci. 2020, 77, 68. [Google Scholar] [CrossRef]
Zhu, W.; Su, B.; Zhang, X.; Li, L.; Fang, S. ALdamage-seg: A Lightweight Model for Instance Segmentation of Aluminum Profiles. Buildings 2024, 14, 2036. [Google Scholar] [CrossRef]
Zhang, Y.; Liu, S.; Tu, W.; Yu, H.; Li, C. Using computer vision and compressed sensing for wood plate surface detection. Opt. Eng. 2015, 54, 103102. [Google Scholar] [CrossRef]
Lazaridis, P.C.; Kavvadias, I.E.; Demertzis, K.; Iliadis, L.; Vasiliadis, L.K. Structural Damage Prediction of a Reinforced Concrete Frame under Single and Multiple Seismic Events Using Machine Learning Algorithms. Appl. Sci. 2022, 12, 3845. [Google Scholar] [CrossRef]
Harley, J.B.; Sparkman, D. Machine learning and NDE: Past, present, and future. AIP Conf. Proc. 2019, 2102, 090001. [Google Scholar] [CrossRef]
Hussain, M.; Khanam, R. In-Depth Review of YOLOv1 to YOLOv10 Variants for Enhanced Photovoltaic Defect Detection. Solar 2024, 4, 351–386. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar] [CrossRef]
Bi, H.; Wen, V.; Xu, Z. Comparing One-Stage and Two-Stage Learning Strategy in Object Detection. Appl. Comput. Eng. 2023, 5, 171–177. [Google Scholar] [CrossRef]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Computer Vision—ECCV 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2016; Volume 9905. [Google Scholar] [CrossRef]
Zhang, J.; Dong, J.; Zhao, Y.; Wang, Y.; Cheng, Y. Stability of Non-Uniform Soils Slope with Tension Cracks Under Unsaturated Flow Conditions. Buildings 2025, 15, 2123. [Google Scholar] [CrossRef]
Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
Fu, H.; Song, G.; Wang, Y. Improved YOLOv4 Marine Target Detection Combined with CBAM. Symmetry 2021, 13, 623. [Google Scholar] [CrossRef]
Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2999–3007. [Google Scholar] [CrossRef]
Arkin, E.; Yadikar, N.; Xu, X.; Aysa, A.; Ubul, K. A survey: Object detection methods from CNN to transformer. Multimed. Tools Appl. 2023, 82, 21353–21383. [Google Scholar] [CrossRef]
Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and efficient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2020), Seattle, WA, USA, 13–19 June 2020. [Google Scholar] [CrossRef]
Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. In Computer Vision—ECCV 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2020; Volume 12346. [Google Scholar] [CrossRef]
Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable DETR: Deformable Transformers for End-to-End Object Detection. arXiv 2020, arXiv:2010.04159. [Google Scholar] [CrossRef]
Song, Z.; Zou, S.; Zhou, W.; Huang, Y.; Shao, L.; Yuan, J.; Gou, X.; Jin, W.; Wang, Z.; Chen, X.; et al. Clinically applicable histopathological diagnosis system for gastric cancer detection using deep learning. Nat. Commun. 2020, 11, 4294. [Google Scholar] [CrossRef]
Kabir, H.; Wu, J.; Dahal, S.; Joo, T.; Garg, N. Automated estimation of cementitious sorptivity via computer vision. Nat. Commun. 2024, 15, 9935. [Google Scholar] [CrossRef] [PubMed]
Shaifee, M.J.; Chywl, B.; Li, F.; Wong, A. Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video. J. Comput. Vis. Imaging Syst. 2017, 3, 1. [Google Scholar] [CrossRef]
Ning, S.; Tan, F.; Chen, X.; Li, X.; Shi, H.; Qiu, J. Lightweight Corn Leaf Detection and Counting Using Improved YOLOv8. Sensors 2024, 24, 5279. [Google Scholar] [CrossRef]
Dalal, N.; Triggs, B. Histograms of Oriented Gradients for Human Detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 886–893. [Google Scholar] [CrossRef]
Chen, L.; Ge, Z.; Luo, R.; Liu, C.; Liu, X.; Zhou, Y. Identification of CT Image Defects in Wood Based on Convolution Neural Network. Sci. Silvae Sin. 2018, 54, 127–133. [Google Scholar] [CrossRef]
Wang, Y.; Zhang, W.; Gao, R.; Jin, Z.; Wang, X. Recent advances in the application of deep learning methods to forestry. Wood Sci. Technol. 2021, 55, 1171–1202. [Google Scholar] [CrossRef]
Shi, J.; Li, Z.; Zhu, T.; Wang, D.; Ni, C. Defect Detection of Industry Wood Veneer Based on NAS and Multi-Channel Mask R-CNN. Sensors 2020, 20, 4398. [Google Scholar] [CrossRef]
Ding, X.; Zhang, X.; Ma, N.; Han, G.; Ding, J.; Sun, J. RepVGG: Making VGG-style ConvNets Great Again. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 13728–13737. [Google Scholar] [CrossRef]
Wang, Q.; Wu, B.; Zhu, P.; Li, W.; Zuo, W.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11531–11539. [Google Scholar] [CrossRef]
Wang, J.; Chen, K.; Xu, R.; Liu, Z.; Loy, C.C.; Lin, D. CARAFE: Content-Aware ReAssembly of FEatures. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3007–3016. [Google Scholar] [CrossRef]
Xu, X.; Li, Q.; Li, S.; Kang, F.; Wan, G.; Wu, T.; Wang, S. Crack Width Recognition of Tunnel Tube Sheet Based on YOLOv8 Algorithm and 3D Imaging. Buildings 2024, 14, 531. [Google Scholar] [CrossRef]
Malaikrisanachalee, S.; Wongwai, N.; Kowcharoen, E. ESPCN-YOLO: A High-Accuracy Framework for Personal Protective Equipment Detection Under Low-Light and Small Object Conditions. Buildings 2025, 15, 1609. [Google Scholar] [CrossRef]
Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar] [CrossRef]
Ultralytics. YOLOv8—Ultralytics YOLOv8 Documentation. 2023. Available online: https://docs.ultralytics.com/models/yolov8/ (accessed on 22 June 2025).
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. arXiv 2019, arXiv:1912.01703. [Google Scholar] [CrossRef]
Python Software Foundation. Python Documentation, version 3.8; Python Software Foundation: Wilmington, DE, USA, 2019. Available online: https://docs.python.org/3/ (accessed on 22 June 2025).

Figure 1. Structure of YOLOv8n.

Figure 2. Structure of WDNET-YOLO.

Figure 3. Structural reparameterization process.

Figure 4. The overall framework of ECA.

Figure 5. The overall framework of CARAFE.

Figure 6. Comparison of ablation experiment results.

Figure 7. Comparison of detection effects—YOLOv8n.

Figure 8. Comparison of detection effects—WDNET-YOLO.

Table 1. Proportional Distribution of Defect Classes in the Dataset.

Defect Type	Images (%)	Instances (%)
Live knot	51.05	37.36
Dead knot	44.90	27.78
Crack	17.78	11.72
Resin	13.28	7.11
Knot with crack	10.28	6.13
Knot missing	7.30	3.65
Marrow	6.73	3.40
Quartz	5.20	2.87

Table 2. Analysis of hardware and software configuration.

Parameters	Setting
CPU	Intel Xeon Platinum 8255C
GPU	NVIDIA RTX 2080Ti
Operating system	Windows 11 Server
Deep Learning Framework	PyTorch 1.8.1 [43]
Acceleration Module	CUDA 11.1
programming language	Python 3.8 [44]

Table 3. Model training parameter settings.

Hyperparameters	Value
Input image size	640 × 640
Optimizer	SGD
Initial learning rate	0.01
Momentum	0.937
Batch size	8
Epochs	220
weight decay coefficient	5 × 10⁻⁴

Table 4. Results of ablation analyses.

Model	REPVGG	ECA	CARAFE	Precision	Recall	mAP50	mAP50-95	GFLOPs	F1 Score
YOLOv8n	×	×	×	0.722	0.645	0.655	0.367	8.1	0.681
YOLOv8n	√	×	×	0.726	0.649	0.672	0.375	8.1	0.686
YOLOv8n	√	√	×	0.725	0.655	0.686	0.385	8.1	0.688
YOLOv8n	√	√	√	0.771	0.634	0.692	0.402	8.6	0.696

Table 5. Multi-run evaluation of detection performance (mAP).

Run	Model	mAP50	mAP50-95
1	YOLOv8n	0.649	0.363
2	YOLOv8n	0.658	0.366
3	YOLOv8n	0.654	0.371
4	YOLOv8n	0.653	0.361
5	YOLOv8n	0.661	0.374
Mean	YOLOv8n	0.655	0.367
SD	YOLOv8n	0.0045	0.0052
1	WDNET-YOLO	0.697	0.404
2	WDNET-YOLO	0.686	0.396
3	WDNET-YOLO	0.695	0.410
4	WDNET-YOLO	0.693	0.398
5	WDNET-YOLO	0.689	0.401
Mean	WDNET-YOLO	0.692	0.402
SD	WDNET-YOLO	0.0042	0.0053

Table 6. Results of comparative analyses.

Model	Precision	Recall	mAP50	mAP50-95	Parameters
YOLOv5n	0.650	0.615	0.644	0.366	1.7
YOLOv8n	0.722	0.645	0.655	0.367	3.0
YOLOv10n	0.686	0.634	0.649	0.365	2.7
YOLOv11	0.692	0.649	0.678	0.366	2.6
RT-DETR	0.608	0.584	0.568	0.316	32.0
SSD	0.598	0.534	0.559	0.294	146.6
Faster R-CNN	0.584	0.545	0.539	0.285	138.3
WDNET-YOLO	0.771	0.634	0.692	0.402	3.1

Table 7. Microcrack experiment comparison.

Resolution	Model	Recall	Precision
2800 × 1024	WDNET-YOLO	0.604	0.697
640 × 640	WDNET-YOLO	0.557	0.655
640 × 640	YOLOv8n	0.518	0.612

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lin, X.; Gong, W.; Sun, L.; Yang, X.; Leng, C.; Li, Y.; Niu, Z.; Meng, Y.; Xiao, X.; Zhang, J. WDNET-YOLO: Enhanced Deep Learning for Structural Timber Defect Detection to Improve Building Safety and Reliability. Buildings 2025, 15, 2281. https://doi.org/10.3390/buildings15132281

AMA Style

Lin X, Gong W, Sun L, Yang X, Leng C, Li Y, Niu Z, Meng Y, Xiao X, Zhang J. WDNET-YOLO: Enhanced Deep Learning for Structural Timber Defect Detection to Improve Building Safety and Reliability. Buildings. 2025; 15(13):2281. https://doi.org/10.3390/buildings15132281

Chicago/Turabian Style

Lin, Xiaoxia, Weihao Gong, Lin Sun, Xiaodong Yang, Chunwei Leng, Yan Li, Zhenyu Niu, Yingzhou Meng, Xinyue Xiao, and Junyan Zhang. 2025. "WDNET-YOLO: Enhanced Deep Learning for Structural Timber Defect Detection to Improve Building Safety and Reliability" Buildings 15, no. 13: 2281. https://doi.org/10.3390/buildings15132281

APA Style

Lin, X., Gong, W., Sun, L., Yang, X., Leng, C., Li, Y., Niu, Z., Meng, Y., Xiao, X., & Zhang, J. (2025). WDNET-YOLO: Enhanced Deep Learning for Structural Timber Defect Detection to Improve Building Safety and Reliability. Buildings, 15(13), 2281. https://doi.org/10.3390/buildings15132281

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

WDNET-YOLO: Enhanced Deep Learning for Structural Timber Defect Detection to Improve Building Safety and Reliability

Abstract

1. Introduction

2. Materials and Methods

2.1. YOLOv8

2.2. WDNET-YOLO

2.3. REPVGG

2.4. ECA

2.5. CARAFE

3. Results and Discussion

3.1. Dataset

3.2. Analysis Environment Configuration and Network Parameters

3.3. Evaluation Indicators

3.4. Ablation Analyses

3.5. Comparative Analyses

3.6. Discussion

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI