Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

BBW YOLO: Intelligent Detection Algorithms for Aluminium Profile Material Surface Defects

Coatings 2025, 15(6), 684; https://doi.org/10.3390/coatings15060684

by Zijuan Yin^1,2, Haichao Li^1,2,*, Bo Qi¹ and Guangyue Shan¹

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Coatings 2025, 15(6), 684; https://doi.org/10.3390/coatings15060684

Submission received: 30 April 2025 / Revised: 18 May 2025 / Accepted: 31 May 2025 / Published: 6 June 2025

(This article belongs to the Special Issue Solid Surfaces, Defects and Detection, 2nd Edition)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This study presents BBW YOLO, an enhanced YOLOv8-based model for detecting surface defects on aluminum profiles. Integrating BiFPN, BiFormer, and WIoU v3, it improves feature extraction, attention, and localization. The model achieves 87.5% precision and 292.3 fps, outperforming other YOLO variants in speed and accuracy, while maintaining a compact 6.3 MB size, enabling efficient real-time defect detection. However, the paper suffers from the limitations listed below, which must be “fully” addressed before its reconsideration:

1- Given that base exposure defects comprise 15.6 % of the raw dataset while paint bubble defects account for only 4.8 %, how can you claim the 9.25 point mAP@0.5:0.95 gain after StyleGAN2 ADA augmentation is not simply a consequence of synthetic over representation rather than true generalization to rare classes?

2- If BBW YOLO really achieves 292.3 fps on an RTX 4090 yet still consumes 8.3 GFLOPs, what prevents this architecture from throttling on edge devices that lack even a tenth of that compute budget?

3- Your hyper parameters include a batch size of 16 on a 24 GB GPU. Did you profile memory bandwidth to confirm that the 91 fps advantage over YOLOv8 is not due to divergent CUDA kernels rather than architectural merit?

4- If StyleGAN2 ADA generates 640 × 640 crops under controlled lighting, how would BBW YOLO handle a production line where illumination varies ±35 % and motion blur adds a 3-pixel PSF, conditions absent from your synthetic pipeline?

5- The authors leveraged computer vision by training an augmented YOLOv8 based BBW YOLO network—integrating BiFPN, BiFormer, and WIoU v3—to detect and localize ten classes of aluminum profile surface defects in real time. This is highly valuable, but please mention that apart from RU-Net, other strong computer vision algorithms could be leveraged, such as DeepLab (https://doi.org/10.1038/s41467-020-18147-8) and EfficientNet (https://doi.org/10.1038/s41467-024-53993-w). Please briefly introduce these two methods and reference the referred papers.

6- The model halts if validation loss plateaus for 50 epochs, yet only 500 total epochs are run. How did you verify that learning rate cosine annealing with an initial 0.1 does not mask late stage overfitting detectable beyond epoch 500?

7- BBW YOLO adds BiFPN and BiFormer yet reports just 3.03 M parameters—almost identical to vanilla YOLOv8’s 3.01 M. What pruning or weight sharing scheme justified this negligible delta despite two extra modules?

Author Response

18 May 2025

Re: BBW YOLO: Intelligent Detection Algorithms for Aluminium Profile Material Surface Defects by Yin, et al. (coatings-3646096)

Thank you for the two (2) Reviewer’s Comments on our Manuscript No. coatings-3646096.

Enclosed please find our revised manuscript that has been carefully revised in the light of the comments of all the Reviewers. Each and all comments are addressed point-by-point in turn below as raised, together with changes made. These are highlighted in RED. Page and figure numbers refer to the revised manuscript.

We wish to thank the Reviewers for their considered comments and help in getting this work published. We hope our revised manuscript is now suitable for publication.

Thank you for your kind assistance.

Li Haichao

Email: hcl@sues.edu.cn

Responses to Reviewer #1:

Responses:

Thanks very much for your encouraging comments. The authors will carefully refine and revise the paper based on your review comments so that it meets the requirements for publication in the journal.

Comments#1.1:

Given that base exposure defects comprise 15.6 % of the raw dataset while paint bubble defects account for only 4.8 %, how can you claim the 9.25 point mAP@0.5:0.95 gain after StyleGAN2 ADA augmentation is not simply a consequence of synthetic over representation rather than true generalization to rare classes?

Responses:

Thanks to the reviewers for their constructive suggestions. We acknowledge your concern that the improvement might be driven by the over-representation of base exposure defects (15.6% of the dataset) rather than true generalization to rare defect classes, such as paint bubbles (4.8% of the dataset).

In manuscript, 9.25 point mAP@0.5:0.95 gain reflects the average improvement across all defect classes, not just the dominant ones. Specifically, post-augmentation, the mAP@0.5:0.95 for substrate exposure defects improved by 8.6%, while for paint bubble defects, it increased by a more substantial 13.9%. This disproportionate improvement in the rare paint bubble class indicates that StyleGAN2 ADA augmentation effectively enhanced the model’s ability to generalize to under-represented classes, rather than merely boosting performance on the already prevalent substrate exposure defects. To further validate this, we conducted a class-wise analysis (now included in the revised manuscript, Section 3.3), which shows that the augmentation disproportionately improved precision and recall for rare classes by generating diverse synthetic samples that better captured their feature distributions. This suggests that the mAP gain is not solely a result of synthetic over-representation but reflects improved feature learning for rare defects. Section 3.3 is modified as follows:

In the defect detection dataset, substrate exposure defects constitute the largest proportion of samples, accounting for 15.6% of the total dataset, while paint bubble defects are the least represented, comprising only 4.8%. The remaining defect classes each represent approximately 10% of the total samples. The baseline BBW YOLO model achieved mAP@0.5:0.95 of 72.3% across all defect classes. After applying StyleGAN2 ADA for data augmentation, the enhanced BBW YOLO model exhibited the most significant improvements, with increases of 9.25% in mAP@0.5:0.95. This improvement was observed across all defect classes, with specific gains of 8.6% for base exposure defects and 13.9% for paint bubble defects. These results highlight the efficacy of the proposed augmentation strategy in improving detection performance.

Comments#1.2:

If BBW YOLO really achieves 292.3 fps on an RTX 4090 yet still consumes 8.3 GFLOPs, what prevents this architecture from throttling on edge devices that lack even a tenth of that compute budget?

Responses:

The remarkable performance of BBW YOLO, achieving 292.3 frames per second on an NVIDIA RTX 4090 while consuming 8.3 GFLOP per frame, underscores its optimization for high-throughput hardware, yet its deployment on edge devices with a tenth of the computational budget is hindered by fundamental architectural and resource disparities. The RTX 4090, equipped with 16,384 CUDA cores, delivers up to 82.6 TFLOPS of single-precision performance and 1.01 TB/s of memory bandwidth, enabling efficient parallel processing of BBW YOLO’s convolutional layers and leveraging Tensor cores for accelerated matrix operations. In contrast, edge devices, constrained by 0.5–5 TFLOPS of compute capacity, 50–100 GB/s of memory bandwidth, and 5–30W power budgets, struggle to meet the model’s computational and memory demands, even with per-frame FLOPs as low as 8.3 GFLOP. Furthermore, the absence of specialized hardware accelerators, such as Tensor cores, and limited support for low-precision arithmetic on edge platforms exacerbate inefficiencies in executing the model’s operations.

However, the challenges faced by edge devices include significantly limited power budgets and heat dissipation, which is a fundamental obstacle for highly computationally loaded models; extremely limited memory bandwidth and capacity, which is unable to meet the high-speed data transfer requirements of large models; order-of-magnitude differences in the number of compute units, where edge devices may only have a tiny percentage of the cores of high-end GPUs; accuracy trade-offs, where edge devices often require low-precision computation to improve efficiency but may compromise accuracy; and fundamental architectural differences, where computing architectures optimized for high-performance GPUs are not directly adaptable to the specific needs of edge computing environments. The trade-off in accuracy requirements, where edge devices often need to use low-precision computation to improve efficiency but may compromise accuracy; and the inherent difference in architecture, where computational architectures optimized for high-performance GPUs are difficult to adapt directly to the specific needs of edge computing environments. Together, these complex interconnections of technical constraints are a major barrier to effectively migrating high-performance vision models to resource-constrained environments.

Comments#1.3:

Your hyper parameters include a batch size of 16 on a 24 GB GPU. Did you profile memory bandwidth to confirm that the 91 fps advantage over YOLOv8 is not due to divergent CUDA kernels rather than architectural merit?

Responses:

We express our gratitude for the reviewer’s astute observation concerning the potential role of CUDA kernel differences in the 91-frame-per-second performance advantage of BBW YOLO over YOLOv8, achieved with a batch size of 16 on a 24GB GPU. In response to the query about whether memory bandwidth profiling was conducted to distinguish architectural benefits from kernel-level optimizations, we acknowledge that our current study has not yet included an in-depth analysis of memory bandwidth utilization or comparative experiments to isolate the impact of CUDA kernel implementations. While BBW YOLO’s hyperparameters are designed to exploit the GPU’s 1.01 TB/s bandwidth for efficient data handling in its convolutional layers, contributing to its 292.3 fps, the absence of detailed profiling leaves open the possibility that optimized kernels, such as those improving memory coalescing or reducing latency, may contribute to the observed performance gain. To address this limitation, we plan to undertake targeted bandwidth and kernel performance analyses in future work and will revise the manuscript to clarify this gap, ensuring a more robust attribution of BBW YOLO’s advantages to its architectural design.

Comments#1.4:

If StyleGAN2 ADA generates 640 × 640 crops under controlled lighting, how would BBW YOLO handle a production line where illumination varies ±35 % and motion blur adds a 3-pixel PSF, conditions absent from your synthetic pipeline?

Responses:

Thank you for your valuable comments and suggestions on our research work. The BBW YOLO image detection algorithm proposed in this research focuses on static image recognition scenarios of surface defects on aluminium profile materials. In the current research phase, the experimental environment and evaluation framework of the group focuses on high-quality image datasets acquired under standard lighting conditions to validate the algorithm's fundamental performance and computational efficiency.

Regarding your production line application scenario of ±35% illumination variation and 3 pixel PSF increase in motion blur, we sincerely appreciate this very valuable research direction guide. It indeed represents a key challenge in real industrial applications and is something we plan to focus on in our subsequent research. In the current research phase, we focus on the fundamental optimization of the algorithmic architecture and its performance under ideal conditions to lay the foundation for future applications in complex environments.

Thank you again for your valuable suggestions, which will help guide our research to better serve the needs of real industrial applications.

Comments#1.5:

The authors leveraged computer vision by training an augmented YOLOv8 based BBW YOLO network—integrating BiFPN, BiFormer, and WIoU v3—to detect and localize ten classes of aluminum profile surface defects in real time. This is highly valuable, but please mention that apart from RU-Net, other strong computer vision algorithms could be leveraged, such as DeepLab (https://doi.org/10.1038/s41467-020-18147-8) and EfficientNet (https://doi.org/10.1038/s41467-024-53993-w). Please briefly introduce these two methods and reference the referred papers.

Responses:

Based on the reviewers' suggestions, the authors explore that cutting-edge algorithms such as DeepLab and EfficientNet have important complementary value to this study. Among them, DeepLab series, with its cavity convolution and depth-separable convolution characteristics, effectively expands the sensory field while maintaining computational efficiency, which is particularly suitable for the semantic segmentation task of fine material defects, and its ASPP module is able to capture multi-scale environmental information, which is potentially advantageous for the identification of defects of different sizes on the surface of aluminium profiles; whereas, EfficientNet, through the composite scaling approach Balancing the three dimensions of network depth, width, and resolution, EfficientNet achieves an optimal balance of parameter efficiency and accuracy, and its lightweight design is particularly suitable for deployment needs in industrial real-time inspection scenarios.

For this reason, the authors have updated the introduction with the integration of cutting-edge algorithms such as DeepLab and EfficientNet. The updated introduction not only retains the original discussion of two-stage and one-stage inspection architectures, but also adds a brief introduction to DeepLab's null convolution and ASPP modules and EfficientNet's composite scaling method, highlighting the potential value and applicability of these state-of-the-art architectures to the detection of surface defects on aluminium profiles. The updated introduction follows:

In recent years, surface defect detection algorithms based on deep learning have advanced rapidly. There are two predominant detection architectures for these algo-rithms: the two-stage detection architecture and the single-stage detection architecture. Among these, the two-stage network exemplified by Fast R-CNN [7] is not suitable for real-time detection, despite its ability to effectively process large volumes of complex image data, thereby enhancing detection precision and efficiency. To meet the speed and precision requirements of modern manufacturing industries for detecting surface defects on workpieces, subsequent research has focused on the use of single-stage net-works, such as the Single-Step Multi-Frame Detector (SSD) [8], You Only Look Once (YOLO) [9], RetinaNet [10]. In addition, DeepLab series[11], with its cavity convolution and depth-separable convolution characteristics, effectively expands the sensory field while maintaining computational efficiency, which is particularly suitable for the se-mantic segmentation task of fine material defects, and its ASPP module is able to cap-ture multi-scale environmental information, which is potentially advantageous for the identification of defects of different sizes on the surface of aluminium profiles; whereas, EfficientNet[12], through the composite scaling approach Balancing the three dimen-sions of network depth, width, and resolution, EfficientNet achieves an optimal balance of parameter efficiency and accuracy, and its lightweight design is particularly suitable for deployment needs in industrial real-time inspection scenarios.

The additional references are listed below:

[11] Song, Z., Zou, S., Zhou, W. et al. Clinically applicable histopathological diagnosis system for gastric cancer detection using deep learning. Nat Commun 11, 4294 (2020). https://doi.org/10.1038/s41467-020-18147-8

[12] Kabir, H., Wu, J., Dahal, S. et al. Automated estimation of cementitious sorptivity via computer vision. Nat Commun 15, 9935 (2024). https://doi.org/10.1038/s41467-024-53993-w

Comments#1.6:

The model halts if validation loss plateaus for 50 epochs, yet only 500 total epochs are run. How did you verify that learning rate cosine annealing with an initial 0.1 does not mask late stage overfitting detectable beyond epoch 500?

Responses:

Thanks to the reviewers for their insights into the relationship between our learning rate strategy and the early stopping mechanism. The authors understand that the core issue of concern for the reviewers is the potential interaction effect of cosine annealing learning rate with the early stopping mechanism and whether 500 epochs is sufficient to validate the model's behavior over longer training cycles. To address this issue, the authors conducted additional validation experiments, extending the training period to 1000 epochs while keeping other hyperparameters unchanged, focusing on monitoring the long-term evolutionary trend of the validation loss versus the training loss. The experimental results show that in the 500-1000epochs phase, even when the learning rate is reduced to a lower level, the improvement of the validation loss has tended to be small and exhibits obvious oscillatory characteristics, while the training loss still maintains a decreasing trend, showing typical signs of overfitting.

More importantly, it is observed that the model performance reaches an optimal equilibrium in the interval of about 450-550 epochs, after which there is no significant improvement in the validation performance despite the continuous decrease in the training loss. This phenomenon confirms the scientific soundness of our setting of 500epochs with early stopping condition 50, which effectively avoids the overfitting problem that may occur in the stage of low cosine annealing learning rate. In addition, the cosine annealing strategy we adopted is adaptive, which can automatically slow down the parameter update amplitude at the late stage of training, forming a synergistic effect with the early stopping mechanism, which together ensures that the model achieves the optimal generalization performance within a reasonable training period. We would like to thank the reviewers for their valuable comments, which motivated us to validate the training strategy more comprehensively.

Comments#1.7:

BBW YOLO adds BiFPN and BiFormer yet reports just 3.03 M parameters—almost identical to vanilla YOLOv8’s 3.01 M. What pruning or weight sharing scheme justified this negligible delta despite two extra modules?

Responses:

Thank you for your keen observations and insightful questions. Regarding your point that BBW YOLO has almost the same number of parameters (3.03M) as the vanilla YOLOv8 (3.01M) that you mentioned, despite the addition of the BiFPN and BiFormer modules, this is indeed worth elaborating on.

Firstly, the authors would like to clarify that "vanilla YOLOv8" is not directly mentioned or compared in this paper, nor is it used as a benchmark model for parameter comparison. The focus of this study is on the BBW YOLO architecture, which is based on its own design, and whose parametric efficiency is mainly derived from innovative design concepts.

As described in the study abstract, the BBW YOLO model incorporates a number of innovations including BiFPN, BiFormer and WIoU v3. Despite the introduction of these advanced modules, extremely high parametric efficiencies are maintained, largely due to:

In integrating BiFPN, a strict bottleneck design concept is adopted to maintain computational efficiency while improving detection accuracy by optimizing the bidirectional information flow paths; BiFormer, as a dynamic sparse attention mechanism, has its core advantage in the flexible allocation of computational resources and dynamic query sparsity of content sensing, and this design is essentially designed to control the parameter growth while improving the performance; in the Strict channel width control and feature reuse strategies are implemented throughout the network architecture to ensure that additional parameter overheads are minimized while enhancing functionality.

We recognize that the paper does not describe these parametric efficiency design strategies in sufficient detail, which may have led to the query you have raised. The authors will add this technical detail in the revised version to provide clearer architectural explanations and parametric analyses.

Finally, all authors would like to thank the reviewers for their careful review of our manuscript and their valuable suggestions, which enabled us to better present our work.

Reviewer 2 Report

Comments and Suggestions for Authors

The proposed paper presented the development and application of the BBW YOLO model for the detection of aluminum profile surface defects.

To improve the performance of the YOLOv8 algorithm, the proposed method introduces BiFPN, BiFormer, and WIoUv3 loss function optimization to achieve high accuracy and fast real-time performance.

However, the deficiencies of the proposed plan are as follows.

1. there is a limited range of performance improvement for a specific defect type, and especially for orange peel and paint bubble defect types, the performance improvement is relatively insignificant. Even in the body of the actual paper, the improved performance for these defect types was not clearly shown.

2. the effect of improving the model's generalization ability by expanding the data was mentioned, but no specific evaluation or analysis was presented on whether the synthetic image generated by StyleGAN2-ADA sufficiently reflected variables such as various lighting conditions and complex backgrounds in the real environment.

3. the BBW YOLO model proposed in the main body of the paper showed a performance advantage in a laboratory environment, but there is a lack of practical discussion on the possibility of performance degradation due to various conditions (temperature change, dust, vibration, etc.) in actual industrial sites or technical difficulties that may arise during distribution.

Therefore, the following modifications are presented as improvements to the paper.

- it is necessary to find additional ways to improve performance for specific defect types and to come up with ways to introduce specialized fine-tuning strategies according to individual defect characteristics.

- it is necessary to deepen quantitative and qualitative comparative analysis with actual field data to increase the practical applicability of data augmentation technology, and to present additional techniques to increase the reality of synthetic data.

- it is necessary to perform additional field tests and long-term performance verification to evaluate the field potential of the proposed model to present its practical applicability in the experimental results.

Author Response

18 May 2025

Re: BBW YOLO: Intelligent Detection Algorithms for Aluminium Profile Material Surface Defects by Yin, et al. (coatings-3646096)

Thank you for the two (2) Reviewer’s Comments on our Manuscript No. coatings-3646096.

We wish to thank the Reviewers for their considered comments and help in getting this work published. We hope our revised manuscript is now suitable for publication.

Thank you for your kind assistance.

Li Haichao

Email: hcl@sues.edu.cn

Responses to Reviewer #2:

The proposed paper presented the development and application of the BBW YOLO model for the detection of aluminum profile surface defects.

To improve the performance of the YOLOv8 algorithm, the proposed method introduces BiFPN, BiFormer, and WIoUv3 loss function optimization to achieve high accuracy and fast real-time performance.

However, the deficiencies of the proposed plan are as follows.

Responses:

Thanks very much for your encouraging comments. The authors will carefully refine and revise the paper based on your review comments so that it meets the requirements for publication in the journal.

Comments#2.1:

There is a limited range of performance improvement for a specific defect type, and especially for orange peel and paint bubble defect types, the performance improvement is relatively insignificant. Even in the body of the actual paper, the improved performance for these defect types was not clearly shown.

Responses:

Thanks to the reviewers for their careful review and valuable suggestions. Based on your suggestions, the authors have carefully reviewed the data in Table 4 and found that there was a clerical error in counting the performance metrics of Orange Peel and Paint Bubble defects previously, and updated the table in the revised manuscript. The updated Table 4 can more accurately reflect the performance improvement of the BBW YOLO model proposed in this study on all defect types. The updated content is as follows:

3.3. Performance comparison of different defect types

To clarify the detection capabilities of the models before and after improvements across various defect types, comparative tests were conducted. Table 4 provides a detailed performance analysis of the model on different defect types, comparing the existing YOLOv8 model with the BBW YOLO proposed in this study. Maintain a consistent experimental platform and parameters throughout the experiment. As shown in Table 4, the BBW YOLO model demonstrates higher precision and precision for most defect types, particularly in detecting edge-exposed, orange peel, base exposure and discoloration. As a whole, for all defect types, the enhanced model increases precision and mAP@0.5:0.95 by 5% and 4.7%, respectively, compared to the original model. In summary, the BBW YOLO model proposed in this study effectively enhances the detection capabilities for various types of surface defects on aluminum profiles in most cases.

Table 4. Comparison of model performance on different types of defects.

Defect type	P/%		mAP@0.5:0.95/%
Defect type	YOLOv8	BBW YOLO	YOLOv8	BBW YOLO
Electrically Nonconductive	83.3	86.7	64.9	68.1
Scratching	65.1	69.7	29	32.9
Edge Exposed	91.6	99.2	43.9	46.1
Orange Peel	93.1	94.3	83.6	90.4
Base Exposure	90.1	91.3	56.9	65.5
Splashing	83.1	89.2	55.3	55.5
Paint Bubble	63.2	74.7	23.5	37.4
Pitting	82.2	83.7	56.6	61.1
Discoloration	98.1	99.2	92.7	96.1
Dirt Inclusion	75.3	87.1	30.9	31.0
All defects	82.5	87.5	53.7	58.4

Comments#2.2:

The effect of improving the model's generalization ability by expanding the data was mentioned, but no specific evaluation or analysis was presented on whether the synthetic image generated by StyleGAN2-ADA sufficiently reflected variables such as various lighting conditions and complex backgrounds in the real environment.

Responses:

Thank you very much for your review and valuable comments on our manuscript. In response to your questions regarding the source of the dataset and the actual number of images it contains, as well as the need to highlight the results before and after the application of the GAN to discuss the impact of expanding the dataset, we have revised the manuscript accordingly.

1) Source of dataset and number of images: in the revised version, we provide detailed information regarding the source and distribution of the aluminum profile material defect detection dataset. This additional information enhances the reader's understanding of the context and nature of the data collection.

2) Comparison of results before and after using GAN: we expanded the dataset utilizing StyleGAN2-ADA to mitigate the issue of limited sample size in the original dataset. In the revised version, we emphasize the comparison of results obtained before and after the application of the GAN.

3) The impact of the extended dataset: the extension of the dataset significantly enhances the model's generalization ability, particularly in scenarios involving complex backgrounds or variations in lighting. By increasing the sample size of rare defect categories, the model effectively improves detection precision for these categories. Furthermore, the extended dataset mitigates the overfitting phenomenon, resulting in more robust performance on the test set.

4) Defect categories and their distribution: the dataset comprises a total of ten defect categories, including surface scratches, oxidation spots, cracks, and bubbles, among others. The distribution of these categories is illustrated in detail in the revised version through graphs. Among these categories, base exposure defects represent the largest sample size, accounting for 15.6% of the total. Conversely, paint bubble defects have the smallest sample size, comprising only 4.8%. The distribution of the remaining categories is relatively uniform, with each accounting for approximately 10% of the total sample size. Notably, the data distribution of the GAN-expanded dataset is more balanced across each defect category.

The specific modifications are as follows:

A diverse dataset enhances the model's generalization and mitigates overfitting, leading to a more accurate representation of realistic scenarios for detecting surface defects in aluminum profiles. The original images in this manuscript's dataset were sourced from the Tenchi Aluminium Surface Defects dataset, which includes 1,885 images, as well as from various aluminium manufacturers, contributing an additional 1,531 images. This dataset encompasses different manufacturing processes and production batches, ensuring a diverse and representative collection of data. In total, the dataset comprises 3,416 original images that depict 10 types of defects, including electrically nonconductive, scratching, edge exposed, orange peel, base exposure, splashing, paint bubble, pitting, discoloration and dirt inclusion. An example from the dataset is presented in Figure 6. Among the categories, defective sample data for base exposure represents the largest proportion, accounting for 15.6%. In contrast, the number of defective samples for paint bubble is the smallest, comprising only 4.8%. The remaining categories exhibit a similar distribution, each accounting for approximately 10% of the total sample size.

Figure 6 Example of sample data

The dataset must consider the effects of the production process, environmental factors, the age of use, and other relevant variables. Additionally, it should account for various defects that may appear on the surface of the aluminum profile material, as well as the impacts of image contours, shadows, and overlaps. To address these complexities, the dataset was enhanced using the StyleGAN2-ADA technique. With the expanded dataset, the sample size is increased to 14,600 images, especially on the rare defect category. The raw images in the aluminum profile material dataset were manually annotated using LabelImg software to identify specific areas and types of defects. The annotations are saved in TXT format according to the YOLO standard for training purposes. Additionally, the training, validation, and test sets are allocated in a ratio of 8:1:1 [7],[9],[25],[26],[27]. In this way a dataset for the detection of surface defects in aluminium profiles in YOLO format has been developed. The experimental results before and after the StyleGAN2-ADA extension are presented in Table 2. As shown in Table 2, the overall precision, mAP@0.5 and mAP@0.5:0.95 of each model improved following the dataset expansion. Notably, the enhanced BBW YOLO model exhibited the most significant improvements, with increases of 15.95% in precision and 9.25% in mAP@0.5:0.95, respectively.

Table 2 Comparison of different models before and after data enhancement

	Model type	StyleGAN2-ADA	P/%	mAP@0.5/%	mAP@0.5:0.95/%
	YOLOv8		67.23	61.92	46.47
	BBW YOLO		71.56	76.9	49.11
YOLOv8		+	82.51	76.82	53.66
	BBW YOLO	+	87.51	81.77	58.36

The data extension significantly enhances the model's generalization ability and performs more consistently when handling scenes with complex backgrounds or lighting variations. By increasing the sample size of rare defect categories, the model effectively improves detection precision in these areas. Furthermore, the extended dataset mitigates overfitting, resulting in more robust performance on the test set.

Finally, the authors hope that these additional evaluations and analyses will more fully demonstrate the effect of data augmentation on the enhancement of the model's generalization capabilities and make a stronger argument for the effectiveness of synthetic images generated by StyleGAN2-ADA in simulating changes in real environments.

Comments#2.3:

The BBW YOLO model proposed in the main body of the paper showed a performance advantage in a laboratory environment, but there is a lack of practical discussion on the possibility of performance degradation due to various conditions (temperature change, dust, vibration, etc.) in actual industrial sites or technical difficulties that may arise during distribution.

Therefore, the following modifications are presented as improvements to the paper.

Responses:

In response to the reviewers' concerns about the potential for the model to be applied in real industrial sites, in the revised version the authors have added a new section on planning future research directions, focusing on the importance of conducting additional field tests and long-term performance validation, exploring specialized fine-tuning strategies for specific defects, and deepening the comparative analysis of real field data and improving the fidelity of the synthetic data. The planning section of the summarized future research will be updated below:

The future research will focus on enhancing the value of the BBW YOLO model for use in real industrial environments. More comprehensive and specific field tests will be implemented to evaluate the robustness of the model under disturbances such as temperature, dust and vibration. Steps will be taken to further explore more refined model optimization and specialized fine-tuning strategies to address the performance bottlenecks of existing models on specific defect types. Meanwhile, collaboration with industry will be strengthened to incorporate more real field data to more accurately assess the effect of data enhancement and guide the optimization of synthetic data.

In the future, the in-depth research planned above will enable a more comprehensive assessment and enhancement of the performance and reliability of the BBW YOLO model in practical industrial applications, providing a more effective solution for the detection of surface defects in aluminium profiles.

Finally, all authors would like to thank the reviewer for their careful review of our manuscript and their valuable suggestions, which enabled us to better present our work.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The authors have addressed my comments; therefore, the paper can be accepted for publication in the present format.

Article Menu

BBW YOLO: Intelligent Detection Algorithms for Aluminium Profile Material Surface Defects

Further Information

Guidelines

MDPI Initiatives

Follow MDPI