Textile Defect Detection Using Artificial Intelligence and Computer Vision—A Preliminary Deep Learning Approach
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis manuscript presents an edge AI-based real-time fabric defect detection system using a lightweight YOLOv11n model deployed on the NVIDIA Jetson Orin Nano platform. The study focuses on detecting three types of defects (hole, crease, color bleeding) in solid-colored cotton fabric using RGB-D input from Intel Realsense D435i cameras. The authors describe data augmentation strategies, training methodology, real-world deployment in an industrial setting, and evaluation using standard metrics.
However, while the conceptual framework is strong and ambitious, several methodological, theoretical, and practical aspects require more clarity and validation.
-While the system is well-engineered and applicable, it lacks algorithmic novelty. The work primarily focuses on deploying an existing YOLOv11n model rather than proposing a novel architecture or detection strategy. The authors should elaborate on any custom modifications made to YOLOv11n or post-processing pipelines tailored for the textile domain. Highlighting even minor domain-specific adaptations would strengthen the methodological contribution.
-The study is limited to plain, solid-colored cotton fabrics. This restricts the model’s generalizability and raises concerns about how it would perform on patterned, textured, or multi-colored textiles common in industry. The authors are encouraged to explicitly discuss the limitations of their current dataset and outline a roadmap or future work toward supporting broader textile categories.
-No comparative evaluation with other models (e.g., SSD, Faster R-CNN, YOLOv5/v8) is provided to justify the selection of YOLOv11n. Include at least one baseline model for comparison, or cite internal benchmarks that motivated the choice. This will contextualize the effectiveness of YOLOv11n in this setting.
-While the authors apply several augmentation techniques (rotation, brightness adjustment, noise), their individual contributions to performance are not quantified. Present a small ablation study or table showing the performance impact of different augmentations to demonstrate their benefit and justify inclusion.
-The color bleeding class exhibits relatively low precision (0.62), suggesting a higher false positive rate. Discuss alternative strategies to enhance detection of chromatic defects, such as color space transformation (e.g., Lab or HSV), histogram-based color equalization, or sensor fusion (e.g., multispectral or hyperspectral imaging).
-The authors mention that specialized pre-processing was deliberately excluded for model generality. However, this decision may have negatively impacted color defect detection. Consider testing adaptive preprocessing selectively for color defects, or at least simulate its potential benefits in future work.
-Detection examples (Figures 8–10) are helpful but would benefit from clearer confidence annotations and bounding box color legends. Overlay numerical confidence scores directly on images and use consistent color-coding for defect types to enhance readability.
Author Response
Comments 1: "While the system is well-engineered and applicable, it lacks algorithmic novelty. The work primarily focuses on deploying an existing YOLOv11n model rather than proposing a novel architecture or detection strategy. The authors should elaborate on any custom modifications made to YOLOv11n or post-processing pipelines tailored for the textile domain. Highlighting even minor domain-specific adaptations would strengthen the methodological contribution."
Response 1: The authors would like to thank the reviewer for the thoughtful and constructive feedback.
We agree with the observation regarding the limited algorithmic novelty and acknowledge the importance of clarifying any custom adaptations of the YOLOv11n model. While our work does not introduce a novel detection architecture, the primary contribution lies in the practical deployment of a lightweight, cost-effective system that is versatile across different fabric types and defect classes, and adaptable to various loom configurations. We believe this practical innovation provides significant value for real-world textile applications where accessibility and affordability are critical factors. This versatility was better explained in the paper, being highlighted in the last paragraph of the abstract.
Comments 2: "The study is limited to plain, solid-colored cotton fabrics. This restricts the model’s generalizability and raises concerns about how it would perform on patterned, textured, or multi-colored textiles common in industry. The authors are encouraged to explicitly discuss the limitations of their current dataset and outline a roadmap or future work toward supporting broader textile categories."
Response 2: We appreciate the reviewer’s observation regarding the limited scope of our dataset. As correctly pointed out, the study focuses exclusively on plain, solid-colored cotton fabrics. This choice was intentional and aligned with the primary goal of the research and the textile company: to investigate the performance of the defect detection system on uniform textiles produced using a specific class of looms that are exclusively designed to weave plain fabrics.
This type of loom is widely used in various industrial settings where standardized production and quality control are critical, and thus represents a significant and practical subset of textile manufacturing. By narrowing the scope to this textile category, we ensured consistency in data acquisition and model training, which was essential for the controlled evaluation of our proposed method.
That said, we fully acknowledge that this focus limits the generalizability of our model to more complex textile categories such as patterned, textured, or multi-colored fabrics. We have revised the manuscript to include an explicit discussion of this limitation (section 5 and section 6, last paragraph) and have outlined directions for future work, which include expanding the dataset to incorporate a wider variety of textiles and evaluating model robustness across different loom technologies and fabric types.
Comments 3: "No comparative evaluation with other models (e.g., SSD, Faster R-CNN, YOLOv5/v8) is provided to justify the selection of YOLOv11n. Include at least one baseline model for comparison, or cite internal benchmarks that motivated the choice. This will contextualize the effectiveness of YOLOv11n in this setting."
Response 3: We thank the reviewer for this valuable and insightful comment.
The choice of YOLOv11n was primarily driven by the constraints of our application, which demands real-time inference, low computational load, and deployment on edge devices with limited processing capabilities. YOLOv11n, as a lightweight model from the YOLO family, provides an effective balance between speed and accuracy under these conditions.
In our updated state-of-the-art section (topic 2.5), we now present a clearer justification for this choice, including a comparative discussion of relevant models such as SSD, Faster R-CNN, and other YOLO variants (e.g., YOLOv5, YOLOv8).
While larger models like YOLOv5m and Faster R-CNN can offer slightly improved detection accuracy, our internal benchmarks demonstrated that these gains come at a significant cost in inference time and resource consumption — rendering them unsuitable for our intended low-cost, real-time deployment.
Accordingly, we have added a final comparative subsection that contextualizes the selection of YOLOv11n within our deployment scenario, highlighting its competitive performance given our system constraints.
Comments 4: "While the authors apply several augmentation techniques (rotation, brightness adjustment, noise), their individual contributions to performance are not quantified. Present a small ablation study or table showing the performance impact of different augmentations to demonstrate their benefit and justify inclusion."
Response 4: We thank the reviewer for the helpful suggestion. In the revision, we include a compact ablation isolating the effect of each augmentation (rotation, brightness adjustment, Gaussian noise) on performance. We report mAP@0.5 overall. Results show that rotation yields the largest improvement, brightness adjustment provides an equivalent gain and additive noise offers a smaller but still significant improvement. The combined policy outperforms any single augmentation. Details are provided in Table 3 and discussed in Section 3.1 (subsection ii).
Comments 5: "The color bleeding class exhibits relatively suggesting a higher false positive rate. Discuss alternative strategies to enhance detection of chromatic defects, such as color space transformation (e.g., Lab or HSV), histogram-based color equalization, or sensor fusion (e.g., multispectral or hyperspectral imaging)."
Response 5: We thank the reviewer for this insightful comment. Indeed, the “color bleeding” class showed lower precision (0.62), which indicates a higher false positive rate. We have now added a discussion on potential strategies to enhance detection of chromatic defects. Specifically, we highlight the use of alternative color spaces (e.g., Lab or HSV) to better separate luminance from chromatic information and sensor fusion approaches (e.g., multispectral or hyperspectral imaging) that can provide richer spectral features for defect detection. This discussion has been included in the revised manuscript in Section 5, lines 857-866.
Comments 6:"The authors mention that specialized pre-processing was deliberately excluded for model generality. However, this decision may have negatively impacted color defect detection. Consider testing adaptive preprocessing selectively for color defects, or at least simulate its potential benefits in future work."
Response 6: We thank the reviewer for this insightful observation. As noted, the decision to exclude specialized pre-processing was intentional, aimed at preserving model generality and simplifying deployment across varied textile environments.
However, we acknowledge that this design choice may have reduced the model’s sensitivity to certain defect types, particularly those involving subtle color variations. In light of this, we plan to explore a hybrid approach in future work, integrating adaptive preprocessing techniques or leveraging multiple imaging modalities. Such strategies would allow selective enhancement—activating preprocessing dynamically based on defect characteristics—while preserving the system's overall generalizability.
This perspective has been incorporated into the revised manuscript (see Section 5 and section 6, discussion and conclusion), outlining our future research directions. We appreciate the reviewer’s suggestion, which aligns closely with our ongoing development efforts.
Comments 7: "Detection examples (Figures 8–10) are helpful but would benefit from clearer confidence annotations and bounding box color legends. Overlay numerical confidence scores directly on images and use consistent color-coding for defect types to enhance readability."
Response 7: We appreciate the reviewer’s suggestion to improve the clarity and readability of the detection examples presented in Figures 8–10.
In response, we have revised the figures to include:
- Numerical confidence scores overlaid directly on each predicted bounding box;
- A consistent color-coding scheme for different defect types, now explained in an accompanying legend.
These enhancements improve visual interpretability and make it easier to assess model confidence and classification accuracy immediately. The updated figures can be found in the revised manuscript (Figures 8–10), and the legend has been included in the respective captions for clarity.
We thank the reviewer for this valuable sugestion.
Reviewer 2 Report
Comments and Suggestions for AuthorsThe authors propose a system prototype designed to detect fabric defects in real time during machine operation. The system employs an RGB camera to capture images of fast-moving fabric, which are then transmitted to an edge server for defect segmentation. YOLOv11n is utilized as the deep-learning backbone to identify three types of defects. Experimental results demonstrate that the system can operate in real time and achieves a mean Average Precision of 0.82 at IoU threshold 0.50 (mAP@50).
The manuscript presents several issues, particularly regarding its claims of contribution and the coverage of related work. Numerous recent studies (post-2024) have already addressed real-time fabric defect detection in manufacturing environments, including scenarios with challenging illumination conditions. To clearly articulate the contributions of this study, the authors should first identify the current state-of-the-art in this domain and then discuss the limitations of existing solutions. This will help clarify how the proposed system provides an improvement over previous works.
Notably, many of the studies cited in the related work section are outdated. As current literature shows, recent approaches have demonstrated strong performance in defect segmentation. Although many do not address real-time constraints or variability in manufacturing conditions, there are still a few recent works that specifically focus on real-time defect detection under realistic operational settings, yet these are not discussed in the manuscript. Even if the authors choose not to conduct a direct comparison in terms of precision or processing speed, it is still essential to discuss the differences between the proposed system and those existing solutions, along with the potential impact of these differences.
Additionally, the authors should provide more insight into the design rationale of the proposed system. Specifically, they should explain why two RGB cameras were selected, rather than a different number or type of cameras (e.g., monochrome, RGB-depth, etc.). The novelty of the augmentation technique is also questionable, as it appears to rely on standard practices commonly employed in deep learning applications.
Minor issues:
The authors state: "Results demonstrate the system’s ability to maintain a mean Average Precision over 82%." This phrasing should be corrected to specify mAP@50 for clarity and precision.
While mAP@50 is used as the performance metric, the manuscript defines only mAP without mentioning the specific IoU threshold used. Since IoU plays a critical role in evaluating performance, the authors should indicate how IoU is computed—whether it is based on pixel-level segmentation or bounding-box comparison.
Author Response
Comments 1: "The manuscript presents several issues, particularly regarding its claims of contribution and the coverage of related work. Numerous recent studies (post-2024) have already addressed real-time fabric defect detection in manufacturing environments, including scenarios with challenging illumination conditions. To clearly articulate the contributions of this study, the authors should first identify the current state-of-the-art in this domain and then discuss the limitations of existing solutions. This will help clarify how the proposed system provides an improvement over previous works."
Response 1: We thank the reviewer for pointing out the need for a more thorough review of the recent literature and a clearer articulation of our study’s contributions relative to the current state of the art.
In response, we have substantially revised the Related Work section to incorporate recent studies published after 2024, including works that address real-time fabric defect detection under challenging manufacturing conditions such as variable illumination and motion blur. These additions help establish a more accurate and updated context for our work.
Furthermore, we now explicitly highlight the limitations of existing approaches, particularly regarding their computational cost, deployment complexity, or reduced adaptability across loom types and fabric categories. This discussion clarifies how our proposed solution contributes by offering:
- A lightweight detection pipeline suitable for deployment on edge devices;
- Robust performance on uniform plain-woven textiles, which are prevalent in certain industrial setups;
- A modular framework designed for future extension to broader textile categories.
We also plan to include comparative experiments or references to benchmarks where available, to strengthen the justification for our model choices and demonstrate performance gains.
These updates can be found in the revised manuscript in Section 2.4. We sincerely thank the reviewer for this valuable guidance, which helped us enhance both the depth and positioning of our work.
Comments 2: "Notably, many of the studies cited in the related work section are outdated. As current literature shows, recent approaches have demonstrated strong performance in defect segmentation. Although many do not address real-time constraints or variability in manufacturing conditions, there are still a few recent works that specifically focus on real-time defect detection under realistic operational settings, yet these are not discussed in the manuscript. Even if the authors choose not to conduct a direct comparison in terms of precision or processing speed, it is still essential to discuss the differences between the proposed system and those existing solutions, along with the potential impact of these differences."
Response 2: We appreciate the reviewer’s observation and fully agree that our initial Related Work section did not sufficiently reflect the most recent advances in real-time defect detection under realistic manufacturing conditions.
In response, we have carefully updated the Related Work section to include several relevant studies published after 2024 that address defect segmentation, real-time detection, and deployment under variable operational conditions. While many of these approaches achieve strong segmentation accuracy, they often rely on high-end hardware or involve complex pipelines that may not be suitable for constrained environments such as legacy looms (the traditional weaving machines) or edge devices (the low-power embedded systems) — a key focus of our study.
We have added a focused discussion comparing the design choices, computational trade-offs, and deployment scenarios of these recent methods with our own. Even in the absence of direct numerical comparison (due to dataset or implementation differences), this discussion helps to clarify the unique value of our approach, particularly its balance between accuracy, efficiency, and adaptability.
These updates are now reflected in Section 2 of the revised manuscript, including citations to recent relevant literature and a new table summarizing key distinctions between selected works and our own system.
We thank the reviewer once again for this constructive feedback, which led to a clearer positioning of our contributions within the current research landscape.
Comments 3: "Additionally, the authors should provide more insight into the design rationale of the proposed system. Specifically, they should explain why two RGB cameras were selected, rather than a different number or type of cameras (e.g., monochrome, RGB-depth, etc.). The novelty of the augmentation technique is also questionable, as it appears to rely on standard practices commonly employed in deep learning applications."
Response 3: We thank the reviewer for the insightful comment regarding the system design rationale.
The choice of using two RGB cameras was based on practical constraints: the fabric width is 3 meters, and meter, a single camera could not cover the entire area with sufficient resolution. Using two cameras ensured full coverage and defect visibility across the fabric width, as well as reducing blind spots caused by texture orientation, lighting variation, or mechanical alignment of the loom.
We have clarified these points in the revised manuscript (Section 3.1).
Regarding data augmentation, we agree that the techniques used are standard. Our focus was not on novelty, but on adapting them to real-world textile variability. This is now explained more clearly in Section 3.1, along with an ablation table (Table 3) showing their impact on performance.
Comments 4: "The authors state: "Results demonstrate the system’s ability to maintain a mean Average Precision over 82%." This phrasing should be corrected to specify mAP@50 for clarity and precision."
Response 4: We appreciate the reviewer’s suggestion. We have revised the phrasing to explicitly state the IoU threshold used. The corrected text now specifies mAP@0.5 instead of the general term mAP.
Comments 5: "While mAP@50 is used as the performance metric, the manuscript defines only mAP without mentioning the specific IoU threshold used. Since IoU plays a critical role in evaluating performance, the authors should indicate how IoU is computed—whether it is based on pixel-level segmentation or bounding-box comparison."
Response 5: We thank the reviewer for pointing this out. In the revised manuscript, we have clarified the evaluation metric. Specifically, we report mAP@0.5, which is the mean Average Precision computed at an Intersection over Union (IoU) threshold of 0.5. The IoU is calculated using bounding-box overlap between predicted and ground-truth boxes, following the standard object detection evaluation protocol. This clarification has now been included in Section 3.5, in the Mean Average Precision formula description.
Reviewer 3 Report
Comments and Suggestions for AuthorsAuthors present a real time AI based fabric defect detection system implemented on an edge computing platform. The paper is well-structured and provides useful insights for readers working in the area of automated inspection and industrial computer vision. Following suggestions are offered:
- It would be helpful to briefly explain why YOLOv11n was chosen, particularly over other lightweight variants. Moreover, it is suggested to add few details about the model.
- Since the dataset is limited to solid coloured cotton fabric, it may be worth mentioning this as a limitation in the conclusion. It could affect the generalizability of the results to other fabric types.
- In a few places, main sections are referred to as “Chapters,” which is more typical in thesis writing. For a journal paper, term “Sections” is more appropriate.
- While the literature review is comprehensive, the paper does not clearly highlight that which gap in existing work it addresses. If this work fills a specific limitation in prior studies, it is recommended to state that clearly.
- The paper mentions that over 5,000 images were collected, which is a strength. However, it would be better if authors can clarify how many images per defect type were labelled and how the dataset was curated.
- Discussion and conclusion are combined in section 5. In order to present insights from results in a better way and to clearly define future work It is suggested to separate them.
- It would be helpful to include a brief comparison with related methods
- The reported model precision is 75%, which may be considered modest for industrial settings. It is suggested to explain whether this is acceptable in the target use-case, or future work suggestions may be included for its improvement.
- A comparison with traditional methods (i.e. using SVM or KNN with hand crafted features like HOG or LBP features) would justify using deep learning in this application.
Author Response
Comments 1: "It would be helpful to briefly explain why YOLOv11n was chosen, particularly over other lightweight variants. Moreover, it is suggested to add few details about the model. "
Response 1: We thank the reviewer for the positive feedback and valuable suggestions.
was selected for its balance between accuracy and efficiency, making it particularly well-suited for real-time inference on edge devices. Among lightweight variants, YOLOv11n provides competitive performance with a small model size and low computational cost, while still maintaining high detection accuracy on our dataset.
We have added a brief explanation and model description in Section 2.4 of the revised manuscript.
Comments 2: "Since the dataset is limited to solid coloured cotton fabric, it may be worth mentioning this as a limitation in the conclusion. It could affect the generalizability of the results to other fabric types."
Response 2: We thank the reviewer for this important observation. We agree that the restriction to solid-colored cotton fabrics is a limitation that may affect the generalizability of the results to patterned, textured, or multi-colored fabrics.
This limitation has now been explicitly mentioned in the Discussion and Conclusion section (5 and 6), along with a note on future work aimed at extending the system to broader textile categories through dataset expansion and sensor adaptation.
Comments 3: "In a few places, main sections are referred to as “Chapters,” which is more typical in thesis writing. For a journal paper, term “Sections” is more appropriate."
Response 3: We thank the reviewer for pointing this out. All references to “Chapters” have been corrected to “Sections” to follow the standard convention for journal articles.
Comments 4: "While the literature review is comprehensive, the paper does not clearly highlight that which gap in existing work it addresses. If this work fills a specific limitation in prior studies, it is recommended to state that clearly."
Response 4: We thank the reviewer for this helpful suggestion. While the original literature review provided context, we agree that the specific gap addressed by our work was not stated explicitly.
We have now revised the end of the State of Art section 2 to clearly articulate the gap: namely, the lack of lightweight, real-time defect detection systems tailored for uniform fabrics on conventional looms, using edge-friendly architectures deployable in constrained industrial environments.
This clarification helps position our contribution more precisely within the existing body of work.
Comments 5: "The paper mentions that over 5,000 images were collected, which is a strength. However, it would be better if authors can clarify how many images per defect type were labelled and how the dataset was curated."
Response 5: We thank the reviewer for highlighting this point. We have updated the manuscript (Section 3.1: System Architecture) to include additional details on the dataset curation process, namely the number of instances per defect type.
Additional information about dataset construction is available in section 3.3 lines 668-674.
This clarification provides better transparency regarding dataset composition and helps contextualize the training and evaluation process.
Comments 6: "Discussion and conclusion are combined in section 5. In order to present insights from results in a better way and to clearly define future work It is suggested to separate them."
Response 6: We thank the reviewer for the suggestion. The current structure, with Discussion and Conclusion combined in Section 5, follows the formatting guidelines of the journal's provided template. However, if the editorial team prefers them to be separated, we are happy to revise accordingly. The Discussion and Conclusion sections have now been separated to improve clarity. The revised Discussion section focuses on analysis and interpretation of results, while the Conclusion highlights key findings and outlines directions for future work.
Comments 7: "It would be helpful to include a brief comparison with related methods"
Response 7: We thank the reviewer for this valuable and insightful comment.
In our updated state-of-the-art section (topic 2.5), we now present a clearer justification for this choice, including a comparative discussion of relevant models such as SSD, Faster R-CNN, and other YOLO variants (e.g., YOLOv5, YOLOv8).
Accordingly, we have added a final comparative subsection that contextualizes the selection of YOLOv11n within our deployment scenario, highlighting its competitive performance given our system constraints.
Comments 8: "The reported model precision is 75%, which may be considered modest for industrial settings. It is suggested to explain whether this is acceptable in the target use-case, or future work suggestions may be included for its improvement."
Response 8: We thank the reviewer for this observation. We agree that a precision of 75% may be modest in some industrial contexts. However, in our target use-case — continuous inspection of plain fabrics on conventional looms — this level of precision is considered acceptable as a first filter in a multi-stage quality control pipeline, where flagged defects are further reviewed.
We have added a clarification in the manuscript (Section 5), and also outlined future work aimed at improving accuracy, including expanded training data, multi-sensor fusion, and adaptive thresholding strategies.
Comments 9: "A comparison with traditional methods (i.e. using SVM or KNN with hand crafted features like HOG or LBP features) would justify using deep learning in this application."
Response 9: We thank the reviewer for this insightful suggestion. While a comparison with traditional methods (e.g., SVM or KNN with handcrafted features like HOG or LBP) would further highlight the advantages of deep learning, our focus in this study was on real-time deployment and scalability, where deep learning models offer superior performance and automation.
We have now acknowledged this point in the revised manuscript (Section 2.4) and included a brief discussion comparing the expected trade-offs between classical and deep learning approaches based on findings in related literature.
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsI appreciate the authors’ thorough and constructive revisions to the manuscript. All of my concerns have been carefully addressed in the updated version. The clarifications on methodological contributions, dataset limitations, and comparative justifications provide the necessary context and significantly strengthen the paper. The inclusion of an ablation study for augmentation effects, improved discussion of chromatic defect detection strategies, and acknowledgment of preprocessing trade-offs further enhance the technical depth.
Reviewer 3 Report
Comments and Suggestions for AuthorsAuthors have improved the manuscript and have addressed most of the suggestions.