1. Introduction
Down is an animal protein fiber growing on the abdomen of geese and ducks, appearing as reed-flower-shaped downy tufts. Under a magnifying glass, each down filament shows a fish-scale structure, densely covered with numerous triangular pores that store a large amount of stationary air. These pores contract and expand as temperature changes. Due to the low thermal conductivity of air, down exhibits excellent thermal insulation performance. Down with longer filaments, larger clusters and higher cluster content is of better quality.
As a high-end thermal insulation material, down quality directly determines the thermal insulation property, bulkiness and commercial grade of related products. According to the Chinese national standard GB/T 17685-2016 Feathers and Down [
1], the core grading indicator of down quality is down content, defined as the percentage of the total mass of down clusters and down filaments in the total mass of feathers and down. The industry-standard terms “95 down”, “90 down” and “85 down” refer to down raw materials with down content ≥95%, ≥90% and ≥85%, respectively. For a more refined down quality classification, mature down clusters act as the core effective component, whereas immature down clusters, down filaments, feathers and yellow-tailed down are key impurities that downgrade the product quality. Yellow-tailed down, in particular, will cause the whole batch of down to appear yellow when mixed in. Against the background of continuously growing global demand for down products, efficient and accurate quality inspection has become a core demand for the quality improvement and industrial upgrading of the down sector.
At present, the down industry still fully adopts manual inspection to comply with national standard specifications. In actual production, manufacturers randomly extract fixed-gram samples from each batch of down. Through our field investigation at down processing factories in Taiqian County, Puyang City, Henan Province, we have gained an in-depth understanding of the on-site inspection workflow. In two independent quality inspection laboratories, professional inspectors are arranged at individual workstations. With tweezers on transparent glass workbenches, they manually separate and pick out down fibers, immature down clusters, feathers, yellow-tipped down and other components one by one for manual counting and statistical analysis. Generally, only 5 g or 10 g micro samples are used for testing, and the inspection results of these small samples are adopted to evaluate the quality grade of the entire production batch.
As a long-standing standard detection method, pure manual inspection has prominent inherent shortcomings. Manual sorting and counting are extremely time-consuming and labor-intensive, resulting in low detection efficiency for batch products. Moreover, inspection results are highly susceptible to the inspector’s working experience, eyesight condition and subjective judgment, which easily cause unstable detection accuracy and large manual errors, making it difficult to achieve unified and standardized evaluation. In the context of large-scale raw material procurement and continuous industrial production, these drawbacks have gradually become key bottlenecks restricting the high-quality development of the down industry.
In recent years, deep learning and computer vision technologies have been widely applied in the field of industrial automatic quality inspection. In particular, the YOLO series of object detection models have been deployed in textiles, agricultural products, medical detection and other scenarios owing to their advantages of strong real-time performance, high accuracy, lightweight structure and easy deployment [
2,
3,
4,
5,
6,
7,
8].
However, existing studies on down detection still present obvious deficiencies that make it difficult for relevant technologies to be applied in industrial production, whose root cause lies in the lack of professional datasets. Due to the absence of high-quality datasets for the national standard-based down quality inspection scenario, a series of problems arise as follows: firstly, precise identification of single down components cannot be supported, and existing models are difficult to adapt to the multi-object detection scenario in a single image; secondly, the detection methods cannot be effectively integrated with the calculation logic of national standard down content, so that the results can hardly be directly used for down grade determination.
Specifically, although the FCA-YOLO model proposed by Liu et al. [
9] can distinguish goose down from duck down, it is limited by the incomplete category coverage of the dataset and fails to cover the full component categories including mature down clusters, immature down clusters, down filaments, feathers and yellow-tailed down. The dataset constructed by Zhang et al. [
10] contains only more than 4000 images with merely two categories (down and feathers) annotated, representing extremely limited category labeling that is far from meeting the requirements of industrial-level deep learning models for large-scale and finely annotated data. Although public datasets such as FeatherV1 [
11] have a relatively large scale, they are designed for the classification and identification of bird species, and their annotated categories are seriously inconsistent with the requirements of industrial down quality inspection. The official implementation of GB/T 45519-2025 [
12] indicates that AI-based detection methods have been recognized by national standards in the textile industry. Nevertheless, the field of down detection still lacks a dedicated dataset and standard docking scheme targeting national standard tasks, which further highlights the gaps in existing research.
Therefore, constructing a dedicated dataset for national standard-based down quality inspection and developing an intelligent detection method conforming to the calculation logic of down content on this basis possess important research value and practical significance.
As mainstream iterative versions of the YOLO series [
13,
14,
15], YOLOv8, YOLOv11 and YOLOv26 have been continuously optimized in small-object recognition, multi-class classification and model efficiency. In the field of industrial quality inspection, YOLOv11 has been successfully applied to various scenarios including defect detection in energy cable production lines [
16], object detection in dense industrial scenes [
17], rail fastener defect detection [
18] and metal surface defect detection [
19], which verifies its reliability in real-time detection and industrial deployment. Meanwhile, research on the improvement of YOLOv8 in small-object detection [
20,
21,
22] also provides methodological references for the accurate identification of small and fine targets in down components.
In summary, YOLO series models can satisfy the requirement for accurate detection of down components. Targeting industrial pain points including low efficiency and high measurement errors caused by manual counting, the technical scheme proposed in this study helps optimize repetitive manual counting procedures, effectively reducing time costs and human subjective bias. More importantly, this paper originally puts forward an innovative research idea for the automatic classification and grading of down quality, offering a brand-new direction for the intelligent upgrading of down quality inspection.
The remainder of this paper is organized as follows:
Section 2 introduces the construction method of the down dataset and elaborates the network structures and key improvements of the compared models;
Section 3 presents the specific settings, results and comparative analysis of model training and simulation experiments;
Section 4 proposes an automatic down quality classification method based on detection results;
Section 5 summarizes the overall work of this paper and prospects the future research directions.
3. Experiments and Result Analysis
This chapter details the experimental environment and configurations, including the hardware platform, software environment, and unified training hyperparameter settings. On this basis, an evaluation metric system is established, covering detection accuracy metrics and model complexity metrics, which provides a quantitative basis for comprehensively measuring the detection performance and engineering practicability of each model. Subsequently, based on the experimental data, the performance of six models—YOLOv8n, YOLOv8m, YOLOv11n, YOLOv11m, YOLOv26n, and YOLOv26m—on the down feather detection dataset is systematically analyzed, and comparisons are made from three dimensions: overall performance, accuracy-efficiency trade-off, and scenario-specific applicability. Finally, in view of the phenomenon that the YOLOv26 series (v26n and v26m) underperforms YOLOv11n on this dataset, an in-depth analysis of the causes is conducted from the perspectives of model design philosophy, key architectural differences, and the adaptability of training strategies to private datasets. It reveals that architectures pursuing extreme end-to-end efficiency may lead to accuracy degradation on specific industrial datasets, providing theoretical guidance for subsequent model selection and optimization.
3.1. Experimental Environment and Configuration
All models were trained and evaluated under a unified hardware and software environment. The hardware configuration is as follows: an AMD Ryzen 7 6800HS Creator Edition octa-core processor, 16 GB RAM, a SAMSUNG MZVL2512HCJQ-00BL2 512 GB solid-state drive, and an NVIDIA GeForce RTX 3050 Laptop GPU (4 GB dedicated video memory, with shared video memory enabled to prevent out-of-memory errors caused by the YOLOv26m model). The software environment consists of Windows 10 Professional 22H2, Python 3.11.15, and PyTorch 2.5.0 (CUDA 11.8).
Training hyperparameters were set uniformly based on the default configuration of the Ultralytics YOLO framework. The details are as follows: The task type is object detection, the input image size is 640 × 640, the batch size is 8, and the number of training epochs is 30. The optimizer adopts SGD (automatically selected by ‘optimizer: auto’), with an initial learning rate of 0.01, a final learning rate factor of 0.01, a momentum parameter of 0.937, a weight decay coefficient of 0.0005, and warm-up epochs of 3. Data augmentation strategies include random hue/saturation/brightness adjustment (hsv_h = 0.015, hsv_s = 0.7, hsv_v = 0.4), random translation (translate = 0.1), random scaling (scale = 0.5), horizontal flipping (fliplr = 0.5), and mosaic augmentation (mosaic = 1.0). The RandAugment automatic augmentation strategy (auto_augment = randaugment) is adopted, with a random erasure probability of 0.4. Mixed precision (amp = True) is enabled during training to accelerate training, and the random seed (seed = 0) and deterministic algorithm (deterministic = True) are fixed to ensure experimental reproducibility. All models use exactly the same dataset split and training configuration to ensure the fairness of the comparison.
The self-built down dataset constructed in this study consists of 632 RGB images, which are randomly split into a training set and a validation set at a fixed ratio of 8:2 for model training and quantitative performance evaluation. To eliminate potential data leakage and ensure the credibility and stability of comparative experimental results, dataset division was strictly implemented based on the actual sampling batches of samples. All images collected from the same production batch were classified into the same data subset rather than being separately allocated to the training set and validation set, which effectively prevents information crossover among different data subsets.
3.2. Evaluation Metrics
To comprehensively measure the performance of each YOLO model in the down feather detection task, this paper selects multiple widely used evaluation metrics in the field of object detection to conduct a comprehensive evaluation from the dimensions of detection accuracy, localization precision and model complexity. Specifically, Precision and Recall are adopted to assess the detection accuracy of the model and its ability to cover positive samples. Precision represents the proportion of genuine down feather samples among those predicted as down feather targets by the model, reflecting the false detection rate of the model. Recall represents the proportion of real down feather targets correctly detected by the model, reflecting the missed detection rate of the model. In addition, mean Average Precision (mAP) is used as the core evaluation metric. Among them, mAP@0.5 denotes the mean average precision when the Intersection over Union (IoU) threshold is 0.5, which measures the detection performance of the model under loose localization requirements. mAP@0.5:0.95 is the average of mAP values calculated at multiple IoU thresholds from 0.5 to 0.95 with a step of 0.05, which can more strictly evaluate the localization precision of the model. All metrics are calculated on the validation set. The comprehensive comparison of the above metrics can objectively reflect the accuracy and robustness of different models in the down feather detection task.
3.3. Experimental Results
To systematically evaluate the performance of different YOLO versions on the down feather detection task, this study selected three YOLO series: YOLOv8, YOLO11, and YOLO26. Each series includes two model scales, namely nano (n) and medium (m), resulting in a total of six models. All models were trained under completely identical hardware–software environments and hyperparameter settings, and evaluated with a unified validation set. The evaluation metrics cover detection accuracy (Precision, Recall, mAP@50, mAP@50-95) and model complexity (number of parameters, GFLOPs). A comparative analysis is presented below from three aspects: overall performance, accuracy–efficiency trade-off, and scenario applicability.
The final performance of the six models on the validation set is shown in
Table 2. Among them, the parameters and GFLOPs data are obtained from the official Ultralytics documentation and public benchmarks, and the accuracy metrics are all extracted from the validation results of the 30th epoch (the final epoch).
In terms of detection accuracy, YOLOv11n achieves the best comprehensive performance, with mAP@50 of 0.99416, Precision of 0.99544 and Recall of 0.99722, all of which are higher than those of other models. YOLOv8n follows closely, with mAP@50 of 0.99308, showing a very small gap from YOLOv11n. In terms of localization accuracy, the mAP@50-95 values of YOLOv11m and YOLOv8m are 0.64597 and 0.64556, respectively, which are significantly higher than those of the nano versions in the same series (approximately 0.634). This indicates that the medium-scale models perform more precise bounding box regression for down feather targets, but such a minor improvement (about 1.8%) usually does not constitute a decisive advantage in practical applications. The YOLOv26 series exhibits relatively weak performance. The mAP@50 of YOLOv26n is only 0.98556, which is approximately 0.0075 lower than that of YOLOv8n, and YOLOv26m also fails to outperform YOLOv8m or YOLOv11m.
In terms of model complexity, the nano-scale models have only 2.4~3.0 M parameters and 5.4~8.1 G GFLOPs, while the medium-scale models have approximately 20~26 M parameters and 67~79 G GFLOPs, with a difference of one order of magnitude in computational cost. YOLOv11n achieves the highest accuracy with the smallest number of parameters (2.58 M) and relatively low GFLOPs (6.3 G), demonstrating its superior architectural design.
Figure 4 shows the variation curves of mAP@50 on the validation set for each model during the 30 training epochs. It can be seen that the mAP@50 of all models rises rapidly within the first 15 epochs and then enters a stable phase. YOLOv11n and YOLOv8n have the fastest convergence speed and approach their final values around the 7th epoch; by contrast, YOLOv26n converges relatively slowly, and its final value is significantly lower than that of the former two. In terms of curve smoothness, YOLOv11n presents the smallest fluctuation, indicating a stable training process. YOLOv26m drops obviously at the 3rd epoch and then climbs up again, which may be related to the adaptability of the new training strategy on the private dataset.
Figure 5 presents the training curves of the stricter metric mAP@50-95. Since this metric has higher requirements for bounding box localization accuracy, the values of all models are lower than those of mAP@50. Notably, the final values of YOLOv11m and YOLOv8m are slightly higher than those of the nano versions (0.64597 and 0.64556 vs. 0.63464 and 0.63442, respectively), but the improvement is less than 2%. YOLOv11n still reaches 0.63464, with a negligible gap from the medium-scale models. Given that its computational cost is only one-tenth of the latter, this trade-off is acceptable.
Figure 6 shows the scatter distribution of precision and recall for each model. An ideal model should be located at the top-right corner of the figure (high precision and high recall). YOLOv11n is clearly positioned at the top-rightmost location (Precision = 0.99544, Recall = 0.99722) and lies above the Precision = Recall reference line, indicating that it maintains an extremely high recall rate with a very low false detection rate. YOLOv11m and YOLOv8n shift slightly toward the bottom-left, while YOLOv26n and YOLOv26m deviate significantly from the optimal region, with recall rates below 0.98. This figure intuitively verifies the superiority of YOLOv11n in the down feather detection task.
To explain the performance differences from the perspective of feature extraction,
Figure 7 visualizes the first 32 channel feature maps output by the last layer (SPPF) of the Backbone for YOLOv8n, YOLOv11n, and YOLOv26n. The comparison shows that the feature maps of YOLOv11n exhibit richer activation regions, with multiple channels responding clearly to the edges, textures, and overall contours of down feather targets. YOLOv8n shows moderate activation. In contrast, the feature maps of YOLOv26n are relatively sparse, with some channels showing almost no effective activation. This difference indicates that YOLOv11n can learn more discriminative features of down feathers, thus achieving higher recall and precision. By simplifying the feature extraction structure in pursuit of end-to-end inference efficiency, YOLOv26n sacrifices accuracy on fine-grained targets such as down feathers.
Figure 8 shows the detection results of YOLOv11n on the validation set images. It can be seen that the model successfully locates and classifies multiple down feather targets in the images, including mature down clusters, immature down clusters, feathers, yellow roots and a small amount of down filaments, with no missed or false detections. The above qualitative results are highly consistent with the quantitative indicators, demonstrating that YOLOv11n has the ability to stably perform down feather detection in practical industrial scenarios.
4. Down Quality Grading Method Based on YOLOv11n
The previous section elaborated on the real-time detection performance of the YOLOv11n model for down impurities (mature down, immature down, down filaments, feathers, and yellow feathers). However, the ultimate goal of industrial quality inspection is not merely to identify impurity categories, but to convert detection results into down quality grades that conform to industrial standards (such as 85 down, 90 down, and 95 down). Given the significant differences in density and individual mass among the five types of targets, it is impossible to estimate the down content directly using the detection count or bounding box area. To this end, this chapter proposes mapping the instance counts output by YOLO in real time to the estimated down content through offline calibration of the mass of each category, thereby realizing automatic grading of down quality. This method has extremely low computational cost, is suitable for real-time deployment on production lines, and is directly compatible with national standards.
Meanwhile, this study comprises phased basic research, which mainly aims to construct a complete and feasible methodological framework for intelligent detection and grading of down. Down manufacturers differ significantly in raw material sources, production batches, processing technologies and on-site environments, so the calibrated average weight of various down components cannot be universally applied. Therefore, the accurate weight calibration parameters for each category need to be independently measured, calibrated and adaptively adjusted by front-line factories according to their own production conditions and raw material characteristics. Restricted by the current experimental conditions and research progress, the large-scale industrial promotion, full-scenario adaptation and practical application of this grading scheme still require long-term optimization and verification in subsequent work. On this basis, we only provide valuable theoretical references and feasible innovative ideas for the intelligent transformation and standardized development of the down quality inspection industry.
4.1. Problem Definition
After a sample is detected in real time by YOLOv11n, the instance counts of the five types of targets are obtained as follows:
: Number of mature down (complete down clusters); : Number of immature down; : Number of down filaments; : Number of feathers; : Number of yellow tails.
The total number of instances is shown in Equation (1):
Through offline calibration, the average weight of each type of target is set as follows:
: Average weight of a single mature down; : Average weight of a single immature down; : Average weight of a single down filament; : Average weight of a single feather; : Average weight of a single yellow tail.
4.2. Offline Quality Calibration Scheme for Real-World Scenarios
For the down raw materials of the current production batch, 3–5 groups of standard samples of 5 g/10 g are randomly selected to ensure sample representativeness. On a glass workbench, inspectors use tweezers to manually sort the samples into five categories: mature down clusters, immature down clusters, down filaments, feathers, and yellow-tail down, with no mixing allowed. The quantity of each category is counted separately, and the total mass of each category is weighed using an electronic balance with an accuracy of 0.001 g. The average individual weight of each category is calculated by dividing the total mass by the corresponding quantity, which is defined as the calibration parameter . By inputting the calibration parameters of the current batch into the detection system, automatic grading of all samples in the same batch can be realized. This calibration procedure only needs to be reperformed once when the raw material batch is updated. A single calibration is applicable to the entire production batch, which balances detection accuracy and factory production efficiency.
4.3. Down Quality Classification and Grading Standards
Through offline calibration experiments, the average mass of each target category is obtained as
. For a sample to be inspected, the quantity of each category detected by the YOLOv11n model is
, and the estimated down content is shown in Equation (2). To achieve higher-standard down detection and recognition, immature down, down filaments, feathers and yellow tails are all regarded as impurities.
where
traverses five categories: mature down, immature down, down filaments, feathers, and yellow tails.
In accordance with the national standard GB/T 17685-2016 and industrial practices, the down content increases in steps of 5%. In this study, the estimated down content is mapped into three grades, as shown in
Table 3:
This study intends to construct the following down quality grading process: A fixed camera is deployed on the production line to collect real-time video streams of spread samples. The YOLO model detects five types of targets (mature down, immature down, down filaments, feathers, and yellow tails) frame by frame and counts the instance number of each category. Subsequently, based on the average mass data of each category obtained through offline calibration, the estimated ratio of down mass to total mass, namely the estimated down content , is calculated.
Finally, the grade (85% down, 90% down, or 95% down) is automatically determined according to the preset threshold intervals (e.g., 85–90%, 90–95%, ≥95%). The core advantages of this method are as follows: it requires no manual intervention and avoids subjective errors; the computation relies only on counting and offline preset average weights, making it lightweight and efficient; the estimation results have clear physical meaning (directly corresponding to mass proportion) and are compatible with industrial standards. This theoretical framework provides a clear algorithmic path and validation basis for the subsequent realization of automated down grading on actual production lines.
5. Summary
This study constructs for the first time a down image dataset covering five categories of impurities: mature down, immature down, down filaments, feathers, and yellow tails. It systematically compares the performance of six models for down impurity detection, namely YOLOv8(n,m), YOLOv11(n,m), and YOLOv26(n,m). Experiments reveal that newer models do not necessarily yield better performance—YOLOv26 is overall inferior to YOLOv11 and YOLOv8, providing an important reference for model selection in this field. On this basis, this paper proposes an automated down quality grading scheme: by calibrating the average mass of each category offline, the instance counts from real-time YOLO detection are converted into an estimated down content, which is then classified into 85% down, 90% down, and 95% down grades in accordance with national standards.
This study carries out methodological innovation and verification for intelligent detection and quality grading of down. Although an automated grading framework and idea applicable to factory scenarios have been constructed, there are still the following limitations:
Lack of empirical verification for quality grading: This paper only proposes the methodological framework and calculation formulas for down grading, without conducting large-scale comparative experiments based on the national standard GB/T 17685-2016 and manual detection results from certified inspectors. Quantitative verification indicators such as grading confusion matrix, grade accuracy, judgment consistency rate and error analysis are absent, which weakens the persuasiveness of the grading conclusions.
Imperfect quality calibration system: This research only clarifies the technical idea of offline quality calibration. Due to great differences in raw material sources, production batches, processing techniques and on-site environments among different down manufacturers, the calibrated average weight of various down components cannot be universally applied. Specific calibration values, calibration sample sizes and standardized calibration procedures are not provided. Meanwhile, the mass difference among samples and the uncertainty propagation of grading results are not analyzed, so the unified quality standard cannot adapt to the raw material characteristics of different manufacturers.
Unmeasured real-time deployment performance: This study focuses on verifying model detection accuracy and training efficiency, while hardware deployment tests are not carried out, and real-time indicators such as inference latency and FPS are not counted, resulting in insufficient support for the real-time performance of the method in industrial application.
This study provides innovative ideas for the intelligent quality inspection of down. The above deficiencies will be gradually improved in future research through batch calibration experiments, national standard comparative tests, expansion of external validation datasets, and actual hardware deployment tests.