Next Article in Journal
GDFC-YOLO: An Efficient Perception Detection Model for Precise Wheat Disease Recognition
Previous Article in Journal
Field Validation of the DNDC-Rice Model for Crop Yield, Nitrous Oxide Emissions and Carbon Sequestration in a Soybean System with Rye Cover Crop Management
 
 
Article
Peer-Review Record

A Detection Line Counting Method Based on Multi-Target Detection and Tracking for Precision Rearing and High-Quality Breeding of Young Silkworm (Bombyx mori)

Agriculture 2025, 15(14), 1524; https://doi.org/10.3390/agriculture15141524
by Zhenghao Li 1,2, Hao Chang 1,2, Mingrui Shang 1,2, Zhanhua Song 1,2,3, Fuyang Tian 1,2,3, Fade Li 1,2,4, Guizheng Zhang 5, Tingju Sun 6, Yinfa Yan 1,2,7,* and Mochen Liu 1,2,7,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Agriculture 2025, 15(14), 1524; https://doi.org/10.3390/agriculture15141524
Submission received: 9 May 2025 / Revised: 7 July 2025 / Accepted: 8 July 2025 / Published: 15 July 2025
(This article belongs to the Section Farm Animal Production)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The study introduces a promising YSC method for silkworm counting; however, several key issues need correction. The exclusion of first-instar silkworms weakens the method’s robustness evaluation, and the claim of universality is overstated given the limited variety testing and omission of complex spot patterns. The discussion lacks detailed error analysis (false positives, ID switches) and does not clearly distinguish between detection accuracy and counting performance. Additionally, no comparative evaluation with baseline methods under identical conditions is provided. Addressing these points is essential to strengthen the validity and applicability of the proposed approach.

Author Response

Comments 1: The study introduces a promising YSC method for silkworm counting; however, several key issues need correction. The exclusion of first-instar silkworms weakens the method’s robustness evaluation, and the claim of universality is overstated given the limited variety testing and omission of complex spot patterns.

Response 1: Dear reviewer, your professional review has provided significant assistance in enhancing the rigor and accuracy of the paper's presentation. In response to the issues you have pointed out, specific explanations are provided as follows:

(1) We fully acknowledge the reviewer's concern regarding validation across all instar stages for the young silkworm counting model. However, the biological and management differences between 1st instar silkworms and 2nd-3rd instar larvae justify their exclusion in this study:

â‘  During the rearing process of the 1st instar young silkworms, the setae dispersion rate is mainly counted, without considering factors such as uniformity, health status, and feed intake. Moreover, the amount of feed fed to the 1st instar silkworms is a fixed value, which has nothing to do with their quantity;

â‘¡ Before setae dispersion, their bodies are black and densely covered with setae. After setae dispersion, their heads and thoraxes remain black, but their bodies shed the setae and turn white. Due to the great difference between the 1st instar silkworms before and after setae dispersion, the judgment of the setae dispersion rate is relatively simple. In addition, the 1st instar young silkworms have weak movement ability, so only regional statistics are needed.

In conclusion, the absence of the 1st instar silkworms does not affect the evaluation of the model's robustness in the target instars (2nd-3rd instars). Supplementary explanations have been added in Section 4 (in Line 520-530, Page 17-18).

…Given that the dataset collected and the subjects tested in this paper are exclusively second-instar young silkworms, first-instar and third-instar young silkworms have not been involved. Among them, the biological characteristics and feeding management of first-instar young silkworms are quite different from those of 2nd-3rd instar young silkworms. The primary focus of observation is the setae dispersion rate, without considering factors such as uniformity, health status, and feed intake. Moreover, the amount of feed fed to first-instar silkworms is a fixed value, which is irrelevant to their quantity, so the significance of counting is relatively small. In addition, their characteristics differ greatly before and after setae dispersion, and their movement ability is weak, making them suitable for traditional static detection schemes. Therefore, they have not been considered for inclusion…

(2) The core reason why complex spotted silkworm varieties are not included in the current study is that the actual breeding scale and market share of such varieties are extremely low. Specifically, complex spotted varieties are special types in silkworm genetic breeding; their breeding areas are concentrated in the experimental bases of a few scientific research institutions and have not been applied to large-scale and commercial artificial feed co-rearing of young silkworms. Since the core goal of this study is to solve the problem of automatic counting of young silkworms under the mode of artificial feed co-rearing of young silkworms, these varieties are not included in the dataset.

We have strictly restricted the expression of universality in Sections 4 (in Line 541-544, Page 18) and Sections 5 (in Line 575-578, Page 19):

…In addition, to test the universality of this method, this paper also conducted a comparative experiment on the counting effect of this method for the light-spotted and common-spotted varieties, namely Jinxiu No. 2 and Huakang No. 3, which are widely applicable to commercial artificial feed rearing and have a large-scale distribution…

…Finally, the generalization of this method was verified, which proves that this method also has a good counting accuracy for other light-spotted and common-spotted young silkworm varieties of commercial artificial feed rearing with a large amount of seed distribution…

Comments 2: The discussion lacks detailed error analysis (false positives, ID switches) and does not clearly distinguish between detection accuracy and counting performance.

Response 2: Dear reviewer, thank you for your meticulous review. We have included the FP and FN indicators in Table 4 (in Line 463, Page 16) in Section 3.2, and the revised Table 4 is as follows:

Table 4. Evaluation results of different target tracking algorithms.

Algorithms

MOTA/%(↑)

IDF1/%(↑)

HOTA/%(↑)

FP(↓)

FN(↓)

IDS(↓)

FPS(↑)

SORT

81.6

83.0

67.2

554

77

55

51

DeepSORT

84.7

85.5

71.9

301

59

39

33

ByteTrack

85.6

87.9

74.3

224

37

33

38

Our method

88.3

90.2

78.1

113

21

18

29

 

    We have also supplemented the analysis of the tracking performance of different tracking methods in Section 3.2 (in Line 440-449, Page 15), as follows:

…The evaluation results are shown in Table 4. First, tests were conducted on SORT, DeepSORT, and ByteTrack. The ByteTrack method achieved 85.6%, 87.9%, and 74.3% in terms of MOTA, IDF1, and HOTA, respectively, and 33, 224, and 37 in terms of IDS, FP, and FN, respectively. Compared with the SORT and DeepSORT methods, it increased by 0.9% and 4% in MOTA, 2.4% and 4.9% in IDF1, 2.4% and 7.1% in HOTA, reduced by 6 and 22 in IDS, and also showed a significant decrease in FP and FN. This proves that ByteTrack, a multi-target tracking method not based on appearance matching, is more suitable for scenarios with targets of high similarity and frequent occlusions. The method in this paper achieved 88.3% in MOTA, 90.2% in IDF1, 78.1% in HOTA, and only 18, 113, and 21 in IDS, FP, and FN, respectively, all showing certain improvements compared with ByteTrack…

    Regarding detection accuracy, we adopted P, R, F1, and AP values, and compared the improved YOLOX method with mainstream methods such as YOLOX, YOLOv5, and YOLOv4. It can be seen that the optimized YOLOX method in this paper achieved the optimal values in all evaluation indicators, which proves that the method in this paper has good detection accuracy in the scenario of young silkworm counting. For counting performance, we ultimately evaluate it based on the level of counting accuracy. The number of detected young silkworms and the maximum ID value in Table 6 further compare our improved detection-tracking method with the baseline method, verifying the effect of this method in the scenario of young silkworm counting. In addition, we have revised the text by relocating the counting-related evaluation metrics to Section 2.4 (in Line 343-346, Page 12) to make the logic clearer.

…We used counting accuracy to evaluate the final performance of young silkworm counting, and its formula is shown in Eq. (12).

    In the equation, Numc represents the detected number of young silkworms, Numt rep-resents the true number of young silkworms, Ac represents the counting accuracy…

Comments 3: Additionally, no comparative evaluation with baseline methods under identical conditions is provided.

Response 3: Dear reviewer, thank you very much for pointing out the issues, addressing these points is essential to strengthen the validity and applicability of the proposed approach.

The dataset construction, model improvement, training, and testing for this research paper were conducted from November 2023 to January 2024, while the manuscript writing took place from February 2024 to September 2024. At that time, YOLOv9 and subsequent versions had not yet been released. However, your suggestion is very reasonable.

Regarding the baseline issue, to compare with the improved YOLOX-S in this paper, we used the unimproved YOLOX-S, YOLOv5s, and YOLOv4 for comparison. In addition, we supplemented YOLOv8s, YOLOv10s, and YOLO11s for silkworm detection, with corresponding supplements made in Table 3 (in Line 408, Page 14) and Figure 10 (in Line 381-393, Page 13-14) of Section 3.1 in the manuscript.

Table 3. Evaluation results of different target detection algorithms.

Algorithms

F1/%(↑)

Recall/%(↑)

Precision/%(↑)

AP50/% (↑)

AP50:95/% (↑)

YOLO v4

74.2

71.8

76.7

72.5%

55.3

YOLO v5

85.2

84.9

85.5

85.1

66.5

YOLOX-S

87.2

86.4

88.1

87.4

69.6

Our method

89.6

87.9

91.3

89.1

72.7

YOLOv8s

85.8

85.4

86.2

85.7

68.1

YOLOv10s

87.9

86.5

89.4

88.2

70.2

YOLOv11s

88.1

87.1

89.2

89.0

71.5

 

Figure 10. Visual comparison of young silkworm detection results by different detectors. In the figure, yellow circles indicate false detections, and green circles indicate missed detections. (a) The original image, where occlusion exists between ID 5 and 6, 11 and 12, 15 and 14, 16, as well as 23 and 24. There is also occlusion between ID 26 and the feed. ID 3, 7, and 21 are located at the edges of the image, with only a majority of their silkworm bodies visible; (b) The detection outcomes of YOLOv4, which includes four false positives and three false negatives; (c) The detection results of YOLOv5s, showcasing two false positives and three false negatives; (d) The detection outcomes of YOLOX-S, with two false positives and one false negative observed; (e) The detection results of our method, which exhibits no false negatives; however, there is one instance of a false positive where the height of the bounding box at target 12 is notably larger than the dimensions of the detected object; (f) The detection results of YOLOv8s showed two false positives and two false negatives; (g) The detection results of YOLOv10s, which exhibits two false positives ;(h) The detection results of YOLO11s also showed two false positives…

      We have added the following content in Section 3.1 (in Line 355-357, Page 12):

…We validated the detection performance of our method for young silkworms by comparing it with YOLOv4, YOLOv5s, YOLOX-S, YOLOv8s, YOLOv10s, and YOLO11s models…

        We have supplemented and revised the content in Section 3.1 (in Line 396-407, Page 14) as follows:

…We compared YOLOX-S with YOLOv4, YOLOv5, and YOLOv8. The YOLOX-S algorithm achieved 87.2%, 86.4%, 88.1%, 86.4%, and 69.6% for F1-score, recall, precision, AP50, and AP50:95, respectively, with all indicators superior to those of YOLOv4, YOLOv5, and YOLOv8. This result validates the approach of building upon the YOLOX-S algorithm (selected as the baseline in this study). Next, we evaluated the improved YOLOX-S. Compared with YOLOX-S, our method showed an increase by 2.4% in F1 score, 1.5% in recall, 3.2% in precision, and 1.7% and 3.1% in AP50 and AP50:95, respectively. For YOLOv10 and YOLOv11, their performance surpasses that of YOLOX-S. Notably, YOLOv10, as a recent YOLO variant, demonstrates significant improvements over YOLOv8 across all metrics, with its precision ranking second only to our method. YOLOv11 ranks first among the tested models in F1 score, recall, AP50, and AP50:95, trailing only behind our proposed approach…

        We have added the following content to Section 3.1 (in Lines 364-373, Page 12):

…In contrast, YOLOv4 and YOLOv5 had five or more false negatives and false positives. YOLOv8 performed slightly better than YOLOv5, with two false positives and two false negatives, while YOLOX-S had two false positives and one false negative. The false negatives and false positives primarily occurred in scenarios such as overlapping of multiple young silkworms, feed occlusion, and image edges. This indicates that morphological changes in young silkworms resulting from occlusion and other factors are the primary causes of detection errors. YOLOv10s, YOLOv11s, and our method achieved zero false negatives; however, YOLOv10s and YOLOv11s each had two false positives, whereas our method exhibited only one instance of bounding box size mismatch, demonstrating superior detection performance…

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Abbreviations such as EMA, MSFF, CSP, YOLOX-S, and MOTA are used without definition at first mention. Ensure all are clearly introduced to aid readers unfamiliar with the domain. While the background on silkworm rearing is thorough, explicitly state the technological gap and novelty of your approach earlier in the introduction. Related work is scattered within the introduction. Consider separating it into a dedicated section summarizing past silkworm counting methods and recent deep learning techniques.

Some good work could be discussed in introduction like a) MvMRL: a multi-view molecular representation learning method for molecular property prediction b) Rumen-protected lysine supplementation improved amino acid balance, nitrogen utilization and altered hindgut microbiota of dairy cows 

Figures showing workflows (e.g., Figs. 3, 6, 8) should clearly differentiate new contributions from reused components (e.g., ByteTrack baseline). Use colored boxes or labels. Figures (e.g., Figs. 3–10) should have higher resolution and more descriptive captions. Ensure each figure is interpretable standalone. Equations (e.g., Eq. 1 and 2–11) introduce variables without clear inline explanation. Add a “Notations” subsection or ensure every variable is defined after use. Use consistent fonts, spacing, and numbering. Eq. (1) lacks clarity in notation (ρ(c_l, c_h)) and should be rewritten for readability.

The use of YOLOX-S as a baseline is valid, but you should justify the choice over more recent models like YOLOv7 or YOLOv8. Some good work could be discussed in literature a) Highlight Removal from A Single Grayscale Image Using Attentive GAN b) A novel embedded cross framework for high-resolution salient object detection. c) Lightweight marine biodetection model based on improved YOLOv10 

Parameter settings for training (e.g., learning rate, decay, batch size) should be briefly justified rather than just listed. Report variance, standard deviation, or confidence intervals in key metrics such as AP, MOTA, and counting accuracy to assess robustness. The edge-related counting error is acknowledged but not quantified. Include analysis of detection accuracy near image boundaries. While several algorithms are compared, include recent lightweight or Transformer-based object detectors or trackers as additional baselines.

The discussion mentions missing first-instar silkworm data. Expand this section to include limitations such as reliance on hardware (camera movement, lighting, etc.). The tracking/counting framework may apply to other insect or animal farming contexts. Briefly mention this in the conclusion for broader impact. 

Comments on the Quality of English Language

Several minor grammatical and typographical errors exist (e.g., missing spaces, punctuation, capitalization inconsistencies). A thorough proofreading is needed.

Author Response

Comments 1: Abbreviations such as EMA, MSFF, CSP, YOLOX-S, and MOTA are used without definition at first mention. Ensure all are clearly introduced to aid readers unfamiliar with the domain.

Response 1: Dear reviewer, your professional review has provided significant assistance in enhancing the rigor and accuracy of the paper's presentation. In response to the issues you have pointed out, specific explanations are provided as follows:

We have appended Supplementary Material A, containing the complete list of abbreviations and acronyms employed throughout the manuscript.

Table A. Nomenclature.

A(c)

The association accuracy

Ac

The counting accuracy

AF-RCNN

Anchor-Free Region-based Convolutional Neural Network

ah

The width of the high-score detection boxes

al

The width of the low-score detection boxes

AP50

The AP value when the IoU threshold is 50

AP50:95

The average AP value under different IoU thresholds ranging from 50 to 95 at intervals of 5

AssA

Evaluate association accuracy

at

The width of the smallest detection rectangle encompassing the high and low score detection boxes

bh

The height of the high-score detection boxes

bl

The height of the low-score detection boxes

bt

The height of the smallest detection rectangle encompassing the high and low score detection boxes

C

A point that belongs to the set of True Positives (TP)

CAN

Context-Aware Network

CS-YOLOX-Byte

Counting of Silkworm-YOLOX Fusion ByteTrack

CSP

Cross Stage Partial

DeepSORT

Simple online and realtime tracking with a deep association metric

DetA

Assess detection accuracy

EMA

Efficient Multi-scale Attention

F1

Harmonic mean of precision and recall

FN

False Negative

FNA(c)

The association prediction accuracy for unpredicted trajectories

FNt

The number of missed detections in the t-th frame

FP

False Positive

FPA(c)

The false association precision of trajectory predictions

FPS

Frames Per Second

FPt

The number of false detections in the t-th frame

frame

The total count of frames

GTt

The total number of tracked objects in the t-th frame

HOTA

High Order Tracking Accuracy

IDF1

Identification Average Rate

IDFN

False negative IDs

IDFP

False positive IDs

IDs

Identity switches

IDSWt

The number of ID switches for tracked objects within the t-th frame

IDTP

True positive IDs

IoU

Intersection over Union

IPFC

In-field Pest in Food Crop

mAP

mean Average Precision

MOTA

Multiple Object Tracking Accuracy

MSFF

Multi-Scale Feature Fusion

Numc

The detected number of young silkworms

Numt

The true number of young silkworms

P

Precision

PAFPN

Path Aggregation Feature Pyramid Network

R

Recall

R-CNN

Regions with CNN features

RCS-OSA

Reparametrized Convolution based on channel Shuffle - One - Shot Aggregation

SORT

Simple Online and Realtime Tracking

timeelapsed

The duration of continuous operation

TN

True Negative

TP

True Positive

TPA(c)

The accuracy of correctly associated detections

α

A hyperparameter representing the influence of the distance between the centre points

β

A hyperparameter representing the influence of the difference in width and height

ρ(cl, ch)

The Euclidean distance between the centre points of low and high score box

Comments 2: While the background on silkworm rearing is thorough, explicitly state the technological gap and novelty of your approach earlier in the introduction. Related work is scattered within the introduction. Consider separating it into a dedicated section summarizing past silkworm counting methods and recent deep learning techniques. Some good work could be discussed in introduction like a) MvMRL: a multi-view molecular representation learning method for molecular property prediction b) Rumen-protected lysine supplementation improved amino acid balance, nitrogen utilization and altered hindgut microbiota of dairy cows.

Response 2: Dear reviewer, thank you for your professional evaluation. In response to the issues you have pointed out, we provide the following specific explanations:

We have supplemented the novelty and technical gaps regarding the young silkworm counting method in Section 1 (in Line 71-76, Page 2; Line 113-117, Page 3):

…Accurate counting of young silkworms provides a basis for determining artificial feed quantities, enabling precise feeding and laying a foundation for quantifying key indicators such as uniformity, incidence rate, and feed intake. Although artificial feed co-rearing technology has significantly enhanced sericulture efficiency, existing counting methods remain heavily reliant on manual labor, and current algorithms exhibit significant technical limitations in accuracy and real-time performance…

…Compared with traditional static detection methods, the objective of this study is to propose a young silkworm counting framework based on dynamic video tracking and counting. Through the collaborative optimization of target detection and multi-target tracking, it breaks through the technical bottleneck of young silkworm counting in dense scenarios and provides a new solution for intelligent silkworm rearing management…

In addition, we have also integrated the relevant technologies in section 1 (in Line 77-112, Page 2-3) and supplemented the literature you suggested:

…Currently, young silkworm counting faces three core challenges: 1. The variable morphology, dense accumulation, and occlusion by artificial feed result in low recognition rates for traditional machine vision methods; 2. Static image detection technology is difficult to apply directly to high-density rearing scenarios; 3. Existing agricultural counting studies predominantly focus on macro targets (e.g., livestock), while dynamic tracking of tiny insects remains under-explored.

Although deep learning-based computer vision has advanced insect detection, most methods are tailored for static images. For example: Suo et al. used the Otsu threshold method to segment aphid images, followed by edge detection for feature extraction, achieving accurate counting on yellow sticky boards in greenhouse and outdoor environments [14]; Li et al. customized anchor box areas and added bilinear interpolation during sampling in Faster R-CNN [15], obtaining an F1 score of 0.944 and mAP of 0.952 on yellow sticky insect boards [16]. Wang et al. proposed DeepPest, where a Context-Aware Network (CAN) was pre-trained on the IPFC dataset to capture contextual information, and multi-projection convolution blocks with attention mechanisms were integrated into ResNet-50 [17], yielding an mAP of 73.9% [18]. Jiao et al. combined anchor-free RCNN (AF-RCNN) with Faster R-CNN on the Pest24 dataset, achieving an mAP of 56.4% and Recall of 85.1% [19]. Ma et al. proposed a silkworm counting method by introducing the RCS-OSA residual module into YOLOv8n, replacing the detection head with a dynamic prediction head, and optimizing the loss function [20]. While effective for fewer than 80 silkworms, it struggles to scale to entire rearing trays in co-rearing scenarios. However, these methods are designed for static image scenarios and lack dynamic tracking capabilities, making them unsuitable for video-based counting.

In addition, cross-domain research also provides methodological references for this paper. For example, the MvMRL method proposed by Zhang et al. provides key ideas for integrating multi-source features to cope with dense, dynamic, and morphologically variable scenarios in the young silkworm counting task through its multi-view fusion logic [21]; the systematic analysis and multi-dimensional verification ideas proposed by Wei et al. can provide methodological references for the parameter optimization, scenario adaptation, and performance evaluation of the method in this paper, helping to improve the reliability and applicability of the method in actual breeding scenarios [22]; Xu et al. proposed an Attentive GAN method, in which the target region is accurately located by adding an attention mechanism, significantly improving the detection effect [23]; Wang et al. proposed an embedded cross framework ECF-DT, which achieves accurate detection of small targets through high-resolution feature processing, multi-scale information fusion, and complex background suppression [24]…

We have added the cited references in the Reference section (in Line 646-654, Page 21):

21. Zhang, R., Lin, Y., Wu, Y., Deng, L., Zhang, H., Liao, M., Peng, Y. MvMRL: a multi-view molecular representation learning method for molecular property prediction. Briefing in Bioinformatics 2024, 25(4). https://doi.org/ 10.1093/bib/bbae298.

22. Wei, X., Wu, H., Wang, Z., Zhu, J., Wang, W., Wang, J., Wang, Y., Wang, C. Rumen-protected lysine supplementation im-proved amino acid balance, nitrogen utilization and altered hindgut microbiota of dairy cows. Animal Nutrition 2023, 15, 320-331. https://doi.org/10.1016/j.aninu.2023.08.001.

23. Xu, H., Li, Q., Chen, J. Highlight Removal from A Single Grayscale Image Using Attentive GAN. APPLIED ARTIFICIAL INTELLIGENCE 2022, 36(1). https://doi.org/10.1080/08839514.2021.1988441.

24. Wang, B., Yang, M., Cao, P., Liu, Y. A novel embedded cross framework for high-resolution salient object detection. Applied Intelligence 2025, 55: 277. https://doi.org/10.1007/s10489-024-06073-x.

Comments 3: Figures showing workflows (e.g., Figs. 3, 6, 8) should clearly differentiate new contributions from reused components (e.g., ByteTrack baseline). Use colored boxes or labels. Figures (e.g., Figs. 3–10) should have higher resolution and more descriptive captions. Ensure each figure is interpretable standalone.

Response 3: Dear reviewer, your professional review has provided significant assistance in enhancing the rigor of the paper's presentation. In response to the issues you have pointed out, specific explanations are provided as follows:

We have revised Figure 3 (in Line 179-180, Page 6) and Figure 8 (in Line 279-280, Page 9), with the newly contributed regions enclosed by red dashed boxes.

Figure 3. System process diagram.

Figure 8. Improved ByteTrack workflow.

We have renamed Figure 10 in Section 3.1 (in Line 381, Page 13) to "Visual Comparison of Young Silkworm Detection Results by Different Detectors", Figure 11 in Section 3.2 (in Line 438, Page 15) to "Visual Comparison of Silkworm Tracking Results by Different Trackers", and Table 6 in Section 3.4 (in Line 498-499, Page 17) to "Verification Results of Young Silkworm Counting Methods Based on Different Detection-Tracking Combination Approaches".

Comments 4: Equations (e.g., Eq. 1 and 2–11) introduce variables without clear inline explanation. Add a “Notations”subsection or ensure every variable is defined after use. Use consistent fonts, spacing, and numbering. Eq. (1) lacks clarity in notation (ρ(cl, ch)) and should be rewritten for readability.

Response 4: Dear reviewer, your professional review has provided significant assistance in enhancing the rigor and accuracy of the paper's presentation. In response to the issues you have pointed out, specific explanations are provided as follows:

We have rechecked the formulas to ensure that each variable has a clear definition. In addition, we have adjusted the expression of the Euclidean distance in Eq. (1) to make it clearer. The revised Eq. (1) in Section 2.2.3 is as follows:

Furthermore, we have improved the explanation of the low-score bounding box screening threshold (T0.5) in Section 2.2.3 (in Line 255-257, Page 8).

…The threshold (T0.5) for low-scoring box selection is calculated using the formula provided, and a box is considered superior if T0.5>0.5, it meets the specified condition…

Comments 5: The use of YOLOX-S as a baseline is valid, but you should justify the choice over more recent models like YOLOv7 or YOLOv8. Some good work could be discussed in literature a) Highlight Removal from A Single Grayscale Image Using Attentive GAN b) A novel embedded cross framework for high-resolution salient object detection. c) Lightweight marine biodetection model based on improved YOLOv10.

Response 5: Dear reviewer, thank you for your careful review. Regarding the issues you mentioned, we hereby reply as follows:

The dataset construction, model improvement, training, and testing for this research paper were conducted from November 2023 to January 2024, while the manuscript writing took place from February 2024 to September 2024. At that time, YOLOv9 and subsequent versions had not yet been released. However, your suggestion is very reasonable. We have included YOLOv8s, YOLOv10s, and YOLO11s for silkworm detection, with corresponding supplements made in Table 3 (in Line 408, Page 14) and Figure 10 (in Line 381-393, Page 13-14) of Section 3.1 in the manuscript.

Table 3. Evaluation results of different target detection algorithms.

Algorithms

F1/%(↑)

Recall/%(↑)

Precision/%(↑)

AP50/% (↑)

AP50:95/% (↑)

YOLO v4

74.2

71.8

76.7

72.5%

55.3

YOLO v5

85.2

84.9

85.5

85.1

66.5

YOLOX-S

87.2

86.4

88.1

87.4

69.6

Our method

89.6

87.9

91.3

89.1

72.7

YOLOv8s

85.8

85.4

86.2

85.7

68.1

YOLOv10s

87.9

86.5

89.4

88.2

70.2

YOLOv11s

88.1

87.1

89.2

89.0

71.5

Figure 10. Visual comparison of young silkworm detection results by different detectors. In the figure, yellow circles indicate false detections, and green circles indicate missed detections. (a) The original image, where occlusion exists between ID 5 and 6, 11 and 12, 15 and 14, 16, as well as 23 and 24. There is also occlusion between ID 26 and the feed. ID 3, 7, and 21 are located at the edges of the image, with only a majority of their silkworm bodies visible; (b) The detection outcomes of YOLOv4, which includes four false positives and three false negatives; (c) The detection results of YOLOv5s, showcasing two false positives and three false negatives; (d) The detection outcomes of YOLOX-S, with two false positives and one false negative observed; (e) The detection results of our method, which exhibits no false negatives; however, there is one instance of a false positive where the height of the bounding box at target 12 is notably larger than the dimensions of the detected object; (f) The detection results of YOLOv8s showed two false positives and two false negatives; (g) The detection results of YOLOv10s, which exhibits two false positives; (h) The detection results of YOLO11s also showed two false positives…

We have added the following content in Section 3.1 (in Line 355-357, Page 12):

…We validated the detection performance of our method for young silkworms by comparing it with YOLOv4, YOLOv5s, YOLOX-S, YOLOv8s, YOLOv10s, and YOLO11s models…

We have supplemented and revised the content in Section 3.1 (in Line 396-407, Page 14) as follows:

…We compared YOLOX-S with YOLOv4, YOLOv5, and YOLOv8. The YOLOX-S algorithm achieved 87.2%, 86.4%, 88.1%, 86.4%, and 69.6% for F1-score, recall, precision, AP50, and AP50:95, respectively, with all indicators superior to those of YOLOv4, YOLOv5, and YOLOv8. This result validates the approach of building upon the YOLOX-S algorithm (selected as the baseline in this study). Next, we evaluated the improved YOLOX-S. Compared with YOLOX-S, our method showed an increase by 2.4% in F1 score, 1.5% in recall, 3.2% in precision, and 1.7% and 3.1% in AP50 and AP50:95, respectively. For YOLOv10 and YOLOv11, their performance surpasses that of YOLOX-S. Notably, YOLOv10, as a recent YOLO variant, demonstrates significant improvements over YOLOv8 across all metrics, with its precision ranking second only to our method. YOLOv11 ranks first among the tested models in F1 score, recall, AP50, and AP50:95, trailing only behind our proposed approach…

We have added the following content to Section 3.1 (in Lines 364-373, Page 12):

…In contrast, YOLOv4 and YOLOv5 had five or more false negatives and false positives. YOLOv8 performed slightly better than YOLOv5, with two false positives and two false negatives, while YOLOX-S had two false positives and one false negative. The false negatives and false positives primarily occurred in scenarios such as overlapping of multiple young silkworms, feed occlusion, and image edges. This indicates that morphological changes in young silkworms resulting from occlusion and other factors are the primary causes of detection errors. YOLOv10s, YOLOv11s, and our method achieved zero false negatives; however, YOLOv10s and YOLOv11s each had two false positives, whereas our method exhibited only one instance of bounding box size mismatch, demonstrating superior detection performance…

In addition, we have included the references you mentioned in Section 1 (in Line 107-112, Page 3), as shown in Response 2.

Comments 6: Parameter settings for training (e.g., learning rate, decay, batch size) should be briefly justified rather than just listed.

Response 6: Dear reviewer, thank you for your careful review, we have supplemented the descriptions of the training parameter settings in Sections 3.1(in Line 357-361, Page 12) and 3.2 (in Line 417-423, Page 14):

…For the scenarios of small targets and high occlusion of young silkworms, we trained for a total of 300 epochs to fully learn the target features, with a batch size of 16 to ensure a balance between gradient stability and data augmentation processing space. Meanwhile, to accelerate early convergence and suppress overfitting in complex scenarios, the initial learning rate was set to 0.01, the momentum to 0.9, and the weight decay to 0.0005…

…The tracker was trained using the SGD method. To ensure stable convergence in the initial stage of training, the learning rate was set to 0.01. To limit the model complexity and prevent overfitting, we set the training epochs to 100 and the training weight decay to 0.0005. Meanwhile, we set NMS to 0.7, and the confidence and IoU thresholds to 0.65 and 0.5, respectively, so as to adapt to the occlusion scenarios of high-density silkworm groups, maintain the balance of PR values, and effectively filter out most false detections while retaining true positive results…

Comments 7: Report variance, standard deviation, or confidence intervals in key metrics such as AP, MOTA, and counting accuracy to assess robustness.

Response 7: Dear reviewer, regarding the questions you raised, we repeated the detection and tracking algorithms 50 times. The resulting fluctuation of AP was less than 0.05%, and the fluctuation range of MOTA was also less than 0.1%, indicating that the algorithm has good stability. The fluctuations may be attributed to floating-point numbers and the randomness of the detection model.

There was indeed a certain fluctuation in counting. We also performed counting 50 times on the same video segment. The specific fluctuations mainly occurred in areas with dense occlusions. The maximum ID value fluctuation was controlled within 3%, and the detection quantity fluctuation was controlled within 1%, which also proves that our method has good stability.

Comments 8: The edge-related counting error is acknowledged but not quantified. Include analysis of detection accuracy near image boundaries.

Response 8: Dear reviewer, regarding the issue of counting errors at the edge of the frame that you pointed out, we hereby provide a specific response as follows:

Counting errors related to edges are mainly caused by repeated counting due to significant differences in target morphology when targets enter and exit. Specifically, they are associated with factors such as the different morphological orientations of targets entering the frame and occlusion conditions, which are difficult to directly quantify. Precisely for this reason, we considered implementing zoning to directly eliminate unstable factors and improve counting accuracy.

Comments 9: While several algorithms are compared, include recent lightweight or Transformer-based object detectors or trackers as additional baselines.

Response 9: Dear reviewer, we compared several current mainstream new lightweight detectors, and the results are shown in Response 5. In terms of trackers, we added a new tracker BOT-SORT for comparison, with the specific results as follows:

Table B. Evaluation results of different target tracking algorithms.

Algorithms

MOTA/%(↑)

IDF1/%(↑)

HOTA/%(↑)

FP(↓)

FN(↓)

IDS(↓)

FPS(↑)

SORT

81.6

83.0

67.2

554

77

55

51

DeepSORT

84.7

85.5

71.9

301

59

39

33

ByteTrack

85.6

87.9

74.3

224

37

33

38

Our method

88.3

90.2

78.1

113

21

18

29

BOT-SORT

88.6

89.9

77.8

111

24

18

14

It can be seen from the Table A that BOT-SORT is superior to ByteTrack in terms of detection accuracy, being close to our modified method, but it performs poorly in real-time performance and cannot meet the real-time requirements of this paper.

Comments 10: The discussion mentions missing first-instar silkworm data. Expand this section to include limitations such as reliance on hardware (camera movement, lighting, etc.). The tracking/counting framework may apply to other insect or animal farming contexts. Briefly mention this in the conclusion for broader impact.

Response 10: Dear reviewer, thank you for pointing out the issues, which are crucial for improving our paper.

We deeply understand the reviewers' concern about full-instar validation of the young silkworm counting model. However, it should be noted that first-instar silkworms differ significantly from 2nd-3rd instar silkworms in both morphological characteristics and rearing practices, with no practical counting demand in production. The rationality of excluding this stage is supported by biological and management logic:

(1) During the rearing process of the 1st instar young silkworms, the setae dispersion rate is mainly counted, without considering factors such as uniformity, health status, and feed intake. Moreover, the amount of feed fed to the 1st instar silkworms is a fixed value, which has nothing to do with their quantity;

(2) Before setae dispersion, their bodies are black and densely covered with setae. After setae dispersion, their heads and thoraxes remain black, but their bodies shed the setae and turn white. Due to the great difference between the 1st instar silkworms before and after setae dispersion, the judgment of the setae dispersion rate is relatively simple. In addition, the 1st instar young silkworms have weak movement ability, so only regional statistics are needed.

We have supplemented the content regarding camera movement speed and lighting conditions. For the camera movement speed, a speed higher than 25mm/s will reduce image clarity, resulting in partial blurred scenes and poor counting performance. If the camera movement speed is too slow, the time taken to shoot the entire silkworm rearing tray will be too long, which may lead to excessive movement distance of some young silkworms, also adversely affecting the counting effect. Therefore, it is appropriate to control the speed within the range of 10mm/s-20mm/s.

In terms of lighting, the closed silkworm room used in our study is consistent with the actual co-rearing scenario, which is also taken from the young silkworm co-rearing workshop of Shandong Guangtong Silkworm Rearing Co., Ltd. The temperature, humidity, lamp brightness, and installation position in the silkworm room strictly follow the practices of young silkworm co-rearing of Shandong Guangtong Silkworm Rearing Co., Ltd. The illuminance in the silkworm room, measured according to the position of the silkworm rearing tray in the room, ranges from 81.7 lx to 223.4 lx. We have conducted experiments, and the counting effect is stable within this brightness range.

Experimental validation confirms these parameters effectively balance imaging quality and operational efficiency in real-world co-rearing scenarios.

In addition, we have made a supplement in Section 5 (in Line 578-582, Page 19-20):

…This work can provide effective support for accurate detection and counting of young silkworms for precision rearing and high-quality breeding, and the novel detection-tracking-based counting framework proposed herein may also offer assistance for counting other small animals and densely planted plants.…

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

Title: A detection line counting method based on multi-target detection and tracking for precision rearing and high-quality breeding of young silkworm (Bombyx mori)

 

Keywords: young silkworm counting; artificial feed rearing; overlapping detection box matching; precision young silkworm rearing; multi-object detection

The article must contain a maximum of 5 keywords. These should represent the scope of the work. Since the words in the title are keywords, the words chosen as keywords should not appear in the title.

The last two paragraphs of the introduction contain no references. These fit better into the discussion than where they are. “The methods mentioned above all rely on static images for detection and counting. However, due to the complex feeding environment of co-rearing young silkworms, the large number of silkworms on the silkworm tray, and the fact that the size of the silkworms themselves is relatively small compared to the tray, it is extremely difficult to directly count the silkworms in the entire tray image. Furthermore, the distribution of silkworms on the tray is closely related to the feed and is not uniform, which makes counting strategies such as five-point sampling and equidistant sampling unrepresentative of the overall population. If multiple images are stitched together, errors and duplicate counting may occur. In comparison, the dynamic video tracking and counting methods are more suitable. A silkworm counting framework named YSC (Young Silkworm Counting) method based on dynamic video tracking and counting is proposed in this study. It first uses a target detector to detect and identify the silkworms, then tracks the detected silkworm targets using an improved multi-target tracker, and finally counts them based on a "detection line" method. The advancements in this study can provide a practical and effective approach for silkworm counting.”  The paper does not have an objective at the end of the introduction.

Figure 1 could be enlarged and have a better resolution. The computer screen cannot identify anything. The numbers, despite being identified, do not show anything. Improve the resolution and increase the size. “Figure 1. Experimental scene for co-rearing of young silkworms. Image acquisition device on the
left, Image of co-rearing of young silkworms is shown in the red box. 1. Silkworm tray 2. Camera 3. Slider 4. Electric linear actuator.5. Sliding rail 6. Supporting frame”

 

The conclusion is extensive and has results. These should be in the materials and methods; results or in the discussion. Not in the conclusion. Authors should provide a robust and direct conclusion, involving answers and opportunities that the article promotes. “A method named YSC for counting young silkworms in a co-rearing environment, which includes improvements in detection, tracking, and counting methods, is proposed in this study. In terms of detection, the MSFF module is introduced to address the issue of morphological changes in young silkworms. Additionally, in response to the complex background and various disturbances in the rearing environment of young silkworms, an EMA-CSP module has been designed. The improved detection algorithm achieves a recall of 87.9%, a precision of 91.3%, and an mAP50:95 of 72.7%, representing increases of 1.5%, 3.2% and 3.1%% respectively compared to the pre-improvement version. In terms of tracking, ByteTrack is utilized to follow the detected silkworm bodies. To overcome the problem of ID switching caused by overlapping and occlusion of adjacent silkworms, the matching method for overlapping detection boxes in ByteTrack is optimized. The improved tracking algorithm achieves a MOTA of 88.3%, an IDF1 of 90.2%, and a HOTA of 78.1%, representing increases of 2.7%, 2.3%, and 3.8% respectively compared to the pre-improvement version. A "detection line" method is designed to address the issue of repeated counting. Testing with three random test videos has demonstrated that this method achieves counting accuracies of 91.4%, 91.1%, and 90.5%. Finally, the generalization of this method is verified, proving that it also exhibits good counting accuracy for other young silkworm varieties with larger seed quantities. This work can provide effective support for accurate detection and counting of young silkworms for precision rearing and high-quality breeding.“

Author Response

Comments 1: Title: A detection line counting method based on multi-target detection and tracking for precision rearing and high-quality breeding of young silkworm (Bombyx mori) Keywords: young silkworm counting; artificial feed rearing; overlapping detection box matching; precision young silkworm rearing; multi-object detection. The article must contain a maximum of 5 keywords. These should represent the scope of the work. Since the words in the title are keywords, the words chosen as keywords should not appear in the title.

Response 1: Dear Reviewer, we sincerely appreciate your constructive feedback on the manuscript, which will be instrumental in refining our work. We have revised the keywords (in Line 40-41, Page 1) as follows:

Keywords: Young silkworm counting; Artificial feed co-rearing; Overlapping detection box matching; Improved ByteTrack; Video analysis

Comments 2: The last two paragraphs of the introduction contain no references. These fit better into the discussion than where they are. “The methods mentioned above all rely on static images for detection and counting. However, due to the complex feeding environment of co-rearing young silkworms, the large number of silkworms on the silkworm tray, and the fact that the size of the silkworms themselves is relatively small compared to the tray, it is extremely difficult to directly count the silkworms in the entire tray image. Furthermore, the distribution of silkworms on the tray is closely related to the feed and is not uniform, which makes counting strategies such as five-point sampling and equidistant sampling unrepresentative of the overall population. If multiple images are stitched together, errors and duplicate counting may occur. In comparison, the dynamic video tracking and counting methods are more suitable. A silkworm counting framework named YSC (Young Silkworm Counting) method based on dynamic video tracking and counting is proposed in this study. It first uses a target detector to detect and identify the silkworms, then tracks the detected silkworm targets using an improved multi-target tracker, and finally counts them based on a "detection line" method. The advancements in this study can provide a practical and effective approach for silkworm counting.” The paper does not have an objective at the end of the introduction.

Response 2: Dear Reviewer, thank you for your valuable comments. We have revised the introduction, adjusted the structure of the Introduction, and added clear research objectives at the end of the Section 1 (in Line 113-117, Page 3):

…Compared with traditional static detection methods, the objective of this study is to propose a young silkworm counting framework based on dynamic video tracking and counting. Through the collaborative optimization of target detection and multi-target tracking, it breaks through the technical bottleneck of young silkworm counting in dense scenarios and provides a new solution for intelligent silkworm rearing management…

Comments 3: Figure 1 could be enlarged and have a better resolution. The computer screen cannot identify anything. The numbers, despite being identified, do not show anything. Improve the resolution and increase the size. “Figure 1. Experimental scene for co-rearing of young silkworms. Image acquisition device on the left, Image of co-rearing of young silkworms is shown in the red box. 1. Silkworm tray 2. Camera 3. Slider 4. Electric linear actuator.5. Sliding rail 6. Supporting frame”.

Response 3: Dear Reviewer, thank you very much for pointing out the valuable issue. We have revised Figure 1 by increasing its resolution and size to make it clearer.

Comments 4: The conclusion is extensive and has results. These should be in the materials and methods; results or in the discussion. Not in the conclusion. Authors should provide a robust and direct conclusion, involving answers and opportunities that the article promotes. “A method named YSC for counting young silkworms in a co-rearing environment, which includes improvements in detection, tracking, and counting methods, is proposed in this study. In terms of detection, the MSFF module is introduced to address the issue of morphological changes in young silkworms. Additionally, in response to the complex background and various disturbances in the rearing environment of young silkworms, an EMA-CSP module has been designed. The improved detection algorithm achieves a recall of 87.9%, a precision of 91.3%, and an mAP50:95 of 72.7%, representing increases of 1.5%, 3.2% and 3.1%% respectively compared to the pre-improvement version. In terms of tracking, ByteTrack is utilized to follow the detected silkworm bodies. To overcome the problem of ID switching caused by overlapping and occlusion of adjacent silkworms, the matching method for overlapping detection boxes in ByteTrack is optimized. The improved tracking algorithm achieves a MOTA of 88.3%, an IDF1 of 90.2%, and a HOTA of 78.1%, representing increases of 2.7%, 2.3%, and 3.8% respectively compared to the pre-improvement version. A "detection line" method is designed to address the issue of repeated counting. Testing with three random test videos has demonstrated that this method achieves counting accuracies of 91.4%, 91.1%, and 90.5%. Finally, the generalization of this method is verified, proving that it also exhibits good counting accuracy for other young silkworm varieties with larger seed quantities. This work can provide effective support for accurate detection and counting of young silkworms for precision rearing and high-quality breeding.”

Response 4: Dear reviewer, thank you very much for your comments on the conclusions of this paper, which have been of great help to us in improving the paper.

We have revised Section 5 (in Line 565-582, Page 19-20) by removing unnecessary result data, and the revised Conclusion are as follows:

…A method named YSC for counting young silkworms in a co-rearing environment, which includes improvements in detection, tracking, and counting methods, is proposed in this study. In terms of detection, the MSFF module is introduced to address the issue of morphological changes in young silkworms. Additionally, in response to the complex background and various disturbances in the rearing environment of young silkworms, an EMA-CSP module has been designed. In terms of tracking, ByteTrack is utilized to follow the detected silkworm bodies. To overcome the problem of ID switching caused by over-lapping and occlusion of adjacent silkworms, the matching method for overlapping detection boxes in ByteTrack is optimized. A "detection line" method is designed to address the issue of repeated counting. Tests on random test videos show that this method achieves a counting accuracy of over 90% in all cases. Finally, the generalization of this method was verified, which proves that this method also has a good counting accuracy for other light-spotted and common-spotted young silkworm varieties of commercial artificial feed rearing with a large amount of seed distribution. This work can provide effective support for accurate detection and counting of young silkworms for precision rearing and high-quality breeding, and the novel detection-tracking based counting framework proposed herein may also offer assistance for counting other small animals and densely planted plants…

Author Response File: Author Response.pdf

Back to TopTop