A Detection Line Counting Method Based on Multi-Target Detection and Tracking for Precision Rearing and High-Quality Breeding of Young Silkworm (Bombyx mori)

Li, Zhenghao; Chang, Hao; Shang, Mingrui; Song, Zhanhua; Tian, Fuyang; Li, Fade; Zhang, Guizheng; Sun, Tingju; Yan, Yinfa; Liu, Mochen

doi:10.3390/agriculture15141524

Open AccessArticle

A Detection Line Counting Method Based on Multi-Target Detection and Tracking for Precision Rearing and High-Quality Breeding of Young Silkworm (Bombyx mori)

by

Zhenghao Li

^1,2,

Hao Chang

^1,2,

Mingrui Shang

^1,2,

Zhanhua Song

^1,2,3

,

Fuyang Tian

^1,2,3

,

Fade Li

^1,2,4,

Guizheng Zhang

⁵,

Tingju Sun

⁶,

Yinfa Yan

^1,2,7,*

and

Mochen Liu

^1,2,7,*

¹

College of Mechanical and Electrical Engineering, Shandong Agriculture University, Tai’an 271018, China

²

Engineering Laboratory of Agricultural Equipment Intelligence, Shandong Agricultural University, Tai’an 271018, China

³

Shandong Engineering Research Center of Agricultural Equipment Intelligentization, Shandong Agricultural University, Tai’an 271018, China

⁴

Shandong Key Laboratory of Intelligent Production Technology and Equipment for Facility Horticulture, Shandong Agricultural University, Tai’an 271018, China

⁵

Guangxi Institute of Sericulture Science, Nanning 530007, China

⁶

Shandong Guangtong Silkworm Rearing Co., Ltd., Weifang 262550, China

⁷

Shandong Higher Education Institution Future Industry Engineering Research Center of Intelligent Agricultural Robots, Shandong Agricultural University, Tai’an 271018, China

^*

Authors to whom correspondence should be addressed.

Agriculture 2025, 15(14), 1524; https://doi.org/10.3390/agriculture15141524 (registering DOI)

Submission received: 9 May 2025 / Revised: 7 July 2025 / Accepted: 8 July 2025 / Published: 15 July 2025

(This article belongs to the Section Farm Animal Production)

Download

Browse Figures

Review Reports Versions Notes

Abstract

The co-rearing model for young silkworms (Bombyx mori) utilizing artificial feed is currently undergoing significant promotion within the sericulture industry in China. Within this model, accurately counting the number of young silkworms serves as a crucial foundation for achieving precision rearing and high-quality breeding. Currently, manual counting remains the prevalent method for enumerating young silkworms, yet it is highly subjective. A dataset of young silkworm bodies has been constructed, and the Young Silkworm Counting (YSC) method has been proposed. This method combines an improved detector, incorporating an optimized multi-scale feature fusion module and the Efficient Multi-Scale Attention Fusion Cross Stage Partial (EMA-CSP) mechanism, with an optimized tracker (based on ByteTrack with improved detection box matching), alongside the implementation of a ‘detection line’ approach. The experimental results demonstrate that the recall, precision, and average precision (AP_50:95) of the improved detection algorithm are 87.9%, 91.3% and 72.7%, respectively. Additionally, the enhanced ByteTrack method attains a multiple-object tracking accuracy (MOTA) of 88.3%, an IDF1 of 90.2%, and a higher-order tracking accuracy (HOTA) of 78.1%. Experimental validation demonstrates a counting accuracy exceeding 90%. The present study achieves precise counting of young silkworms in complex environments through an improved detection-tracking method combined with a detection line approach.

Keywords:

young silkworm counting; artificial feed co-rearing; overlapping detection box matching; improved ByteTrack; video analysis

1. Introduction

Sericulture, as one of the oldest industries in China, occupies a pivotal position in the implementation of the rural revitalization strategy. Since the 1970s, China has consistently ranked first globally in silk production, processing, and trade, accounting for 75% of the world’s total cocoon and silk output [1]. In addition, the silkworm organism pupa not only has high nutritional value, but also has hypoglycemic, hypolipidemic, liver protection, cancer prevention, treatment of diabetes and sexual dysfunction, and other medicinal values, as the base material for the development of whole silkworm powder, silkworm moth composite powder, five-spice silkworm moths, androgynous wine, silk peptide, and other food and healthcare products and medicines. It has also greatly broadened the use of the pathway of the silkworm organism. However, the stagnation in the quality of silkworm (Bombyx mori) rearing and the continuous degradation of domestic silkworm varieties have increasingly highlighted the drawbacks of the traditional sericulture mode. Meanwhile, the co-rearing mode of young silkworms using artificial feed is considered a promising approach for the transformation and sustainable development of sericulture, due to its labor-saving and efficient characteristics [2,3,4,5].

Since 2019, the technology of artificial feed co-rearing for first to third instar silkworms has been applied on a large scale in multiple regions of China, including Jiangsu, Guangxi, and Sichuan [6,7]. By 2024, the cocoon production utilizing this technology had reached over 20,000 tons [8,9,10]. With artificial feed, silkworms only need to be fed once every three days during each instar stage, significantly simplifying the sericulture process and reducing labor intensity. Even when performed manually, this method offers an efficiency two to three times higher than that of mulberry leaf feeding. Furthermore, the quantity of feed administered during the co-rearing process of young silkworms exerts an influence on the quality of silkworm rearing. Excessive feeding has been shown to result in feed wastage and the spoilage of uneaten feed, thereby elevating the risk of silkworm diseases and increasing the subsequent workload for handling [11]. Conversely, insufficient feeding hinders the growth and development of silkworms, exerting a detrimental effect on rearing and breeding. Accurate counting of young silkworms provides a basis for determining artificial feed quantities, enabling precise feeding and laying a foundation for quantifying key indicators such as uniformity, incidence rate, and feed intake. Although artificial feed co-rearing technology has significantly enhanced sericulture efficiency, existing counting methods remain heavily reliant on manual labor [12,13], and current algorithms exhibit significant technical limitations in accuracy and real-time performance.

Currently, young silkworm counting faces three core challenges: firstly, the variable morphology, dense accumulation, and occlusion due to the artificial feed result in low recognition rates for traditional machine vision methods; secondly, static image detection technology is difficult to apply directly to high-density rearing scenarios; thirdly, existing agricultural counting studies predominantly focus on macro targets (e.g., livestock), while dynamic tracking of tiny insects remains underexplored.

Although deep learning-based computer vision has led to advances in insect detection, most methods are tailored for static images. For example, Suo et al. used the Otsu threshold method to segment aphid images, followed by edge detection for feature extraction, achieving accurate counting on yellow sticky boards in greenhouse and outdoor environments [14]. Li et al. customized anchor box areas and added bilinear interpolation during sampling in Faster R-CNN [15], obtaining an F1 score of 0.944 and mAP of 0.952 on yellow sticky insect boards [16]. Wang et al. proposed DeepPest, where a context-aware network (CAN) was pre-trained on the IPFC dataset to capture contextual information, and multi-projection convolution blocks with attention mechanisms were integrated into ResNet-50 [17], yielding an mAP of 73.9% [18]. Jiao et al. combined anchor-free RCNN (AF-RCNN) with Faster R-CNN on the Pest24 dataset, achieving an mAP of 56.4% and recall of 85.1% [19]. Ma et al. proposed a silkworm counting method by introducing the RCS-OSA residual module into YOLOv8n, replacing the detection head with a dynamic prediction head, and optimizing the loss function [20]. While effective for fewer than 80 silkworms, it struggles to scale to entire rearing trays in co-rearing scenarios. However, these methods are designed for static image scenarios and lack dynamic tracking capabilities, making them unsuitable for video-based counting.

In addition, cross-domain research also provides methodological references for this paper. For example, the MvMRL method proposed by Zhang et al. provides key ideas for integrating multi-source features to cope with dense, dynamic, and morphologically variable scenarios in the young silkworm counting task through its multi-view fusion logic [21]; the systematic analysis and multi-dimensional verification ideas proposed by Wei et al. can provide methodological references for the parameter optimization, scenario adaptation, and performance evaluation of the method in this paper, helping to improve the reliability and applicability of the method in actual breeding scenarios [22]. Xu et al. proposed an Attentive GAN method, in which the target region is accurately located by adding an attention mechanism, significantly improving the detection effect [23]. Wang et al. proposed an embedded cross framework ECF-DT, which achieves accurate detection of small targets through high-resolution feature processing, multi-scale information fusion, and complex background suppression [24].

Compared with traditional static detection methods, the objective of this study is to propose a young silkworm counting framework based on dynamic video tracking and counting. Through the collaborative optimization of target detection and multi-target tracking, it breaks through the technical bottleneck of young silkworm counting in dense scenarios and provides a new solution for intelligent silkworm rearing management.

2. Materials and Methods

2.1. Materials

2.1.1. Image Acquisition

The images of the bodies of young silkworms utilized in this study were collected between June and October 2023 at Shandong Guangtong Silkworm Rearing Co., Ltd. The silkworm varieties were Liangguang No. 2 (a variety with light spots) and Jingsong × Haoyue (a variety with common spots). These silkworms were reared on artificial feed and were at the second instar stage. They were reared on silkworm nets positioned in the middle of silkworm trays, with the dimensions of the nets being 700 mm × 500 mm × 120 mm. Figure 1 illustrates the image acquisition setup employed for capturing images of the young silkworms. During the filming process, an electric linear actuator (Yixing, China, featuring a stroke length of 700 mm, a pushing speed of 20 mm/s, and a torque of 500 N) moved the camera (SY012HD, Mingchuangda, China, with a focal length of 2.7 mm and a field of view of 85°) at a speed of 20 mm/s. The camera lens was positioned 220 mm above the silkworm net. Videos were recorded at hourly intervals, resulting in a total of 40 video segments, each with a duration of 35 s. The videos were captured at a resolution of 1920 × 1080 pixels and a frame rate of 20 frames per second.

2.1.2. Dataset of Young Silkworm Bodies

The bodies of young silkworms within the collected videos were labeled utilizing the DarkLabel 2.4 software (https://github.com/darkpgmr/DarkLabel, accessed on 12 March 2023), and a dataset was created. All images within the dataset feature a uniform silkworm variety, totaling 2400 images, comprising 1200 images of the “Liangguang No. 2” variety and 1200 images of the “Jingsong × Haoyue” hybrid. The dataset was randomly partitioned into training, testing, and validation sets in a 6:2:2 ratio. The specific grouping information is detailed in Table 1.

2.1.3. Challenges

In the context of co-rearing young silkworms, achieving precise counting of silkworms faces the following key challenges:

Morphological Variability. As illustrated in Figure 2, the morphology of silkworms in the dataset varies due to multiple factors, primarily stemming from mutual occlusion among individual silkworms and occlusion by feed. These morphological differences contribute to the diversification of silkworm features. To address this issue, the PAFPN structure [25] of the original detector is proposed to be substituted with the multi-scale feature fusion (MSFF) structure, and the EMA mechanism [26] is introduced within the CSP module, aiming to enhance the detection accuracy in this study.
ID Switching. The interleaved arrangement of silkworm individuals and occlusion by feed can result in a decrease in the score of tracking detection boxes, making it difficult to match them with tracking trajectories. This, in turn, triggers ID switching issues, compromising the accuracy of subsequent counting. To tackle this problem, an improved version of the ByteTrack [27] method is adopted in this study to reduce the probability of mismatching between trajectories and detection boxes. Additionally, by adjusting the matching strategy, the frequency of ID switching is further minimized.
Duplicate Counting. Factors such as edge distortion and occlusion can lead to duplicate counting issues in the process of silkworm target tracking. To resolve this issue, a novel statistical approach specifically tailored for silkworm targets is introduced in this study. This method leverages constructed “detection lines” and target ID values within the statistical area to effectively mitigate the problem of duplicate counting.

2.2. Method for Detecting Young Silkworms

2.2.1. Overall Overview

The YSC method in this study comprises the following modules, detector, tracker, and counter, with the system workflow illustrated in Figure 3. The detector component is used to detect silkworm targets in the video, while the tracker follows the detected targets. Finally, the counter tallies the number of tracked targets.

2.2.2. Detector

YOLOX [28] has made certain improvements based on the YOLOv4 [29] and YOLOv5 algorithms [30]. The improvements encompass the replacement of the detection head with a decoupled detection head, the implementation of an anchor-free algorithm, and the integration of SimOTA for the purpose of positive and negative sample matching. Due to detection requirements, the YOLOX-S network, characterized by fewer model parameters, better adaptability, and high detection accuracy, was ultimately selected as the baseline for improvements in this paper.

(1): EMA-CSP Detection Module

During the co-rearing process, the dense distribution of young silkworms can easily lead to missed or false detections. To address this issue, the EMA (Efficient Multi-Scale Attention) mechanism was introduced into the CSP module, enabling the neural network to focus more on the bodies of young silkworms and reducing attention to other irrelevant background information in this study.

The CSP1-X and CSP2-X modules are crucial for enhancing feature learning and generalization capabilities, as well as facilitating feature fusion. On this basis, the EMA mechanism, which captures cross-dimensional interactions and establishes interdependencies between dimensions, is incorporated, enhancing the representation ability of feature maps and improving the accuracy of target localization and recognition in this study. The integrated EMA-CSP module is illustrated in Figure 4.

(2): Multi-scale feature fusion method

To address the challenge of feature extraction caused by mutual occlusion among young silkworms and occlusion due to feed, a multi-scale feature fusion method (MSFF) is adopted in this study, effectively fusing low-level and high-level features while fully leveraging the advantages of high-level features, thereby replacing the original PAFPN component.

The multi-scale feature fusion (MSFF) method presented in this study comprises a feature filter module and a feature fusion Model, with its overall structure depicted in Figure 5. In the feature filter module, the most representative features are extracted through max pooling and average pooling while minimizing loss. These features are then merged, and a sigmoid activation function is employed to determine the weight values for each channel. The obtained weight values are multiplied by the feature maps of the corresponding scale, yielding filtered feature maps. Finally, the number of channels in feature maps across different scales is unified. In this study, {S5, S4, S3, S2} are employed to denote the feature map levels extracted from the input images, while {P5, P4, P3} are used to represent the generated feature map levels.

Subsequently, the high-level and low-level features from the filtered feature maps are fed into the feature fusion module. {N3, N4, N5} are used to represent the newly generated feature maps corresponding to {P3, P4, P5}, respectively, where N5 remains identical to P5 without any processing. Initially, the high-level features of P5 are expanded, and bilinear interpolation is applied for up-sampling or down-sampling to ensure consistency in dimensions between high-level and low-level features. Following this, the feature filter module converts the high-level features into attention weights, which are utilized to filter out redundant information in the low-level features of P3 and P4 feature maps. Ultimately, the filtered low-level features are integrated with the high-level features, yielding a more comprehensive and effective representation of semantic information. This process further optimizes feature extraction.

2.2.3. Tracker

Due to the dense distribution of silkworms during co-rearing, intersections and occlusions frequently occur among them. The multi-object tracking method ByteTrack is improved upon to address this issue for tracking silkworm targets. When two silkworms with high-scoring detection boxes intersect, the more active silkworm above (referred to as A) tends to continue holding the high-scoring detection box, while the less active silkworm below (referred to as B), due to occlusion, loses its original high-scoring detection box and instead exhibits a low-scoring detection box. The specific workflow of ByteTrack when dealing with intersecting silkworms is shown in Figure 6. When processing intersecting silkworms, three main problems arise: redundant computation, mismatching, and trajectory loss.

These issues lead to ID switching and increased processing time. To address the above problems, an overlapping detection box matching module is designed in this study, with the specific implementation method illustrated in Figure 6.

(1): To address redundant computations, the original ByteTrack strategy is modified to process only the low-scoring boxes within the minimum bounding rectangle of overlapping high-confidence boxes. This optimized strategy reduces the number of detection boxes to be processed, thereby shortening the processing time.
(2): To tackle the problem of mismatching, a pre-screening step is incorporated into the optimized method before trajectory matching. This step aims to select more suitable low-scoring detection boxes, which are then matched with the tracking trajectories that failed to match with high-scoring boxes in the first attempt. The schematic diagram of the screening process is illustrated in Figure 7. The threshold (T_0.5) for low-scoring box selection is calculated using the formula provided, and a box is considered superior if T_0.5 > 0.5, and it meets the specified condition.

$T_{0.5} = I o U - α \cdot \frac{ρ^{2} (c^{l}, c^{h})}{{(a_{t})}^{2} + {(b_{t})}^{2}} - β \cdot [\frac{{(a_{h} - a_{l})}^{2}}{a_{t}^{2}} + \frac{{(b_{h} - b_{l})}^{2}}{b_{t}^{2}}]$

(1)

where IoU stands for intersection over union; α is a hyperparameter representing the influence of the distance between the center points, set to 0.4; β is a hyperparameter representing the influence of the difference in width and height, set to 0.45; ρ(c^l, c^h) is the Euclidean distance between the center points of low-scoring and high-scoring boxes; a_t and b_t represent the width and height of the smallest detection rectangle encompassing the high-scoring and low-scoring detection boxes; a_h and a_l represent the width of the high-scoring and low-scoring detection boxes; b_h and b_l represent the height of the high-scoring and low-scoring detection boxes.
(3): To address the issue of trajectory loss due to prolonged unmatched periods, the retention time for trajectories that fail to match detection boxes was extended from 30 frames to 60 frames. Additionally, the criterion for creating a new trajectory was revised to consider a match between a superior low-scoring box and a trajectory with a score below 80%, and to discard the original trajectory and create a new one if no match is established within 60 frames.

The optimized ByteTrack workflow is illustrated in Figure 8. Compared to the original method, the proposed approach can effectively alleviate the issue of ID switching.

2.2.4. Counter

During the detection process, the camera moves at a constant speed from left to right to capture images of the silkworm trays. During this period, silkworms located at the edge of the frame undergo significant morphological changes, increasing the likelihood of misdetections. Additionally, some silkworms crawl in the trays, which can lead to repeated counting. Furthermore, due to the high number and density of silkworms in the trays during tracking, changes in their positions and occlusions result in an increase in ID switching, making counting solely based on IDs highly erroneous. To address these issues, a specialized counting method is introduced in this study, primarily relying on the ID of the detection box and its position relative to the detection line within the frame for counting. The counting process is illustrated in Figure 9.

A vertical “detection line” is drawn within the detection area, and its position remains constant as the video moves from right to left. When a detection box located to the right of the detection line contacts the line, its ID is verified. If the ID has not been recorded, it is recorded, and the count is incremented by one. For detection boxes initially crossing the detection line, when they exit the line, their IDs are similarly verified. If the ID has not been recorded, it is recorded, and the count is incremented by one. All other events do not alter the count.

The detection line counting method effectively reduces repeated counting by restricting the counting conditions and preventing misdetections caused by excessive target deformation at the edge of the frame.

2.3. Experimental Platform

The experimental platform used for model training and testing is the HP Z820 workstation, whose configuration is shown in Table 2.

2.4. Evaluation Metrics

In this study, five evaluation metrics are utilized to assess the performance of the young silkworm detection model, including precision (P), recall (R), F1 score, AP₅₀, and AP_50:95. The definitions of these indices are presented in Equations (2) to (6).

R = T P / (T P + F N)

(2)

P = T P / (T P + F P)

(3)

F 1 = (2 \times R \times P) / (R + P)

(4)

A P = \int_{0}^{1} P (R) d R

(5)

A P_{50 : 95} = (A P_{50} + A P_{55} + \dots + A P_{90} + A P_{95}) / 10

(6)

where TP stands for true positive, FP stands for false positive, TN stands for true negative, and FN stands for false negative. AP₅₀ represents the AP value when the IoU threshold is 50, while AP_50:95 represents the average AP value under different IoU thresholds ranging from 50 to 95 at intervals of 5.

In the context of tracking tasks for young silkworm bodies, three primary evaluation metrics are selected in this study: multiple object tracking accuracy (MOTA), identification average rate (IDF1), and high-order tracking accuracy (HOTA).

M O T A = 1 - \sum_{t} (F N_{t} + F P_{t} + I D S W_{t}) / \sum_{t} (G T_{t})

(7)

The formula for calculating MOTA is presented in Equation (7), where FN_t represents the number of missed detections in the t-th frame, FP_t denotes the number of false detections in the t-th frame, IDSW_t indicates the number of ID switches for tracked objects within the t-th frame, and GT_t signifies the total number of tracked objects in the t-th frame.

I D F 1 = 2 I D T P / (2 I D T P + I D F P + I D F N)

(8)

The formula for calculating IDF1 is presented in Equation (8), where IDTP represents true positive IDs, IDFP denotes false-positive IDs, and IDFN signifies false-negative IDs.

H O T A = \sqrt{D e t A \cdot A s s A} = \sqrt{\frac{\sum_{c \in T P} A (c)}{T P + F N + F P}}

(9)

A (c) = \frac{T P A (c)}{T P A (c) + F P A (c) + F N A (c)}

(10)

The formula for calculating HOTA is provided in Equation (9), where DetA is utilized to assess detection accuracy, and AssA is employed to evaluate association accuracy. C represents a point that belongs to the set of true positives (TP), and A(c) denotes the association accuracy defined by Equation (10). Within this context, TPA(c) signifies the accuracy of correctly associated detections, FPA(c) represents the false association precision of trajectory predictions, and FNA(c) indicates the association prediction accuracy for unpredicted trajectories.

Additionally, the performance of the tracking methodology is evaluated through two metrics in this study: identity switches (IDs) and frames per second (FPS). The formula for computing FPS is presented in Equation (11), where ‘frame’ represents the total count of frames and ‘time_elapsed’ represents the duration of continuous operation.

F P S = f r a m e / t i m e_{e l a p s e d}

(11)

We used counting accuracy to evaluate the final performance of young silkworm counting, and its formula is shown in Equation (12).

A_{c} = 1 - \frac{|N u m_{t} - N u m_{c}|}{N u m_{t}}

(12)

In the equation, Num_c represents the detected number of young silkworms, Num_t represents the true number of young silkworms, and A_c represents the counting accuracy.

3. Results

This section summarizes the detailed parameter configurations for both the detector and the tracker, and presents the experimental results obtained from the dataset. These results are intended to showcase the performance of the improvements proposed in this study when applied to the detector and tracker. Specifically, the experiments conducted include young silkworm detection tests, tracking experiments, ablation studies, and validation tests for counting accuracy.

3.1. Detection Results

We validated the detection performance of our method for young silkworms by comparing it with YOLOv4, YOLOv5s, YOLOX-S, YOLOv8s, YOLOv10s, and YOLO11s models. For the scenarios of small targets and high occlusion of young silkworms, we trained for a total of 300 epochs to fully learn the target features, with a batch size of 16 to ensure a balance between gradient stability and data augmentation processing space. Meanwhile, to accelerate early convergence and suppress overfitting in complex scenarios, the initial learning rate was set to 0.01, the momentum to 0.9, and the weight decay to 0.0005.

The visualization results of the detection are shown in Figure 10, which depicts a total of 26 young silkworms with multiple instances of occlusion among the young silkworms themselves and between young silkworms and feed. In contrast, YOLOv4 and YOLOv5 had five or more false negatives and false positives. YOLOv8 performed slightly better than YOLOv5, with two false positives and two false negatives, while YOLOX-S had two false positives and one false negative. The false negatives and false positives primarily occurred in scenarios such as overlapping of multiple young silkworms, feed occlusion, and image edges. This indicates that morphological changes in young silkworms resulting from occlusion and other factors are the primary causes of detection errors. YOLOv10s, YOLOv11s, and our method achieved zero false negatives; however, YOLOv10s and YOLOv11s each had two false positives, whereas our method exhibited only one instance of bounding box size mismatch, demonstrating superior detection performance.

The aforementioned results demonstrate the effectiveness of the improvements proposed in this study. The incorporation of the EMA mechanism enhances the precision of object detection while reducing the focus on background elements that are prone to causing interference, such as feeding grids, feed, and silkworm body textures. Additionally, the optimization strategy for the MSFF module generates more accurate semantic features through the fusion of features at different granularities, enabling effective feature extraction of young silkworms under occlusion conditions.

The evaluation results, as shown in Table 3, indicate that our method achieves optimal performance across all metrics, including F1 score, recall rate, precision, AP₅₀, and AP_50:95. We compared YOLOX-S with YOLOv4, YOLOv5, and YOLOv8. The YOLOX-S algorithm achieved 87.2%, 86.4%, 88.1%, 86.4%, and 69.6% for F1-score, recall, precision, AP50, and AP50:95, respectively, with all indicators superior to those of YOLOv4, YOLOv5, and YOLOv8. This result validates the approach of building upon the YOLOX-S algorithm (selected as the baseline in this study). Next, we evaluated the improved YOLOX-S. Compared with YOLOX-S, our method showed an increase by 2.4% in F1 score, 1.5% in recall, 3.2% in precision, and 1.7% and 3.1% in AP₅₀ and AP_50:95, respectively. For YOLOv10 and YOLOv11, their performance surpasses that of YOLOX-S. Notably, YOLOv10, as a recent YOLO variant, demonstrates significant improvements over YOLOv8 across all metrics, with its precision ranking second only to our method. YOLOv11 ranks first among the tested models in F1 score, recall, AP₅₀, and AP_50:95, trailing only behind our proposed approach.

The aforementioned results validate the effectiveness of the improvements proposed in this study. The implementation of the EMA-CSP module has reduced the attention on irrelevant background elements such as silkworm nets, feed, and silkworm body textures. Furthermore, the optimization strategy for the multi-scale feature fusion (MSFF) module generates more precise semantic features through the integration of high-level and low-level features, enabling effective feature extraction of various morphologies of young silkworms under occlusion conditions.

3.2. Tracking Results

The tracker was trained using the SGD method. To ensure stable convergence in the initial stage of training, the learning rate was set to 0.01. To limit the model complexity and prevent overfitting, we set the training epochs to 100 and the training weight decay to 0.0005. Meanwhile, we set NMS to 0.7, and the confidence and IoU thresholds to 0.65 and 0.5, respectively, so as to adapt to the occlusion scenarios of high-density silkworm groups, maintain the balance of PR values, and effectively filter out most false detections while retaining true positive results.

The multi-object tracking algorithms SORT [31], DeepSORT [32], ByteTrack, and our method were compared.

The visualization results of young silkworm tracking utilizing the DeepSORT, ByteTrack, and improved ByteTrack methods are depicted in Figure 11. In this figure, panels (a), (b), and (c) correspond to frames 8, 23, and 56 of the video sequence, respectively. During the period from frame 8 to frame 23 using the DeepSORT method, ID 14 transitioned to ID 17, and ID 18 transitioned to ID 20. By frame 56, the maximum ID value reached 25. Conversely, the ByteTrack method exhibited no ID transitions between frames 8 and 23, but observed a switch from ID 18 to ID 22 between frames 23 and 56, with a maximum ID value of 22. Both methods encountered numerous mismatches due to the overlapping of young silkworm bodies, resulting in ID switches during the tracking process. However, our method demonstrated robust tracking stability without any ID switches, achieving a maximum ID of 19 (which is consistent with the ground truth, GT).

The evaluation results are shown in Table 4. First, tests were conducted on SORT, DeepSORT, and ByteTrack. The ByteTrack method achieved 85.6%, 87.9%, and 74.3% in terms of MOTA, IDF1, and HOTA, respectively, and 33, 224, and 37 in terms of IDS, FP, and FN, respectively. Compared with the SORT and DeepSORT methods, it increased by 0.9% and 4% in MOTA, 2.4% and 4.9% in IDF1, 2.4% and 7.1% in HOTA, reduced by 6 and 22 in IDS, and also showed a significant decrease in FP and FN. This proves that ByteTrack, a multi-target tracking method not based on appearance matching, is more suitable for scenarios with targets of high similarity and frequent occlusions. The method in this paper achieved 88.3% in MOTA, 90.2% in IDF1, 78.1% in HOTA, and only 18, 113, and 21 in IDS, FP, and FN, respectively, all showing certain improvements compared with ByteTrack. These findings collectively demonstrate that the optimized bounding box matching method presented in this study can effectively enhance tracking accuracy and data association precision, maintain stable tracking performance in complex scenarios, and reduce issues such as missed detections, false detections, and ID switches during the tracking process.

In terms of processing speed, although the method proposed in this study modifies the matching strategy and reduces the number of low-score detection boxes requiring matching, it necessitates an additional screening process and extends the retention time for unmatched trajectories. This results in an increase in the number of matching iterations to some extent and a decrease in frames per second (FPS) compared to the pre-improvement method. However, with an FPS value of 29, which still exceeds the 20 FPS of the video recorded by the image acquisition device, the processing speed of the method presented in this study is entirely adequate to meet practical requirements.

The results presented above show that the performance of the ByteTrack method, optimized through refinements to the overlapping detection box matching mechanism based on the characteristics of young silkworm activity, surpasses that of the original ByteTrack. This superiority is evident in the reduced probability of mismatching between intertwined young silkworm trajectories, the decrease in trajectory loss, and consequently, the minimization of ID switch occurrences.

3.3. Ablation Tests

In order to conduct a more comprehensive analysis of the improvements proposed in this study, an ablation experiment is designed to dissect the multi-scale feature fusion (MSFF) method, the EMA mechanism, and the overlapping detection box matching mechanism. Two test videos are utilized to assess the influence of each improvement on the final tracking results.

The experimental results, as shown in Table 5, indicate that all three improvements contribute to an increase in the MOTA value, with the MSFF method and the overlapping detection box matching mechanism demonstrating more pronounced enhancements. Regarding the IDF1 metric, the overlapping detection box matching mechanism exhibits the largest improvement, with a 1.3% increase. Similarly, for the HOTA metric, the overlapping detection box matching mechanism also achieves the highest increase, reaching 2.4%. The substantial improvements observed in various evaluation metrics for the overlapping detection box matching mechanism are primarily attributed to the frequent occurrence of occlusion among young silkworms. Furthermore, in this study, the impact of different modules on inference speed is analyzed, finding that the multi-scale fusion mechanism and the detection box matching mechanism have a considerable influence, resulting in a combined reduction of 21% in frames per second (FPS) compared to the pre-improvement state when implemented together.

3.4. Counting Effectiveness for Young Silkworms

To validate the effectiveness of the young silkworm counting method, four approaches (YOLOv4-Byte, YOLOv5-Byte, YOLOX-Byte, and YSC) were employed to count the number of young silkworms in three randomly selected test videos (A, B, C). The counted numbers were then compared with manual counts, and the counting accuracy was calculated using Equation (12).

As indicated in Table 6, the three test set videos A, B, and C contain 2974, 3088, and 2682 young silkworms, respectively. The method refined in this study can effectively diminish the probabilities of missed and false detections. This is exemplified by the fact that when Method IV (YSC) is employed, the detected number of young silkworms is closest to the actual count. Similarly, the refined method in this study can efficiently mitigate ID switching. The maximum young silkworm ID values obtained using Method IV are 3468, 3482, and 2921, which are lower than those obtained using Method III (4543, 4495, 4075 in one instance, and 4247, 4483, 3754 in another), and significantly lower than those obtained using Methods I and II. In terms of the accuracy of counting young silkworms, the method presented in this study achieves accuracies exceeding 90% in all instances, markedly surpassing those achieved using the other three methods.

The overall results demonstrate that the adoption of the YSC method significantly enhances the accuracy of young silkworm counting. In the context of detection, it diminishes the occurrences of false positives and missed detections arising from morphological variations and complex backgrounds. In terms of tracking, it decreases the frequency of ID switching events. Furthermore, the integration of the “detection line” counting methodology with the improved detector and tracker proposed in this study exhibits superior performance in counting young silkworms in practical applications, minimizing the impact of ID switching.

4. Discussion

Given that the dataset collected and the subjects tested in this paper are exclusively second-instar young silkworms, first-instar and third-instar young silkworms have not been involved. Among them, the biological characteristics and feeding management of first-instar young silkworms are quite different from those of second–third-instar young silkworms. The primary focus of observation is the setae dispersion rate, without considering factors such as uniformity, health status, and feed intake. Moreover, the amount of feed fed to first-instar silkworms is a fixed value, which is irrelevant to their quantity, so the significance of counting is relatively small. In addition, their characteristics differ greatly before and after setae dispersion, and their movement ability is weak, making them suitable for traditional static detection schemes. Therefore, they have not been considered for inclusion. In contrast, third-instar silkworms exhibit a body length approximately double that of second-instar silkworms (with second-instar silkworms being approximately 1 cm in length and third-instar silkworms being approximately 2 cm in length). Additionally, the rearing density is reduced for third-instar silkworms. When counting third-instar silkworms, the results, as illustrated in Figure 12 and Table 7, reveal that due to their larger body size compared to second-instar silkworms and more pronounced features, the counting outcomes are more satisfactory. This demonstrates that YSC method can fulfill the requirements of practical applications.

In addition, to test the universality of this method, this paper also conducted a comparative experiment on the counting effect of this method for the light-spotted and common-spotted varieties, namely Jinxiu No. 2 and Huakang No. 3, which are widely applicable to commercial artificial feed rearing and have a large-scale distribution. The selected test videos were all 40 s in duration, with the counting conducted between the 10th and 29th seconds, every two seconds (or every 40 frames). The number of silkworms detected was compared with the actual number, and the comparative results are illustrated in Figure 13.

The errors, when compared to the actual numbers, were all within 5%. Compared to the counting methods based on density maps by Bereciartua-Pérez et al., detectors based on deep learning by Saradopoulos et al., and various methods based on Faster R-CNN by Li et al. mentioned previously, which are only applicable to single static images, the YSC method not only achieves high detection accuracy, but also significantly expands the detectable range and quantity, demonstrating good universality [33,34]. Furthermore, the YSC method significantly reduces counting errors compared to statistical counting strategies such as five-point sampling and equidistant sampling.

It is imperative to point out some limitations in the methodology presented in this study. Specifically, experiments have not been conducted on silkworm varieties with markedly distinct spot color traits, such as Tea Spot, Black Pickaxe Spot, and Tiger Spot. Moving forward, efforts will be directed towards expanding and refining the dataset to include a broader range of silkworm varieties.

5. Conclusions

A method named YSC for counting young silkworms in a co-rearing environment, which includes improvements in detection, tracking, and counting methods, is proposed in this study. In terms of detection, the MSFF module is introduced to address the issue of morphological changes in young silkworms. Additionally, in response to the complex background and various disturbances in the rearing environment of young silkworms, an EMA-CSP module has been designed. In terms of tracking, ByteTrack is utilized to follow the detected silkworm bodies. To overcome the problem of ID switching caused by overlapping and occlusion of adjacent silkworms, the matching method for overlapping detection boxes in ByteTrack is optimized. A “detection line” method is designed to address the issue of repeated counting. Tests on random test videos show that this method achieves a counting accuracy of over 90% in all cases. Finally, the generalization of this method was verified, which proves that this method also has a good counting accuracy for other light-spotted and common-spotted young silkworm varieties of commercial artificial feed rearing with a large amount of seed distribution. This work can provide effective support for accurate detection and counting of young silkworms for precision rearing and high-quality breeding, and the novel detection-tracking-based counting framework proposed herein may also offer assistance for counting other small animals and densely planted plants.

Author Contributions

Conceptualization, Z.L., Y.Y. and M.L.; methodology, Z.L. and M.L.; software, Z.L.; validation, Z.L.; formal analysis, Z.L. and M.L.; investigation, H.C., M.S. and Z.S.; resources, F.T., F.L., T.S. and G.Z.; data curation, Z.L., M.L. and H.C.; writing—original draft preparation, Z.L.; writing—review and editing, Z.L., M.L. and Y.Y.; supervision, Z.S., F.T., F.L. and Y.Y.; funding acquisition, Z.S., M.L. and Y.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Shandong Province Key Research and Development Plan Project (No. 2022TZXD0042); the Special National Key Research and Development Plan (No. 2023YFD1600900); the Shandong Province Modern Agricultural Industry Technology System, China (No. SDAIT-18-06); the China Agriculture Research System of MOF and MARA (CARS-18); and the National Natural Science Foundation of China (No. 32001419).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The datasets presented in this article are not readily available because the data are part of an ongoing study. Requests to access the datasets should be directed to the corresponding author.

Conflicts of Interest

Author Tingju Sun was employed by the company Shandong Guangtong Silkworm Rearing Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Nomenclature

A(c)	The association accuracy
A_c	The counting accuracy
AF-RCNN	Anchor-Free Region-based Convolutional Neural Network
a_h	The width of the high-score detection boxes
a_l	The width of the low-score detection boxes
AP₅₀	The AP value when the IoU threshold is 50
AP_50:95	The average AP value under different IoU thresholds ranging from 50 to 95 at intervals of 5
AssA	Evaluate association accuracy
a_t	The width of the smallest detection rectangle encompassing the high and low score detection boxes
b_h	The height of the high-score detection boxes
b_l	The height of the low-score detection boxes
b_t	The height of the smallest detection rectangle encompassing the high and low score detection boxes
C	A point that belongs to the set of True Positives (TP)
CAN	Context-Aware Network
CS-YOLOX-Byte	Counting of Silkworm-YOLOX Fusion ByteTrack
CSP	Cross Stage Partial
DeepSORT	Simple online and realtime tracking with a deep association metric
DetA	Assess detection accuracy
EMA	Efficient Multi-scale Attention
F1	Harmonic mean of precision and recall
FN	False Negative
FNA(c)	The association prediction accuracy for unpredicted trajectories
FN_t	The number of missed detections in the t-th frame
FP	False Positive
FPA(c)	The false association precision of trajectory predictions
FPS	Frames Per Second
FP_t	The number of false detections in the t-th frame
frame	The total count of frames
GT_t	The total number of tracked objects in the t-th frame
HOTA	High Order Tracking Accuracy
IDF1	Identification Average Rate
IDFN	False negative IDs
IDFP	False positive IDs
IDs	Identity switches
IDSW_t	The number of ID switches for tracked objects within the t-th frame
IDTP	True positive IDs
IoU	Intersection over Union
IPFC	In-field Pest in Food Crop
mAP	mean Average Precision
MOTA	Multiple Object Tracking Accuracy
MSFF	Multi-Scale Feature Fusion
Num_c	The detected number of young silkworms
Num_t	The true number of young silkworms
P	Precision
PAFPN	Path Aggregation Feature Pyramid Network
R	Recall
R-CNN	Regions with CNN features
RCS-OSA	Reparametrized Convolution based on channel Shuffle-One-Shot Aggregation
SORT	Simple Online and Realtime Tracking
time_elapsed	The duration of continuous operation
TN	True Negative
TP	True Positive
TPA(c)	The accuracy of correctly associated detections
α	A hyperparameter representing the influence of the distance between the centre points
β	A hyperparameter representing the influence of the difference in width and height
ρ(c^l, c^h)	The Euclidean distance between the centre points of low and high score box

References

Li, J.; Liang, Q.; Gu, G. The Impact of the New Crown Pneumonia Epidemic on China’s Silk Industry and Development Countermeasures. Chin. J. Anim. Sci. 2023, 59, 336–341. [Google Scholar] [CrossRef]
Feng, H.; Li, J. The development history and characteristics of sericulture industry in 60 years of new China. China Seric. 2014, 35, 1–10. [Google Scholar] [CrossRef]
Gu, G.; Li, J. Development of the Silk Industry and Industrialized Silkworm Rearing Using Artificial Diet for All Life Stages. Newsl. Sericult. Sci. 2019, 39, 46–48. [Google Scholar]
Li, J.; Gu, G. Development Trends and Policy Recommendations of China’s Sericulture Industry in 2022. Chin. J. Anim. Sci. 2022, 58, 270–274. [Google Scholar] [CrossRef]
Wang, Y.; Yang, H.; Chen, Y.; Chen, S.; Jiang, Y.; Cui, C.; Liu, M.; Tang, H.; Fan, Y.; Yang, Q.; et al. Reflections on the co-rearing of commercialized young silkworms on artificial feed in Yunnan province. China Seric. 2023, 44, 38–43. [Google Scholar] [CrossRef]
Cheng, Y.; Yuan, G.; Zhu, J.; Xiong, H.; Cao, N.; He, Y.; Xiao, H.; Yang, Z. Status and Prospects of Artificial Feed Rearing Technology for Silkworms in Sichuan Province. Sichuan Canye 2023, 49, 49–52. [Google Scholar] [CrossRef]
Li, Y.; Zhang, G. The large-scale artificial feed rearing of young silkworms in Guangxi has been successfully trialed. Guangxi Seric. 2023, 56, 41. [Google Scholar]
Li, J.; Gu, G.; Cui, W. Progress and Prospects of Silkworm Rearing Using Artificial Feed. China Seric. 2021, 42, 46–52. [Google Scholar] [CrossRef]
Jin, Q.; Feng, J.; Wu, H.; Zhan, P.; Feng, H. Artificial Feed Rearing for Young Silkworms + Leaf Strip Rearing for Larger Silkworms. China Seric. 2019, 40, 65–69. [Google Scholar] [CrossRef]
Wang, L.; Hu, S. Promotion of the Babei Model for Factory-based Silkworm Rearing Using Artificial Feed for All Life Stages. Bull. Seric. 2020, 51, 37–38+54. [Google Scholar]
Cheng, Y.; Cao, N.; Xiao, H.; He, Y. Suggestions on Prevention and Emergency Response Measures for Mold Formation in Artificial Feed for Bombyx mori. Sichuan Canye 2024, 52, 37–38. [Google Scholar] [CrossRef]
Yan, Z.; Guan, S. Key technical measures for domestic silkworm rearing. Jiangsu Seric. 2017, 2017, 16–17. [Google Scholar]
Lan, B.; Qin, M.; Qin, X. Technical measures to improve the quality of co-rearing of young silkworms in seed cocoon production. Guangdong Canye 2014, 48, 3–5+8. [Google Scholar] [CrossRef]
Suo, X.; Liu, Z.; Lei, S.; Wang, J.; Zhao, Y. Aphid identification and counting based on smartphone and machine vision. J. Sens. 2017, 2017, 3964376. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Li, W.; Wang, D.; Li, M.; Gao, Y.; Wu, J.; Yang, X. Field detection of tiny pests from sticky trap images using deep learning in agricultural greenhouse. Comput. Electron. Agric. 2021, 183, 106048. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
Wang, F.; Wang, R.; Xie, C.; Yang, P.; Liu, L. Fusing multi-scale context-aware information representation for automatic in-field pest detection and recognition. Comput. Electron. Agric. 2020, 169, 105222. [Google Scholar] [CrossRef]
Jiao, L.; Dong, S.; Zhang, S.; Xie, C.; Wang, H. AF-RCNN: An anchor-free convolutional neural network for multi-categories agricultural pest detection. Comput. Electron. Agric. 2020, 174, 105522. [Google Scholar] [CrossRef]
Ma, X.; Wang, M.; Kuang, H.; Tang, L.; Liu, X. Detecting and counting silkworms using improved YOLOv8n. Trans. Chin. Soc. Agric. Eng. 2024, 40, 143–151. [Google Scholar] [CrossRef]
Zhang, R.; Lin, Y.; Wu, Y.; Deng, L.; Zhang, H.; Liao, M.; Peng, Y. MvMRL: A multi-view molecular representation learning method for molecular property prediction. Brief. Bioinform. 2024, 25, bbae298. [Google Scholar] [CrossRef]
Wei, X.; Wu, H.; Wang, Z.; Zhu, J.; Wang, W.; Wang, J.; Wang, Y.; Wang, C. Rumen-protected lysine supplementation improved amino acid balance, nitrogen utilization and altered hindgut microbiota of dairy cows. Anim. Nutr. 2023, 15, 320–331. [Google Scholar] [CrossRef] [PubMed]
Xu, H.; Li, Q.; Chen, J. Highlight Removal from A Single Grayscale Image Using Attentive GAN. Appl. Artif. Intell. 2022, 36, 1988441. [Google Scholar] [CrossRef]
Wang, B.; Yang, M.; Cao, P.; Liu, Y. A novel embedded cross framework for high-resolution salient object detection. Appl. Intell. 2025, 55, 277. [Google Scholar] [CrossRef]
Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar] [CrossRef]
Ouyang, D.; He, S.; Zhan, J.; Guo, H.; Huang, Z.; Luo, M.; Zhang, G. Efficient Multi-Scale Attention Module with Cross-Spatial Learning. In Proceedings of the ICASSP 2023–2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023. [Google Scholar] [CrossRef]
Zhang, Y.; Sun, P.; Jiang, Y.; Yu, D.; Weng, F.; Yuan, Z.; Luo, P.; Liu, W.; Wang, X. ByteTrack: Multi-Object Tracking by Associating Every Detection Box. In Proceedings of the Computer Vision–ECCV 2022, 17th European Conference, Tel Aviv, Israel, 23–27 October 2022; Springer Nature: Cham, Switzerland, 2022; pp. 1–21. [Google Scholar] [CrossRef]
Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. YOLOX: Exceeding YOLO Series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar] [CrossRef]
Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar] [CrossRef]
Jocher, G.; Chaurasia, A.; Stoken, A.; Borovec, J.; NanoCode012; Kwon, Y.; Xie, T.; Fang, J.; Imyhxy; Michael, K.; et al. ultralytics/yolov5: V6.1–TensorRT, TensorFlow Edge TPU and OpenVINO Export and Inference; Zenodo: Geneva, Switzerland, 2022. [Google Scholar] [CrossRef]
Bewley, A.; Ge, Z.; Ott, L.; Ramos, F.; Upcroft, B. Simple online and realtime tracking. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 3464–3468. [Google Scholar] [CrossRef]
Wojke, N.; Bewley, A.; Paulus, D. Simple online and realtime tracking with a deep association metric. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 3645–3649. [Google Scholar] [CrossRef]
Bereciartua-Pérez, A.; Gómez, L.; Picón, A.; Navarra-Mestre, R.; Klukas, C.; Eggers, T. Insect counting through deep learning-based density maps estimation. Comput. Electron. Agric. 2022, 197, 106933. [Google Scholar] [CrossRef]
Saradopoulos, L.; Potamitis, L.; Konstantaras, A.; Eliopoulos, P.; Ntalampiras, S.; Rigakis, I. Image-Based Insect Counting Embedded in E-Traps That Learn without Manual Image Annotation and Self-Dispose Captured Insects. Information 2023, 14, 267. [Google Scholar] [CrossRef]

Figure 1. Experimental scene for co-rearing of young silkworms. Image acquisition device on the left, Image of co-rearing of young silkworms is shown in the red box. 1. Silkworm tray; 2. camera; 3. slider; 4. electric linear actuator; 5. sliding rail; 6. supporting frame.

Figure 2. Morphological variability due to obstruction in the young silkworm rearing scene.

Figure 3. System process diagram. The parts marked with red dashed lines are the new contributions of this paper.

Figure 4. EMA-CSP modules. (a) CSP1-X with EMA mechanism; (b) CSP2-X with EMA mechanism.

Figure 5. Schematic diagram of MSFF structure.

Figure 6. ByteTrack workflow.

Figure 7. Schematic diagram of low-score box filtering. (a) The red, yellow, blue, and green rectangles represent detection bounding boxes. The red box is a high-score bounding box, while the yellow, blue, and green boxes are low-score bounding boxes that appear after the disappearance of the high-score box; (b) the IoU (intersection over union) and T_0.5 values of these three boxes. They exhibit an identical IoU value of 0.56, which tends to induce misclassification. When comparing the T_0.5 values, the yellow box, with a value below 0.5, will be prioritized for elimination.

Figure 8. Improved ByteTrack workflow. The parts marked with red dashed lines are the new contributions of this paper.

Figure 9. The counting process of the counter. In the figure, “num” signifies the total number of targets in the current frame, “silkworm” denotes the current count of silkworms being tallied, and “cross IDs” indicates the ID values of the silkworms currently being counted. (a) Counting commences at this point, with an initial count of 0. (b) The left boundary line of the detection frame for silkworm ID 14 crosses the detection line, resulting in a count of 1. (c) Both the left boundary line of the detection frame for silkworm ID 4 and the right boundary line of the detection frame for silkworm ID 5 cross the detection line, leading to a count of 3. (d) The left boundary lines of the detection frames for silkworms ID 9, 13, 16, 18, and 10 all cross the detection line, yielding a total count of 8.

Figure 10. Visual comparison of young silkworm detection results by different detectors. In the figure, yellow circles indicate false detections, and green circles indicate missed detections. (a) The original image, where occlusion exists between ID 5 and 6, 11 and 12, 15 and 14, 16, as well as 23 and 24. There is also occlusion between ID 26 and the feed. ID 3, 7, and 21 are located at the edges of the image, with only a majority of their silkworm bodies visible; (b) the detection outcomes of YOLOv4, which includes four false positives and three false negatives; (c) the detection results of YOLOv5s, showcasing two false positives and three false negatives; (d) the detection outcomes of YOLOX-S, with two false positives and one false negative observed; (e) the detection results of our method, which exhibits no false negatives; however, there is one instance of a false positive where the height of the bounding box at target 12 is notably larger than the dimensions of the detected object; (f) the detection results of YOLOv8s showed two false positives and two false negatives; (g) the detection results of YOLOv10s, which exhibits two false positives; (h) the detection results of YOLO11s also showed two false positives.

Figure 11. Visual comparison of silkworm tracking results by different trackers. (a) Frame 8; (b) Frame 25; (c) Frame 56.

Figure 12. Counting effectiveness illustration for third-instar silkworms.

Figure 13. Comparison chart of counting results for various silkworm varieties.

Table 1. Grouping information of the image dataset.

Dataset Classification	Variety of Young Silkworms	Number of Images	Number of Samples
Training set	Liangguang No. 2	720	20,048
Training set	Jingsong × Haoyue	720	18,795
Validation set	Liangguang No. 2	240	5243
Validation set	Jingsong × Haoyue	240	5104
Test set	Liangguang No. 2	240	5163
Test set	Jingsong × Haoyue	240	6323
Total		2400	100,201

Table 2. Main configurations of hardware platform.

Configuration	Parameter
CPU	Intel Xeon Gold 5218R
Memory	128 G
GPU	GeForce RTX 3090
Accelerated environment	CUDA 11.3 cuDNN 8.0.5
Operating system	Windows 10.0
Development environment	Python 3.8 Pytorch 1.12.0

Table 3. Evaluation results of different target detection algorithms.

Algorithms	F1/% (↑)	Recall/% (↑)	Precision/% (↑)	AP₅₀/% (↑)	AP_50:95/% (↑)
YOLOv4	74.2	71.8	76.7	72.5%	55.3
YOLOv5s	85.2	84.9	85.5	85.1	66.5
YOLOX-S	87.2	86.4	88.1	87.4	69.6
Our method	89.6	87.9	91.3	89.1	72.7
YOLOv8s	85.8	85.4	86.2	85.7	68.1
YOLOv10s	87.9	86.5	89.4	88.2	70.2
YOLOv11s	88.1	87.1	89.2	89.0	71.5

Note: ‘↑’ indicates that the higher the value, the better the detection effect.

Table 4. Evaluation results of different target tracking algorithms.

Algorithms	MOTA/% (↑)	IDF1/% (↑)	HOTA/% (↑)	FP (↓)	FN (↓)	IDS (↓)	FPS (↑)
SORT	81.6	83.0	67.2	554	77	55	51
DeepSORT	83.7	85.5	69.9	301	59	43	33
ByteTrack	85.6	87.9	74.3	224	37	33	38
Our method	88.3	90.2	78.1	113	21	18	29

Note: ‘↑’ indicates that the higher the value, the better the detection effect, while ‘↓’ indicates that the lower the value, the better the detection effect.

Table 5. Ablation tests of the three proposed improvements.

ByteTrack	MSFF	EMA	Overlapping Detection Box Matching	MOTA/% (↑)	IDF1/% (↑)	HOTA/% (↑)	FPS (↑)
1				84.9	87.3	72.8	38
2	✔			86.4	88.5	74.6	35
3		✔		85.1	84.8	74.1	37
4			✔	86.6	88.6	75.2	33
5	✔	✔		87.4	87.9	74.9	34
6		✔	✔	87.3	88.8	75.9	32
7	✔		✔	87.2	89.5	76.4	30
8	✔	✔	✔	88.3	90.4	76.7	29

Note: ‘↑’ indicates that the higher the value, the better the detection effect.

Table 6. Verification results of young silkworm counting methods based on different detection-tracking combination approaches.

Methods	Test Videos	Num_t	Num_c	Max ID	MOTA/%	A_c/%
I YOLOv4-Byte	A	2974	4892	5483	0.63	35.5
	B	3088	5138	5733	0.70	33.6
	C	2682	4312	4948	0.64	39.2
II YOLOv5-Byte	A	2974	3997	5196	0.67	65.6
	B	3088	4093	5337	0.72	67.5
	C	2682	3666	4933	0.69	63.3
III YOLOX-Byte	A	2974	3682	4543	0.82	76.2
	B	3088	3661	4495	0.75	81.4
	C	2682	3234	4075	0.77	79.4
IV YSC (our method)	A	2974	3230	3468	0.85	91.4
	B	3088	3364	3482	0.80	91.1
	C	2682	2936	2921	0.82	90.5

Table 7. Counting results for third-instar silkworms.

Test Video	Num_t	Num_c	Max ID	MOTA/%	A_c/%
A	3123	3358	3552	0.84	92.5
B	2866	3063	3244	0.86	93.1
C	3049	3273	3431	0.84	92.7

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, Z.; Chang, H.; Shang, M.; Song, Z.; Tian, F.; Li, F.; Zhang, G.; Sun, T.; Yan, Y.; Liu, M. A Detection Line Counting Method Based on Multi-Target Detection and Tracking for Precision Rearing and High-Quality Breeding of Young Silkworm (Bombyx mori). Agriculture 2025, 15, 1524. https://doi.org/10.3390/agriculture15141524

AMA Style

Li Z, Chang H, Shang M, Song Z, Tian F, Li F, Zhang G, Sun T, Yan Y, Liu M. A Detection Line Counting Method Based on Multi-Target Detection and Tracking for Precision Rearing and High-Quality Breeding of Young Silkworm (Bombyx mori). Agriculture. 2025; 15(14):1524. https://doi.org/10.3390/agriculture15141524

Chicago/Turabian Style

Li, Zhenghao, Hao Chang, Mingrui Shang, Zhanhua Song, Fuyang Tian, Fade Li, Guizheng Zhang, Tingju Sun, Yinfa Yan, and Mochen Liu. 2025. "A Detection Line Counting Method Based on Multi-Target Detection and Tracking for Precision Rearing and High-Quality Breeding of Young Silkworm (Bombyx mori)" Agriculture 15, no. 14: 1524. https://doi.org/10.3390/agriculture15141524

APA Style

Li, Z., Chang, H., Shang, M., Song, Z., Tian, F., Li, F., Zhang, G., Sun, T., Yan, Y., & Liu, M. (2025). A Detection Line Counting Method Based on Multi-Target Detection and Tracking for Precision Rearing and High-Quality Breeding of Young Silkworm (Bombyx mori). Agriculture, 15(14), 1524. https://doi.org/10.3390/agriculture15141524

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Detection Line Counting Method Based on Multi-Target Detection and Tracking for Precision Rearing and High-Quality Breeding of Young Silkworm (Bombyx mori)

Abstract

1. Introduction

2. Materials and Methods

2.1. Materials

2.1.1. Image Acquisition

2.1.2. Dataset of Young Silkworm Bodies

2.1.3. Challenges

2.2. Method for Detecting Young Silkworms

2.2.1. Overall Overview

2.2.2. Detector

2.2.3. Tracker

2.2.4. Counter

2.3. Experimental Platform

2.4. Evaluation Metrics

3. Results

3.1. Detection Results

3.2. Tracking Results

3.3. Ablation Tests

3.4. Counting Effectiveness for Young Silkworms

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI