1. Introduction
The Chinese mitten crab (
Eriocheir sinensis), also known as the river crab, possesses significant 4nutritional and economic value [
1,
2]. In China, river crabs are mainly used for two purposes: fresh food and deep processing. As of 2022, the total freshwater aquaculture production of river crabs reached 815,000 tons. Of this total, the fresh food market accounted for 98.89%, while the deep processing market made up just 1.11% [
3]. In comparison to the fresh food market, the deep-processing market for river crabs is significantly underdeveloped, largely due to the complex structure of the crabs. The processing methods used are mostly partial and cumbersome. These methods involve multiple steps, including removing the dorsal shell, crab legs, and crab yolk, as well as separating the shells from the meat of various parts of the crab. Moreover, the majority of these processes rely on manual shelling instruments (e.g., hammers, shovels, needles, forks, and scissors) for separating different crab parts [
4,
5], which results in high labor intensity, low processing efficiency, and a risk of contamination of the crab meat. The challenges of high labor intensity, low processing efficiency, and the risk of crab meat contamination continue to be significant issues. To improve the processing efficiency of river crabs, most current research focuses on enhancing devices designed for specific parts of the crab. This includes tools for extruding shells and separating meat from crab feet, cutting devices for the crab body, and mechanisms for adsorbing and removing the dorsal armor [
6,
7]. Although these devices enhance specific processing methods for river crabs, they only focus on certain aspects of the process, which makes it difficult to achieve precise processing positions. As a result, the outcomes of the processing are less than ideal, and the level of automation is low. This, in turn, significantly hampers the development of the deep-processing industry for river crabs [
8]. Recent advancements in robotics and image recognition technology have made it possible to simulate the manual processing of river crabs. A significant challenge in this process is the quick and accurate identification of key points for each step, such as the joints of the crab legs and pincers that connect to the crab body, the tail of the river crab, and the yolk pipeline linked to the dorsal carapace. Among these key points, detecting the crab’s tail is the first step in the deep processing of river crabs, as it is necessary for the removal of the dorsal armor. Therefore, it is essential to clarify the key position for the removal of the dorsal armor in river crabs and to explore automated and intelligent methods for processing them into parts. This approach will improve processing efficiency and support the sustainable development of river crab processing equipment.
In recent years, the rapid development of deep learning in image recognition has led to the widespread application of machine vision in the deep processing of aquatic products, thereby enhancing the accuracy of processing and promoting the automation of aquatic processing [
9]. In 2018, Wang et al. pioneered the integration of image analysis into automated crab processing by proposing a convolutional neural network-based computer vision algorithm, achieving a remarkable accuracy of 97.67% in detecting dorsal fin nodes of green crabs [
10]. Subsequently, Wang et al. introduced a method to classify the quality of river crabs by combining image processing techniques, genetic algorithms, and neural networks from BP, achieving an accuracy rate of 92.7% [
11]. While these methods represented significant strides in the intelligent processing of aquatic products, challenges such as large network models, slower detection speeds, and insufficient real-time performance persisted. In recent years, the YOLO (You Only Look Once) algorithm [
12] and advancements in model lightweighting technology have substantially improved the efficiency of intelligent processing. Ye et al. enhanced the YOLOv5 model by integrating the CA attention mechanism and Bottleneck Transformer, addressing the labor intensity and inefficiency of traditional crayfish sorting and providing a rapid automated solution [
13]. Chen replaced the CSP-Darknet53 backbone of YOLOv4 with GhostNet, added the SE attention mechanism and CIoU, and achieved automatic sorting of both shelled and peeled shrimp with a lightweight model achieving a mAP of 98.5%, thereby enhancing the automation of shrimp sorting [
14]. Chen et al. adopted the PP-LCNet architecture in YOLOv5 to address the high labor intensity and inefficiency in traditional manual crayfish sorting, enabling efficient and automated sorting. Furthermore, by employing PP-LCNet as the backbone of YOLOv5, they classified the quality of South American white shrimp by incorporating a DepthSepConv module and replacing the SiLU activation function, achieving a mAP of 98.5% with 4.8 M parameters and 9.0 GFLOPs [
15]. Although these methods have successfully reduced computational load, the resulting models may still be too heavy for some applications. Additionally, the process of making these models lightweight often leads to compromises in their detection capabilities. Therefore, there is an urgent need to explore new strategies and techniques to enhance the detection capabilities of lightweight YOLO networks. This will help improve both accuracy and real-time performance in aquatic processing, addressing the demands of current lightweight edge devices.
This paper explores ways to improve the intelligence of equipment used for processing river crabs and to achieve precise part processing. The main focus is on the crab tail, which is the primary area for removing the dorsal armor. The paper examines efficient methods for identifying and localizing this area using machine vision. To address this, we propose a lightweight deep learning model called YOLOv7-SPSD, which is based on YOLOv7-tiny. It achieves initial lightweighting by introducing the Slimneck module [
16], PConv [
17], and SimAM attention mechanism [
18]. Additionally, the model utilizes the DepGraph pruning algorithm [
19] to remove redundant parameters, further enhancing model lightweighting to meet the requirements of edge devices. This approach opens up possibilities for the application of lightweight networks in river crab processing equipment.
3. Results and Discussion
3.1. Experimental Environment
In this paper’s experiments, experiments were conducted using the PyTorch2.0 deep learning framework on a 13th Gen Intel(R) Core(TM) i5-13500HX@2.50 GHz (16 GB RAM, Intel Corporation, Santa Clara, United States) and NVIDIA GeForce RTX4060 Laptop GPU (8 GB RAM, NVIDIA Corporation, Santa Clara, United States) operating under a Windows 11 operating system hardware platform. The entire training process was configured for 300 cycles, with a batch size of 16; the input image size was 640 × 640; the initial learning rate was set at 0.1; a stochastic gradient descent optimizer was utilized and no pretrained weights were loaded.
3.2. Indicators for Evaluation
In this paper, P (precision), R (recall), F1 (F1-score), AP (average precision), mAP (mean average precision), the model parameters, FLOPs (floating point operations), GFLOPS and model size were utilized as evaluation metrics. As an evaluation metric, the IOU (intersection over union) threshold was set at 0.65. Precision is defined as the ratio of correctly identified river crab parts to the total number of detected river crab parts. Recall is the ratio of correctly detected river crab parts to the total number of parts in the dataset. F1 is the harmonic mean of precision and recall. Generally, a higher F1 indicates greater model stability. mAP evaluates overall performance across various confidence thresholds. Additionally, the number of parameters, computational requirements and model size assess complexity of the model.
where
represents a true positive,
a false positive,
a true negative, and
a false negative.
3.3. Ablation Experiment
To verify the effectiveness of Slimneck, PConv, SimAM, and DepGraph pruning, ablation experiments were carried out using the YOLOv7-tiny model and the results are shown in
Table 1.
As shown in
Table 1, replacing the neck component of the original YOLOv7-tiny model with Slimneck resulted in a decrease of 0.8 GFLOPS, 0.3 MB in model size, and a 3.5% reduction in parameters. While the mAP0.5 remained unchanged, there were slight decreases in P, R, and F1. Subsequently, integrating the complex ELAN module in the backbone with PConv retained the advantages of the ELAN module while significantly reducing the computational overhead for feature extraction. After modifying the Slimneck structure and replacing the ELAN module with the ELAN-P module, although mAP0.5 decreased by 0.1% compared to the original model, GFLOPS was reduced by 3.9, model size was reduced by 2.8 MB, and parameters decreased by 25.1%. These changes compensated to a certain extent for the decreases in P, R, and F1, further enhancing the feature extraction ability for the crab tail. Given the limited feature fusion capability of Slimneck, the inclusion of the SimAM attention mechanism in the SPPCSPC and each Concat module within the neck segment served to enhance Slimneck’s feature fusion capacity. Following the integration of SimAM, there was no increase in GFLOPS, size, or parameters. Both mAP0.5 and F1 achieved the same levels of accuracy as the baseline model, with recall increasing by 0.2%. Therefore, by concurrently employing Slimneck, ELAN-P, and SimAM, the model maintains both accuracy and precision while achieving weight reduction.
Finally, the model, which integrates the three modules, was pruned using the DepGraph method to eliminate redundant parameters and further reduce its weight. After pruning, mAP0.5 increased by 0.1%, recall improved by 0.2%, and the F1 increased by 0.1%; GFLOPS decreased by 74.6%, size by 8.1 MB, and parameters by 71.6%. The ablation experiment results are illustrated in
Figure 9, showing notable reductions in GFLOPS, size, and parameters compared to the baseline model. The improvements brought by the three modules, along with the DepGraph pruning algorithm, play a significant role in preserving the detection performance of the YOLOv7-SPSD model, all while decreasing its parameter count, computational load, and overall size.
3.4. Pruning Experiment
In this study, we applied the DepGraph algorithm to prune the enhanced model. We established a pruning multiplier parameter and iteratively adjusted it to find the optimal value. This process allowed us to compress the model to its maximum potential while maintaining consistent experimental conditions. The results of the pruning and compression experiments are presented in
Table 2.
Analysis of the data reveals that mAP0.5 remains consistent at 99.6% for compression multipliers of 1.5, 2, 2.2, 2.5, and 2.7. However, at a compression multiplier of 2.8, mAP0.5 slightly decreases to 99.5%, representing a decline of 0.1%. To enhance the accuracy of crab tail detection, we determined through experimentation that the optimal upper limit for the pruning multiplier is 2.7. Furthermore,
Figure 10 illustrates radar plot analyses of the data from our table. In these plots, the largest graph area corresponds to a multiplier of 2.7, indicating that this is the most effective pruning multiplier for the DepGraph algorithm.
To evaluate the performance of the DepGraph algorithm, this study compares the maximally compressed model produced by DepGraph with several other popular pruning algorithms, including L1, Lamp, Slim, and Taylor. All experiments utilized enhanced lightweight base models, which were pruned at various proportions under the same conditions. The results of these experiments are presented in
Table 3.
Analysis of the table reveals that the DepGraph pruning algorithm offers significant advantages in terms of detection accuracy, parameter count, computational demand, model size, and compression ratios. The DepGraph algorithm maintains 99.6% mAP0.5 even when compressed by a factor of 2.7, marking a 0.1% increase in performance before pruning. Furthermore, the post-pruning parameter count, computational requirements, and model size are more favorable compared to those of other pruning algorithms. While some other methods achieve a degree of reduction, they typically experience a significant decline in accuracy as the compression ratio increases. The radar plot analysis shown in
Figure 11 demonstrates that the area representing the model pruned with the DepGraph algorithm is the largest, indicating superior performance across all metrics in comparison to other algorithms.
3.5. Model Comparison Experiment
To evaluate the effectiveness of the YOLOv7-SPSD model in detecting crab tails, this study conducted a comprehensive comparative analysis of major target detection networks, including the YOLOR, YOLOv5, YOLOv7, and YOLOv8 series. Throughout the training process, all networks used the same dataset and did not utilize any official default weights. Additionally, to assess the performance of the various models on edge devices, the CPU was employed to measure the FPS of the networks. The image input size for all models was set to 640 × 640, and a total of 600 images from the set of experiments were fully evaluated. The results of each target detection model upon completion of the tests are presented in
Table 4.
From the analysis presented in the table, it is evident that YOLOR-CSP and YOLOv7 achieve higher mAP0.5 and F1-score metrics. However, these models are characterized by substantial model parameters and high computational intensity, resulting in low FPS on CPUs. This limitation complicates their deployment on edge computing devices with limited GPU resources. On the other hand, YOLOv5s and YOLOv8n demonstrate effectiveness in detecting crab tails. Nevertheless, their overall performance is inferior to that of YOLOv7-tiny. Notably, YOLOv5-Lite, with its fewer parameters, reduced computational demands, and smaller model size, exhibits significantly lower precision, recall, mAP0.5, and F1-score compared to other networks. While YOLOv5n achieves a higher FPS, its mAP0.5 and F1-score are 0.3% and 1.6% lower, respectively, than those of YOLOv7-SPSD. Moreover, its network parameters, computational demands, and model size exceed those of YOLOv7-SPSD.
The enhanced and pruned YOLOv7-SPSD model demonstrates significant efficiency improvements over YOLOv7-tiny, with parameters reduced by 71.6%, GFLOPS by 74.6%, and size by 69.2%. Additionally, it exhibits slight improvements in performance metrics, with precision, mAP0.5, and F1-score increasing by 0.1%, and recall by 0.2%. Clearly, the YOLOv7-SPSD model significantly reduces the computational load while maintaining high performance levels. The detection results depicted in
Figure 12 indicate that, although the YOLOv5 series and YOLOv8 are slightly less effective than the YOLOv7 series, the YOLOv7-SPSD achieves comparable, and in some complex environments, superior performance to YOLOv7 and YOLOR-CSP. Additionally, its detection accuracy surpasses that of other smaller models, expanding its potential for broader applications.
4. Conclusions
This study focuses on the precise part-based processing of river crabs, specifically the crab tail, which is crucial for removing dorsal armor. We propose the YOLOv7-SPSD lightweight deep learning model to efficiently identify and locate the key processing areas of river crabs. The YOLOv7-SPSD algorithm enhances the YOLOv7-tiny framework by incorporating several advancements, including the lightweight Slimneck module, PConv, and SimAM attention mechanisms. Additionally, we have refined this approach by implementing the DepGraph pruning method. This combination effectively merges modular enhancements with advanced pruning techniques.
Our main findings are as follows: The performance of the YOLOv7-SPSD model, enhanced through modularization and pruning, has seen significant improvements. Specifically, mAP0.5 has increased by 0.1%, R by 0.2%, and F1-score by 0.1%, while FPS has improved by 2.4. Furthermore, GFLOPS have been reduced by 74.6%, size by 8.1 MB, and parameters by 71.6%. Compared with current mainstream target detection algorithms, the YOLOv7-SPSD model outperforms most lightweight networks in terms of FPS, parameters, GFLOPS, and size. Its detection efficacy matches or even surpasses that of larger networks such as YOLOv7 and YOLOR-CSP. Additionally, pruning experiments demonstrate that when compressed to 2.7 times using the DepGraph algorithm, the model shows a 0.1% increase in performance compared to before pruning. Moreover, the model’s parameters, GFLOPS, and size are lower than those achieved by other pruning algorithms such as L1, Lamp, Slim, and Taylor.
In conclusion, the YOLOv7-SPSD lightweight deep learning model presented in this paper is both lightweight and effective for detection tasks. It introduces an innovative concept for research on lightweight models and allows robots to incorporate machine vision for accurate, part-based processing of river crabs. This approach provides a new method for enhancing the intelligence of equipment used in the deep processing of river crabs.