Real-Time Lightweight Morphological Detection for Chinese Mitten Crab Origin Tracing

Ma, Xiaofei; Shen, Nannan; He, Yanhui; Fang, Zhuo; Zhang, Hongyan; Wang, Yun; Duan, Jinrong

doi:10.3390/app15137468

Open AccessArticle

Real-Time Lightweight Morphological Detection for Chinese Mitten Crab Origin Tracing

by

Xiaofei Ma

¹,

Nannan Shen

¹,

Yanhui He

¹,

Zhuo Fang

²,

Hongyan Zhang

¹,

Yun Wang

¹ and

Jinrong Duan

^1,3,*

¹

Freshwater Fisheries Research Center, Chinese Academy of Fishery Sciences, Wuxi 214081, China

²

School of Electronic and Information Engineering, Anhui Jianzhu University, Hefei 230601, China

³

Wuxi Fisheries College, Nanjing Agricultural University, Wuxi 214081, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(13), 7468; https://doi.org/10.3390/app15137468 (registering DOI)

Submission received: 15 May 2025 / Revised: 28 June 2025 / Accepted: 29 June 2025 / Published: 3 July 2025

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Review Reports Versions Notes

Abstract

During the cultivation and circulation of Chinese mitten crab (Eriocheir sinensis), the difficulty in tracing geographic origin leads to quality uncertainty and market disorder. To address this challenge, this study proposes a two-stage origin traceability framework that integrates a lightweight object detector and a high-precision classifier. In the first stage, an improved YOLOv10n-based model is designed by incorporating omni-dimensional dynamic convolution, a SlimNeck structure, and a Lightweight Shared Convolutional Detection head, which effectively enhances the detection accuracy of crab targets under complex multi-scale environments while reducing computational cost. In the second stage, an Improved GoogleNet’s Inception Net for Crab is developed based on the Inception module, with further integration of Asymmetric Convolution Blocks and Squeeze and Excitation modules to improve the feature extraction and classification ability for regional origin. A comprehensive crab dataset is constructed, containing images from diverse farming sites, including variations in species, color, size, angle, and background conditions. Experimental results show that the proposed detector achieves an mAP50 of 99.5% and an mAP50-95 of 88.5%, while maintaining 309 FPS and reducing GFLOPs by 35.3%. Meanwhile, the classification model achieves high accuracy with only 17.4% and 40% of the parameters of VGG16 and AlexNet, respectively. In conclusion, the proposed method achieves an optimal accuracy-speed-complexity trade-off, enabling robust real-time traceability for aquaculture systems.

Keywords:

Chinese mitten crab; attention mechanism; YOLOv10; target detection; image recognition; geographic origin traceability

1. Introduction

The Chinese mitten crab (Eriocheir sinensis), also known as the hairy crab or freshwater crab, is one of the traditional Chinese aquatic treasures. Chinese mitten crabs cultured in Yangcheng Lake and other lakes are favored by people because of their excellent quality. However, in recent years, some unscrupulous traders have used “bathing crabs” to pass off inferior products as well-known lake products, which not only harmed the rights and interests of consumers, but also constituted a serious infringement on the brand of aquatic products. Accurate and efficient identification of the geographic origin of Chinese mitten crab has become a growing concern for researchers, which is of great significance for the assessment and management of aquatic resources.

In order to reliably determine the geographical source of the Chinese mitten crab, researchers have proposed many methods, mainly divided into two categories: traditional methods of identification and those based on deep learning.

The traditional methods mainly include morphological index identification and biochemical indicator identification, and so on. The morphological index identification method [1] mainly starts from the influence of water quality and food composition of the geographic origin on the external morphology of Chinese mitten crabs such as body length, body color, tooth spacing, pereiopod length, and other internal organ morphology and other indexes, and then determines the influence of different production environments on the morphological and physiological indexes of Chinese mitten crabs. However, this method often suffers from the problems of low detection efficiency and inaccurate detection results, and also requires a high level of experience for the inspectors, which is not favorable for the extension of the method. In addition, the biochemical index identification method based on biological indexes, the proportion of edible parts and nutrient composition, the element fingerprint identification method for identifying trace elements [2], and the electronic nose (tongue) identification method for detecting nitrogen compounds and non-nitrogen compounds as well as volatile substances are all common methods for identifying the geographic origin of Chinese mitten crab. For example, the Sr isotopic composition of the shells of Chinese mitten crabs can serve as a good geographic indicator because the 87Sr/86Sr ratios of the crabs are similar to the ratios in the water in which they grow [3]. Yin et al. used a laser stripping technique combined with a multi-receiver inductively coupled plasma mass spectrometer (ICP-MS) to achieve an accurate determination of the geographic origin of Chinese mitten crabs by analyzing the strontium isotope composition of crab shells [4]. By conducting a multi-mineral fingerprinting analysis of the Chinese mitten crab in Yangcheng Lake throughout its annual cultivation cycle, Xue et al. [5,6] found that it takes a long time for the fingerprint characteristics of mineral elements to stabilize. By using the Agilent 7500ce ICP-MS for a comparative analysis of multi-mineral elements in authentic crabs and those cultured for one month under controlled “bathing” conditions, they discovered significant differences in the profiles of 11 elements between the authentic and the cultured crabs, both before and after the bathing process. This finding serves as a foundation for verifying the origin of Chinese mitten crabs from Yangcheng Lake.

Traditional methods are time-consuming, subject to significant human subjectivity, highly dependent on specialized expertise, and unsuitable for large-scale, rapid traceability applications. With the development of machine learning technology and image processing technology, there have been more choices for aquatic product monitoring and classification, and there are also more and more methods for aquatic product identification [7,8,9,10,11]. Among them, convolutional neural networks have great advantages in image feature extraction and classification. Traditional classification methods can only distinguish some low-dimensional features, while neural networks can extract and recognize high-dimensional features. It has shown good performance and broad application prospects in the task of aquatic product classification and recognition [12,13,14,15,16,17]. Yuan et al. [18] adopted the detection and identification method of fish in the electronic monitoring data based on the improved Yolov8 commercial fishing vessels; the backbone network used the GCBlock structure to model remote dependencies in order to increase the feature extraction ability. The new convolution method of GSConv is used on the Neck side to reduce the amount of model computation. The results showed that the model can quickly and accurately complete the detection and identification of fish in the electronic monitoring data of commercial fishing vessels at a lower cost. Chunhui Zhang et al. [19] proposed an improved BIAS-YOLO network based on the YOLOv5 network, which was used to automatically detect different kinds of fish. Based on the original YOLOv5 network, the Spatial Pyramid Pooling (SPP) module is improved, and the idea of spatial bias is introduced, so that the information of the whole image can be fully utilized. Experiments show that the BIAS-YOLO algorithm is improved by 3.1% on the mAP@0.5 compared with the YOLOv5 network. Based on the study of the characteristics of fish acoustic scattering, Du et al. [20] extracted the singular value features of wavelet packet coefficients, temporal centroid features, and discrete cosine transform coefficient features of the fish acoustic scattering signals from the perspective of collaborative fusion of multi-directional acoustic scattering data, and made decisions on the extracted features through SVM, and finally obtained the final classification results of fish through the collaborative fusion method. Zhang et al. [21] proposed a method for detecting fish in static photos using the YOLOv4 architecture and attention system with multi-scale feature combinations. This method is a deep learning-based object detection method specifically designed for underwater environments. Some researchers also combine machine learning and computer vision techniques with methods such as using multi-class support vector machines for classification, using probabilistic background models for real-time fish detection in images with complex backgrounds, and modeling biological features for fish classification. For example, Ju et al. [22] model utilized an improved version of the AlexNet model, which features object-based soft attention and transfer learning.

Convolutional neural networks are an effective method for target detection and classification of aquatic products. It can automatically extract image features, process target objects of different sizes and shapes, and train through backpropagation algorithms to gradually improve the accuracy of the model. However, there are two problems with the detection and recognition methods of commercial hairy crabs based on convolutional neural networks: First, the detection and recognition accuracy were low in real-world scenarios, which hindered the fine-grained species classification of hairy crabs. Second, models with high detection and recognition accuracy have higher costs and consume more computing resources. There is relatively little research on how to efficiently complete commercial hairy crab detection and recognition at lower costs in real-world scenarios.

To address the aforementioned limitations in the hairy crab classification task, this paper proposes a two-stage identification methodology that synergistically combines an enhanced YOLOv10n object detector with an improved IGINC classification model reconstructed from GoogleNet.

The main contributions of this article are summarized as follows:

(1): A large image dataset labeled with the geographical origin of Chinese mitten crab has been established. This dataset contains multiple forms of hairy crabs from different origins, different lighting conditions, different angles (dorsal and ventral), and whether they are bundled or not.
(2): A two-stage model is proposed for the traceability of Chinese mitten crabs. The first stage employs an improved YOLOv10n [23] detector that integrates ODConv modules in the backbone, a SlimNeck structure in the neck, and the LSCD detection head. These enhancements improve the detection accuracy of crabs under diverse conditions while significantly reducing computational complexity.
(3): In the second stage, a customized IGINC classification model based on GoogleNet is used to classify the detected crabs. By incorporating AC.B and an SE attention mechanism, the model achieves robust and precise identification of the breeding origin, even in visually complex scenes, thereby enhancing traceability and consumer protection.

The other parts of the article are organized as follows: The Section 2 discusses the proposed method in detail. In the Section 3, we designed ablation experiments and model comparison experiments for the proposed model, verifying the accuracy and stability of the entire detection model. In the Section 4, the experimental results were analyzed and discussed.

2. Materials and Methods

In order to better trace the geographical origin of Chinese mitten crabs, we have analyzed the existing images before designing the algorithm model, and found the following three characteristics:

(a): Due to the limited research on the biological appearance characteristics of crabs, it is impossible to identify the species of crabs by directly marking the characteristic points of crabs through pictures.
(b): The data comes from live crabs, which were photographed in two main forms, unbundled and bundled, and the characteristics varied greatly between the two forms. In the unbundled state, human hands are required to restrain to get more stable data, and the picture contains invalid information such as human hands, and it is easy to lack part of the crab’s leg. The rope will cover most of the crab’s appearance when it is tied, and there is also a nameplate with crab information in the picture, so it is also included in the picture.
(c): Wild environments present complex backgrounds (e.g., vegetation, sediment), dynamic lighting, and natural occlusions, leading to partial limb loss in crab images.

Based on the above three characteristics, this paper proposes a two-stage detection and classification framework for geographical origin identification of Chinese mitten crabs.

2.1. Overall Architecture

Figure 1 illustrates the overall two-stage architecture of the proposed origin traceability system for Chinese mitten crabs. The system consists of two main modules. (1) Object Detection Stage: The input image is first passed into the object detection module, MC-RTLNet, which is built upon an improved YOLOv10n architecture. It integrates ODConv, SlimNeck, and LSCD Head. This module is designed to accurately detect the presence and locations of Chinese mitten crabs in complex aquaculture environments. The output of this stage is a set of cropped image regions, each containing a single crab. (2) Classification Stage: The cropped crab images are then fed into the classification module. This module employs a customized IGINC classifier, which is based on the Inception architecture of GoogleNet and further incorporates AC.B and SE modules. These enhancements improve the model’s ability to extract discriminative features relevant to geographic origin identification.

The two modules are trained independently, offering two major advantages: first, each stage can utilize task-specific datasets tailored to either detection or classification; second, the decoupled structure allows for targeted optimization based on the distinct complexity and feature characteristics of the two tasks. This modular framework effectively combines the robustness of object detection with the fine-grained capability of origin classification, while also providing greater flexibility and adaptability for real-world deployment.

2.2. Detection Model MC-RTLNet

2.2.1. YOLOv10n Model Introduction

YOLOv10 is the latest technological achievement in the field of target detection and represents an important breakthrough in the balance of efficiency and accuracy for the YOLO family of algorithms. Through a consistent dual allocation strategy, YOLOv10 eliminates the reliance on non-maximal suppression training, thus optimizing the redundancy problem of traditional YOLO [24] models for multi-target detection. This strategy optimizes one-to-one and one-to-many detection heads in the training phase and uses only one-to-one detection heads in the inference phase, which significantly improves inference efficiency. The model adopts an overall efficiency-accuracy-driven design, including a lightweight classification head, spatial-channel-discrete downsampling (SCDown), and intrinsic rank-based modules. The SCDown module implements highly efficient feature downsampling, and the PSA module enhances the performance of global information extraction through the partial self-attention mechanism while reducing the computational overhead. YOLOv10 performs well on the COCO dataset and not only maintains high detection accuracy in complex scenarios, but also has lightweight features for embedded device deployment.

YOLOv10 consists of four parts: Input, Backbone, Neck, and Head. The input part improves the model’s generalization ability through data enhancement (e.g., HSV enhancement, image panning, scaling, flipping, and mosaic enhancement). The backbone part adds new SCDown, C2fUIB, and PSA modules to optimize feature extraction. The Neck part adopts the classic FPN and PANet structures to achieve multi-scale feature fusion. The Head part combines one-to-one and one-to-many detection heads for lightweight target detection tasks. YOLOv10 is available in six versions to meet the needs of different scenarios. Among them, YOLOv10n, as a lightweight version, is ideal for embedded device deployment and small target detection tasks.

Based on the above advantages, this paper uses the target detection model based on YOLOv10n and improves it for the actual needs of crab traceability. The improved YOLOv10n network architecture is shown in Figure 2.

2.2.2. ODConv Module in the Backbone Network

In order to enhance the ability of YOLOv10n to extract features from crab images under complex backgrounds, this paper introduces the Omni-dimensional Dynamic Convolution module. Convolution is a core operation in convolutional neural networks (CNNs), widely used for extracting local image features such as edges, textures, and shapes. It works by sliding a small matrix, the convolution kernel, over the input and performing element-wise multiplication and summation within local regions. Unlike traditional convolution with fixed weights, ODConv uses a multi-dimensional attention mechanism to achieve dynamic weighting along the spatial, input channel, output channel, and kernel levels, thereby adaptively adjusting the convolution operation.

The input feature map x ∊ R ^C×H×W is first processed through Global Average Pooling (GAP) to produce a channel-wise descriptor of length C. This descriptor is then passed through a Fully Connected (FC) layer and ReLU activation for dimension reduction. Subsequently, it is forwarded into four parallel attention branches to generate dynamic weights: α_s for spatial, α_c for input channels, α_f for output channels, and α_w for kernel-level attention. These weights are applied to the convolution kernels via element-wise multiplication to enable dynamic adjustment in multiple dimensions, thereby enhancing the expressiveness and adaptability of the convolution operation. The structure of the ODConv module is shown in Figure 3.

The convolution output of ODConv is given by

Y = \sum_{i = 1}^{n} α_{w i} (α_{fi} \cdot (α_{ci} \cdot (α_{si} \cdot W_{i}))) * X

(1)

where W_i is the i-th convolution kernel and ∗ denotes the convolution operation. X and Y represent the input and output feature maps, respectively. α_si, α_ci, α_fi, and α_wi are attention weights applied at the spatial, input channel, output channel, and kernel levels. The introduction of these four types of attention enables the model to adaptively adjust each convolution kernel, allowing it to capture target details more precisely while suppressing background interference.

After incorporating the ODConv module, the network is not only able to adaptively adjust the convolution kernel parameters based on the global image features but also to balance the complementary information among the spatial, channel, and kernel dimensions during dynamic weighting. This substantially enhances the network’s focus on target features. In crab images with complex backgrounds, ODConv effectively suppresses interference from cluttered background information, thereby enabling the model to extract the primary crab features (such as shell texture and morphology) with greater precision, while also improving robustness in recognizing local details (such as overlapping crab legs). Compared with traditional static convolution and conventional dynamic convolution, ODConv—even when employing a single-kernel design—achieves performance comparable to that of multi-kernel dynamic convolution, while significantly reducing the number of parameters and computational cost, thereby greatly enhancing overall computational efficiency.

2.2.3. SlimNeck Module in the Neck Network

To enhance the model’s ability to extract crucial features of crabs under complex backgrounds while reducing computational complexity, a lightweight SlimNeck module is proposed [25]. This module replaces the standard convolution and C2f components in the original neck with a hybrid convolution module (GSConv) and a cross-stage partial network module (VoVGSCSP), achieving efficient feature fusion. The original neck network fuses multi-scale features to restore spatial details lost during downsampling and to retain high-level semantic information, but this also increases model complexity. To address this, GSConv is employed, which combines Standard Convolution (SC), Depthwise Separable Convolution (DSC), and Shuffle operations. The Shuffle operation transfers the rich information from SC to various parts of DSC, effectively compensating for the limited representation capability of DSC while leveraging its efficiency. Figure 4 illustrates the architectural design of the GSConv module, which is composed of two parallel branches: a standard convolution and a depthwise convolution (DWConv). The outputs of both branches are concatenated along the channel dimension to enhance feature diversity. Subsequently, a channel shuffle operation is applied to rearrange and mix the channel-wise information, promoting better interaction between different feature groups. Figure 5 shows the architecture of the VoVGSCSP module. It begins with an input of c₁ channels, which is first reduced to c₁/2 through a Conv layer and then processed by a GS-Bottleneck module. In parallel, the original input passes through another Conv layer. The outputs of both branches are concatenated, followed by a final Conv layer that produces the output with c₂ channels. This design effectively integrates lightweight feature extraction with residual information flow, improving both representation and computational efficiency.

By incorporating the SlimNeck module, the network not only achieves a significant reduction in the number of parameters and computational load but also enhances the efficiency of feature extraction and fusion.

2.2.4. Lightweight Detection Head

In the LSCD lightweight detection head, the three feature layers (P3, P4, and P5) output from the neck first enter the branches of the detection head, which are used to process targets at different scales. Each branch adjusts the channel number size and performs a normalization operation to unify the channel numbers of the feature layers into an intermediate channel number by means of a 1 × 1 convolutional layer and a Group Normalization (GN) module. This step not only significantly reduces the computational complexity, but also provides a stable feature distribution through the GN, improves the training convergence speed, and provides a consistent and standardized input feature representation for the subsequent shared convolution module. The tuned feature layers are pooled into a shared 3 × 3 convolutional GN module that uniformly extracts multi-scale features. With the shared convolutional weights and the normalization operation of GN, the model is able to effectively reduce the number of parameters and computational cost while maintaining the accuracy and effectiveness of feature extraction. This design further enhances the integration of multi-scale features and ensures the efficient performance of the lightweight model.

The features extracted by the shared convolution finally enter the classification branch and the regression branch for decoupling. The classification branch uses a 1×1 convolutional layer to predict the category probability and learns the features for the classification task through independent convolutional weights. The regression branch predicts the coordinate offset of the bounding box through a 1 × 1 convolutional layer and introduces a scale layer to dynamically adjust the target scale to adapt to different sizes of the detection targets. The structure of the LSCD is shown in Figure 6.

The modular design of the LSCD detection head combines Group Normalization with convolutional operations to reduce computational resource requirements while maintaining the ability to optimize independently for classification and localization tasks. This structure not only improves the effectiveness of feature extraction and multi-scale integration, but also achieves fast and accurate target detection in edge computing environments. For the crab traceability system, this design can efficiently detect crab targets of different sizes and provide stable and reliable basic data support for traceability.

2.3. Classification Model IGINC

We first construct a CNN model with three parallel lines based on the Inception module of GoogleNet, as the basic architecture of the IGINC model. In each branch, we refer to the structure of the VGG16 model. The Inception module of GoogleNet is an important innovation in the field of deep learning, which is a deep convolutional neural network structure proposed by Google researchers. The basic idea of the Inception module is to improve the performance of the network by designing a wider network structure. This module typically includes multiple parallel convolution paths, each using a different size of convolution kernel. The outputs of these paths will be concatenated together to form a wider feature map. In GoogleNet, the Inception module uses convolution kernels of different sizes (such as 1 × 1, 3 × 3, 5 × 5, etc.), and also introduces pooling layers and 1×1 convolution layers to reduce dimensionality and computational complexity. Its advantage lies in its ability to effectively improve the network’s representation ability and computational efficiency. By using various convolution kernels of different sizes, the Inception module can capture more diverse feature information, thereby improving the accuracy of the network. At the same time, by designing the network structure reasonably and using 1 × 1 convolutional layers, the Inception module can effectively reduce the computational load and parameter count, thereby accelerating the training and inference process of the network.

Then, for the infrastructure of the proposed IGINC model, we designed the convolutional kernels of the CBL modules in each branch network. The AC.B module is used to replace the existing basic convolution to improve the network’s feature extraction ability. In addition, the SE attention mechanism module is introduced into the residual structure of the network to extract more detailed information such as feature positions, thereby improving the overall performance of the network. Meanwhile, a Batch Normalization (BN) layer was added between the convolution and activation to accelerate convergence and reduce model overfitting. Finally, we connect the fully connected layers of each branch together and obtain the final prediction result through the Softmax function. The structure diagram of the IGINC model is shown in Figure 7.

2.3.1. Asymmetric Convolution Blocks

Inspired by Acnet, the AC. B convolution block is used in the IGINC model to replace the basic convolutions in the original CBL. As shown in Figure 8, the AC. B consists of three parallel layers with kernel sizes of 3 × 3, 1 × 3, and 3 × 1. The 3 × 3 kernel is a regular convolution that can extract basic features from Chinese mitten crab images. The other two kernels are used to extract horizontal and vertical features, as well as position and rotation features. Therefore, the improved network has a stronger feature extraction ability for Chinese mitten crabs.

According to the superposition principle in convolutional operations, the designed AC. B module can directly replace the convolutional kernels in the existing network, and after feature extraction of the image, it can be stacked according to the operation method in Formula (2).

I * K^{1} + I * K^{2} = I * (K^{1} \oplus K^{2})

(2)

where, I represents input, and K¹ and K² are two convolution kernels that are compatible in size. The purpose of combining K¹ and K² is to fuse directional or shape-specific features (e.g., horizontal, vertical, and spatial patterns). “⊕” represents a fusion of kernels into an equivalent kernel, based on the linearity of convolution.

Similar to conventional convolutional neural networks, each layer is normalized in batches as a branch, and then the outputs of the three branches are fused into branch as AC. B’s output. At this point, we can use the same configuration as the original model for network training without adjusting any additional hyperparameters.

2.3.2. Attention Mechanism

SE attention mechanism is a technique used to improve the performance of deep learning models, especially in convolutional neural networks. It helps the model adaptively adjust the importance of each channel by explicitly modeling the dependency relationship between input feature channels. The core idea of the SE attention mechanism includes two main steps, which are Squeeze and Excitation.

For Squeeze, each channel of the input feature map is compressed into a channel descriptor by performing a global average pooling operation, which reflects the spatial average response of each channel. For excitation, two fully connected layers (the first layer reduces the number of channels and the second layer restores the number of channels) and a sigmoid activation function are used to learn the weights of each channel. These weights are used to readjust the response of each channel in the input feature map.

In the IGINC model proposed in this article, we attempt to add SE at different positions. Through this approach, the SE attention mechanism enables the model to adaptively focus on the most important feature channels for the classification task of Chinese mitten crabs, thereby improving the model’s representation ability and performance. The SE module has relatively fewer parameters, thus achieving a balance between performance and computational overhead.

3. Experimental Results and Analysis

3.1. Dataset

Given the absence of publicly available crab image datasets, systematic collection of cultured crab images from diverse breeding regions is required to construct a dedicated dataset, followed by data preprocessing.

3.1.1. Obtaining the Original Image Dataset

The research in this paper aims to achieve the classification of crabs from different sources. The effectiveness of the detection method depends greatly on the quality of the dataset [26,27]. The experimental image data were collected at different times and under different weather conditions in the laboratory, taking photographs of the back, abdomen, and bundles of hairy crabs in both strapped and untied conditions. The background of the images is usually complex in natural lighting environments. We collected more than 8000 bubble images, of which 4766 images were eligible for the JPG format. The original dataset contains 1478 crabs from five sources: Changzhou, Yangcheng Lake, Yixing, Gaochun, and Hongze Lake, with photo resolutions of 1500 × 1500 and 3872 × 2592, and part of the dataset is shown in Figure 9.

3.1.2. Generalization of the Dataset

To increase the diversity of images and the size of the sample dataset as well as to reduce overfitting during model training, we performed data enhancement on the original dataset [28]. The specific operation methods include enhancing or diminishing the brightness of the images, geometric transformation, adding Gaussian noise, etc. The visual effects of various image perturbations are illustrated in Figure 10. Specifically, Figure 10a displays the original images under different conditions, with only size cropping applied. Figure 10b presents the results after geometric transformations, while Figure 10c shows images with increased brightness. In contrast, Figure 10d illustrates decreased brightness, and Figure 10e depicts images corrupted with Gaussian noise.

To facilitate subsequent network training. The dataset is split into the training set, validation set, and test set with a ratio of 7:2:1. Table 1 shows the structure of the generalized dataset.

3.2. Crab Detection

3.2.1. Experimental Environment

The experimental environment is Python 3.10.11, PyTorch 2.4.1, and Cuda 12.1, the CPU is Intel(R) Xeon(R) W-2255, the GPU is NVIDIA GeForce RTX 3090, and the RAM is 64G. To ensure consistent input size during training, all original images with resolutions of 1500 × 1500 or 3872 × 2592 were uniformly resized to 640 × 640 during preprocessing, and data enhancement was performed using the Mosaic method. The learning rate was tested in the range of [0.001, 0.005, 0.01, 0.05], and 0.01 was selected as it provided the best trade-off between convergence speed and stability. The batch size was evaluated with values of 8, 16, and 32, with 16 chosen due to its balance between training efficiency and GPU memory usage. We selected SGD for its better generalization in our task. The number of training epochs was fixed at 150, based on performance convergence observed around that point.

All models were uniformly trained in the same hardware configuration and with the initial training parameters mentioned above.

3.2.2. Evaluation Indicators

Four metrics are used in this experiment to evaluate the performance of the model: mAP, parameters, GFLOPs, and FPS [29].

The mAP represents the average of multiple categories’ average precision (AP). The formula is shown in (3).

m A P = \frac{1}{n} \int_{0}^{1} P (R) d R

(3)

where n is the number of categories and P(R) denotes the P-R curve. mAP50 is the mean average precision at an IoU threshold of 0.5. mAP50-95 is for IoU thresholds from 0.5 to 0.95 in steps of 0.05, where APs are computed and averaged for each IoU threshold.

Parameters refer to the number of parameters included in the model. GFLOPs is a measure of the complexity of the algorithm but not necessarily the computational speed of the algorithm. FPS denotes the number of images processed by the model per unit of time. The formula is shown in (4).

F P S = \frac{1}{t}

(4)

t is the time taken by the model to detect one frame of an image; the larger the FPS, the better the real-time performance.

3.2.3. Ablation Experiments

To evaluate the contribution of each proposed module, we conducted ablation experiments by sequentially adding ODConv (A), SlimNeck (B), and LSCD (C) to the YOLOv10n baseline. As shown in Table 2, the baseline achieved an mAP50-95 of 86.5%, with 2.26 M parameters, 6.5 GFLOPs, and 317 FPS. When only the ODConv module was introduced, the parameter count increased slightly to 2.31 M, while the computational cost was notably reduced to 4.4 GFLOPs, indicating an improvement in feature extraction efficiency. However, the mAP50-95 experienced a marginal drop to 86.0%, possibly due to the absence of high-level semantic aggregation. In contrast, the integration of the SlimNeck module yielded a significant improvement in both accuracy (87.2%) and FPS (334), while maintaining a moderate model size (2.39 M) and relatively low computation (5.9 GFLOPs), indicating its effectiveness in lightweight multi-scale feature fusion. The LSCD module alone also contributed positively, raising mAP50-95 to 87.5% and increasing FPS to 375, while simultaneously reducing the model size to 1.94 M, highlighting its capability to enhance detection performance with reduced parameter overhead. The full model achieved the best overall performance with an mAP50-95 of 88.5%, a parameter size of 2.11 M, 4.2 GFLOPs, and an inference speed of 309 FPS. Compared to the baseline, this model achieves a 2.0% improvement in accuracy, a 35.3% reduction in GFLOPs, and a significant enhancement in detection efficiency.

From the convergence curves of precision, recall, mAP50, and mAP50-95 during training, it can be observed that the improved models converge significantly faster than the baseline model in the early and middle stages of training, indicating that the improved modules enable more efficient feature learning. At the end of training, the improved models outperformed the baseline model across all performance metrics, validating the effectiveness and rationality of the improvements. These results demonstrate that the combination of the ODConv, SlimNeck, and LSCD modules significantly enhances the model’s detection accuracy, speed, and robustness. The results of ablation experiments are shown in Figure 11 and Table 2. Figure 12 shows the comparison of detection effects before and after improvement.

3.2.4. Comparison of Detection Algorithms

To evaluate the performance of the improved model, it was compared with representative models such as Faster R-CNN, YOLOv8n, and YOLOv10n. The improved model achieved mAP50 and mAP50-95 scores of 98.5% and 88.5%, respectively, significantly outperforming the other models. Additionally, the improved model demonstrated lower computational complexity with GFLOPs of 4.2 and 2.11M parameters, outperforming Faster R-CNN and YOLOv8n in terms of efficiency. Moreover, the improved model achieved an FPS of 309, significantly higher than Faster R-CNN’s 10.8 and YOLOv8n’s 275, showcasing exceptional real-time detection capability. Overall, the improved model achieves an excellent balance between accuracy, efficiency, and computational cost. The results of comparative experiments are shown in Table 3.

3.3. Crab Classification

3.3.1. Experimental Environment

The experimental setup follows the same configuration as the crab detection stage. The learning rate is 0.001, the epoch is 2000, and the batch size is 16.

3.3.2. Comparison of Detection Algorithms

The average iteration time and parameter size of the three models are shown in Table 4, the parameter size of our model is 40% and 17.4% of that of the AlexNet model and VGG16 model, the training time is 7.8% and 47.7% of the other two models, and the detection time is 9.5% and 50.2% of the other two models.

As shown in Figure 13a, it can be seen that the accuracy of the AlexNet model increases the fastest, our model increases slightly slower, and the VGG16 model is the slowest, but the accuracy of the three models is still approximate, and the value of accuracy convergence is also very similar at 2000 iterations.

We can see from Figure 14 that the loss of the AlexNet model decreases the fastest, our model decreases slightly slower, and the VGG16 model is the slowest, but the loss of the three models is still approximately the same, and the value of loss convergence is also very similar at 2000 iterations.

In summary, when the appropriate learning rate is selected, our model has a faster convergence speed and higher accuracy. As shown in Figure 13, our model is slightly lower than AlexNet in terms of accuracy, recall, precision, and F1, but higher than VGG16. However, the proposed model is much smaller than the other two models in terms of single iteration time and model size. In the crab classification task, the actual training time of the proposed model is shorter, and the prediction time is also shorter. Therefore, considering the performance parameters and model size, the proposed model is more suitable for the task of crab classification.

3.3.3. Comparison with Traditional Methods

To highlight the advantages of the proposed method, we conducted a comparison with two representative traditional traceability approaches: morphological identification and biochemical analysis.

Morphological methods rely heavily on expert judgment. The classification process typically takes several minutes per sample and is prone to subjectivity, leading to inconsistent results when used by non-experts. This approach is not scalable for real-time or large-scale applications.

Biochemical techniques such as Sr isotope fingerprinting offer high precision [4], but the analysis requires high-end instruments (e.g., MC-ICP-MS), complex sample digestion and purification steps, and typically 2–3 working days to complete one batch. The per-sample testing cost is also relatively high (200–500 RMB depending on protocol and lab). Furthermore, they are unsuitable for rapid or automated field deployment.

In contrast, the proposed two-stage deep learning framework achieves a classification accuracy of 99.5% and an inference speed of 309 FPS without relying on extensive manual labor or chemical analysis. It offers a low-cost, real-time, and efficient solution suitable for large-scale aquatic product origin traceability.

4. Conclusions

To address the challenge of accurately tracing the geographical origin of Chinese mitten crabs during the breeding and distribution processes, this paper proposes a two-stage traceability framework that integrates an improved YOLOv10n object detection model with a lightweight IGINC classification model optimized from GoogleNet. In the detection stage, a YOLOv10n-based architecture enhanced with ODConv, SlimNeck, and LSCD modules is employed to efficiently locate crab targets under complex backgrounds and multi-scale conditions. In the classification stage, an IGINC model based on GoogleNet is designed, incorporating asymmetric convolution (AC.B) and channel attention (SE) modules to achieve robust origin classification under diverse environmental variations. Experimental results show that the proposed detector achieves an mAP50 of 99.5% and mAP50–95 of 88.5% while running at 309 FPS, with a 35.3% reduction in GFLOPs. The IGINC classifier achieves 93.7% accuracy using only 17.4% of VGG16’s parameters and 40% of AlexNet’s.

Unlike traditional origin traceability methods based on morphological traits or elemental fingerprinting, this approach leverages deep learning to achieve more efficient and automated identification, providing strong technical support for crab quality control and market supervision. While the proposed two-stage framework achieves high accuracy and efficiency, it still has some limitations. The current model is trained on a dataset captured under controlled conditions and may require adaptation when applied to fully underwater or highly dynamic environments. Additionally, the system is currently designed for post-harvest identification. In future work, we plan to extend the framework to support real-time underwater crab recognition and expand the dataset to cover more farming areas and seasonal variations.

Author Contributions

All authors contributed to the study conception and design. X.M. and Z.F. participated in the overall framework construction and the first draft writing of the paper. N.S., Y.H. and H.Z. participated in sorting out the experimental data and revised and improved the first draft. Y.W. and J.D. reviewed and finalized the paper to ensure its quality. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Key R&D Program of China under Grant 2022YFF0608201, in part by the Central Public-interest Scientific Institution Basal Research Fund of the Chinese Academy of Fishery Sciences, CAFS under Grant 2023TD65, in part by the Central Public-interest Scientific Institution Basal Research Fund of Fisheries Research Center, CAFS under Grant 2021JBFM11, and in part by the Procurement Project for Fisheries Ecology and Resources Monitoring Services in 2024.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study is available on request from the corresponding author. The data is not publicly available due to privacy restrictions.

Acknowledgments

The authors thank the anonymous reviewers for their valuable comments and suggestions.

Conflicts of Interest

The authors declare no conflicts of interest related to this work.

Abbreviations

The following abbreviations are used in this manuscript:

ODConv	omni-dimensional dynamic convolution
LSCD	Lightweight Shared Convolutional Detection
IGINC	Improved GoogleNet’s Inception Net for Crab
AC.B	Asymmetric Convolution Blocks
SE	Squeeze and Excitation

References

Xu, J.; Liu, H.; Jiang, T.; Chen, X.; Yang, J. Shape variation in the carapace of Chinese mitten crabs (Eriocheir sinensis H. Milne Edwards, 1853) in Yangcheng Lake during the year-long culture period. Eur. Zool. J. 2022, 89, 217–228. [Google Scholar] [CrossRef]
Long, X.; Wu, X.; Zhao, L.; Liu, J.; Cheng, Y. Effects of dietary supplementation with Haematococcus pluvialis cell powder on coloration, ovarian development and antioxidation capacity of adult female Chinese mitten crab, Eriocheir sinensis. Aquaculture 2017, 473, 545–553. [Google Scholar] [CrossRef]
Liu, Y.; Li, X.H.; Savage, P.S.; Tang, G.Q.; Li, Q.L.; Yu, H.M.; Huang, F. New quartz and zircon Si isotopic reference materials for precise and accurate SIMS isotopic microanalysis. At. Spectrosc. 2022, 43, 99–106. [Google Scholar] [CrossRef]
Yin, H.M.; Huang, F.; Shen, J.; Yu, H.M. Using Sr isotopes to trace the geographic origins of Chinese mitten crabs. Acta Geochim. 2020, 39, 326–336. [Google Scholar] [CrossRef]
Xue, J.; Jiang, T.; Chen, X.; Liu, H.; Yang, J. Multi-mineral fingerprinting analysis of the Chinese mitten crab (Eriocheir sinensis) in Yangcheng Lake during the year-round culture period. Food Chem. 2022, 390, 133167. [Google Scholar] [CrossRef]
Xue, J.; Jiang, T.; Chen, X.; Liu, H.; Yang, J. Multi-mineral element profiles in genuine and “bathing” cultured Chinese mitten Crabs (Eriocheir sinensis) in Yangcheng Lake, China. Fishes 2022, 7, 11. [Google Scholar] [CrossRef]
Robotham, H.; Bosch, P.; Gutiérrez-Estrada, J.; Castillo, J.; Pulido-Calvo, I. Acoustic identification of small pelagic fish species in Chile using support vector machines and neural networks. Fish. Res. 2010, 102, 115–122. [Google Scholar] [CrossRef]
Lundgren, B.; Nielsen, J. A method for the possible species discrimination of juvenile gadoids by broad-bandwidth backscattering spectra vs. angle of incidence. ICES J. Mar. Sci. 2008, 65, 581–593. [Google Scholar] [CrossRef]
Islas-Cital, A.; Atkins, P.; Foo, K.; Picó, R. Broadband amplitude and phase sonar calibration using LFM pulses for high-resolution study of hard and soft acoustic targets. In Proceedings of the OCEANS 2011 IEEE, Santander, Spain, 6–9 June 2011; pp. 1–8. [Google Scholar]
Taboada, L.; Sánchez, A.; Velasco, A.; Santaclara, F.; Pérez-Martín, R.; Sotelo, C. Identification of Atlantic cod (Gadus morhua), ling (Molva molva), and Alaska pollock (Gadus chalcogrammus) by PCR–ELISA using duplex PCR. J. Agric. Food Chem. 2014, 62, 5699–5706. [Google Scholar] [CrossRef]
Sheela, A.J.; Raj, R.M.; Manoj, E.; Roshan, G. A Survey on Fish Detection and Species Recognition. In Proceedings of the 2024 2nd International Conference on Computer, Communication and Control (IC4), Indore, India, 8–10 February 2024; pp. 1–5. [Google Scholar]
Deep, B.V.; Dash, R. Underwater fish species recognition using deep learning techniques. In Proceedings of the 2019 6th International Conference on Signal Processing and Integrated Networks (SPIN), Noida, India, 7–8 March 2019; pp. 665–669. [Google Scholar]
Shammi, S.A.; Das, S.; Hasan, M.; Noori, S. FishNet: Fish classification using convolutional neural network. In Proceedings of the 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kharagpur, India, 6–8 July 2021; pp. 1–5. [Google Scholar]
Shafait, F.; Mian, A.; Shortis, M.; Ghanem, B.; Culverhouse, P.F.; Edgington, D.; Cline, D.; Ravanbakhsh, M.; Seager, J.; Harvey, E.S. Fish identification from videos captured in uncontrolled underwater environments. ICES J. Mar. Sci. 2016, 73, 2737–2746. [Google Scholar] [CrossRef]
Prasetyo, E.; Suciati, N.; Fatichah, C. Multi-level residual network VGGNet for fish species classification. J. King Saud Univ. -Comput. Inf. Sci. 2022, 34, 5286–5295. [Google Scholar] [CrossRef]
Zhou, X.; Wang, J.; Liu, Y.; Duan, Q. Deep learning with PID residual elimination network for time-series prediction of wa-ter quality in aquaculture industry. Comput. Electron. Agric. 2023, 212, 108125. [Google Scholar] [CrossRef]
Wang, S.; Zhang, S.; Zhu, W.; Sun, Y.; Yang, Y.; Sui, J.; Shen, L.; Shen, J. Application of an electronic monitoring system for video target detection in tuna longline fishing based on YOLOV5 deep learning model. J. Dalian Ocean Univ. 2021, 36, 842–850. [Google Scholar]
Yuan, H.; Tao, L. Detection and identification of fish in electronic monitoring data of commercial fishing vessels based on improved Yolov8. J. Dalian Ocean. Univ. 2023, 38, 533–542. [Google Scholar]
Zhang, C.; Gu, S. Fish Object Detection Based on Spatial Bias Pyramid Pooling: Improved BIAS-YOLO. In Proceedings of the 2023 8th International Conference on Intelligent Computing and Signal Processing (ICSP), Xi’an, China, 21–23 April 2023; pp. 1976–1979. [Google Scholar]
Du, W.; Li, H.; Wei, Y.; Xu, Z. Multi-azimuth Acoustic Scattering Data Cooperative Fusion Using SVM for Fish Classification and Identification. Trans. Chin. Soc. Agric. Mach. 2015, 46, 268–275. [Google Scholar]
Zhang, M.; Xu, S.; Song, W.; He, Q.; Wei, Q. Lightweight underwater object detection based on yolo v4 and multi-scale attentional feature fusion. Remote Sens. 2021, 13, 4706. [Google Scholar] [CrossRef]
Ju, Z.; Xue, Y. Fish species recognition using an improved AlexNet model. Optik 2020, 223, 165499. [Google Scholar] [CrossRef]
Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J. Yolov10: Real-time end-to-end object detection. Adv. Neural Inf. Process. Syst. 2025, 37, 107984–108011. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Li, H.; Li, J.; Wei, H.; Liu, Z.; Zhan, Z.; Ren, Q. Slim-neck by GSConv: A lightweight-design for real-time detector architectures. J. Real-Time Image Process. 2024, 21, 62. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 91–99. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Shrestha, A.; Mahmood, A. Review of deep learning algorithms and architectures. IEEE Access 2019, 7, 53040–53065. [Google Scholar] [CrossRef]
Huang, Z.; Wu, F.; Fu, Y.; Zhang, Y.; Jiang, X. Review of deep learning-based algorithms for ship target detection from remote sensing images. Guangxue Jingmi Gongcheng/Opt. Precis. Eng. 2023, 31, 2295–2318. [Google Scholar] [CrossRef]

Figure 1. System’s overall framework.

Figure 2. Schematic diagram of MC-RTLNet structure.

Figure 3. Schematic diagram of ODConv structure.

Figure 4. Schematic diagram of GSConv structure.

Figure 5. Schematic diagram of VoVGSCSP structure.

Figure 6. Schematic diagram of LSCD structure.

Figure 7. The network diagram of the CNN model proposed in this paper.

Figure 8. Schematic diagram of AC.B structure.

Figure 9. Part of original dataset. (a) Changzhou. (b) Yangcheng Lake. (c) Hongze Lake. (d) Gaochun. (e) Yixing.

Figure 10. Augmented dataset. (a) original dataset. (b) Geometric transformation processing. (c) Brightness enhancement processing. (d) Brightness reduction processing. (e) Add Gaussian noise.

Figure 11. Comparison chart of ablation experiments. (a) Precision. (b) Recall. (c) mAP50. (d) mAP50-95.

Figure 12. Comparative Analysis of Detection Performance between YOLOv10n and MC-RTLNet.

Figure 13. Comparison curves of four metrics across three models versus training iterations. (a) Accuracy. (b) Precision. (c) Recall. (d) F1-score.

Figure 14. Comparison of the loss curves changing with iterations of the three models.

Table 1. Statistics for the dataset.

Zone	Original	Augmented	Training	Validation	Test
Total	4766	23,830	16,681	4766	2383
Changzhou	1320	6600	3960	1320	660
Yangcheng Lake	1546	7730	4638	1546	773
Hongze Lake	678	3390	2034	678	339
Gaochun	648	3240	1944	648	324
Yixing	574	2870	1722	574	287

Table 2. Ablation experiment results.

ODConv	SlimNeck	LSCD	Parameter Size/M	GFLOPs	FPS	mAP50-95/%
-	-	-	2.26	6.5	317	86.5
√			2.31	4.4	261	86
	√		2.39	5.9	334	87.2
		√	1.94	6.2	375	87.5
√	√		2.43	4.5	293	87.8
√		√	1.99	4.1	297	86.4
	√	√	2.09	5.5	381	87.4
√	√	√	2.11 (−0.15)	4.2 (−2.3)	309	88.5 (+2)

Table 3. Comparison chart of different algorithms.

Model	Parameter Size/M	GFLOPs	FPS	mAP50/%	mAP50-95/%
Faster R-CNN	33.1	803	10.8	98.4	85.2
YOLOv5n	2.5	7.1	257	97.8	86
YOLOv8n	3.01	8.1	275	98.2	86.3
YOLOv10n	2.26	6.5	317	98.1	86.5
YOLOv10s	72.2	21.4	228	98.4	88.2
MC-RTLNet	2.11	4.2	309	98.5	88.5

Table 4. Comparison of iterations of AlexNet, VGG16, and our model.

Model	DT (ms/Image)	TT (min)	Parameter Size/M	mAP50/%
AlexNet	24.79	47.58	60 M	99.81%
VGG16	130.82	289.59	138 M	99.90%
Ours	12.45	22.71	24 M	99.57%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ma, X.; Shen, N.; He, Y.; Fang, Z.; Zhang, H.; Wang, Y.; Duan, J. Real-Time Lightweight Morphological Detection for Chinese Mitten Crab Origin Tracing. Appl. Sci. 2025, 15, 7468. https://doi.org/10.3390/app15137468

AMA Style

Ma X, Shen N, He Y, Fang Z, Zhang H, Wang Y, Duan J. Real-Time Lightweight Morphological Detection for Chinese Mitten Crab Origin Tracing. Applied Sciences. 2025; 15(13):7468. https://doi.org/10.3390/app15137468

Chicago/Turabian Style

Ma, Xiaofei, Nannan Shen, Yanhui He, Zhuo Fang, Hongyan Zhang, Yun Wang, and Jinrong Duan. 2025. "Real-Time Lightweight Morphological Detection for Chinese Mitten Crab Origin Tracing" Applied Sciences 15, no. 13: 7468. https://doi.org/10.3390/app15137468

APA Style

Ma, X., Shen, N., He, Y., Fang, Z., Zhang, H., Wang, Y., & Duan, J. (2025). Real-Time Lightweight Morphological Detection for Chinese Mitten Crab Origin Tracing. Applied Sciences, 15(13), 7468. https://doi.org/10.3390/app15137468

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Real-Time Lightweight Morphological Detection for Chinese Mitten Crab Origin Tracing

Abstract

1. Introduction

2. Materials and Methods

2.1. Overall Architecture

2.2. Detection Model MC-RTLNet

2.2.1. YOLOv10n Model Introduction

2.2.2. ODConv Module in the Backbone Network

2.2.3. SlimNeck Module in the Neck Network

2.2.4. Lightweight Detection Head

2.3. Classification Model IGINC

2.3.1. Asymmetric Convolution Blocks

2.3.2. Attention Mechanism

3. Experimental Results and Analysis

3.1. Dataset

3.1.1. Obtaining the Original Image Dataset

3.1.2. Generalization of the Dataset

3.2. Crab Detection

3.2.1. Experimental Environment

3.2.2. Evaluation Indicators

3.2.3. Ablation Experiments

3.2.4. Comparison of Detection Algorithms

3.3. Crab Classification

3.3.1. Experimental Environment

3.3.2. Comparison of Detection Algorithms

3.3.3. Comparison with Traditional Methods

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI