1. Introduction
The Chinese mitten crab, also known as the hairy crab, is rich in essential nutrients such as fatty acids, proteins, vitamins, and trace elements. It is also a crucial raw material for producing various food products, including crab roe sauce, crab powder, and marinated crabs [
1]. Significant differences exist in the composition of fatty acids, flavor compounds, and mineral elements among hairy crabs from different regions. Fatty acids such as ω-3 and ω-6 can reduce the risk of cardiovascular diseases, while flavor compounds like amino acids, nucleotides, and organic acids directly impact the taste. Mineral elements such as calcium, phosphorus, potassium, and magnesium are also vital for human health [
2,
3]. These variations in taste and nutritional composition among crabs from different origins lead to substantial price differences in the market. Currently, the market is inundated with counterfeit crabs, with crabs from other regions often mislabeled as premium brands like ‘Yangcheng Lake’ and ‘Gucheng Lake’, resulting in inflated prices. For example, although the annual production of Yangcheng Lake hairy crabs is approximately 1500 tons, the market supply exceeds 10,000 tons [
4]. This highlights the urgent need for geographical origin identification of hairy crabs, as fraudulent practices have significantly harmed consumer rights. Due to the difficulty in distinguishing crabs from different regions through sensory evaluation alone, identification largely depends on physicochemical methods, which lack the speed required for effective market regulation. Therefore, developing rapid detection techniques for hairy crab origin identification is crucial for regulating the market and protecting consumers’ rights.
To address the issue of geographical origin identification of hairy crabs, researchers have explored various detection methods, including sensory evaluation, stable isotope analysis, mineral element fingerprinting, and chromatographic techniques. Sensory evaluation primarily relies on the aroma and morphological characteristics of the crabs. For example, Lu et al. found that the volatile compounds in the hepatopancreas of hairy crabs vary by region, enabling identification through sensory perception [
5]. Similarly, Xu et al. demonstrated that the origin of hairy crabs can be determined by manually examining the carapace, frontal teeth, lateral teeth, and leg characteristics [
4]. Regarding stable isotope analysis, Luo et al. utilized strontium (Sr) isotopic analysis to measure the 87Sr/86Sr ratio in the muscle and shell of hairy crabs, achieving precise origin identification with an accuracy of up to 99.4% [
2]. For mineral element fingerprinting, Xue et al. conducted biochemical analyses to assess the content and proportions of specific mineral elements within the crabs, discovering that the mineral element fingerprint stabilizes after 3~4 months of cultivation, facilitating reliable origin traceability [
6]. Chromatographic methods are also effective for identifying the origin and internal composition of crabs. Yao et al. employed high-performance liquid chromatography (HPLC) to quantify seven amino acids—Cys, Gly, Arg, Val, Met, Pro, and Tyr—in the muscle of crabs from different regions, which allowed for origin and species identification [
7]. Wang et al., utilized gas chromatography (GC) to analyze the content of fatty acids, including EPA and DHA, in crab muscle tissue, enabling traceability of crab origin and flavor components [
8].
Although the aforementioned methods can accurately identify the geographical origin of the Chinese mitten crab, the sensory-based odor evaluation is highly subjective and heavily reliant on human expertise, and morphological analysis is cumbersome. Stable isotope analysis, mineral element fingerprinting, and chromatographic techniques all face challenges related to operational complexity, slow detection speed, and the need for destructive testing of the crabs. These limitations make it difficult to address the chaotic state of the Chinese mitten crab market, highlighting the urgent need for a rapid and non-destructive method for origin identification. NIR spectroscopy offers several advantages, including rapid, non-destructive testing without the need for sample preparation, and it has seen numerous successful applications in the quality assessment of crabs [
9]. For instance, Wold et al. utilized NIR spectroscopy to measure the edible meat content in king crab legs, achieving a coefficient of determination of 0.85, which demonstrates NIR’s capability to penetrate the crab shell and assess internal components. NIR spectroscopy has also been successfully applied in geographical origin traceability for agricultural products and food items [
10]. For example, Li et al., achieved non-destructive identification of duck egg origin using visible-NIR spectroscopy with an accuracy of 100.00% [
11]. Additionally, Arndt et al. employed NIR to distinguish the geographical origin of Prunus dulcis MILL [
12]. Similar applications have been reported for the traceability of yams, tea leaves, and other agricultural products [
13,
14]. Considering the compositional differences among Chinese mitten crabs from various regions, variations in the number and distribution of hydrogen-containing groups, such as C-H, O-H, and N-H, in their internal components can be expected. The successful applications of NIR spectroscopy in origin identification of agricultural products and food, as well as in crab quality assessment, suggest that NIR can capture these differences. Therefore, NIR spectroscopy offers a promising theoretical approach for the geographical origin identification of the Chinese mitten crab.
This study builds upon previous research on the geographical origin identification of Chinese mitten crabs and the application of NIR spectroscopy in tracing the origin of agricultural products and food. This study aims to develop a rapid detection method for the geographical origin of Chinese mitten crabs using NIR spectroscopy by incorporating effective methodologies from prior work. The approach involves collecting spectral data from the abdomen and carapace of crabs from different origins, applying spectral preprocessing algorithms to clean the data, and selecting key spectral features indicative of origin through feature wavelength algorithms. Finally, machine (deep) learning models are employed to construct a rapid detection model for crab origin identification. Compared to existing crab origin identification methods, the main contributions of this study include: (1) proposing a rapid detection method for the geographical origin of Chinese mitten crabs using NIR spectroscopy; (2) identifying that the feature wavelengths of Chinese mitten crabs from different regions are primarily concentrated between 10,000 cm−1 and 9500 cm−1, and between 9000 cm−1 and 8500 cm−1; and (3) demonstrating that NIR data collected from the ventral side of crabs provides more accurate origin identification.
4. Conclusions
This study collected NIR diffuse reflectance spectral data from the abdomen and carapace regions of 390 Chinese mitten crabs. For each crab, three spectral measurements were taken from both the abdomen and the carapace, with the average values representing the spectral data for each individual. The NIR spectral data were pretreated using the MSC algorithm. Subsequently, feature selection was performed using the CARS, the BOSS, and the UVE algorithms. Finally, the SVM, the BP, and the CNN models were developed to detect the geographical origin of the crabs, providing technical support for the rapid detection of crab origin using NIR spectroscopy. The main findings are as follows:
- (1)
The characteristic wavelengths reflecting the geographical origin of crabs were primarily distributed between 10,000 cm−1 to 9500 cm−1 and 9000 cm−1 to 8500 cm−1.
- (2)
For abdomen spectral data, the SVM model based on the feature wavelengths selected by the CARS algorithm demonstrated the best performance in discriminating crab origin, with an overall Acc of 90.00%. Specifically, for crabs from Weishan Lake, P, R, and F1 were 89.29%, 86.21%, and 87.72%, respectively; for crabs from Tuo Lake, these metrics were 86.96%, 95.24%, and 90.91%; and for crabs from Hongze Lake, they were 90.00%, 93.10%, and 91.53%.
- (3)
For carapace spectral data, the SVM model based on the feature wavelengths selected by the BOSS algorithm exhibited the highest discriminative performance, with an accuracy of 87.50%. For Weishan Lake crabs, P, R, and F1 were 77.14%, 93.10%, and 84.38%, respectively; for Tuo Lake crabs, the values were 100%, 90.47%, and 95.00%; and for Hongze Lake crabs, the corresponding metrics were 92.31%, 80.00%, and 85.71%.
These results indicate that MSC pretreated combined with feature selection and SVM modeling shows substantial potential for the identification of crab origin, with particularly enhanced effectiveness when utilizing abdomen spectral data.