Figure 1.
Workflow for extracting individual dried goji berry images from original multi-berry images using image preprocessing, edge detection, contour filtering, mask-based segmentation, and independent cropping.
Figure 1.
Workflow for extracting individual dried goji berry images from original multi-berry images using image preprocessing, edge detection, contour filtering, mask-based segmentation, and independent cropping.
Figure 2.
Representative examples of data augmentation operations applied to dried goji berry images, including horizontal flipping, vertical flipping, rotation, scaling, and translation.
Figure 2.
Representative examples of data augmentation operations applied to dried goji berry images, including horizontal flipping, vertical flipping, rotation, scaling, and translation.
Figure 3.
Overall architecture of the proposed LSH-CoAtNet model, including convolutional stages, HWD ADown and ShuffleMBConv modules, Transformer stages, and the final classifier head for dried goji berry cultivar identification.
Figure 3.
Overall architecture of the proposed LSH-CoAtNet model, including convolutional stages, HWD ADown and ShuffleMBConv modules, Transformer stages, and the final classifier head for dried goji berry cultivar identification.
Figure 4.
Comparison between the original Softmax-based self-attention in the baseline CoAtNet and the ReLU-based LinearAttention module in the proposed model. (a) Original Softmax-based attention; (b) LinearAttention with ReLU feature mapping.
Figure 4.
Comparison between the original Softmax-based self-attention in the baseline CoAtNet and the ReLU-based LinearAttention module in the proposed model. (a) Original Softmax-based attention; (b) LinearAttention with ReLU feature mapping.
Figure 5.
Stage-wise depth configuration of CoAtNet and the proposed LSH-CoAtNet. The number of repeated blocks in stages S0–S4 was reduced from [2, 2, 3, 5, 2] in the original CoAtNet to [2, 2, 2, 3, 1] in LSH-CoAtNet.
Figure 5.
Stage-wise depth configuration of CoAtNet and the proposed LSH-CoAtNet. The number of repeated blocks in stages S0–S4 was reduced from [2, 2, 3, 5, 2] in the original CoAtNet to [2, 2, 2, 3, 1] in LSH-CoAtNet.
Figure 6.
Structure of the proposed ShuffleMBConv module, including channel split, lightweight branch transformation, MBConv-based residual feature extraction, channel concatenation, and channel shuffle.
Figure 6.
Structure of the proposed ShuffleMBConv module, including channel split, lightweight branch transformation, MBConv-based residual feature extraction, channel concatenation, and channel shuffle.
Figure 7.
Structure of the proposed HWD ADown module, consisting of an HWD branch with one-level 2D discrete wavelet transform and a pooling branch with max pooling, followed by depthwise separable convolution and feature concatenation.
Figure 7.
Structure of the proposed HWD ADown module, consisting of an HWD branch with one-level 2D discrete wavelet transform and a pooling branch with max pooling, followed by depthwise separable convolution and feature concatenation.
Figure 8.
Overall classification performance comparison of different deep learning models on the test set in terms of accuracy, precision, recall, and F1-score.
Figure 8.
Overall classification performance comparison of different deep learning models on the test set in terms of accuracy, precision, recall, and F1-score.
Figure 9.
Normalized confusion matrices of different deep learning models on the test set. Panels (a–h) denote CoAtNet, ResNet50, ConvNeXt, EfficientNetV2, GhostNet, MobileNetV4, RepVGG, and ShuffleNetV2, respectively. Classes 1–5 represent Ningqi No. 7, Linqi No. 5, Ningqi No. 1, Keqi 6082, and Jingqi No. 1, respectively.
Figure 9.
Normalized confusion matrices of different deep learning models on the test set. Panels (a–h) denote CoAtNet, ResNet50, ConvNeXt, EfficientNetV2, GhostNet, MobileNetV4, RepVGG, and ShuffleNetV2, respectively. Classes 1–5 represent Ningqi No. 7, Linqi No. 5, Ningqi No. 1, Keqi 6082, and Jingqi No. 1, respectively.
Figure 10.
Class-wise distributions of precision, recall, and F1-score for different deep learning models on the test set. (a) Precision; (b) recall; (c) F1-score. Each boxplot summarizes the variation in model performance across the five dried goji berry cultivars.
Figure 10.
Class-wise distributions of precision, recall, and F1-score for different deep learning models on the test set. (a) Precision; (b) recall; (c) F1-score. Each boxplot summarizes the variation in model performance across the five dried goji berry cultivars.
Figure 11.
Visual comparison of representative dried goji berry images before and after image quality enhancement. The upper row shows the original images, and the lower row shows the corresponding quality-enhanced images for samples 1–6.
Figure 11.
Visual comparison of representative dried goji berry images before and after image quality enhancement. The upper row shows the original images, and the lower row shows the corresponding quality-enhanced images for samples 1–6.
Figure 12.
Classification performance comparison between the original dataset and the quality-enhanced dataset in terms of accuracy, precision, recall, and F1-score.
Figure 12.
Classification performance comparison between the original dataset and the quality-enhanced dataset in terms of accuracy, precision, recall, and F1-score.
Figure 13.
Performance comparison between CoAtNet and LSH-CoAtNet. (a) Validation loss and accuracy curves; (b) normalized confusion matrix of LSH-CoAtNet on the test set; (c) class-wise precision, recall, and F1-score comparison; (d) comparison of classification performance and model complexity in terms of accuracy, precision, recall, F1-score, Params, and GFLOPs. Classes 1–5 represent Ningqi No. 7, Linqi No. 5, Ningqi No. 1, Keqi 6082, and Jingqi No. 1, respectively.
Figure 13.
Performance comparison between CoAtNet and LSH-CoAtNet. (a) Validation loss and accuracy curves; (b) normalized confusion matrix of LSH-CoAtNet on the test set; (c) class-wise precision, recall, and F1-score comparison; (d) comparison of classification performance and model complexity in terms of accuracy, precision, recall, F1-score, Params, and GFLOPs. Classes 1–5 represent Ningqi No. 7, Linqi No. 5, Ningqi No. 1, Keqi 6082, and Jingqi No. 1, respectively.
Table 1.
Distribution of images for the five dried goji berry cultivars, including the total number of images and the numbers in the training, validation, and test sets.
Table 1.
Distribution of images for the five dried goji berry cultivars, including the total number of images and the numbers in the training, validation, and test sets.
| Cultivar | Number of Images | Training | Validation | Test |
|---|
| Ningqi No. 7 | 5346 | 3742 | 1069 | 535 |
| Linqi No. 5 | 5178 | 3624 | 1035 | 519 |
| Ningqi No. 1 | 5340 | 3737 | 1068 | 535 |
| Keqi 6082 | 5211 | 3647 | 1042 | 522 |
| Jingqi No. 1 | 4824 | 3376 | 964 | 484 |
| Total | 25,899 | 18,126 | 5178 | 2595 |
Table 2.
Software tools and their corresponding versions and purposes used in this study.
Table 2.
Software tools and their corresponding versions and purposes used in this study.
| Software/Tool | Version | Purpose |
|---|
| Python | 3.8.18 | Programming language |
| PyTorch | 1.13.1 | Model construction, training, and testing |
| torchvision | 0.14.1 | Image transformation and model-related utilities |
| CUDA | 11.7 | GPU-accelerated computation |
| OpenCV | 4.8.1 | Image preprocessing and image quality enhancement |
| NumPy | 1.24.4 | Numerical calculation and array processing |
| pandas | 2.0.3 | Dataset organization and experimental result recording |
| scikit-learn | 1.3.2 | Calculation of accuracy, precision, recall, and F1-score |
| THOP | 0.1.1.post2209072238 | Calculation of parameter size and floating-point operations |
| Matplotlib | 3.7.5 | Visualization of training curves and experimental results |
| OriginPro | 2025b | Figure preparation and result visualization |
Table 3.
Class-wise precision, recall, and F1-score of different deep learning models on the test set. All values are expressed as percentages. Bold values indicate the best result for each class and metric. Classes 1–5 correspond to Ningqi No. 7, Linqi No. 5, Ningqi No. 1, Keqi 6082, and Jingqi No. 1, respectively.
Table 3.
Class-wise precision, recall, and F1-score of different deep learning models on the test set. All values are expressed as percentages. Bold values indicate the best result for each class and metric. Classes 1–5 correspond to Ningqi No. 7, Linqi No. 5, Ningqi No. 1, Keqi 6082, and Jingqi No. 1, respectively.
| Model | Metric | 1 | 2 | 3 | 4 | 5 |
|---|
| CoAtNet | Precision (%) | 97.99 | 99.16 | 94.37 | 95.19 | 94.17 |
| Recall (%) | 98.53 | 99.16 | 97.01 | 92.95 | 93.45 |
| F1-score (%) | 98.26 | 99.15 | 95.67 | 94.05 | 93.81 |
| ResNet50 | Precision (%) | 93.90 | 99.14 | 86.62 | 86.72 | 84.49 |
| Recall (%) | 96.14 | 97.05 | 93.01 | 84.99 | 79.77 |
| F1-score (%) | 95.00 | 98.08 | 89.70 | 85.84 | 82.06 |
| ConvNeXt | Precision (%) | 80.00 | 91.93 | 70.36 | 68.47 | 64.06 |
| Recall (%) | 80.88 | 91.35 | 73.45 | 71.07 | 58.38 |
| F1-score (%) | 80.44 | 91.64 | 71.88 | 69.74 | 61.09 |
| EfficientNetV2 | Precision (%) | 99.07 | 98.73 | 89.74 | 89.64 | 92.95 |
| Recall (%) | 97.61 | 98.31 | 96.01 | 93.85 | 83.82 |
| F1-score (%) | 98.33 | 98.52 | 92.77 | 91.70 | 88.15 |
| GhostNet | Precision (%) | 74.16 | 86.15 | 65.17 | 62.15 | 63.25 |
| Recall (%) | 81.25 | 83.97 | 63.87 | 72.15 | 48.75 |
| F1-score (%) | 77.54 | 85.04 | 64.52 | 66.78 | 55.06 |
| MobileNetV4 | Precision (%) | 86.89 | 94.09 | 74.61 | 82.28 | 72.98 |
| Recall (%) | 91.36 | 94.09 | 85.63 | 70.52 | 69.75 |
| F1-score (%) | 89.07 | 94.09 | 79.74 | 75.95 | 71.33 |
| RepVGG | Precision (%) | 96.72 | 97.24 | 85.82 | 79.26 | 83.09 |
| Recall (%) | 92.10 | 96.62 | 91.82 | 84.99 | 75.72 |
| F1-score (%) | 94.35 | 96.93 | 88.72 | 82.02 | 79.23 |
| ShuffleNetV2 | Precision (%) | 76.58 | 88.74 | 64.41 | 67.31 | 65.17 |
| Recall (%) | 82.35 | 84.81 | 72.26 | 69.26 | 52.99 |
| F1-score (%) | 79.36 | 86.73 | 68.11 | 68.27 | 58.45 |
Table 4.
Model complexity comparison of different deep learning models in terms of parameters and GFLOPs.
Table 4.
Model complexity comparison of different deep learning models in terms of parameters and GFLOPs.
| Model | Params (M) | GFLOPs |
|---|
| CoAtNet | 16.99 | 3.35 |
| ResNet50 | 25.56 | 4.13 |
| ConvNeXt | 28.59 | 4.46 |
| EfficientNetV2 | 20.31 | 25.54 |
| GhostNet | 5.18 | 0.16 |
| MobileNetV4 | 9.71 | 0.91 |
| RepVGG | 7.83 | 1.53 |
| ShuffleNetV2 | 2.28 | 0.15 |
| LSH-CoAtNet | 6.41 | 1.6 |
Table 5.
Ablation results of the proposed LSH-CoAtNet components in terms of classification performance and model complexity. LR, SMB, and HAD denote lightweight reconstruction, ShuffleMBConv, and HWD ADown, respectively. A check mark indicates that the corresponding component is included.
Table 5.
Ablation results of the proposed LSH-CoAtNet components in terms of classification performance and model complexity. LR, SMB, and HAD denote lightweight reconstruction, ShuffleMBConv, and HWD ADown, respectively. A check mark indicates that the corresponding component is included.
| Base | LR | SMB | HAD | Acc (%) | Prec (%) | Rec (%) | F1 (%) | Params (M) | GFLOPs |
|---|
| √ | | | | 94.83 | 94.87 | 94.83 | 94.83 | 16.99 | 3.35 |
| √ | √ | | | 93.67 | 93.69 | 93.67 | 93.66 | 6.41 | 1.68 |
| √ | | √ | | 96.41 | 96.41 | 96.41 | 96.41 | 17.28 | 3.06 |
| √ | | | √ | 97.18 | 97.19 | 97.18 | 97.18 | 17.29 | 3.62 |
| √ | √ | √ | | 93.21 | 93.20 | 93.21 | 93.19 | 6.33 | 1.59 |
| √ | √ | | √ | 96.99 | 97.01 | 96.99 | 96.99 | 6.34 | 1.57 |
| √ | | √ | √ | 98.84 | 98.85 | 98.84 | 98.84 | 16.68 | 3.08 |
| √ | √ | √ | √ | 98.80 | 98.81 | 98.80 | 98.80 | 6.41 | 1.60 |
Table 6.
Paired prediction results and exact McNemar test between CoAtNet and LSH-CoAtNet on the same test set.
Table 6.
Paired prediction results and exact McNemar test between CoAtNet and LSH-CoAtNet on the same test set.
| Paired Prediction Result | Number of Samples |
|---|
| Both models correctly classified | 2550 |
| CoAtNet correct, LSH-CoAtNet incorrect | 3 |
| CoAtNet incorrect, LSH-CoAtNet correct | 37 |
| Both models incorrectly classified | 5 |
| Exact McNemar test p-value | 1.95 × 10−8 |
Table 7.
Five-fold cross-validation results of LSH-CoAtNet in terms of accuracy, precision, recall, and F1-score.
Table 7.
Five-fold cross-validation results of LSH-CoAtNet in terms of accuracy, precision, recall, and F1-score.
| Fold | Acc (%) | Prec (%) | Rec (%) | F1 (%) |
|---|
| 1 | 98.82 | 98.82 | 98.81 | 98.81 |
| 2 | 99.23 | 99.23 | 99.21 | 99.22 |
| 3 | 99.21 | 99.20 | 99.20 | 99.20 |
| 4 | 99.00 | 98.98 | 98.99 | 98.98 |
| 5 | 98.63 | 98.62 | 98.60 | 98.61 |
Table 8.
Inference efficiency of CoAtNet and LSH-CoAtNet on the Jetson Orin Nano platform in terms of FPS and latency under batch sizes of 1 and 16.
Table 8.
Inference efficiency of CoAtNet and LSH-CoAtNet on the Jetson Orin Nano platform in terms of FPS and latency under batch sizes of 1 and 16.
| Model | FPS Batch = 1 | Latency Batch = 1 (ms/Image) | FPS Batch = 16 | Latency Batch = 16 (ms/Image) |
|---|
| CoAtNet | 35.10 | 28.49 | 61.25 | 16.33 |
| LSH-CoAtNet | 50.18 | 19.93 | 98.31 | 10.17 |
Table 9.
Robustness performance of LSH-CoAtNet on the perturbed test set in terms of accuracy, macro-average metrics, and weighted-average metrics.
Table 9.
Robustness performance of LSH-CoAtNet on the perturbed test set in terms of accuracy, macro-average metrics, and weighted-average metrics.
| Metric | Value (%) |
|---|
| Acc | 92.92 |
| Macro Prec | 93.13 |
| Macro Rec | 92.96 |
| Macro F1 | 92.89 |
| Weighted Prec | 93.30 |
| Weighted Rec | 92.92 |
| Weighted F1 | 92.96 |