Next Article in Journal
Postharvest Preservation of Flammulina velutipes with Isoamyl Isothiocyanate
Previous Article in Journal
The Nitrogen Cycling Key Functional Genes and Related Microbial Bacterial Community α−Diversity Is Determined by Crop Rotation Plans in the Loess Plateau
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Ginseng Appearance Quality Grading Method Based on an Improved ConvNeXt Model

1
Thing Networking Engineering College, Wuxi University, Wuxi 214063, China
2
College of Information Technology, Jilin Agricultural University, Changchun 130118, China
3
College of Chinese Medicinal Materials, Jilin Agricultural University, Changchun 130118, China
*
Author to whom correspondence should be addressed.
Agronomy 2023, 13(7), 1770; https://doi.org/10.3390/agronomy13071770
Submission received: 28 May 2023 / Revised: 25 June 2023 / Accepted: 26 June 2023 / Published: 29 June 2023
(This article belongs to the Section Precision and Digital Agriculture)

Abstract

:
In order to solve the problem of the small degree of variability between the features of ginseng grading classes and the resulting need for heavy reliance on professionals, this study established a ginseng dataset containing 5116 images with three classes in different contexts and proposed a ginseng-grading model based on an improved ConvNeXt framework. Firstly, a Channel Shuffle module was embedded in the backbone network after down-sampling to fully fuse the channel features and improve the model’s grading accuracy. The model’s characterization ability enriched the feature space of the convolutional block and further improved the model’s accuracy. Finally, the original activation function, GELU, was replaced with the PreLU activation function to increase the nonlinear variability of the neural network model and improve the model’s accuracy and efficiency. The experimental results show that the method demonstrated accuracy improvements of 2.46% and 4.32%, respectively, compared with the current advanced networks, Vision Transformer and Swim Transformer. Furthermore, the accuracy, precision, recall, and specificity of ginseng classification reached values of 94.44%, 91.58%, 91.04%, and 95.82%, respectively, and the loss rate was reduced to 0.24. A comparison with expert appraisal results showed high consistency, thus verifying our model’s accuracy and reliability in ginseng quality assessment and its ability to provide technical support for intelligent ginseng quality grading.

1. Introduction

The monitoring and quality control of ginseng (Panax ginseng C. A. Meyer), as the products at the “top of Chinese medicine”, are particularly important. The appearance and quality of ginseng have mainly been monitored manually [1]. The main drawbacks of this method are that it wastes a large amount of manpower, relies heavily on professionals, and results in a slow grading speed and low accuracy rate. Secondly, the grading standards are difficult to unify and there is a lack modern management tools, making it difficult to guarantee the quality of ginseng products [2]. According to the ginseng and deer antler commodity standards [3], the identification points of ginseng [4], a survey on the status quo regarding the classification of commodity specifications in the Chinese herbal medicine market [5], a study on the quality (grade) standards of ginseng and red ginseng [6], and the proof provided by Li et al. [7], the shape of the root is the most important factor in defining the quality of ginseng. It must be emphasized that the shape of the root is a determining factor in evaluating the quality of the root. Although the morphology of ginseng roots is highly susceptible to change, even when the plants are from the same place of origin, their color and texture are affected to a lesser extent [8]. Therefore, the classification of ginseng mainly focuses on the extraction of color and texture features, which are more complex and difficult to distinguish, for the purpose of ginseng grading. In summary, this paper aims to introduce deep learning technology into the field of white ginseng herbal grading to improve accuracy in grading the quality of white ginseng.
In recent years, computer vision techniques have played an important role in plant phenotype identification. For example, Kumar et al. [9] proposed a method to identify Indian medicinal plants based on leaf color, area, and edge features, but this work was limited to the detection of mature leaves. Lee et al. [10] proposed a fast Fourier transform method for leaf images; they performed a distance analysis of the leaf’s contour and center of mass. An average recognition accuracy of 97.19% was achieved by inputting the features into the system. Kadir et al. [11] proposed a plant recognition method that extracted the texture, shape, and color features of leaves using morphological openings and classified the plants using probabilistic neural networks. The method was tested on the Flavia dataset containing 32 plant leaves, with an average accuracy of 93.75%. Carranza-Rojas et al. [12] used a GoogLeNet network to process plant specimens, fine-tuned the original model, and experimented with several databases on different datasets with good results. Sun et al. [13] proposed a convolutional neural network (CNN) for herb identification and retrieval and trained a VGGNet-based network model, which improved the accuracy of herb identification in a cluttered background and made herb identification more suitable for practical applications. However, the average recognition accuracy was 71%, and the average retrieval accuracy was 53%. Thus, the accuracy of herb identification still needs to be improved. Liu et al. [14] proposed a deep-learning-based method to distinguish the origin of Astragalus using NIR spectroscopy and a CNN, which simplified the data-pre-processing process while ensuring prediction accuracy and precision. The proposed method showed potential for the non-destructive and accurate assessment of ginseng quality using hyperspectral imaging. Li et al. [15] improved the process of assessing the quality of ginseng by replacing the traditional ReLU (rectified linear unit) activation function and using a self-built dataset with data augmentation. The predictive capability of their model was improved by 3.02% compared to that of the original model, and the method demonstrated that deep learning can effectively classify the quality of the appearance of herbal medicines. This experiment is of great value for identifying the quality of the appearance of herbal medicines.
To date, researchers at home and abroad have designed a variety of deep learning modules that have high levels of accuracy and demonstrate a good performance. In 2018, Zhang et al. [16] used the highly efficient CNN (convolutional neural network) architecture ShuffleNet, which utilizes channel shuffling to help to reduce computational complexity and information flow. They greatly reduced the computational effort required as compared to other models while maintaining comparable accuracy. In 2021, Wightman et al. [17] proposed a structure-based re-parameterization method to reconstruct a solar speckle map and re-parameterize each branch structure, which improved the accuracy of the model to a large extent and saved computation time. Wang et al. [18] explored the efficient training of VGG-style super-resolution networks using structural re-parameterization techniques and introduced a new re-parameterizable block that achieved a superior performance and a better trade-off between this better performance and the runtime compared to previous methods. Bello et al. [19] combined this improved module with their proposed backbone network, ConvNeXt, which had a convolutional structure. Originally, their approach could only learn the adjacent spatial positions of the feature map locally, but it later became more able to combine the global information provided in images, effectively improving the network’s learning capability. In addition to improving the depth of the CNN, the attention module was also combined with long- and short-term memory networks to achieve better classification results for images, and it was demonstrated that fusing high-level feature information between channels can improve a network’s learning ability.
Preliminary experiments have shown that deep convolutional neural network (CNN) technology can be applied to the grading process of Chinese herbal medicines, but problems such as a slow processing speed, low grading accuracy, and unstable loss rate occur, which are the main difficulties in the application of neural network technology to ginseng grading production. Therefore, through information fusion to improve the network operation rate and address ginseng-grading performance problems, inspired by the above-mentioned high-performance module applications, an effective ConvNeXt [20] framework was designed in this study for the fine-grained classification task of ginseng appearance quality identification. Section 2 of this paper introduces the data pre-processing, Section 3 partly introduces the improved ConvNeXt network framework, Section 4 focuses on the experimental validation and results’ analysis, and Section 5 partly summarizes and discusses the findings reported in this paper.

2. Data Pre-Processing

2.1. Image Dataset Acquisition and Pre-Processing

The original ginseng dataset was the same batch of white ginseng (a product made from fresh garden ginseng that has been washed and dried or dried in the sun) collected in 2020 and 2021 at the Institute of Traditional Chinese Medicine, Jilin Agricultural University, with a total of 367 white ginseng stems. This ginseng was photographed using a small professional photography studio (Sutefoto, Guangdong, China) with a mobile phone camera (Apple, Cupertino, CA, USA) placed on top of the studio box, perpendicular to the ginseng, in different orientations and against different backgrounds (a whiteboard background, black leather background, and red-brown wood-grain background. The high-definition color images are shown in Figure 1. The grading was based on the ginseng-grading criteria in the “Ginseng Release of Jilin Province Daoji Herbs”, as shown in Table 1, and was confirmed by expert teachers from the School of Chinese Herbal Medicine, Jilin Agricultural University, including a total of 1668 images of white ginseng taken from multiple angles. The identification was divided into three categories and graded according to the rules in Table 1. The white ginseng was classified into three categories according to the rules in Table 1, including 549 special white ginseng plants, 830 first grade white ginseng plants, and 289 s grade white ginseng plants.

2.2. Dataset Construction

In this experiment, the ginseng dataset exhibited two major issues. Firstly, there was a problem of category imbalance, where the number of extra-grade and secondary ginseng samples was small compared to the large volume of primary ginseng data. Secondly, the professional environment for ginseng classification was contextually homogeneous, resulting in the narrow applicability of the model.
To address the issue of imbalanced datasets among ginseng classes, specific measures were taken before training the model. Firstly, the two classes with limited data, the special ginseng and secondary ginseng, were augmented by extending the dataset. Data enhancement techniques, such as Gaussian blurring and horizontal flipping, were applied to individual ginseng images. These strategies aimed to improve the model’s recognition ability. Additionally, they served to enhance the model’s robustness and generalization ability. A diverse range of backgrounds was used to construct the dataset for the primary ginseng class. This approach allowed the model to better recognize objects in various scenes and effectively handle new test data.Furthermore, this experiment also used an online enhancement method, using the PIL (python image library) module in the Python Image Library to uniformly crop a given image to 256 × 256 pixels before each training round; then, the image was processed with the center crop method and, finally, the normalization was completed. This could indirectly increase the amount of training data by increasing the diversity of image samples used in the training process. Figure 1 displays the image information of the data enhancement performed on the dataset. The pre-processing operations of the data aimed to enhance the diversity of the data samples, improve the model’s robustness and generalization ability, and mitigate the occurrence of overfitting during network training.

2.3. Dataset Partitioning

This experiment used a five-fold crossover validation method for the models trained with each network to ensure the stability of the training and exclude the chance of overfitting of the training results. We divided the training and validation set images in a ratio of 8:2, as shown in Table 2. A Python scripting program was used for random near-parity partitioning into five parts, of which four parts of the dataset were used for the training of each model and the other part was used for validation. It was ensured that each training partition was performed in the ratio of 4:1 for training and validation. The final experimental results in this paper are the averages of the five experiments.

3. Building the Network Model

Ginseng images pose several challenges due to their complexity, including variations in shape, size, and color and the influences of factors such as illumination, orientation, and background. These factors make it difficult to achieve the accurate classification and grading of ginseng images using traditional computer vision techniques. To address these challenges, the ConvNeXt network was selected as the backbone network for this study. The ConvNeXt network combines convolutional and cross-channel parameterization layers to extract features from images. The inclusion of cross-channel parameterization allows the network to capture more complex relationships between different channels, which is crucial for accurately capturing the unique features present.

3.1. Construction of Ginseng Appearance Quality Grading Model

ConvNeXt was used as the backbone network for ginseng appearance quality classification, and an improved structural re-parameterization module and channel cleaning method were introduced into the network. The backbone network contained four stages with the ratio of 3:3:27:3. The numbers of channels C for each stage were (128, 256, 512, 1024), and the dimensions of the feature map were (56 × 56, 28 × 28, 14 × 14, 7 × 7).
In this paper, a connected network structure based on the improved ConvNeXt is proposed, as shown in Figure 2. The network is divided into four stages A–D:
A-Stage: The input image, using the combined online and offline data enhancement method, obtains a feature map of 224 × 224 dimensions, reduces the image height and width via a convolution operation, and increases the channel size to obtain a feature map F1∈R56×56×128.
B-Stage: The feature map is entered into the modified ConvNeXt structure and is firstly re-parameterized using the structure re-parameterization module, which adds a relatively small (5 × 5) kernel to a large (13 × 13) kernel via a linear transformation, and then enters into the depth convolution (DW) [21], using LayerNorm (LN) normalization instead. Batch normalization (BN), after increasing the channel size by 1 × 1 convolution with the PReLU activation function to 1 × 1 convolution so as to reduce the input channel size, fuses the output with F1 through the LayerScale [22] layer and DropPath layer [23] to obtain the Fc-out.
C-Stage: The Fc-out is normalized by LayerNorm (LN) to reduce the feature map size and increase the channel size by 2 × 2 convolutions. The Channel Shuffle module and ConvNeXt Block are entered again after four stages to extract the feature information and yield the final output Fout∈R7×7×1024, where the ratio of the Channel Shuffle and ConvNeXt Block part of this stage is 3:3:27:3.
D-Stage: Fout is normalized by the average pooling layer and then by LN. The feature graph spatial densities are further reused and passed to the fully connected layer, finally completing the classification.

3.2. Improving the ConvNeXt Model

3.2.1. Introduction of the Channel Shuffle Module

To enhance the cross-channel information interaction between multilayer features on different scales, the ConvNeXt module incorporates a channel-blending operation. This operation improves the network’s ability to capture and integrate information from different channels, enabling better feature representation for ginseng images. Furthermore, to improve the computational efficiency, the ConvNeXt module utilizes the group convolution method for 1 × 1 convolutions. This approach enhances both the efficiency and accuracy of network recognition. It is particularly beneficial for the grading of fine-grained images of ginseng, where there might be limited phenotypic differentiation between different classes in the dataset. A schematic diagram of the channel-blending operation is shown in Figure 3. In the figure, group a indicates that the input feature maps are grouped by channel and subjected to the convolution operation, which reduces the expressiveness of the feature maps. In contrast, the channel-blending operation (shown in groups b and c) blends the features of each group with the features of the other groups. Using the channel-blending method ensures that the grouped convolution employed next takes its input from different groups and the information can flow between the different groups.

3.2.2. Improved Re-Parameterization Module

Structural re-parameterization [24,25,26,27,28] is a method used to equivalently transform the model structure by transforming its parameters to improve the representational power of the model, thus enabling the efficient grading of human parameters at fine grains without a loss of accuracy. Structural re-parameterization methods are widely used in neural network modules such as convolution (Conv), average pooling layers, and residual connections. Conv converts the input features to output O. The formula is as follows:
O : = o I = I     F + b R d H W
where C, H, and W are the input channel, height, and width, respectively. I R C H W , F R D C k k , and b R D are the parameters of the convolution operator, where the values of H and W depend on several variables, including the kernel size, padding, step size, etc. The convolution operator’s additivity demonstrates that several branches can be combined into a single convolution and, as a result, have additivity. For example, Table 3 summarizes the detailed operation space. RepVGG [24] proposes an extended 3 × 3 Conv that includes a 1 × 1 Conv and the remaining connections; DBB [24] proposes a diverse branching block to replace the original K × K Conv; and RepNAS [29] aims to search the DBB branches (NAS) using a neural structure search.
Figure 4 illustrates the enhanced reparameterization module, which incorporates kernels of the dimensions 5 × 5 and 13 × 13. By using a very large kernel, the model is able to capture specific patterns within a smaller range, leading to an improved performance. In addition, the convolutional layer in the module employs batch normalization (BN) and utilizes a weight-sharing strategy, treating the entire feature map as neurons. This approach further enhances the accuracy of the optimized module. The experimental results demonstrate that the accuracy of the optimized module is approximately 5% higher than that of the conventional ConvNeXt network.

3.2.3. Using the PReLU Activation Function

The original ConvNeXt network used GELU (Gaussian Error Linear Units) [30] as the activation function, but more activation functions have emerged over time. GELU is an activation function that incorporates regularization, a smoother variant of the ReLU (Rectified Linear Unit) [31]. In general, its activation function takes the form of:
GELU x = x × 1 2 1 + e r f ( x 2 )
where x is used as the neuron input. The larger x is, the more likely it is that the activation output x will be retained, and the smaller x is, the more likely it is that the activation result will be 0.
The GELU activation function introduces the idea of stochastic regularity to alleviate the gradient disappearance problem, but it affects the convergence in the hard saturation region. The ReLU function, defined as m a x   ( a x , x ) , has the problem that when the input is less than zero, the gradient is zero and the neuron cannot learn via back propagation. The Sigmoid function is calculated as 1 1 + e x , where x is the input value, and the output range of the Sigmoid function is (0, 1). However, the Sigmoid function is affected by the problem of gradient saturation, where the gradient approaches zero at large or small input values, leading to gradient disappearance and gradient explosion. Hence, this study utilizes the PReLU (Parametric Rectified Linear Unit) activation function. It is defined as m a x   ( a x , x )   ( 0 < a < 1 ) . The corresponding image is shown in Figure 5. The PReLU activation function solves the hard saturation problem of the GELU at x < 0 and transmits the ginseng information more efficiently, increasing the nonlinear variability of the ConvNeXt network model.

4. Experimental Validation and Analysis of the Results

4.1. Test Configuration Environment

We implemented our approach based on PyTorch. The processor of the experimental workstation was Xeon 4210 (8-core 2.45 GHz) (Intel., Santa Clara, CA, USA); the memory was 64 G. The GPU was an NVIDIA GeForce GTX 1080 ti (NVIDIA., Santa Clara, CA, USA); the running memory was 11 GB RAM. The software experimental configuration environment was based on Ubuntu 16.04; Python 3.7.0; Pytorch 1.10.1; and CUDA 10.2.

4.2. Experimental Training Process

In addition to the design of the network architecture, the training process also affects the final performance. The Vision Transformer [32] approach in the original network architecture not only brings with it a new set of modules and architectural design decisions but also introduces different training methods (e.g., the AdamW optimizer) for the experiments. In this case, the approach mainly concerns the optimization strategy and the associated hyperparameter settings. Thus, the first step of our experiments is to train the base model using the training procedure of Vision Transformer, which, in this case, is based on ConvNeXt _base. It has been shown [33] that the performance of a simple ResNet-50 model can be significantly improved using modern training techniques. The training parameters of the optimized ConvNeXt network model used in the present study are shown in Table 4. Regarding the training method, we used the AdamW [34] optimizer.

4.3. Model Evaluation

In order to comprehensively measure the effectiveness of the proposed ConvNeXt network, the accuracy (Acc), recall (Rec), precision (Pre), and specificity (Spe) were used as evaluation metrics, which were calculated as follows:
A c c u r a c y = T P + T N T P + F N + F P + T N
R e c a l l = T P T P + F N
P r e c i s i o n = T P T P + F P
S p e c i f i c i t y = T N T N + F P
where TP is the number of samples correctly predicted as positive samples, i.e., the number of accurately identified ginseng specimens; TN is the number of samples correctly predicted as negative samples, i.e., the number of other accurately identified ginseng specimens; FP is the number of samples incorrectly predicted as positive samples, i.e., the number of incorrectly identified ginseng specimens; and FN is the number of samples incorrectly predicted as negative samples, i.e., the number of ginseng specimens identified as other species.

4.4. Impact of the Activation Function on Model Performance

The activation function is crucial in neural networks and has a significant impact on the model performance. In this experiment, the original GELU activation function was replaced with three activation functions, namely, the Sigmoid [35], ReLU, and PReLU, for comparison. The results show that the GELU function takes longer to train, while Sigmoid and ReLU have significant differences compared to PReLU. ReLU causes the necrosis of negative neurons, and Sigmoid cannot handle the problem of feature differences in the training data.
To solve these problems and improve the efficiency, PReLU is used instead of the GELU activation function in this paper. Compared with the original model, as shown in Table 5, PReLU improves the accuracy by 2.77%, reduces the loss value by 0.033, and saves 3.08 s of training time. PReLU directly trains the deep neural network, alleviates the gradient disappearance problem, and improves the training efficiency. At the same time, PReLU can deliver more detailed information to maximize inter-grade differences, such as texture, line, and color, to quickly extract the detailed information of hard-to-capture features, obtaining a better generalization performance and recognition effect and significantly improving the model’s performance.

4.5. Impact of Data Enhancement on Model Performance

To enhance the recognition performance of the new network model, this study addresses the issue of homogeneous backgrounds in the ginseng dataset by employing several strategies. The dataset’s augmentation includes five methods aimed at improving the model’s generalization and robustness: (1) random rotation and flip; (2) random flip combined with a sharpening effect; (3) horizontal flip combined with Gaussian blur; (4) an added black leather background and reddish-brown wood-grain background; and (5) an online enhancement method using the PIL (Python Image Library) module in the Python Image Library, where the given image is uniformly cropped to 256 pixels × 256 pixels before each training round, and then the image is processed using the center-cropping method.
Using the new network as the experimental model, the pre-expansion dataset (1680 images) and the post-expansion dataset (5116 images) were tested for comparison under the same conditions for the other parameters, and the graphs are shown in Figure 6, which show the test results of 78.67% and 85.97%, respectively, with a 7.3% increase.

4.6. Impact of Different Module Combinations on the Experimental Results

The identification results of each different method combination based on the test set are shown in Table 6, where the “↑” symbol in the table indicates the improvement in accuracy compared to the original ConvNeXt model. Overall, for NUM 3, NUM 5, NUM 6, and NUM 8, the recognition accuracies of the models were 90.12%, 92.59%, 91.76%, and 94.44%, respectively. Compared with the other three types of models, the NUM 8 model was able to locate and recognize ginseng features more accurately, effectively improve the accuracy of the ConvNeXt model, better balance the grading results of the different grades of ginseng, and achieve the accurate identification of ginseng grades.
According to the details presented in Table 6, it is observed that the incorporation of the Channel Shuffle module, re-parameterization module, and PReLU module can effectively alleviate the issues of lower accuracy and precision encountered when using the ConvNeXt model. These modules contribute to the improvement in the model’s performance and make it more suitable for the classification of ginseng grades. By integrating these modules, the model achieves an enhanced accuracy and precision, enabling more reliable and accurate classification results for ginseng-grading tasks. Among them, the introduction of the re-parameterization module has the best effect on the overall identification of the ConvNeXt model under the influences of individual modules, with the accuracy, recall, and specificity of the model being 90.12%, 85.09%, and 85.23%, respectively; therefore, the introduction of the Re-parameterization module in a single module is more effective in improving the recognition performance of the ConvNeXt model. In the case of the fusion of two modules, the Channel Shuffle module and the re-parameterization module together have the best effect, showing an improvement of 7.4% compared to the original module. Meanwhile, the combination of the PReLU and re-parameterization module has the lowest effect on the accuracy rate, improving the accuracy by only 3.7%, verifying that the Channel Shuffle module has a competitive advantage and has strong robustness and generalization ability.
Figure 7 depicts the confusion matrix that corresponds to the aforementioned table, where Principal stands for extra ginseng, First-class for first-class ginseng, and Second-class for second-class ginseng; the horizontal coordinate stands for the true class, while the vertical coordinate stands for the predicted class. According to the confusion matrix, the original model’s accuracy in identifying the Principal and Second-class cohorts is just 75%.
The inclusion of the PReLU activation function module (NUM4) improved the grading performance specifically for the second-class ginseng. However, there was no improvement for the extra-grade or first-class ginseng; in fact, the inclusion of this module led to an increased misclassification rate for the extra-grade ginseng. When both the PReLU activation function module and the re-parameterization module were introduced (NUM7), the model’s performance decreased compared to the configuration with the embedded PReLU activation function module and the Channel Shuffle module (NUM6). Specifically, there were decreases of 0.03% and 0.08% in the accuracy of the Principal and Second-class ginseng classifications, respectively. Furthermore, there was no improvement in the accuracy of the First-class ginseng classification.The incorporation of the three improved modules (NUM8) resulted in a significant improvement in the grading performance of both the first-class and second-class ginseng, effectively reducing the misclassification rate. As a result, the recognition accuracy for all types of ginseng exceeded 90%. This achievement demonstrates the superior recognition effect and enhanced generalization performance of the model. The introduction of these modules proved to be highly beneficial, resulting in improved accuracy and ensuring the model’s ability to accurately classify and distinguish different grades of ginseng.
We analyzed the accuracy and loss change curves of the improved ConvNeXt network. During the training process, the loss of the model decreased, and the accuracy increased faster in the first 75 rounds of training, and then the curve gradually stabilized and slowed down. At approximately 110 rounds of training, the accuracy and loss curves of the model leveled off, indicating that the improved ConvNeXt model was in a saturated state and reached the highest recognition accuracy. The trends of the loss rate and accuracy curves are consistent, indicating that the model generally converges well and does not overfit, which validates the effectiveness of the improved ConvNeXt model.

4.7. Comparison of Experimental Models with Mainstream Networks

The classical network and the ginseng dataset were selected for comparison tests according to the prototype framework and parameter setting methods in the corresponding original paper, and the good classification performance achieved using the improved ConvNeXt network on the ginseng dataset was demonstrated by combining the four category performance metrics in Table 7.
Overall, the ConvNeXt network, before improvement, showed improvements of 2.47%, 1.86%, 0.62%, and 4.94% compared to ResNet-50 [35] ResNet-101 [36], DenseNet-121 [37], and Incep-tionV3 [38], respectively, with a recall of 78.06%, which indicates that the original ConvNeXt network results are more stable and that the model performs better compared to the conventional network.
Compared with the original ConvNeXt model, the improved ConvNeXt network has 9.25%, 13.86%, 12.98%, and 7.02% higher accuracy, precision, recall, and specificity, respectively, and several indexes are better than those of the Vision Transformer and Swim Transformer networks. The accuracy reaches 94.44%, and its better results are achieved for the following three reasons: firstly, the channel mixing wash is added after the down-sampling of the backbone network, so that the channel features are fully fusedm and the network operation accuracy is improved. Second, the addition of the reparameterization module to the ConvNeXt Block maintains the high accuracy of the network. Finally, the PReLU activation function increases the nonlinear variability of the neural network model and enhances the network operation rate. The model can provide a valuable reference for subsequent applications in ginseng appearance quality recognition.

4.8. Comparison of the Experimental Model and Expert Identification Results

The accuracy comparison between expert identification and the ConvNeXt model is shown in Table 8. In the ginseng-grading task, the ConvNeXt model achieved high accuracy on all grades of ginseng-grading tasks. Compared with the expert identification results, the model improved by 7.86 percentage points in the grading of extra-grade ginseng, 8.71 percentage points for first-grade ginseng, and 3.30 percentage points for second-grade ginseng. This proves that the ConvNeXt model has a significant advantage in ginseng-grading accuracy, as compared with the expert identification results, and can better capture the features of ginseng images and perform effective grading.
The experimental model’s high accuracy in the performance of ginseng-grading tasks highlights its practical importance. By utilizing the ConvNeXt model, the accurate classification of various grades of ginseng can be achieved, eliminating the subjectivity and human bias associated with reliance on expert identification alone. Moreover, the model’s efficiency enables swift ginseng grading identification, significantly improving the work efficiency. This proves the necessity and effectiveness of the model in solving the difficult problem of ginseng grading and provides an accurate and efficient solution for the task of ginseng grading.

5. Conclusions

It was found that although the generic classification model achieves high accuracy on public datasets, different application scenarios have different characteristics and needs. In some specific scenarios, such as the application scenario described in this paper, only three different contexts need to be classified for ginseng; therefore, the model cannot reach the high performance level of the general classification model. The confusion matrix with the original model shows that the second-class ginseng has the lowest recognition accuracy, which also reflects the fact that when using highly similar ginseng datasets, the feature extraction capability of the model needs to be further improved to provide better recognition results.
In conclusion, this study addresses the specific challenge of classifying ginseng in different contexts and proposes a ConvNeXt model based on the fusion of Channel Shuffle with re-parameterization improvement. The model effectively solves the problem of grading ginseng of various grades, achieving accurate and efficient ginseng grading recognition. The analysis of data augmentation, Channel Shuffle module, structural re-parameterization module, activation function, and the improved ConvNeXt model yielded the following key findings:
  • Data augmentation improves the accuracy of the augmented dataset by 7.3%, validating the effectiveness of this technique.
  • Embedding the Channel Shuffle module after the down-sampled layer enhances the model’s accuracy through improved cross-channel information interaction.
  • The optimized structural re-parameterization module increases the model’s feature extraction capability, contributing to improved performance.
  • The utilization of the PReLU activation function instead of GELU improves the accuracy by 2.77%, reduces the loss value by 0.033, and demonstrates the model’s ability to solve the vanishing gradient problem and enhance the detailed feature information.
  • The improved ConvNeXt model achieves an accuracy of 94.44% on the test set, outperforming common classification models such as ResNet-50, ResNet-101, DenseNet-121, and InceptionV3 by significant margins, showcasing its generalization ability and robustness in accurately identifying ginseng ranks.
In addition, the improved ConvNeXt model demonstrated a high degree of agreement with the expert identification results in accurately classifying ginseng grades. This indicates its reliability and effectiveness in ginseng-grading tasks. However, to further enhance the model’s reliability and adaptability, future work should focus on expanding the dataset to encompass a wider range of ginseng species and appearance features. Additionally, the integration of additional relevant information, such as geographical origin, cultivation methods, and processing techniques, could enhance the model’s understanding and predictive ability. These efforts will contribute to improving the overall robustness of the model and provide valuable technical support for the intelligent grading of ginseng. By continually refining and expanding the dataset and incorporating diverse information, the model’s performance and applicability could be further improved, enabling more accurate and comprehensive ginseng-grading capabilities.

Author Contributions

Conceptualization, W.L. and D.L.; methodology, D.L. and M.Z.; software, D.L. and M.Z.; validation, D.L., M.Z. and W.L.; formal analysis, D.L. and M.Z.; investigation, M.Z. and W.L.; resources, D.L.; data curation, M.Z., W.L. and X.P.; writing—original draft preparation, L.Z., W.L., D.L., X.P. and M.Z.; writing—review and editing, L.Z., W.L., D.L., X.P. and M.Z.; visualization, L.Z., W.L., D.L., X.P. and M.Z.; supervision, D.L.; project administration, D.L.; funding acquisition, L.Z., W.L., D.L., X.P. and M.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the NSFC (No. 61806024); Jilin Province Science and Technology Development Plan Key Research and Development Project, grant number (20210204050YY); Wuxi University Research Start-up Fund for Introduced Talents, grant number (No. 2023r004), (No. 2023r006); National Natural Science Foundation of China (No. 61801439); Jilin Provincial Education Department Scientific Research Project (JJKH20210747KJ); Jilin Provincial Environmental Protection Department Project (202107); Jilin Provincial Middle and Young Leaders Team and Innovative Talents Support Program (No. 20200301037RQ).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All relevant data are included in the manuscript. Raw images are available on request from the corresponding author.

Acknowledgments

The authors would like to thank the anonymous reviewers for their criticisms and suggestions. We would also like to thank D.L., L.Z. and M.Z. for their research support.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Li, F.; Bao, H. Herbal Textual Research and progress on pharmacological actions of Ginseng Radix et Rhizoma. Ginseng Res. 2017, 29, 43–46. [Google Scholar]
  2. Liu, W.; Li, W. Review on industrialization development status and the prospect of Panax ginseng processing. J. Jilin Agric. Univ. 2022, 6, 1–11. [Google Scholar]
  3. Dai, Q.; Li, L.; Xu, G.; Tang, D.; Ma, H. Study on the Correlation between Morphological Characteristics and Chemical Constituents of GardenGinseng, Mountain Transplanted Ginseng, and Mountain-grown Ginseng Based on “Evaluating Quality from Morphological Characteristics”. China Pharm. 2020, 31, 650–655. [Google Scholar]
  4. Li, J.; Wei, X.; Wan, G.; Yang, X.; Shi, J.L. Historical Evolution and Modern Research Progress of Quality Evaluation Based onCharacter Identification of Traditional Chinese Medicinal Materials. Chin. J. Exp. Tradit. Med. 2021, 27, 189–196. [Google Scholar]
  5. Liu, T.-R.; Jin, Y.; Meng, H.-B.; Zhao, Y.-Y.; Zhou, J.-H.; Yuan, Y.; Huang, L.-Q. Biological research of color and quality evaluation in “quality discrimination by the character” of Chinese medicine. China J. Chin. Mater. Med. 2020, 45, 4545–4554. [Google Scholar]
  6. Wang, H.; Tian, Y.; Liu, D.; Ma, X.; Zhan, Z.; Huang, L.; Du, H. History, Development and Application of the Traditional Chinese Medicine Quality Evaluation through Morphological Identification. J. Chin. Med. Mater. 2021, 44, 513–519. [Google Scholar]
  7. Li, M.; Zhang, X.; Liu, S.; Chen, X.; Huang, L.; Shi, T.; Yang, R.; Liu, S.; Zheng, F. Partly Interpretable Machine Learning Method of Ginseng Geographical Origins Recognition and Analysis by Hyperspectral Measurements. Spectrosc. Spectr. Anal. 2022, 42, 1217–1221. [Google Scholar]
  8. Zhao, L.; Shi, M.; Zhang, Q.; Qin, L. Research progress on quality characteristics and formation mechanism of Sun Yiqi Authentic medicinal materials. Chin. Herb. Med. 2022, 53, 6931–6947. [Google Scholar]
  9. Kumar, E.S.; Talasila, V. Leaf Features Based Approach for Automated Identification of Medicinal Plants. In Proceedings of the 2014 International Conference on Communication and Signal Processing, Melmaruvathur, India, 3–5 April 2014; pp. 210–214. [Google Scholar]
  10. Lee, K.-B.; Hong, K.-S. An Implementation of Leaf Recognition System using Leaf Vein and Shape. Int. J. Bio-Sci. Bio-Technol. 2013, 5, 57–66. [Google Scholar] [CrossRef]
  11. Kadir, A.; Nugroho, L.E.; Susanto, A. Leaf Classification Using Shape, Color, and Texture Features. arXiv 2013, arXiv:1401.4447. [Google Scholar]
  12. Carranza-Rojas, J.; Goeau, H.; Bonnet, P.; Mata-Montero, E.; Joly, A. Going Deeper in the Automated Identification of Herbarium Specimens. BMC Evol. Biol. 2017, 17, 181. [Google Scholar] [CrossRef] [Green Version]
  13. Sun, X.; Qian, H. Chinese Herbal Medicine Image Recognition and Retrieval by Convolutional Neural Network. PLoS ONE 2016, 11, e0156327. [Google Scholar] [CrossRef] [Green Version]
  14. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
  15. Li, D.; Piao, X.; Lei, Y.; Li, W.; Zhang, L.; Ma, L. A Grading Method of Ginseng (Panax ginseng C. A. Meyer) Appearance Quality Based on an Improved ResNet50 Model. Agronomy 2022, 12, 2925. [Google Scholar] [CrossRef]
  16. Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar]
  17. Wightman, R.; Touvron, H.; Jégou, H. ResNet Strikes Back: An Improved Training Procedure in Timm. arXiv 2021, arXiv:2110.00476. [Google Scholar]
  18. Wang, X.; Dong, C.; Shan, Y. RepSR: Training Efficient VGG-style Super-Resolution Networks with Structural Re-Parameterization and Batch Normalization. In Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, 10–14 October 2022; pp. 2556–2564. [Google Scholar]
  19. Bello, I.; Fedus, W.; Du, X.; Cubuk, E.D.; Srinivas, A.; Lin, T.-Y.; Shlens, J.; Zoph, B. Revisiting ResNets: Improved Training and Scaling Strategies. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2021; Volume 34, pp. 22614–22627. [Google Scholar]
  20. Liu, Z.; Mao, H.; Wu, C.-Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A ConvNet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
  21. Han, Q.; Fan, Z.; Dai, Q.; Sun, L.; Cheng, M.M.; Liu, J.; Wang, J. On the connection between local attention and dynamic depth-wise convolution. arXiv 2021, arXiv:2106.04263. [Google Scholar]
  22. Crnjanski, J.; Krstić, M.; Totović, A.; Pleros, N.; Gvozdić, D. Adaptive Sigmoid-like and PReLU Activation Functions for All-Optical Perceptron. Opt. Lett. 2021, 46, 2003–2006. [Google Scholar] [CrossRef] [PubMed]
  23. Touvron, H.; Cord, M.; Sablayrolles, A.; Synnaeve, G.; Jégou, H. Going Deeper with Image Transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 32–42. [Google Scholar]
  24. Ding, X.; Zhang, X.; Ma, N.; Han, J.; Ding, G.; Sun, J. RepVGG: Making VGG-Style ConvNets Great Again. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13733–13742. [Google Scholar]
  25. Huang, G.; Sun, Y.; Liu, Z.; Sedra, D.; Weinberger, K.Q. Deep Networks with Stochastic Depth. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 646–661. [Google Scholar]
  26. Ding, X.; Chen, H.; Zhang, X.; Han, J.; Ding, G. RepMLPNet: Hierarchical Vision MLP with Re-Parameterized Locality. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 578–587. [Google Scholar]
  27. Ding, X.; Guo, Y.; Ding, G.; Han, J. ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1911–1920. [Google Scholar]
  28. He, Y.; Shi, H.; Tan, S.; Song, B.; Zhu, J. Multiblock Temporal Convolution Network-Based Temporal-Correlated Feature Learning for Fault Diagnosis of Multivariate Processes. J. Taiwan Inst. Chem. Eng. 2021, 122, 78–84. [Google Scholar] [CrossRef]
  29. Rong, J.; Ou, L. RepNAS: Searching for Efficient Re-Parameterizing Blocks. arXiv 2021, arXiv:2109.03508. [Google Scholar]
  30. Hendricks, D.; Gimpel, K. Gaussian Error Linear Units (GELUs). arXiv 2020, arXiv:1606.08415. [Google Scholar]
  31. Agarap, A.F. Deep Learning Using Rectified Linear Units (ReLU). arXiv 2019, arXiv:1803.08375. [Google Scholar]
  32. Ranftl, R.; Bochkovskiy, A.; Koltun, V. Vision Transformers for Dense Prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 12179–12188. [Google Scholar]
  33. Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
  34. Llugsi, R.; Yacoubi, S.E.; Fontaine, A.; Lupera, P. Comparison between Adam, AdaMax and Adam W Optimizers to Implement a Weather Forecast Based on Neural Networks for the Andean City of Quito. In Proceedings of the 2021 IEEE Fifth Ecuador Technical Chapters Meeting (ETCM), Cuenca, Ecuador, 12–15 October 2021; pp. 1–6. [Google Scholar]
  35. Wen, L.; Li, X.; Gao, L. A Transfer Convolutional Neural Network for Fault Diagnosis Based on ResNet-50. Neural Comput. Appl. 2020, 32, 6111–6124. [Google Scholar] [CrossRef]
  36. Ghosal, P.; Nandanwar, L.; Kanchan, S.; Bhadra, A.; Chakraborty, J.; Nandi, D. Brain Tumor Classification Using ResNet-101 Based Squeeze and Excitation Deep Neural Network. In Proceedings of the 2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP), Gangtok, India, 25–28 February 2019; pp. 1–6. [Google Scholar]
  37. Chhabra, M.; Kumar, R. A Smart Healthcare System Based on Classifier DenseNet 121 Model to Detect Multiple Diseases. In Mobile Radio Communications and 5G Networks; Marriwala, N., Tripathi, C.C., Jain, S., Kumar, D., Eds.; Springer Nature: Singapore, 2022; pp. 297–312. [Google Scholar]
  38. Liu, Z.; Yang, C.; Huang, J.; Liu, S.; Zhuo, Y.; Lu, X. Deep Learning Framework Based on Integration of S-Mask R-CNN and Inception-v3 for Ultrasound Image-Aided Diagnosis of Prostate Cancer. Future Gener. Comput. Syst. 2021, 114, 358–367. [Google Scholar] [CrossRef]
Figure 1. Ginseng (Panax ginseng C. A. Meye) dataset.
Figure 1. Ginseng (Panax ginseng C. A. Meye) dataset.
Agronomy 13 01770 g001
Figure 2. The general framework of the improved ConvNeXt network model.
Figure 2. The general framework of the improved ConvNeXt network model.
Agronomy 13 01770 g002
Figure 3. Operation schematic of the Channel Shuffle.
Figure 3. Operation schematic of the Channel Shuffle.
Agronomy 13 01770 g003
Figure 4. Modified structural re-parameterization module.
Figure 4. Modified structural re-parameterization module.
Agronomy 13 01770 g004
Figure 5. Relu, GELU, PReLU, and Sigmoid activation function curves.
Figure 5. Relu, GELU, PReLU, and Sigmoid activation function curves.
Agronomy 13 01770 g005
Figure 6. Data enhancement before and after the accuracy curve.
Figure 6. Data enhancement before and after the accuracy curve.
Agronomy 13 01770 g006
Figure 7. Visualization of the confusion matrix corresponding to the methods in the table.
Figure 7. Visualization of the confusion matrix corresponding to the methods in the table.
Agronomy 13 01770 g007
Table 1. Ginseng classification standards in the Jilin Province authentic medicinal materials, ginseng release.
Table 1. Ginseng classification standards in the Jilin Province authentic medicinal materials, ginseng release.
ProjectsPrincipal GinsengFirst-Class GinsengSecond-Class Ginseng
Main RootCylindrical-like
Branch RootThere are 2~3 obvious branched roots, and the thickness is more uniformOne to four branches, coarser and finer
RutabagaComplete with reed head and ginseng fibrous rootsThe reed head and ginseng fibrous roots are more completeRutabaga and ginseng with incomplete fibrous roots
GrooveClear and obvious groovesNot obvious, distinct grooveWithout grooves
Diameter length≥3.53.0–3.492.5–2.99
SurfaceYellowish-white or grayish-yellow, no water rust, no draw groovesYellowish-white or grayish-yellow, light water rust, or with pumping groovesYellowish-white or grayish-yellow, slightly more water rust, with pumping grooves
Cross-sectionYellowish-white in section, powdery, with resinous tract visible
TextureHarder, powdery, non-hollow
Damage, scarsNo significant injuryMinor injuryMore serious
Insects, mildew, impuritiesNoneMildPresence
SectionSection neat, clearSegment is obviousSegments are not obvious
SpringtailsSquare or rectangularMade conical or cylindricalIrregular shape
weight500 g/root or more250–500 g/root100–250 g/root
Table 2. Dataset partition.
Table 2. Dataset partition.
LevelNumber of Original Training SetsNumber of Enhanced Training SetsNumber of Original Validation SetsNumber of Enhanced Validation Sets
Principal4121436137359
First-class6721334257334
Second-class217132272331
Table 3. Operation spaces for RepVGG, DBB, RepNAS, and ConvNeXt.
Table 3. Operation spaces for RepVGG, DBB, RepNAS, and ConvNeXt.
Method#BranchesBranches
RepVGG [24]3K × K, 1 × 1, residual connection
DBB [24]4K × K, 1 × 1-K × K, 1 × 1-AVG, 1 × 1
RepNAS [29]7K × K, 1 × 1-K × K, 1 × 1-AVG, 1 × 1, 1 × K, K × 1, residual connection
ConvNeXt (ours)5K × K, 13 × 13, residual connection
Table 4. New model training parameters.
Table 4. New model training parameters.
ParameterValue or Name
Training epochs150
Batch size32
Learning rate0.0006
OptimizerAdamW
Weight decay0.08
Input size/pixels256 × 256
Table 5. Effects of different activation functions on model performance in the test dataset.
Table 5. Effects of different activation functions on model performance in the test dataset.
Activation FunctionAcc/%Loss/%Time s/epoch
GELU91.670.30327.02
Sigmoid88.880.33624.24
ReLU90.740.39024.63
PReLU94.440.27023.94
Table 6. Comparison of ConvNeXt experimental models with different module combinations.
Table 6. Comparison of ConvNeXt experimental models with different module combinations.
NUMMethodAcc/%Pre/%Rec/%Spe/%δAcc/%
1ConvNeXt85.1977.7278.0688.80-
2ConvNeXt + Channel Shuffle89.5184.2184.8091.98↑3.70
3ConvNeXt + Rep90.1285.0985.2392.55↑4.93
4ConvNeXt + PReLU86.4279.4779.6089.76↑1.23
5ConvNeXt + Channel Shuffle + Rep92.5988.7789.3194.33↑7.40
6ConvNeXt + Channel Shuffle + PReLU91.3687.0487.2893.42↑6.48
7ConvNeXt + PReLU + Rep88.8983.3383.7391.58↑3.70
8ConvNeXt + Channel Shuffle + Rep + PReLU
(Our model)
94.4491.5891.0495.82↑9.25
Table 7. Comparison of the experimental results of different convolutional neural network models.
Table 7. Comparison of the experimental results of different convolutional neural network models.
ModelAcc/%Pre/%Rec/%Spe/%
ResNet-5082.7274.0874.1286.94
ResNet-10183.3374.9675.5887.31
DenseNet-12184.5783.3177.1588.28
InceptionV380.2570.4270.5885.10
ConvNeXt85.1977.7278.0688.80
Vision Transformer91.9888.1487.9493.95
Swim Transformer90.1285.1184.9892.61
Our Model94.4491.5891.0495.82
Table 8. Comparison of expert assessment and ConvNeXt model accuracy.
Table 8. Comparison of expert assessment and ConvNeXt model accuracy.
LevelExpert Identification Accurate/%ConvNeXt Model Accurate/%
Principal82.5090.36
First-class83.7092.41
Second-class88.9092.20
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, D.; Zhai, M.; Piao, X.; Li, W.; Zhang, L. A Ginseng Appearance Quality Grading Method Based on an Improved ConvNeXt Model. Agronomy 2023, 13, 1770. https://doi.org/10.3390/agronomy13071770

AMA Style

Li D, Zhai M, Piao X, Li W, Zhang L. A Ginseng Appearance Quality Grading Method Based on an Improved ConvNeXt Model. Agronomy. 2023; 13(7):1770. https://doi.org/10.3390/agronomy13071770

Chicago/Turabian Style

Li, Dongming, Mengting Zhai, Xinru Piao, Wei Li, and Lijuan Zhang. 2023. "A Ginseng Appearance Quality Grading Method Based on an Improved ConvNeXt Model" Agronomy 13, no. 7: 1770. https://doi.org/10.3390/agronomy13071770

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop