A Multi-Scale Feature Extraction Algorithm for Chinese Herbal Medicine Image Classification

Dai, Wenbin; Ma, Yuxin; Fan, Yan; Ma, Jun

doi:10.3390/app15084271

Open AccessArticle

A Multi-Scale Feature Extraction Algorithm for Chinese Herbal Medicine Image Classification

School of Information Science & Engineering, Lanzhou University, Lanzhou 730000, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(8), 4271; https://doi.org/10.3390/app15084271

Submission received: 13 March 2025 / Revised: 8 April 2025 / Accepted: 10 April 2025 / Published: 12 April 2025

Download

Browse Figures

Versions Notes

Abstract

Due to the low quality of existing Chinese herbal medicine datasets and the lack of recognition algorithms for herbal images, automatic classification of Chinese herbal medicine is ineffective. In this paper, we constructed a comprehensive dataset comprising 4485 images across 20 categories of Chinese herbal medicine. This dataset captures the morphological diversity of Chinese herbal medicine while reducing inter-class variations and closely mimics real-world complexity. Considering the subtle differences among the data, we proposed a multi-scale feature extraction architecture called MSPyraNet. This architecture is composed of multiple FACNBlock units, which are designed explicitly for herbal medicine characteristics. FACNBlock utilizes a multi-scale representation module, using convolutions and atrous convolutions of varying sizes to generate and fuse multi-scale feature maps. Experimental results show that MSPyraNet improves accuracy by more than 4.72% and 4.54% compared to existing SOTA models on two datasets. Ablation studies validate the effectiveness of the multi-scale representation module. Furthermore, we discovered that MSPyraNet achieves a notable improvement in classifying Chinese herbal medicines that are morphologically similar but belong to different categories. Briefly, this study provides a dataset and methodological reference for future research on Chinese herbal medicine classification.

Keywords:

image recognition; deep learning; convolutional neural networks; Chinese herbal medicine

1. Introduction

With the rapid development of artificial intelligence, an increasing number of studies have focused on applying intelligent algorithms to the healthcare domain [1,2,3]. Traditional Chinese Medicine(TCM), as an essential component of the global healthcare system, plays a crucial role in medical practices worldwide. Chinese herbal medicines, the core elements of TCM, have demonstrated remarkable efficacy in regulating bodily functions, enhancing the immune system, promoting blood circulation, alleviating fatigue symptoms, and preventing chronic diseases. However, the classification of Chinese herbal medicines faces several challenges: the wide variety and morphological diversity of herbs, the existence of multiple subspecies within the same species, high similarity in morphological features among different herbs, and significant morphological variations due to growth stages, processing methods, and storage conditions. Traditional methods of herbal identification rely heavily on the expertise of professionals [4], which are not only inefficient and costly but also prone to subjectivity and high error rates. Automated classification of Chinese herbal medicines can significantly improve efficiency and reduce the cost of identification, providing robust technical support for the modernization and internationalization of TCM.

Image datasets of Chinese herbal medicines are fundamental to automated classification research. A dataset with a wide variety of species, sufficient samples, diverse morphologies, multiple shooting angles, and complex backgrounds is crucial for building classification models with high generalization capabilities. Previous studies have made some progress in this area. For instance, Miao et al. constructed a dataset containing images of six Chinese herbal medicines [5]; Mookdarsanit et al. developed a dataset with images of 11 Thai herbs [6]; Zhang et al. established a dataset with images of 60 types of Chinese herbal slices [7]. Additionally, Azzez et al. created a dataset with images of five herbs [8]. In our previous work, we developed the CHMD1 dataset [9], which includes 16 types of Chinese herbal medicine with 3552 sample images, providing a valuable data foundation for subsequent research.

Convolutional Neural Networks (CNNs) have been widely applied in computer vision due to their ability to automatically extract image features, effectively overcoming the limitations of traditional manual feature extraction [10,11,12,13]. For example, Lee et al. demonstrated the effectiveness of CNNs in feature extraction through deconvolution network visualization techniques, providing a significant basis for further research [14]. In the field of herbal image classification, researchers have made notable progress using CNNs. Miao et al. achieved advanced classification performance on a six-class Chinese herb dataset using an improved CNN architecture [5], and Mookdarsanit et al. obtained similar high-efficiency results [6]. Furthermore, Zhao et al. employed PointNet to classify seven types of Chinese herbal medicine, further validating the potential of deep learning methods [15]. Transfer learning has also been widely applied, such as Khalid et al. using the pre-trained ResNet-50 model [16,17], and Xing et al. and Azeez et al. employing ImageNet pre-trained CNNs for herb classification, all achieving excellent results [8,18]. These studies fully demonstrate the significant application value of CNNs in Chinese herbal image analysis.

Despite the progress in datasets and methods for Chinese herbal classification, two challenges remain. (1) Existing Chinese herbal image datasets generally suffer from small sample sizes, insufficient categories, monotonous backgrounds, and excessive inter-class differences. These issues severely limit the generalization capabilities of models, particularly in handling inter-class similarities and morphological diversity. Moreover, existing datasets often fail to adequately reflect the diversity of Chinese herbal medicine in real-world scenarios, leading to poor model performance in practical applications; (2) current research mainly involves simple improvements to CNNs or direct application of existing pre-trained models, which often struggle to effectively capture multi-scale features in Chinese herbal images, resulting in limited receptive fields and insufficient extraction of multi-level features.

To address these issues, we have made contributions in two key areas. First, in terms of datasets, we expanded the CHMD1 dataset by adding four new types of Chinese herbal medicine and 933 sample images, constructing the CHMD2 dataset. This dataset retains the advantages of CHMD1 and significantly enhances inter-class similarity and morphological diversity, providing richer data support for model training. Second, in terms of model network improvement, we proposed an enhancement based on the ConvNeXtv2 architecture [19]. Specifically, we replaced the original 7 × 7 large-kernel depthwise separable convolution [20] with a multi-scale representation module, creating the FACNBlock (Fusion Atrous Convolutional Network Block) unit tailored for Chinese herbal images capable of extracting multi-scale features. The FACNBlock can generate feature maps with diverse receptive fields, and it considers the characteristic that Chinese herbal medicines often appear in stacked forms. FACNBlock enables a more comprehensive and accurate capture of the multi-level features of Chinese herbal medicine images. Additionally, the FACNBlock incorporates Global Response Normalization (GRN) technology [19], significantly enhancing the expressive power between channels. GRN improves the contrast between feature map channels, alleviating feature collapse and enabling different channels to focus on extracting diverse feature information, thereby enhancing the model’s expressive power. Based on this, we proposed the MSPyraNet (Multi-Scale Pyramid Network), which fuses feature maps of different receptive fields to more comprehensively capture multi-level features of Chinese herbal images, further optimizing the performance of Chinese herbal image classification tasks. The comparison between the MSPyraNet and other convolutional architecture can be seen in Figure 1. Experimental results show that our method performs well in Chinese herbal image classification tasks.

The main contributions of this study include the following:

(1): Dataset Expansion and Improvement: We constructed the CHMD2 dataset, adding four new types of Chinese herbal medicine. This significantly enhanced inter-class similarity and morphological diversity, providing richer data support for model training;
(2): Network Architecture Innovation: Proposed the MSPyraNet, significantly improving the model’s classification ability for Chinese herbal images through multi-scale feature extraction tailored to the characteristics of Chinese herbal images;
(3): Experimental Results and Analysis: To validate the effectiveness of the proposed method, we conducted comparative experiments on the CHMD1 and CHMD2 datasets. The results show that the MSPyraNet significantly outperforms baseline models.

2. Related Work

2.1. Herb Dataset Construction

Previous research has attempted to construct datasets of herbal or Chinese medicine-related images. Miao et al. built a dataset containing six types of Chinese herbal medicine with a total of 7853 images using a combination of photography and web crawling, providing an essential foundation for subsequent research [5]. Mookdarsanit et al. constructed a dataset of 11 Thai herbs with 2700 images through photography, but the images had a single background, which is not conducive to model generalization in real-life scenarios [6]. Zhang et al. collected 13,088 images of 60 types of Chinese herbal slices through photography and web crawling, with some data having background interference [7]. Although the data scale was significantly expanded, all data were Chinese herbal slices, not actual Chinese herbal medicine. Azeez et al. obtained a dataset of five herbs with at least 100 images per herb through photography and web crawling but with a single background [8]. Dai et al. constructed a dataset of single-category Chinese herb images through web crawling, where images may present multiple stacked samples, mutual occlusion, and sample morphological diversity, providing a richer and more diverse dataset for research [9].

2.2. Application of Convolutional Neural Networks in Herbal Classification

Convolutional Neural Networks (CNNs) have shown significant advantages in herbal feature extraction and classification. Lee et al. applied CNNs to feature learning of 44 plant images and demonstrated the superiority of CNN-learned features over traditional manual features [10,11,12,13] through deconvolution network (Deconvnet) visualization [14]. Miao et al. achieved breakthrough progress on a six-class Chinese herb dataset by improving CNN architecture and incorporating data augmentation techniques [5]. Mookdarsanit et al. successfully applied CNNs to the classification task of 11 Thai herbs, validating the universality of this method [6]. Zhao et al. innovatively introduced three-dimensional structural features, using PointNet to effectively classify a seven-class Chinese herb dataset [15]. Khalid et al. confirmed the superiority of the pre-trained ResNet-50 model in herbal classification tasks through comparative experiments [16]. Xing et al. and Azeez et al. validated the effectiveness of ImageNet pre-trained CNNs in herbal classification [8,18].

The ConvNeXt series architecture has demonstrated excellent performance in various tasks. ConvNeXt achieved advanced results on ImageNet using only pure convolutional structures [21]. Inspired by Swin-Transformer [22] and ResNeXt [23], ConvNeXt adopted 7 × 7 large-kernel convolutions in its convolutional blocks, replaced common batch normalization layers [24] with layer normalization layers [25], and substituted ReLU activation functions with GeLU activation functions. ConvNeXt v2 [19] further developed this by introducing a new Global Response Normalization (GRN) layer [19] to enhance feature competition between channels within the original ConvNeXt convolutional module, capturing more discriminative channel features.

Although progress has been made in Chinese herbal classification, some issues remain. Existing datasets often have small sample sizes, limited categories, and monotonous backgrounds, which affect model generalization. They also fail to reflect real-world diversity fully. In terms of methods, most studies only make simple improvements to CNNs or apply pre-trained models without considering the unique characteristics of herbal images.

3. Dataset Construction

Chinese herbal datasets play a crucial role in improving model performance. However, the construction of Chinese herbal image datasets in current research still has shortcomings. In a previous study, we constructed the CHMD1 dataset, which includes 16 types of Chinese herbal medicine with 3552 images [9]. This dataset has significant advantages over similar datasets: it is specifically designed for Chinese herbal medicine, effectively addressing issues such as limited categories, insufficient sample sizes, single shooting angles, monotonous backgrounds, and uniform morphology within categories. However, the number of Chinese herb types is still insufficient, leading to excessive differences between different types of Chinese herbal medicine and a lack of inter-class similarity, thereby affecting the model’s generalization capabilities. Therefore, we constructed a new Chinese herbal dataset, CHMD2, based on CHMD1.

The CHMD2 dataset incorporates four new types of Chinese herbal medicine images: Raspberry, Poria Cocos, Rhizoma Phragmitis, and Cinnamon. These images were primarily obtained through web scraping, yielding an initial collection of 2496 raw images. After manual curation, 933 high-quality images were retained, including 249 Raspberry images, 199 Poria Cocos images, 245 Cinnamon images, and 240 Rhizoma Phragmitis images. The total number of images in CHMD2 reaches 4485, with the number of types increasing to 20. As shown in Table 1. The category with the most images has 380 images, while the category with the fewest images has 110 images. The distribution of the number of images for each Chinese herb is shown in Figure 2. CHMD2 not only retains the advantages of CHMD1, such as complex and diverse morphological expressions, stacked or clustered presentation, single-category images, diverse backgrounds, and multi-angle shooting, but it also increases inter-class similarity and further enhances morphological diversity.

The addition of the four new categories effectively enhances the inter-class similarity of the dataset, thereby improving the model’s generalization capabilities. As shown in Figure 3, in the original dataset, matrimony vine is the only red Chinese herb category. However, color is an important feature for model recognition, which may limit the model’s ability to recognize red features. Therefore, we introduced raspberry images. Most Raspberries are also red, which will interfere with the model to some extent, thereby improving its generalization capabilities. Similarly, in the original dataset, alumen is the only white block-shaped Chinese herb. Processed poria cocos is also white and block-shaped, and its inclusion helps the model recognize such forms, thereby improving the model’s ability to handle similar forms. The introduction of rhizoma phragmitis enhances the model’s recognition ability due to its similar texture and morphological features to radix stemonae, while the addition of cinnamon expands the model’s recognition range for strip-shaped species.

To further increase the morphological diversity of the dataset, the four new categories include multiple morphological features. As shown in Figure 4, raspberry presents different forms, colors, and textures, including fresh and processed states, red and black variants, and morphological changes due to different processing techniques. Poria cocos includes processed and unprocessed samples, with processed samples further divided into block and slice forms. Rhizoma phragmitis also has processed and unprocessed distinctions, with different parts showing morphological differences. Cinnamon includes slice and strip forms in different colors. These diverse morphological features provide more comprehensive learning samples for the model, helping to improve its robustness and accuracy in practical applications.

4. MSPyraNet Architecture Design

4.1. FACNBlock Design

The ConvNeXt series architecture has demonstrated outstanding performance in computer vision as a pure convolutional architecture. In ConvNeXt v2 [19], a key convolutional unit (called CNBlock) uses 7 × 7 large-kernel depthwise separable convolution [20] to extract features. However, due to the limitation of the receptive field, this structure struggles to capture multi-scale features, resulting in poor performance in handling different levels of features. In Chinese herbal datasets, differences between herbs may be reflected in texture, overall shape, and other levels, making it difficult to distinguish similar herbs using only 7 × 7 convolution. Therefore, we improved the CNBlock and designed the FACNBlock, which replaces the original 7 × 7 large-kernel depthwise separable convolution [20] with a multi-scale representation module to address the issue of limited receptive fields and difficulty in capturing multi-scale features caused by single-size convolution kernels. As shown in Figure 5, the multi-scale representation module consists of three parallel branches, each maintaining the spatial dimensions of the feature map, capturing multi-scale features through different convolutions or atrous convolutions [26]. In existing multi-scale fusion pyramid structure research [27], researchers used 1 × 1 convolution combined with multiple 3 × 3 convolutions with different dilation rates (6, 12, 18) to extract features. However, since Chinese herbal samples often appear in stacked forms, this design results in feature maps with excessively large receptive fields, making it difficult to effectively capture the local features of individual samples. To address this issue, we optimized the convolution kernel size to better adapt to the needs of Chinese herbal image classification tasks, ensuring accurate extraction of individual target features in stacked samples. In this study, the first branch uses 1 × 1 convolution to capture local detail information; the second branch uses 3 × 3 convolution to expand the receptive field and capture medium-scale context information; the third branch uses 3 × 3 atrous convolution (dilation rate of 2) to further expand the receptive field to five while reducing computational load. The output feature maps of the three branches are batch normalized [24] and ReLU activated, then concatenated with the original input and reduced to the same spatial dimensions as the input through 1 × 1 convolution. This pyramid structure effectively enhances the model’s ability to extract multi-scale features. Finally, the output feature map is GELU activated, and the GRN part of the CNBlock [19] is retained to enhance the contrast and selectivity between channels, preventing feature collapse, and the feature transformation is completed through 1 × 1 convolution, followed by a residual network [17] connection.

Specifically, the input tensor

x \in R^{C_{i n} \times H \times W}

first undergoes a layer normalization [25] operation

L N (\cdot)

, yielding the normalized result

x^{*}

. By normalizing along the channel dimension, the input distribution can be stabilized, ensuring that the data across these dimensions within each sample has a mean of 0 and a variance of 1. The calculation process is as follows:

x^{*} = L N (x)

(1)

Subsequently,

x^{*}

is fed into the multi-scale representation module. In the first branch, local detail information is captured through a 1 × 1 convolutional operation

{C o n v}_{1 \times 1} (\cdot)

, followed by batch normalization [24]

B N (\cdot)

and ReLU activation

R e L U (\cdot)

, producing the output

Y_{1}

of the first branch. Specifically, batch normalization [24] normalizes the intermediate features to have a mean of 0 and a variance of 1, which stabilizes training and accelerates convergence. The ReLU activation function then introduces non-linearity by setting all negative values to 0, allowing the model to learn complex and non-linear patterns. This process is formulated as:

Y_{1} = R e L U (B N ({C o n v}_{1 \times 1} (x^{*})))

(2)

In the second branch,

x^{*}

undergoes a 3 × 3 convolutional operation

{C o n v}_{3 \times 3} (\cdot)

to capture medium-scale contextual information, followed by batch normalization [24]

B N (\cdot)

and ReLU activation

R e L U (\cdot)

, resulting in the output

Y_{2}

of the second branch. The computation is expressed as:

Y_{2} = R e L U (B N ({C o n v}_{3 \times 3} (x^{*})))

(3)

In the third branch,

x^{*}

is processed through a 3 × 3 atrous convolution with a rate of 2 to capture larger-scale contextual information

{C o n v}_{3 \times 3, r a t e = 2} (\cdot)

, followed by batch normalization [24]

B N (\cdot)

and ReLU activation

R e L U (\cdot)

, yielding the output

Y_{3}

of the third branch. This operation is computed as:

Y_{3} = R e L U (B N ({C o n v}_{3 \times 3, r a t e = 2} (x^{*})))

(4)

The output feature maps

Y_{1}

,

Y_{2}

, and

Y_{3}

from the three branches are concatenated with the original input

x^{*}

using the

C o n c a t (\cdot)

operation. This concatenated result is then reduced in dimension through a 1 × 1 convolution

{C o n v}_{1 \times 1} (\cdot)

, producing the fused feature map

Y_{o u t}

. The computation is as follows:

Y_{o u t} = {C o n v}_{1 \times 1} (C o n c a t (Y_{1}, Y_{2}, Y_{3}, x^{*}))

(5)

Next, the output feature map

Y_{o u t}

is activated using the GELU function

G E L U (\cdot)

, resulting in

Y_{a c t i v a t e d}

. Global Response Normalization

G R N (\cdot)

is then applied to enhance inter-channel contrast and selectivity, yielding

Y_{G R N}

. The process is as follows:

Y_{a c t i v a t e d} = G E L U (Y_{o u t})

(6)

Y_{G R N} = G R N (Y_{a c t i v a t e d})

(7)

Finally, a 1 × 1 convolution

{C o n v}_{1 \times 1} (\cdot)

is employed to complete the ultimate feature transformation, producing

Y_{f i n a l}

:

Y_{f i n a l} = {C o n v}_{1 \times 1} (Y_{G R N})

(8)

4.2. MSPyraNet Overall Architecture

The MSPyraNet in this study is based on the ConvNeXtv2-Base architecture [19] and comprises four main modules: the Stem layer, DownSample layer, Stage layer, and Average pooling layer. The Stem layer, serving as the initial module of the network, utilizes a 4 × 4 convolutional kernel (with a stride of 4) and layer normalization [25] to downsample the input data, preparing it for subsequent feature extraction. The Stage layer consists of multiple FACNBlocks, which progressively extract and transform features. The output of each Stage layer (except the last one) is processed by the DownSample layer, halving the feature map size and doubling the number of channels before proceeding to the next stage. The output of the final Stage layer undergoes average pooling to generate the ultimate result of MSPyraNet.

As this study pertains to a multi-class classification task, the cross-entropy loss function is selected as the model’s objective function to measure the discrepancy between the model’s predicted outputs and the true labels:

L o s s = - \frac{1}{m} \sum_{i = 1}^{m} \sum_{k = 1}^{n} y_{k}^{(i)} l o g (p_{k}^{(i)})

(9)

where

L o s s

represents the model’s loss,

n

is the number of classes,

m

is the number of samples,

y_{k}^{(i)}

is 1 if the true class of the sample,

i

is the

k

-th class and 0 otherwise;

p_{k}^{(i)}

is the probability predicted by the model that the sample

i

belongs to the

k

-th class.

To optimize model performance, the AdamW optimizer is employed to iteratively update the model parameters, effectively minimizing the loss value.

5. Experiments and Results Analysis

5.1. Evaluation Metrics and Experimental Setup

Following previous work, we chose accuracy, precision, recall, and F1 score as evaluation metrics. These metrics are widely used in classification tasks. Accuracy represents the proportion of correctly predicted samples, reflecting overall classification capability; Precision measures the proportion of actual positives among predicted positives, indicating prediction exactness; Recall calculates the proportion of actual positives correctly identified, demonstrating model coverage; F1-Score is the harmonic mean of Precision and Recall, providing a balanced evaluation for imbalanced datasets. The formulas are defined as:

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(10)

P r e c i s i o n = \frac{T P}{T P + F P}

(11)

R e c a l l = \frac{T P}{T P + F N}

(12)

F 1_s c o r e = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(13)

where

T P

denotes True Positives,

T N

denotes True Negatives,

F P

denotes False Positives, and

F N

denotes False Negatives.

We divided the dataset into training, validation, and test sets in a ratio of 7:2:1. The training set contains 3140 images, the validation set contains 897 images, and the test set contains 448 images, ensuring a balanced sample distribution across each category.

To enhance the model’s generalization ability, we performed data augmentation on the dataset. First, each image was rotated by 90, 180, and 270 degrees, expanding the sample size to four times the original. Then, the training set images were uniformly resized to 256 × 256, followed by random horizontal or vertical flipping, and finally randomly cropped to 224 × 224. The validation and test set images were directly randomly cropped to 224 × 224.

To avoid certain features having an excessive impact on model training and to speed up the training process, we perform normalization on the input image tensor. Specifically, for each channel of the image, we subtract the mean and divide it by the standard deviation, ensuring that the mean of each channel is 0 and the variance is 1. The formula is as follows:

p_{n} = \frac{p_{0} - μ}{σ}

(14)

where

p_{0}

represents the pixel value in the original image,

p_{n}

represents the normalized result,

μ

denotes the mean of the pixel values in the dataset, and

σ

denotes the standard deviation of the pixel values in the dataset.

The main training parameter settings are as follows: the number of training epochs is 200, the batch size per epoch is 32, and the optimizer used is AdamW.

5.2. Comparative Experiments with Common Models on CHMD1

To validate the effectiveness of MSPyraNet in the task of Chinese herbal medicine classification, we conducted comparative experiments on the CHMD1 dataset. The performance of MSPyraNet was evaluated against several common deep learning models [17,21,22,28,29,30] and CSA [9] in terms of accuracy, precision, recall, and F1 score. The classification layer of all models utilized a fully connected layer. Previous studies have shown that the CSA encoder outperforms other deep learning classification algorithms on the CHMD1 dataset, albeit with limited improvement. However, MSPyraNet demonstrated significantly superior performance on the CHMD1 dataset compared to other models. The specific experimental results are presented in Table 2.

Experimental results demonstrate that MSPyraNet achieves superior performance, with an accuracy of 86.22%, precision of 83.80%, recall of 83.81%, and an F1 score of 83.26%. Compared to the CSA encoder, MSPyraNet shows improvements of 4.72%, 3.92%, 4.13%, and 3.83% in accuracy, precision, recall, and F1 score, respectively. Among all the SOTA models, EfficientNetv2_s performs the best, with scores of 79.94% (accuracy), 78.07% (precision), 77.99% (recall), and 77.28% (F1 score). MSPyraNet outperforms EfficientNetv2_s by 6.28% (accuracy), 5.73% (precision), 5.82% (recall), and 5.98% (F1 score). Within the Resnet family, MSPyraNet surpasses the best-performing model, Resnet-152, by 13.49% (accuracy), 9.42% (precision), 14.36% (recall), and 12.39% (F1 score). Similarly, compared to the best model in the ConvNeXt family, ConvNeXtv2_b, MSPyraNet achieves improvements of 7.85% (accuracy), 6.63% (precision), 7.86% (recall), and 7.56% (F1 score). These results demonstrate that MSPyraNet excels in the task of Chinese herbal medicine classification, marking a significant advancement over previous research. Figure 6 illustrates the performance comparison of MSPyraNet and SOTA models on the CHMD1 dataset across four evaluation metrics.

5.3. Comparative Experiments with Common Models on CHMD2

To further validate the performance of MSPyraNet on a more complex dataset, in this study, experiments were conducted on the CHMD2 dataset, with comparisons made against the CSA encoder [9], ConvNeXt v2 [19] and some other common deep learning models [17,29,30]. The experiments aimed to demonstrate the effectiveness of the MSPyraNet in more complex Chinese herbal medicine classification tasks. The results, presented in Table 3, show that MSPyraNet maintains commendable performance on the more complex CHMD2 dataset.

On the CHMD2 dataset, MSPyraNet demonstrates outstanding performance, with an accuracy of 89.22%, precision of 88.85%, recall of 87.87%, and an F1 score of 88.16%. Compared to the best SOTA model, EfficientNetv2_m, MSPyraNet improves accuracy, precision, recall, and F1-score by 4.54%, 5.65%, 5.20%, and 5.65%, respectively. The improvements over the CSA model are even more significant, with gains of 11.98% in accuracy, 11.94% in precision, 13.62% in recall, and 13.29% in F1-score. MSPyraNet also outperforms Resnet_50 by 8.11%, 8.31%, 9.60%, and 9.30% on the four metrics. Compared to ConvNeXtv2_b, the performance gains are 12.52%, 12.28%, 13.50%, and 13.49%, respectively. These results highlight the robustness and effectiveness of MSPyraNet on this classification task. Figure 7 illustrates the performance comparison of MSPyraNet and SOTA models on the CHMD2 dataset across four evaluation metrics.

5.4. Ablation Experiment Analysis

Ablation experiments were conducted on both the CHMD1 and CHMD2 datasets to systematically evaluate the effectiveness of the multi-scale representation module. By comparing the performance differences between models with and without the multi-scale representation module, the contribution of this module was validated. Specifically, two experimental setups were designed: without the multi-scale representation module and with the multi-scale representation module. The results are presented in Table 4.

The experimental results reveal that the introduction of the multi-scale representation module significantly enhances the model’s performance on both datasets. Specifically, on the CHMD1 dataset, the model’s accuracy, precision, recall, and F1 score improved by 9.81%, 11.08%, 13.65%, and 12.82%, respectively. On the CHMD2 dataset, the corresponding metrics improved by 12.14%, 11.11%, 13.59%, and 13.08%. These results robustly confirm the importance of the multi-scale representation module in the task of Chinese herbal medicine image classification.

Notably, the MSPyraNet model exhibited superior performance on the more complex CHMD2 dataset. As shown in Table 3, MSPyraNet’s accuracy, precision, recall, and F1 score on CHMD2 improved by 3%, 5.05%, 4.06%, and 4.9%, respectively, compared to CHMD1. This indicates that the model possesses stronger adaptability and robustness in handling complex Chinese herbal medicine classification tasks.

To validate the suitability of smaller convolutional sizes for feature extraction in scenarios where Chinese herbal medicines are stacked, we conducted experiments on CHMD2 using existing multi-scale fusion pyramid structures [27] with varying convolutional sizes. The results are presented in Table 5.

From Table 4, it is evident that adjusting the convolutional sizes in consideration of the characteristics of Chinese herbal medicines led to a significant improvement in model performance. MSPyraNet achieved improvements of 10.12%, 9.77%, 12.2%, and 11.66% in accuracy, precision, recall, and F1 score, respectively. This demonstrates that the multi-scale representation module in MSPyraNet is particularly well suited for Chinese herbal medicine image classification tasks.

To verify the importance of each convolutional unit in the multi-scale representation module, we conducted ablation experiments on CHMD2 by removing one convolutional unit at a time. The experimental results are shown in Table 6.

The experimental results indicate that removing any single convolutional unit from the Multi-scale Representation Module leads to a decline in model performance. Among them, the removal of the 3 × 3 convolution has the most significant impact, with accuracy, precision, recall, and F1-score dropping to 82.99%, 82.96%, 80.86%, and 81.42%, respectively. These findings suggest that each convolutional unit plays a distinct and important role within the module.

5.5. Single Chinese Herb Prediction Accuracy Analysis

To showcase MSPyraNet’s capability in identifying individual Chinese herbal medicines, we analyzed the model’s accuracy for each Chinese herb in the CHMD2 dataset. For a more intuitive comparison, the accuracy of CSA [9] and ConvNeXtv2_b [19] for each herb in CHMD2 was also presented. The results are shown in Table 7. The confusion matrices for accuracy of ConvNeXtv2_b [19], CSA [9], and MSPyraNet on the CHMD2 dataset can be found in Figure A1, Figure A2 and Figure A3 in Appendix A, respectively.

The experimental results indicate that MSPyraNet’s accuracy in identifying individual Chinese herbal medicines is significantly superior to that of CSA and ConvNeXtv2_b. Particularly notable improvements were observed for ganoderma lucidum, folium artemisiae argyi, dangshen, poria cocos, and flos sophorae.

To further demonstrate MSPyraNet’s enhanced performance in classifying Chinese herbal medicines with high inter-class similarity, a comparison was conducted of the probabilities of a given Chinese herb being predicted as the true label versus other morphologically similar labels. The results are presented in Table 8.

The experimental results demonstrate that MSPyraNet exhibits superior performance in distinguishing between morphologically similar Chinese herbal medicines. For instance, flos sophorae, which is morphologically similar to honeysuckle, was frequently misclassified as honeysuckle by ConvNeXtv2_b and CSA. However, MSPyraNet did not misclassify flos sophorae as honeysuckle. Similarly, honeysuckle was often misclassified as folium artemisiae argyi and flos sophorae by ConvNeXtv2_b and CSA, but MSPyraNet did not misclassify honeysuckle as folium artemisiae argyi and only misclassified 1.39% of honeysuckle as flos sophorae. The identification accuracy of dangshen was notably low in ConvNeXtv2_b and CSA, with CSA achieving only 30.56%. However, MSPyraNet achieved an accuracy of 75.00% for dangshen, significantly reducing misclassifications such as angelica dinensis, flos sophorae, and rhizoma rhragmitis. The misclassification rate of white hyacinth bean as poria cocos was also reduced to 15.38%.

5.6. Evaluation Under Different Dataset Splits

To verify the stability and non-randomness of the model’s performance, we conducted training on the CHMD2 dataset by randomly re-shuffling the images in the training, validation, and test sets. The model’s performance was then compared under three different data-splitting strategies. The experimental results are presented in Table 9.

As can be observed, the performance metrics remain consistently high and exhibit only minor variations across the three different data partitioning strategies. This consistency indicates that the model is not overly dependent on a specific dataset split and confirms its robustness and generalization ability. Therefore, the strong and stable results across all configurations provide solid evidence of the reliability and effectiveness of the proposed model.

6. Conclusions

This study constructed a highly complex Chinese herbal medicine image dataset, which significantly surpasses existing datasets in terms of morphological diversity and inter-class similarity, more closely resembling the complexity of real-world scenarios. In response to the characteristics of Chinese herbal medicine images, particularly their inter-class similarity and stacked arrangement, this study innovatively proposes a multi-scale representation module and designs a feature extraction unit, FACNBlock, leading to the construction of the MSPyraNet. The experimental results demonstrate that the MSPyraNet exhibits exceptional performance in the task of Chinese herbal medicine image classification, particularly in handling complex scenarios and distinguishing between morphologically similar Chinese herbal medicine categories, showcasing significant advantages. Although the current dataset already exhibits complex and diverse characteristics, the number of samples is still relatively limited. In the future, we plan to further expand the dataset to provide a richer and more reliable resource for research in this field.

Author Contributions

Conceptualization, J.M. and W.D.; methodology, W.D.; software, W.D. and Y.M.; validation, W.D., Y.M. and Y.F.; formal analysis, W.D. and Y.M.; investigation, Y.F. and W.D.; resources, J.M.; data curation, W.D. and Y.F.; writing—original draft preparation, W.D. and Y.M.; writing—review and editing, W.D. and J.M.; visualization, W.D.; supervision, J.M.; project administration, J.M.; funding acquisition, J.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of Gansu Province, grant number No. 23JRRA1066; the Central Government Guided Local Science and Technology Development Fund Project, grant number No. 25ZYJA016; the Gansu Province Major Special Science and Technology Project, grant number No. 22ZD6GE016; and the Fundamental Research Funds for the Central Universities, grant number No. lzujbky-2024-eyt02.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Due to intellectual property considerations, the dataset and code required to reproduce the findings of this study can be provided by contacting the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CHMD1	Chinese Herbal Medicine Dataset 1
CHMD2	Chinese Herbal Medicine Dataset 2

Appendix A

To provide better data support, we present the confusion matrices of ConvNeXtV2_b [19], CSA [9], and MSPyraNet on CHMD2 in terms of accuracy, as shown in Figure A1, Figure A2 and Figure A3. We also provide a more detailed visualization of the training process on CHMD2 of MSPyraNet, including line charts showing the changes in loss, accuracy, precision, and recall on both the training and validation sets. These results demonstrate that our model does not suffer from overfitting, as shown in Figure A4.

Figure A1. The confusion matrix of ConvNeXtV2_b on CHMD2 in terms of accuracy.

Figure A2. The confusion matrix of CSA on CHMD2 in terms of accuracy.

Figure A3. The confusion matrix of MSPyraNet on CHMD2 in terms of accuracy.

Figure A4. Per-epoch loss, accuracy, precision, and recall on training and validation sets on CHMD2.

References

Obasi, C.; Ikharo, B. Remote Health Parameter Monitoring Using Internet of Things: An Edge-Cloud Centric Integration for Real-time Reporting. Int. J. Data Inform. Intelligent Comput. 2025, 4, 45–53. [Google Scholar] [CrossRef]
Chaturvedi, S. Clinical prediction on ML based internet of things for E-health care system. Int. J. Data Inform. Intelligent Comput. 2023, 2, 29–37. [Google Scholar] [CrossRef]
Shukla, A.K.; Kumar, V.S. Cloud computing with artificial intelligence techniques for effective disease detection. Int. J. Data Inform. Intelligent Comput. 2023, 2, 32–41. [Google Scholar] [CrossRef]
Li, D.; Zhao, Z.; Yin, Y.; Zhao, C. Research on the Classification of Sun-Dried Wild Ginseng Based on an Improved ResNeXt50 Model. Appl. Sci. 2024, 14, 10613. [Google Scholar] [CrossRef]
Miao, J.; Huang, Y.; Wang, Z.; Wu, Z.; Lv, J. Image recognition of traditional Chinese medicine based on deep learning. Front. Bioeng. Biotechnol. 2023, 11, 1199803. [Google Scholar] [CrossRef] [PubMed]
Mookdarsanit, L.; Mookdarsanit, P. Thai herb identification with medicinal properties using convolutional neural network. Suan Sunandha Sci. Technol. J. 2019, 6, 34–40. [Google Scholar] [CrossRef]
Zhang, Q.; Ou, J.; Zhou, H. Research on image recognition of Chinese herbal pieces based on Xception and transfer learning. Mod. Electron. Tech. 2024, 47, 29–33. [Google Scholar] [CrossRef]
Azeez, Y.R.; Rajapakse, C. An application of transfer learning techniques in identifying herbal plants in Sri Lanka. In Proceedings of the 2019 International Research Conference on Smart Computing and Systems Engineering (SCSE), Colombo, Sri Lanka, 28 March 2019. [Google Scholar] [CrossRef]
Dai, W.; Zhao, Y.; Chen, Y.; Ma, J. A Chinese medicinal herb image classification algorithm combining local and global features. In Proceedings of the 2025 17th International Conference on Machine Learning and Computing (ICMLC 2025), Guangzhou, China, 14–17 February 2025. [Google Scholar]
Ramesh, S.; Hebbar, R.; Niveditha, M.; Pooja, R.; Shashank, N.; Vinod, P.V. Plant disease detection using machine learning. In Proceedings of the 2018 International Conference on Design Innovations for 3Cs Compute Communicate Control (ICDI3C), Bangalore, India, 25–28 April 2018. [Google Scholar] [CrossRef]
Kumar, P.M.; Surya, C.M.; Gopi, V.P. Identification of ayurvedic medicinal plants by image processing of leaf samples. In Proceedings of the 2017 Third International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN), Kolkata, India, 3–5 November 2017. [Google Scholar] [CrossRef]
Aakif, A.; Khan, M.F. Automatic classification of plants based on their leaves. Biosyst. Eng. 2015, 139, 66–75. [Google Scholar] [CrossRef]
Chaki, J.; Parekh, R.; Bhattacharya, S. Plant leaf recognition using texture and shape features with neural classifiers. Pattern Recognit. Lett. 2015, 58, 61–68. [Google Scholar] [CrossRef]
Lee, S.H.; Chan, C.S.; Wilkin, P.; Remagnino, P. Deep-plant: Plant identification with convolutional neural networks. In Proceedings of the 2015 IEEE International Conference on Image Processing (ICIP), Quebec City, QC, Canada, 27–30 September 2015. [Google Scholar] [CrossRef]
Zhao, J.H.; Chen, X.H.; Dou, X.T.; Cao, Y.E.; Wang, Y.R.; Cui, Y.Y.; Niu, X.Y. Automatic classification of medicinal materials based on three-dimensional point cloud and surface spectral information. In Proceedings of the International Conference on Computer Graphics, Artificial Intelligence, and Data Processing (ICCAID 2021), Harbin, China, 24–26 December 2021. [Google Scholar] [CrossRef]
Khalid, F.; Romle, A.A. Herbal plant image classification using transfer learning and fine-tuning deep learning model. ARASET 2024, 35, 16–25. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar] [CrossRef]
Xing, C.; Huo, Y.; Huang, X.; Lu, C.; Liang, Y.; Wang, A. Research on image recognition technology of traditional Chinese medicine based on deep transfer learning. In Proceedings of the 2020 International Conference on Artificial Intelligence and Electromechanical Automation (AIEA), Tianjin, China, 26–28 June 2020. [Google Scholar] [CrossRef]
Woo, S.; Debnath, S.; Hu, R.; Chen, X.; Liu, Z.; Kweon, I.S.; Xie, S. Convnext v2: Co-designing and scaling convnets with masked autoencoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar] [CrossRef]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
Liu, Z.; Mao, H.; Wu, C.-Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022. [Google Scholar] [CrossRef]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 10–17 October 2021. [Google Scholar] [CrossRef]
Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar] [CrossRef]
Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv 2015, arXiv:1502.03167. [Google Scholar]
Ba, J.L.; Kiros, J.R.; Hinton, G.E. Layer normalization. arXiv 2016, arXiv:1607.06450. [Google Scholar]
Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. arXiv 2015, arXiv:1511.07122. [Google Scholar]
Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning (ICML), Long Beach, CA, USA, 9–15 June 2019. [Google Scholar] [CrossRef]
Tan, M.; Le, Q. Efficientnetv2: Smaller models and faster training. In Proceedings of the 38th International Conference on Machine Learning (ICML), Virtual Event, 18–24 July 2021. [Google Scholar] [CrossRef]

Figure 1. Comparison between MSPyraNet and other convolutional architectures.

Figure 2. Distribution of the number of Chinese herbal images.

Figure 3. This figure shows examples of new Chinese herb types that increase inter-class similarity in the dataset: (a) raspberry and matrimony vine have similar color features; (b) poria cocos and alumen are similar in both color and shape, being white and lump-like; (c) rhizoma phragmitis and radix stemonae have similar morphological features; (d) cinnamon and dangshen both have strip-like features.

Figure 4. This figure shows examples of morphological diversity in the newly added Chinese herbal medicines; (a) illustrates the diversity of raspberry in terms of color (black, red, white, brown) and texture (plump, shriveled); (b) shows the processed and unprocessed forms of poria cocos, as well as differences in shape under different processing methods (lump-like, slice-like); (c) demonstrates the morphological differences in rhizoma phragmitis in different parts and between processed and unprocessed forms; (d) shows differences in the shape (rolled, strip-like, slice-like) and color (reddish-brown, claybank, gray) of cinnamon.

Figure 5. Multi-scale representation module and FACNBlock structure diagram.

Figure 6. Comparative experimental results of MSPyraNet on CHMD1.

Figure 7. Comparative experimental results of MSPyraNet on CHMD2.

Table 1. Comparison Table of CHMD1 and CHMD2 Datasets.

Dataset	Number of Classes	Total Images	Average Images per Class
CHMD1	16	3552	222
CHMD2	20	4485	224.25

Table 2. Comparative experimental results of MSPyraNet on CHMD1.

Method	Accuracy	Precision	Recall	F1
ConvNeXtv2_b	78.37%	77.17%	75.95%	75.70%
ConvNeXt_b	72.32%	70.21%	69.03%	69.34%
ViT_b_16	76.80%	75.69%	73.33%	73.86%
Swin_Transformer	74.92%	72.35%	71.72%	71.79%
Resnet_18	63.95%	60.70%	60.14%	60.14%
Resnet_50	69.59%	66.09%	66.98%	66.02%
Resnet_152	72.73%	74.38%	69.45%	70.87%
EfficientNet_b0	69.91%	68.52%	68.52%	66.88%
EfficientNet_b1	68.97%	66.54%	66.13%	66.08%
EfficientNet_b2	71.79%	69.82%	69.40%	68.89%
EfficientNetv2_s	79.94%	78.07%	77.99%	77.28%
EfficientNetv2_m	74.29%	73.21%	71.28%	71.38%
CSA	81.50%	79.88%	79.68%	79.43%
MSPyraNet (ours)	86.22%	83.80%	83.81%	83.26%

Table 3. Comparative experimental results of MSPyraNet on CHMD2.

Method	Accuracy	Precision	Recall	F1
ConvNeXtv2_b	76.70%	76.57%	74.37%	74.67%
Resnet_50	81.11%	80.54%	78.27%	78.86%
EfficientNet_b2	77.13%	76.89%	75.64%	75.74%
EfficientNetv2_s	81.53%	80.37%	79.50%	79.59%
EfficientNetv2_m	84.68%	83.20%	82.67%	82.51%
CSA	77.24%	76.91%	74.25%	74.87%
MSPyraNet (ours)	89.22%	88.85%	87.87%	88.16%

Table 4. Comparative experiments with and without the multi-scale representation module on CHMD1 and CHMD2.

Dataset	Method	Accuracy	Precision	Recall	F1
CHMD1	Without Multi-scale Representation Module	76.41%	72.72%	70.16%	70.44%
CHMD1	With Multi-scale Representation Module (ours)	86.22%	83.80%	83.81%	83.26%
CHMD2	Without Multi-scale Representation Module	77.08%	77.74%	74.28%	75.08%
CHMD2	With Multi-scale Representation Module (ours)	89.22%	88.85%	87.87%	88.16%

Table 5. Comparative experiments of MSPyraNet and existing multi-scale fusion pyramid structures on CHMD2.

Method	Accuracy	Precision	Recall	F1
$[\begin{array}{l} c o n v 1 \times 1 \\ c o n v 3 \times 3 (r a t e = 6) \\ c o n v 3 \times 3 (r a t e = 12) \\ c o n v 3 \times 3 (r a t e = 18) \end{array}]$	79.10%	79.08%	75.67%	76.50%
$[\begin{array}{l} c o n v 1 \times 1 \\ c o n v 3 \times 3 \\ c o n v 3 \times 3 (r a t e = 2) \end{array}]$ (ours)	89.22%	88.85%	87.87%	88.16%

Table 6. Experiments with different convolution unit configurations on CHMD2.

Method	Accuracy	Precision	Recall	F1
All (ours)	89.22%	88.85%	87.87%	88.16%
w/o $c o n v 1 \times 1$	88.40%	88.61%	86.53%	87.09%
w/o $c o n v 3 \times 3$	82.99%	82.96%	80.86%	81.42%
w/o $c o n v 3 \times 3 (r a t e = 2)$	84.59%	84.67%	82.49%	83.04%

Table 7. Single Chinese herb identification accuracy on CHMD2.

Name of Chinese Herbal Medicine	ConvNeXtv2_b	CSA	MSPyraNet (Ours)
Amomum Tsao-ko	84.46%	92.57%	95.95%
Cordyceps Sinensis	89.52%	86.29%	92.74%
Ganoderma Lucidum	70.00%	76.00%	93.00%
Folium Artemisiae Argyi	75.00%	63.16%	89.47%
White Hyacinth Bean	59.62%	61.54%	76.92%
Radix Stemonae	55.36%	60.71%	64.29%
Alumen	92.31%	88.46%	100.00%
Lily	79.00%	85.00%	92.00%
Radix Paeoniae Alba	57.61%	53.26%	72.83%
Rhizoma Atractylodis Macrocephalae	61.36%	59.09%	70.45%
Mint	100.00%	98.53%	99.26%
Angelica Sinensis	73.08%	76.92%	82.69%
Dangshen	20.83%	30.56%	75.00%
Poria Cocos	75.00%	66.67%	92.86%
Raspberry	93.27%	95.19%	93.27%
Matrimony Vine	92.05%	97.73%	90.91%
Cinnamon	82.00%	89.00%	95.00%
Flos Sophorae	72.37%	59.21%	94.74%
Honeysuckle	88.89%	76.39%	98.61%
Rhizoma Phragmitis	65.62%	68.75%	87.50%

Table 8. Comparison of classification performance for Chinese herbal medicine with high inter-class similarity.

True Label	Method	ConvNeXtv2_b	CSA	MSPyraNet (Ours)
Flos Sophorae	Flos Sophorae	72.37%	59.21%	94.74%
Flos Sophorae	Honeysuckle	23.68%	38.16%	0.00%
Honeysuckle	Honeysuckle	88.89%	76.39%	98.61%
	Folium Artemisiae Argyi	6.94%	8.33%	0.00%
	Flos Sophorae	4.17%	12.50%	1.39%
Dangshen	Dangshen	20.83%	30.56%	75.00%
	Angelica Sinensis	12.50%	22.22%	4.17%
	Flos Sophorae	22.22%	13.89%	0.00%
	Rhizoma Phragmitis	26.39%	16.67%	5.56%
White Hyacinth Bean	White Hyacinth Bean	59.62%	61.54%	76.92%
White Hyacinth Bean	Poria Cocos	23.08%	26.92%	15.38%

Table 9. Experiments under different dataset splits.

Method	Accuracy	Precision	Recall	F1
Split Method 1	89.22%	88.85%	87.87%	88.16%
Split Method 2	88.96%	88.81%	88.43%	88.69%
Split Method 3	89.82%	89.68%	89.52%	89.61%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dai, W.; Ma, Y.; Fan, Y.; Ma, J. A Multi-Scale Feature Extraction Algorithm for Chinese Herbal Medicine Image Classification. Appl. Sci. 2025, 15, 4271. https://doi.org/10.3390/app15084271

AMA Style

Dai W, Ma Y, Fan Y, Ma J. A Multi-Scale Feature Extraction Algorithm for Chinese Herbal Medicine Image Classification. Applied Sciences. 2025; 15(8):4271. https://doi.org/10.3390/app15084271

Chicago/Turabian Style

Dai, Wenbin, Yuxin Ma, Yan Fan, and Jun Ma. 2025. "A Multi-Scale Feature Extraction Algorithm for Chinese Herbal Medicine Image Classification" Applied Sciences 15, no. 8: 4271. https://doi.org/10.3390/app15084271

APA Style

Dai, W., Ma, Y., Fan, Y., & Ma, J. (2025). A Multi-Scale Feature Extraction Algorithm for Chinese Herbal Medicine Image Classification. Applied Sciences, 15(8), 4271. https://doi.org/10.3390/app15084271

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Multi-Scale Feature Extraction Algorithm for Chinese Herbal Medicine Image Classification

Abstract

1. Introduction

2. Related Work

2.1. Herb Dataset Construction

2.2. Application of Convolutional Neural Networks in Herbal Classification

3. Dataset Construction

4. MSPyraNet Architecture Design

4.1. FACNBlock Design

4.2. MSPyraNet Overall Architecture

5. Experiments and Results Analysis

5.1. Evaluation Metrics and Experimental Setup

5.2. Comparative Experiments with Common Models on CHMD1

5.3. Comparative Experiments with Common Models on CHMD2

5.4. Ablation Experiment Analysis

5.5. Single Chinese Herb Prediction Accuracy Analysis

5.6. Evaluation Under Different Dataset Splits

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI