A Hybrid Deep Learning Model for Aromatic and Medicinal Plant Species Classification Using a Curated Leaf Image Dataset

E. M., Shareena; Chandy, D. Abraham; P. M., Shemi; Poulose, Alwin

doi:10.3390/agriengineering7080243

Open AccessArticle

A Hybrid Deep Learning Model for Aromatic and Medicinal Plant Species Classification Using a Curated Leaf Image Dataset

by

Shareena E. M.

^1,2,*

,

D. Abraham Chandy

¹,

Shemi P. M.

² and

Alwin Poulose

³

¹

Department of Electronics and Communication Engineering, Karunya Institute of Technology and Sciences, Coimbatore 641114, India

²

Department of Electronics, MES College Marampally, Aluva 683105, India

³

School of Data Science, Indian Institute of Science Education and Research Thiruvananthapuram (IISER TVM), Vithura, Thiruvananthapuram 695551, India

^*

Author to whom correspondence should be addressed.

AgriEngineering 2025, 7(8), 243; https://doi.org/10.3390/agriengineering7080243

Submission received: 15 June 2025 / Revised: 20 July 2025 / Accepted: 23 July 2025 / Published: 1 August 2025

(This article belongs to the Special Issue Implementation of Artificial Intelligence in Agriculture)

Download

Browse Figures

Versions Notes

Abstract

In the era of smart agriculture, accurate identification of plant species is critical for effective crop management, biodiversity monitoring, and the sustainable use of medicinal resources. However, existing deep learning approaches often underperform when applied to fine-grained plant classification tasks due to the lack of domain-specific, high-quality datasets and the limited representational capacity of traditional architectures. This study addresses these challenges by introducing a novel, well-curated leaf image dataset consisting of 39 classes of medicinal and aromatic plants collected from the Aromatic and Medicinal Plant Research Station in Odakkali, Kerala, India. To overcome performance bottlenecks observed with a baseline Convolutional Neural Network (CNN) that achieved only 44.94% accuracy, we progressively enhanced model performance through a series of architectural innovations. These included the use of a pre-trained VGG16 network, data augmentation techniques, and fine-tuning of deeper convolutional layers, followed by the integration of Squeeze-and-Excitation (SE) attention blocks. Ultimately, we propose a hybrid deep learning architecture that combines VGG16 with Batch Normalization, Gated Recurrent Units (GRUs), Transformer modules, and Dilated Convolutions. This final model achieved a peak validation accuracy of 95.24%, significantly outperforming several baseline models, such as custom CNN (44.94%), VGG-19 (59.49%), VGG-16 before augmentation (71.52%), Xception (85.44%), Inception v3 (87.97%), VGG-16 after data augumentation (89.24%), VGG-16 after fine-tuning (90.51%), MobileNetV2 (93.67), and VGG16 with SE block (94.94%). These results demonstrate superior capability in capturing both local textures and global morphological features. The proposed solution not only advances the state of the art in plant classification but also contributes a valuable dataset to the research community. Its real-world applicability spans field-based plant identification, biodiversity conservation, and precision agriculture, offering a scalable tool for automated plant recognition in complex ecological and agricultural environments.

Keywords:

leaf-based plant recognition; smart farming; precision agriculture; fine-grained species classification

1. Introduction

The classification of aromatic and medicinal plants is a critical and multidisciplinary field essential for the systematic identification, documentation, and understanding of plant species with therapeutic and aromatic properties [1,2]. These plants have long held cultural, medicinal, and economic importance, playing a central role in traditional healing practices and the fragrance industry. Their classification encompasses a range of scientific and cultural domains, including botanical taxonomy, morphological and chemical characterization, and the preservation of traditional ecological knowledge. Taxonomic classification provides the foundational framework for organizing plant species into hierarchical categories, such as families, genera, and species [3,4]. Meanwhile, morphological traits, such as leaf shape, color, and venation, are often the primary indicators used in field identification [5].

Chemical analysis further refines classification by examining phytochemical profiles, including alkaloids, flavonoids, essential oils, and volatile compounds, which define a plant’s medicinal and aromatic potential [6]. In addition, indigenous knowledge systems make significant contributions to plant classification by providing contextual and experiential insights that have been passed down through generations. These systems often categorize plants based on practical applications in healing, rituals, and local ecology. Geographic factors also play a pivotal role, as environmental conditions can give rise to region-specific species and chemotypes with unique properties [7,8]. Recent advances in molecular biology and genomics have revolutionized plant taxonomy, with tools such as DNA barcoding and molecular markers enabling the precise and reproducible classification of plant species [9,10]. However, as the volume and complexity of morphological, chemical, and genetic data increase, traditional classification techniques alone may be insufficient to achieve scalable and accurate identification. In this context, artificial intelligence (AI), particularly machine learning and deep learning, provides powerful tools for automating and enhancing the classification process [11].

Deep learning algorithms, particularly Convolutional Neural Networks (CNNs), have demonstrated outstanding performance in image-based plant classification tasks by learning hierarchical and spatial features directly from raw leaf images [12]. These models excel at capturing intricate visual patterns, such as venation, margin shapes, and texture, which are attributes that are critical in distinguishing between aromatic and medicinal plant species, especially when inter-class similarities are high. Compared to traditional machine learning algorithms, such as Support Vector Machines (SVMs) and Random Forests, which rely on manual feature extraction and expert domain knowledge, CNNs automate the feature learning process. This significantly reduces preprocessing overhead and enhances classification accuracy, particularly when applied to high-resolution and diverse leaf datasets, such as the one used in this study.

Deep learning models employed in plant classification can be broadly categorized into three types based on their core objectives: generative, discriminative, and hybrid architectures. Generative models, including Generative Adversarial Networks (GANs), are primarily developed for synthesizing new data but also serve a valuable auxiliary role in augmenting datasets, especially for under-represented plant species or rare leaf variations, thereby indirectly enhancing the performance of classification models [13,14]. Discriminative models, such as CNNs and fine-tuned pre-trained networks like VGG16, are explicitly designed for classification tasks and are adept at mapping input features to plant species labels [15]. Hybrid models integrate both paradigms: the generative component improves dataset richness by producing realistic synthetic samples, while the discriminative component ensures accurate species identification. In the context of aromatic and medicinal plant classification, such hybrid models are highly effective, particularly when dealing with class imbalance, limited labeled data, or high morphological variability [16,17,18].

This study aimed to leverage the strengths of deep learning for the classification of 39 aromatic and medicinal plant species using a unique dataset of leaf images. The research begins with a baseline CNN, followed by experiments using transfer learning with the VGG16 model. Enhancements such as data augmentation, Squeeze-and-Excitation (SE) blocks, Gated Recurrent Units (GRUs), Transformer modules, and Dilated Convolutions are progressively incorporated to improve classification accuracy. The proposed hybrid model achieves high performance while addressing challenges associated with limited data availability and species similarity.

The hypothesis of this work was not only to demonstrate the potential of deep learning in botanical classification but also contributes to the field by introducing a curated dataset collected from the Aromatic and Medicinal Plant Research Station, Odakkali, Kerala, India. The availability of such datasets is crucial for advancing AI-based agricultural research and supporting innovative farming initiatives.

2. Related Work

The classification of plant species, particularly aromatic and medicinal plants, has garnered significant attention with the advancement of deep learning and image-based techniques [19,20]. Among these, CNNs have emerged as the most widely adopted models due to their ability to learn hierarchical features directly from raw input data.

Ibrahim et al. [21] pioneered the use of CNN, AlexNet, and GoogleNet architectures for leaf recognition, achieving high classification accuracy on the Leafsnap dataset. Their work demonstrated that augmenting CNNs with multiple max-pooling layers enhances performance. Kadiwal et al. [22] proposed a CNN-based automated plant recognition system integrated with a mobile application developed in Java for Android, emphasizing the practicality and accessibility of deep learning-based classification. Malik et al. [23] presented a real-time plant species identification system utilizing an EfficientNet-B1 model, which incorporates crowdsourcing feedback and geospatial mapping. This method achieved an accuracy of 87% on the PlantCLEF and UBD Botanical datasets. Similarly, Van Hieu and Hien [24] conducted a large-scale plant classification using deep learning and the PlantCLEF2003 dataset, which comprises over 51,000 images from 609 species. By employing ResNet50V2, Inception ResNetV2, MobileNetV2, and VGG16 as feature extractors and SVM and KNN as classifiers, the study demonstrated that MobileNetV2 achieved the highest accuracy of 95.6%. Sharrab et al. [25] proposed a deep learning approach for medicinal plant classification using a VGG-16-based CNN model trained on ImageNet, handling a dataset of over 25,000 images. Lee et al. [26] introduced a hybrid CNN and Deconvolutional Network (DN) approach that emphasized venation patterns as a stronger classifier than leaf outlines. This model extracted features across hierarchical layers, mirroring the logic of botanical categorization. Roslan et al. [27] evaluated the performance of a CNN on a dataset of Malaysian medicinal herbs. The model achieved 75% accuracy on real data, which improved to 88% with image augmentation, demonstrating the effectiveness of synthetic data in enhancing classification. Gopal et al. [28] developed a basic medicinal plant identification system using boundary, moment, and color-based features, achieving 92% accuracy, showcasing a traditional yet efficient approach to feature-based classification.

In the area of image preprocessing and segmentation, Gao and Lin [29] implemented a segmentation technique that enhanced vein structures using gradient magnitude and angle images, followed by OTSU thresholding. The method outperformed several deep learning-based segmentation models, including Fully Convolutional Networks (FCNs). Maulana and Herdiyeni [30] proposed a hybrid retrieval system that combined image and text features for medicinal plant identification, employing Fuzzy Local Binary Patterns (FLBPs) and BM25 term weighting, achieving an average precision of 0.7081. Zin et al. [31] demonstrated the benefit of a multi-layer CNN with additional parameters and data augmentation techniques, such as flipping and rotating, in improving classification performance under small sample conditions. Taslim et al. [32] utilized ResNet-50 for the classification of five Malaysian plant species captured with mobile phones across varied backgrounds, obtaining over 98% accuracy. Mardiana et al. [33] demonstrated the effectiveness of a pre-trained VGG16 model and image augmentation using the Image Data Generator in boosting herbal plant classification accuracy from 82% to 97%, thereby mitigating issues related to overfitting.

These studies collectively highlight the progression from classical image processing and traditional machine learning methods to more advanced and scalable deep learning models. Traditional machine learning models, such as Support Vector Machines (SVMs) and Random Forests, have been recognized for their robustness in handling high-dimensional data and non-linear features. SVM’s kernel trick enables effective classification even in complex leaf structures, while Random Forests reduce overfitting by combining multiple decision trees and efficiently handling larger datasets. Conversely, CNN-based deep learning models excel by automatically learning multi-scale features from raw images without explicit manual intervention. Architectures like VGG16 and ResNet-50 are particularly effective in capturing fine-grained leaf characteristics. However, performance is not solely dictated by model architecture; it also heavily depends on dataset size, diversity, and the effectiveness of augmentation strategies. The literature clearly emphasizes that combining a strong model design with sufficient and representative data are vital for achieving high plant classification accuracy.

3. Rationale and Research Objectives

The accurate classification of medicinal and aromatic plants is a critical task in botany, agriculture, and pharmacognosy. However, this process remains highly challenging due to the vast morphological diversity observed across species. Traditional identification methods, whether through manual botanical observation or classical machine learning techniques, often rely on handcrafted features, which are time-consuming to engineer and prone to inconsistencies, especially when dealing with intra-species variability. Moreover, these methods frequently struggle to generalize across species due to rigid feature representations and limited adaptability. To address these limitations, this research proposes an automated classification approach using deep learning, specifically a modified CNN architecture based on the VGG16 model. CNNs have demonstrated superior performance in various image classification tasks by learning hierarchical visual features directly from input images, thereby removing the dependency on expert-driven feature design. The proposed method leverages the adaptability of CNNs to capture fine-grained and complex morphological traits across plant species, using a dataset comprising leaf images from 39 different medicinal and aromatic plant classes. The motivation for adopting deep learning, and particularly CNNs, for this study, is rooted in six core factors:

Morphological Diversity in Leaf Shape and Size: The 39 plant species selected exhibit diverse leaf morphologies, including lanceolate, cordate, and elliptical forms. These shapes serve as essential visual identifiers in plant taxonomy. CNNs excel at capturing such geometric patterns through convolutional filters, reducing the need for manual annotation or segmentation, which is often required in classical techniques.
Species-Specific Venation Patterns: Venation patterns, such as parallel, reticulate, and palmate types, are taxonomically significant but challenging to quantify using traditional image processing methods. CNNs are capable of learning these intricate structures at multiple levels of abstraction, enabling more reliable classification by focusing on both global and local vein arrangements.
Distinct Leaf Margins, Tips, and Bases: Features such as serrated versus entire leaf margins and apex types, including acuminate or mucronate, are subtle yet crucial for species differentiation. CNNs can effectively learn and recognize these nuanced features from high-resolution imagery, allowing for precise classification that would otherwise be challenging to encode algorithmically.
Texture and Color Variations: Variability in surface texture (e.g., succulent vs. waxy leaves), trichome presence, and subtle color gradients serve as additional distinguishing factors. Deep learning models are particularly adept at capturing these features, even under variable lighting conditions or environmental noise, thereby improving generalization across real-world data.
Handling Inter-Class Similarity and Intra-Class Variability: Morphologically similar species often lead to inter-class similarity, while environmental factors, seasonal changes, or developmental stages introduce intra-class variability. CNNs are inherently capable of learning robust and discriminative representations that generalize well across these variations, outperforming feature-specific or rule-based models in such complex classification scenarios
Environmental Robustness and Scalability: By utilizing data augmentation techniques, CNNs exhibit resilience to common field-related challenges, including image rotation, scale variation, lighting changes, and background clutter. Additionally, these models are scalable and can be trained on large datasets to support classification across hundreds of plant species, making them ideal for real-world, large-scale deployment.

4. Materials and Methods

A novel image dataset collected from the Aromatic and Medicinal Plant Research Station (AMPRS), Odakkali, Kerala, was utilized for experimentation. Various model architectures were explored, including a custom CNN, transfer learning using VGG16, fine-tuning strategies, the integration of attention mechanisms such as Squeeze-and-Excitation (SE) blocks, and a proposed hybrid architecture that combines VGG16 with Gated Recurrent Units (GRU), Transformer modules, and Dilated Convolutions. The methodology aimed to evaluate and enhance model performance through systematic architectural modifications, data augmentation, and rigorous training protocols.

4.1. Dataset Collection

The dataset used in this study comprises high-resolution leaf images of 39 distinct medicinal and aromatic plant species, collected at the Aromatic and Medicinal Plant Research Station (AMPRS) in Odakkali, which operates under the Kerala Agricultural University. The station is geographically located between

10 ° 5^{'} 40^{″}

to

10 ° 6^{'} 0^{″}

N latitude and

76 ° 32^{'} 35^{″}

to

76 ° 32^{'} 55^{″}

E longitude, covering an area of approximately 12.5 hectares. The dataset represents a diverse collection of species, including Aaduthinnapala (Aristolochia indica), Adalodakam (Justicia adhatoda), Alpam (Thottea siliquosa), Arayal (Ficus religiosa), Asokam (Saraca asoca), Athy (Ficus racemosa), Attuvanchy (Homonoia riparia), Bengla Thippali (Piper chaba), Chathuramulla (Myxopyrum smilacifolium (Wall.) Blume), Chengazhuneerkizhang (Kaempferia galanga), Chuvanna Kadalavanakk (Jatropha gossypiifolia), Chuvannamantharam (Bauhinia purpurea), Dandhappala (Wrightia tinctoria), Ekanayakam (Salacia reticulata), Erukk (Ekanayakam), Ilanji (Mimusops elengi), Kallal (Ficus arnottiana), Kanjiram (Strychnos nux-vomica), Karinijotta (Quassia indica), Karinochi (Vitex negundo), Karpooram (Cinnamomum camphora), Karuva (Cinnamomum verum), Kattumunthiri (Cissus vitiginea L), Kumizh (Gmelina arborea), Maramanjal (Coscinium fenestratum), Modakam (Fagraea ceilanica), Mullatha (Annona muricata), Murikkootti Chuvapp (Hemigraphis colorata), Nagadandhi (Couroupita guianensis), Noni (Morinda citrifolia), Palakapayyani (Oroxylum indicum), Pambuvalenchedi (Rauvolfia serpentina), Panikkoorka (Plectranthus amboinicus), Parijatham (Nyctanthes arbor-tristis), Samudrappacha (Argyreia nervosa), Thathiri (Woodfordia fruticosa), Theeppala (Alstonia venenata), Thippali (Piper longum), and Vathakkodi (Justicia gendarussa Burm).

For each of these 39 medicinal and aromatic plant species in the dataset, leaves were collected from at least 100 healthy, disease-free individuals to ensure intra-species biological variability and to minimize individual-plant bias. When morphology and plant size permitted, leaves were sampled from multiple canopy positions, such as apical, middle, and basal, to represent positional variation in leaf shape, thickness, venation prominence, and pigmentation in the dataset. All material was collected during the mature vegetative growth stage for each species to avoid confounding ontogenetic effects associated with juvenile or senescent tissues. To further support downstream comparative and ecological analyses, each species entry in the dataset was annotated with its life-cycle category (perennial) based on regional agronomic and botanical records from AMPRS.

As illustrated in Figure 1, the dataset contains a total of 3900 images, with a balanced distribution of 100 images per species. Each image was captured in natural daylight field conditions using a SONY ALPHA 7R III (ILCE-7RM3) camera (Sony Corporation, Wuxi, China) equipped with ZEISS BATIS 40 mm f/2 CF (Carl Zeiss AG, Oberkochen, Germany), CANON EF 100 mm f/2.8 Macro lens (CANON Inc., Utsunomiyashi, Japan), and a SIGMA MC-11 (SIGMA Corporation, Aizuwakamatsu, Japan) mount converter. The images were taken with a resolution of 7952 × 5304 pixels at 350 dpi (both horizontally and vertically). While prior datasets often relied on controlled or synthetic environments, this study prioritizes realism by capturing leaves with natural backgrounds and varied orientations, thereby preserving subtle morphological variations, including venation, texture, margin shape, and pigmentation.

To facilitate model training and evaluation, the dataset was divided into a 70:30 stratified split, comprising 2730 training and 1170 testing images. Additionally, 10% of the training data (273 images) was reserved for validation, resulting in 2457 training, 273 validation, and 1170 testing images. This curated dataset serves as a high-quality, domain-specific benchmark for the development and validation of deep learning models tailored to fine-grained plant species classification.

4.2. Preprocessing and Data Augmentation

To ensure input consistency across models and improve generalization, all images were resized to

224 \times 224 \times 3

(RGB) and normalized before training. Data augmentation was applied on-the-fly using Keras ImageDataGenerator with the following parameters: rotation_range =

20 °

, zoom_range = 0.2, width_shift_range = 0.2, height_shift_range = 0.2, and horizontal_flip = True. These controlled spatial and geometric perturbations exposed the network to natural variations in leaf pose, orientation, and scale that commonly occur in field imagery, thereby reducing sensitivity to acquisition conditions and mitigating overfitting [34].

4.3. Model Architectures and Experimental Setup

The performance of several deep learning models was evaluated for the classification of aromatic and medicinal plant species. The models include a custom CNN, transfer learning using VGG16, fine-tuned VGG16, a VGG16 enhanced with Squeeze-and-Excitation (SE) blocks, and a hybrid model integrating VGG16 with Batch Normalization, GRU layers, Transformer modules, and dilated convolutions.

4.3.1. Custom CNN Architecture

In the first part of this work, we employed a custom CNN designed specifically for multi-class image classification [35,36]. To ensure input consistency across all models evaluated in this study, the network was trained on RGB images resized to

224 \times 224 \times 3

. The architecture of the custom CNN is illustrated in Figure 2.

The network begins with a convolutional layer consisting of 64 filters, each of size

3 \times 3

, using the ReLU activation function to introduce non-linearity. This is followed by a

2 \times 2

max-pooling layer that reduces the spatial dimensions, thereby decreasing computational complexity and controlling overfitting. The second convolutional layer includes 128 filters, again with a

3 \times 3

kernel and ReLU activation. This is also followed by a

2 \times 2

max-pooling layer. Together, these convolutional and pooling layers extract hierarchical spatial features from the input image. After the convolutional blocks, the feature maps are flattened into a one-dimensional vector, which is then passed through a dropout layer with a dropout rate of 0.5 to prevent overfitting. This is followed by a fully connected dense layer consisting of 256 neurons with ReLU activation, enabling high-level feature abstraction. Another dropout layer with a dropout rate of 0.5 is applied before the output layer. The final output layer consists of 39 neurons (corresponding to the 39 image classes) and uses the softmax activation function to produce a probability distribution over the classes, supporting multi-class classification. The model is compiled using the categorical cross-entropy loss function, which is suitable for multi-class classification tasks. Optimization is performed using an appropriate optimizer (e.g., Adam). Additionally, EarlyStopping is employed to halt training when the validation loss does not improve over a predefined number of epochs, thereby preventing overfitting. ModelCheckpoint is also utilized to save the model weights corresponding to the epoch with the best validation performance.

4.3.2. Transfer Learning with VGG16

The second phase of analysis employs transfer learning using the pre-trained VGG16 model [37,38,39,40], a deep CNN trained initially on the ImageNet dataset. Transfer learning enables the leveraging of previously learned, rich feature representations from large datasets and adapting them to new, smaller datasets with limited samples. The architecture of the transfer learning model is illustrated in Figure 3.

The VGG16 model is used as a fixed feature extractor by removing its top fully connected layers (i.e., ‘include_top = False’) and retaining only the convolutional base. The input images are resized to

224 \times 224 \times 3

, which matches the expected input dimensions of the VGG16 model. These RGB images are passed through the frozen convolutional layers of VGG16, which extract robust features such as edges, textures, and patterns from the input. A GlobalAveragePooling2D layer is appended to the base model to reduce the spatial dimensions of the output feature maps by computing the average of each feature map. This not only minimizes the number of trainable parameters but also helps mitigate overfitting by providing a smooth transition from convolutional features to dense layers. Following the pooling layer, a fully connected Dense layer with 256 neurons and ReLU activation is added. This layer learns non-linear combinations of the features extracted by the base model. To further prevent overfitting, a Dropout layer with a dropout rate of 0.5 is applied. Finally, a softmax Dense layer with 39 neurons (corresponding to the number of target classes) is used to perform the multi-class classification. During training, the weights of the VGG16 convolutional base are kept frozen to retain the powerful features learned from the ImageNet dataset. Only the custom top layers (GlobalAveragePooling2D, Dense, and Dropout) are trained, ensuring efficiency and preventing overfitting on small datasets.

4.3.3. Fine-Tuned VGG16

To enhance the model’s ability to adapt to domain-specific characteristics of medicinal and aromatic plant species, a fine-tuned version of the VGG16 model was employed. Initially, a standard transfer learning approach was used, where the pre-trained VGG16 served as a fixed feature extractor. In that setup, all convolutional layers were frozen, and only the custom classification head, which included a GlobalAveragePooling2D layer, a fully connected Dense layer with 256 neurons and ReLU activation, a Dropout layer (rate = 0.5), and a final softmax Dense layer with 39 output neurons, was trainable. To further improve classification performance and capture fine-grained features unique to plant species (such as vein patterns, leaf textures, and edge structures), the model was fine-tuned by unfreezing the final two convolutional blocks of the VGG16 architecture (as shown in Figure 4). These deeper layers are responsible for extracting high-level, abstract features, making them critical for distinguishing subtle morphological differences between classes.

The model input consisted of RGB images resized to

224 \times 224 \times 3

, which is consistent with the VGG16 requirements. The complete fine-tuned architecture comprises five convolutional blocks with increasing filter depth: the first block contains two Conv2D layers with 64 filters each, followed by a MaxPooling2D layer. The second block uses 128 filters, the third and fourth use 256 and 512 filters, respectively, and the fifth block uses 512 filters again. Each convolutional layer uses a

3 \times 3

kernel and rectified linear unit (ReLU) activation. Max-pooling is applied after each block to reduce spatial dimensionality and computational complexity. To reduce overfitting during fine-tuning, the learning rate was decreased to

1 \times 10^{- 5}

, enabling more controlled and stable updates to the pre-trained weights. This approach helped retain the general feature representations learned from ImageNet while allowing the model to adapt to the fine details of the new dataset.

4.3.4. VGG16 with Squeeze-and-Excitation (SE) Blocks

To improve classification performance on the visually similar species of medicinal and aromatic plants, we propose a modified convolutional neural network by integrating Squeeze-and-Excitation (SE) attention blocks [41,42] into the VGG16 architecture. While VGG16 is a robust feature extractor with deep convolutional layers, it treats all feature channels equally, potentially overlooking the critical and class-specific features essential for distinguishing subtle differences in leaf texture, venation, and pigmentation.

To address this limitation, we incorporate SE blocks into each of the five convolutional blocks of the pre-trained VGG16 model (see Figure 5). The SE block introduces channel-wise attention by adaptively recalibrating feature responses. This allows the network to emphasize informative features and suppress less useful ones, enhancing its discriminative power for fine-grained classification tasks. The SE block consists of three core operations: Squeeze, Excitation, and Scaling.

Squeeze: This operation compresses the spatial dimensions of the feature map to a channel descriptor using Global Average Pooling (GAP). For a feature map $U \in R^{H \times W \times C}$ , where H and W are the spatial dimensions and C is the number of channels, GAP is applied independently on each channel to produce a vector $z \in R^{C}$ :

$z_{c} = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} U_{c} (i, j)$

(1)
Excitation: This step models channel-wise dependencies and learns which channels are more informative. The vector z is passed through two fully connected layers with ReLU and sigmoid activations:

$S = σ (W_{2} \cdot δ (W_{1} \cdot z))$

(2)

where $W_{1} \in R^{\frac{C}{r} \times C}$ and $W_{2} \in R^{C \times \frac{C}{r}}$ are the learned weight matrices, $δ (\cdot)$ is the ReLU activation function, $σ (\cdot)$ is the sigmoid activation function, and r is the channel reduction ratio (typically $r = 16$ ).
Scaling: The original feature map U is scaled (recalibrated) by channel-wise multiplication with the learned attention weights S to produce the refined output $\tilde{U}$ :

${\tilde{u}}_{c} = s_{c} \cdot u_{c}$

(3)

This architecture maintains the five convolutional blocks of VGG16, each followed by a max-pooling layer with a stride of 2. Each convolutional block is enhanced with an SE module, improving the network’s ability to focus on the most relevant feature maps. After the convolutional and SE-enhanced feature extraction stages, the output is flattened and passed through three fully connected layers: two dense layers with 4096 units (ReLU activation) and a final softmax layer tailored for the classification task.

The detailed configuration of the SE-VGG16 model used in this study is presented in Table 1. The architecture builds upon the pre-trained VGG16 model, with Squeeze-and-Excitation (SE) blocks integrated after each of the five max-pooling layers to introduce channel-wise attention. These SE blocks use a reduction ratio of 16 and employ ReLU and Sigmoid activations to recalibrate feature maps. The model is optimized using the Adam optimizer with a learning rate of 0.0001 and trained using a batch size of 32. Fully connected layers comprise two dense layers with 4096 units each, utilizing ReLU activation, followed by a softmax layer for classification. To mitigate overfitting, a dropout rate of 0.5 is applied within the fully connected layers. The model is trained using categorical cross-entropy loss, which is suitable for multi-class classification tasks.

4.3.5. Hybrid Model with VGG16, Batch Normalization, GRUs, Transformers, and Dilated Convolutions

To overcome the limitations observed in conventional deep learning models, we designed a hybrid architecture that integrates multiple complementary components: VGG16, Batch Normalization, Dilated Convolutions, Gated Recurrent Units (GRUs), and Transformer Encoder layers (see Figure 6). The superiority of this model lies in its ability to combine local feature extraction, multi-scale context modeling, and global dependency learning within a single pipeline.

Backbone: VGG16

The model begins with the five convolutional blocks of the VGG16 network, leveraging pre-trained ImageNet weights for hierarchical feature extraction. VGG16 offers robust local feature capture capabilities, particularly for capturing leaf textures and shape characteristics. Each convolutional layer uses

3 \times 3

kernels followed by ReLU activation for non-linearity.

Batch Normalization

To stabilize and accelerate training, Batch Normalization [43] is applied after each convolutional layer. This normalization mitigates the internal covariate shift and enhances generalization, which is essential for handling variations in lighting, scale, and background commonly present in real-world plant images [44,45,46].

Dilated Convolutions

After the VGG16 backbone, we employ a Dilated Convolutional block with two

3 \times 3

convolution layers at a dilation rate of 2. This allows for an expanded receptive field without additional parameters, enabling the capture of broader contextual information while preserving fine details, such as venation patterns and glandular spots.

GRU Layer for Spatial Dependencies

The output feature maps are reshaped into sequences and processed using a GRU layer with 256 units. Unlike typical CNNs, which struggle to model long-range dependencies, GRUs capture spatial continuity and structural patterns (e.g., vein alignment or margin curvature) across the leaf image, enhancing discriminative power for species with subtle morphological differences [47].

Transformer Encoder for Global Context

A Transformer Encoder layer follows, incorporating an eight-head self-attention mechanism and a feed-forward network with residual connections and layer normalization. This module captures global relationships across distant leaf regions, enabling the model to analyze contour shapes and interior patterns jointly, a task that convolutional layers alone cannot achieve effectively.

Classification Layers

Finally, features are aggregated through two fully connected layers (each with 4096 units and ReLU activation), followed by a softmax layer for classification into plant species.

Justification of Complexity

Each component was selected based on its unique contribution to addressing challenges in plant leaf classification:

VGG16 provides robust local and hierarchical features.
Batch Normalization accelerates convergence and reduces overfitting.
Dilated Convolutions enable multi-scale feature extraction without loss of resolution.
GRUs capture spatial dependencies in sequential form for elongated patterns.
Transformer Encoder models global context and long-range feature interactions.

Together, these components provide a comprehensive representation of both texture and structure, enhancing the classification of fine-grained plant species.

Mathematical Formulations

Batch Normalization: For an input activation x, Batch Normalization transforms x as follows:

\begin{matrix} \hat{x} & = \frac{x - μ}{\sqrt{σ^{2} + ϵ}} \end{matrix}

(4)

\begin{matrix} y & = γ \hat{x} + β \end{matrix}

(5)

where

μ

and

σ^{2}

are the batch mean and variance,

ϵ

is a small constant for numerical stability, and

γ

and

β

are learnable parameters.

Gated Recurrent Unit (GRU): For time step t, we have

\begin{matrix} z_{t} & = σ (W_{z} x_{t} + U_{z} h_{t - 1} + b_{z}) \end{matrix}

(6)

\begin{matrix} r_{t} & = σ (W_{r} x_{t} + U_{r} h_{t - 1} + b_{r}) \end{matrix}

(7)

\begin{matrix} {\tilde{h}}_{t} & = tanh (W_{h} x_{t} + U_{h} (r_{t} ⊙ h_{t - 1}) + b_{h}) \end{matrix}

(8)

\begin{matrix} h_{t} & = (1 - z_{t}) ⊙ h_{t - 1} + z_{t} ⊙ {\tilde{h}}_{t} \end{matrix}

(9)

where

z_{t}

is the update gate,

r_{t}

is the reset gate,

{\tilde{h}}_{t}

is the candidate hidden state, and

h_{t}

is the final hidden state.

Self-Attention Mechanism:

Attention (Q, K, V) = softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(10)

where Q, K, and V are the query, key, and value matrices, and

d_{k}

is the key dimensionality.

Dilated Convolution: For a 1D signal x and filter w, the Dilated Convolution is defined as follows:

y [i] = \sum_{k = 0}^{K - 1} x [i + r \cdot k] \cdot w [k]

(11)

where r is the dilation rate and K is the filter size.

The hyperparameter configuration used for the proposed deep learning architecture is detailed in Table 2. This configuration combines the strengths of a pre-trained VGG-16 backbone with additional modules, including Batch Normalization, Dilated Convolutions, Gated Recurrent Units (GRUs), and a Transformer Encoder, to effectively capture both spatial and sequential patterns in leaf images. Key design choices include the use of ReLU activation, 3 × 3 convolutional kernels, 256 GRU units, eight-headed multi-head attention, and dropout regularization. The model was trained using the Adam optimizer with a learning rate of 0.0001, a batch size of 32, and a categorical cross-entropy loss function. Gradient clipping and validation split strategies were also employed to ensure training stability and generalization.

5. Results and Discussion

The dataset used in this study compressed 39 distinct classes, presenting a challenging multi-class classification task, especially in the context of visually similar plant species. To evaluate the effectiveness of the proposed models, several deep learning architectures were trained and compared, including a custom CNN, transfer learning with VGG16 (with and without data augmentation), fine-tuned VGG16, and progressively enhanced hybrid models integrating SE blocks, GRU layers, Transformer Encoders, Batch Normalization, and Dilated Convolutions.

5.1. Model Performance Analysis

The baseline model, a custom CNN developed from scratch, served as a foundational reference for performance benchmarking. As shown in Figure 7a, the model exhibited suboptimal learning dynamics, achieving a training accuracy of 43.14% and a validation accuracy of 52.53%. The corresponding losses remained high (1.76 training, 1.71 validation), indicating poor feature extraction and generalization capabilities. This underperformance is attributed to the limited depth and lack of trained filters, which are crucial for capturing the complex textures and structures of plant leaves. Figure 7b illustrates a significant performance improvement through the use of transfer learning with the VGG16 architecture. Leveraging pre-trained weights from ImageNet, this model achieved training and validation accuracies of 78.93% and 89.87%, respectively, with corresponding losses of 0.76 and 0.74. The improvement highlights the effectiveness of pre-trained convolutional layers in extracting general visual features relevant to plant leaf classification. Introducing data augmentation (Figure 7c) further enhanced model generalization by exposing it to diverse transformations of leaf images. Training accuracy increased to 91.14%, and validation accuracy reached 92.10%. Simultaneously, training and validation losses decreased to 0.44 and 0.43, respectively. These results confirm the role of augmentation in mitigating overfitting and increasing robustness across unseen data. Further gains were realized by fine-tuning the last two convolutional blocks of VGG16, as depicted in Figure 7d. The model achieved a training accuracy of 92.90% and a validation accuracy of 93.21%, with a corresponding decrease in loss to 0.24 (training) and 0.19 (validation). Fine-tuning allowed the network to adapt more closely to domain-specific patterns such as leaf venation, margins, and surface textures. The integration of Squeeze-and-Excitation (SE) blocks into the VGG16 backbone (Figure 7e) led to enhanced attention mechanisms, enabling channel-wise recalibration. This modification improved training and validation accuracies to 93.80% and 94.30%, respectively, while reducing losses to 0.14 and 0.12. The SE mechanism enhanced the model’s focus on subtle yet important visual cues, such as pigmentation or vein structure. The proposed hybrid model, as shown in Figure 7f, integrates multiple enhancements, including Batch Normalization, Dilated Convolution, GRU layers, and a Transformer Encoder, into the VGG16 framework. This composite architecture achieved the highest performance, with a training accuracy of 95.60% and a validation accuracy of 96.70%. Additionally, it maintained extremely low training and validation losses (0.13 and 0.11, respectively). The fusion of spatial and temporal learning capabilities, along with global contextual attention, enabled the model to capture the fine-grained morphological variations necessary for the accurate classification of medicinal and aromatic plants.

Table 3 presents a summary comparison of all models. It is evident that model performance consistently improves as the architecture evolves from simple CNNs to more sophisticated combinations integrating attention, recurrent, and Transformer modules. The final proposed hybrid model outperforms all others, confirming the value of architectural enhancements in solving high-dimensional, multi-class plant classification problems.

The classification performance of the baseline and proposed models was further analyzed using their respective confusion matrices (Figure 8). The custom CNN model exhibited scattered predictions with a weak diagonal pattern, indicating poor class discrimination and frequent misclassifications across multiple classes. The VGG-16 model before data augmentation showed a slightly improved diagonal structure, yet misclassifications remained, reflecting its moderate accuracy. With data augmentation, the confusion matrix of VGG-16 revealed a more evident diagonal dominance, signifying enhanced generalization and class separation. Fine-tuning VGG-16 further reduced off-diagonal errors, thereby improving the precision of class predictions. The integration of a Squeeze-and-Excitation (SE) block refined the attention mechanism, resulting in stronger diagonal and minimal misclassification. Finally, the proposed hybrid model, which combines VGG-16, GRU, and Transformer, demonstrated the most pronounced diagonal in the confusion matrix, with minimal off-diagonal values, showcasing its superior ability to capture spatial and temporal dependencies and context-aware features. This visual evidence, as depicted in Figure 8, aligns with the quantitative metrics and underscores the progressive improvement achieved by the proposed model in accurately classifying all 39 classes.

5.2. Baseline Deep Learning Model Comparison

To evaluate the effectiveness of the proposed hybrid model, we conducted a comparative analysis against several baseline deep learning models using key classification metrics: accuracy, precision, recall, and F1-score. As shown in Table 4, the custom CNN, developed from scratch, exhibited the weakest performance, with an accuracy of only 44.94%, indicating a limited ability to capture the discriminative features of medicinal plant leaves. Traditional transfer learning models, such as VGG-19 and VGG-16 (without data augmentation), demonstrated moderate improvements, achieving accuracies of 59.49% and 71.52%, respectively. Significant performance gains were observed when using advanced architectures, such as Xception and Inception v3, achieving accuracies of 85.44% and 87.97%, respectively. Data augmentation applied to VGG-16 further enhanced the model’s robustness, with accuracy increasing to 89.24%. Additional fine-tuning further improved performance to 90.51%. Lightweight yet efficient models, such as MobileNetV2, demonstrated strong generalization with an accuracy of 93.67%. Integrating Squeeze-and-Excitation (SE) blocks into VGG-16 yielded even better results, achieving 94.94% accuracy and a strong F1-score of 95.00. The proposed model, which integrates VGG-16 with GRU layers and Transformer Encoder blocks, outperformed all baselines across all metrics. It achieved the highest accuracy of 96.70%, precision of 96.13%, recall of 95.24%, and an F1-score of 95.05%. These results affirm the effectiveness of combining convolutional, sequential, and attention-based mechanisms for the fine-grained classification of medicinal plant species.

5.3. Ablation Study

To evaluate the contribution of each architectural component in the proposed hybrid deep learning model, we conducted an ablation study by systematically removing one module at a time while keeping the others intact. The results, summarized in Table 5, clearly demonstrate the importance of each module in contributing to the overall classification performance.

Removing Batch Normalization resulted in a noticeable drop in validation accuracy (92.19%) and slower convergence, suggesting its critical role in stabilizing and accelerating training. Eliminating the Dilated Convolution layers reduced the accuracy to 93.31%, indicating that a smaller receptive field impaired the model’s ability to capture fine-grained features. When the Gated Recurrent Unit (GRU) was excluded, accuracy dropped significantly to 86.97%, highlighting its essential function in spatial sequence modeling and capturing dependencies along the leaf margin. The absence of Transformer modules further degraded performance to 86.30%, underscoring their importance in modeling long-range dependencies and complex spatial interactions. The proposed model, which combines all components, achieved an accuracy of 96.70%, demonstrating the effectiveness of integrating these modules for robust and fine-grained classification of medicinal and aromatic plant species.

Figure 9 presents the impact of systematically removing key components from the proposed hybrid model to evaluate their contributions to overall performance. The performance is assessed based on training and validation accuracy and loss curves for each ablation variant. In Figure 9a, the absence of Batch Normalization leads to slower convergence and higher validation loss. The model exhibits increased generalization error, indicating the stabilizing and regularizing role of Batch Normalization in learning effective representations. Figure 9b shows the results without Dilated Convolution layers. Although the training converges, the validation accuracy saturates earlier and remains lower than that of the full model. This is attributed to the reduced receptive field, which limits the model’s ability to capture fine-grained spatial details and contextual dependencies across leaf structures. Figure 9c depicts the performance without GRU layers, where both training and validation accuracy are significantly degraded. The poor modeling of sequential spatial relationships (especially along leaf margins) results in a lower representational capacity, underscoring the importance of GRUs in capturing structural continuity across the input feature maps. Figure 9d highlights the effect of excluding the Transformer Encoder. The model shows the lowest validation accuracy among all variants. This confirms that the Transformer is essential for modeling long-range dependencies and enhancing global contextual understanding of spatial interactions across leaf structures. Overall, the results confirm that each component, such as Batch Normalization, Dilated Convolution, GRUs, and the Transformer Encoder, plays a vital role in enhancing the model’s ability to generalize and accurately classify complex leaf patterns. The proposed hybrid model, which integrates all these elements, achieves the highest accuracy and optimal convergence behavior, validating the effectiveness of the architectural design.

5.4. Computational Complexity and Parameter Analysis

To evaluate the efficiency of the proposed hybrid model, we analyzed the layer-wise computational complexity (in terms of floating-point operations per second, FLOPs) and the distribution of trainable parameters across the network layers, as shown in Figure 10 and Figure 11.

The computational complexity analysis (Figure 10) reveals that the majority of FLOPs are concentrated in the dense layers, particularly the two fully connected layers with 4096 units each (Dense_4096_1 and Dense_4096_2), which account for approximately 80% of the total computation. The VGG16 convolutional blocks (VGG_Block1 to VGG_Block5) exhibit a moderate level of complexity due to their hierarchical convolutional operations, while the GRU layer and Transformer Encoder modules (multi-head attention and feed-forward components) add a relatively minor computational overhead.

Similarly, the parameter count distribution (Figure 11) demonstrates that the dense layers dominate the trainable parameter space, with each of the 4096-unit layers contributing approximately 8 million parameters. In contrast, the convolutional layers of VGG16, the Dilated Convolutions, and the GRU-256 layer maintain a comparatively lower parameter count, highlighting the model’s efficient allocation of parameters across feature extraction and classification components.

Overall, this analysis demonstrates that while the dense layers are computationally intensive, they play a crucial role in consolidating high-level features for classification. In contrast, the convolutional and Transformer components strike a balance between performance and complexity. This design ensures that the proposed hybrid model achieves superior accuracy (95.24%) while maintaining computational feasibility for real-world applications.

5.5. State-of-the-Art Comparison

Over the recent decade, the identification and classification of medicinal and aromatic plants have garnered substantial research interest, resulting in the development of numerous deep learning and hybrid models. Table 6 presents a comprehensive comparison of recent methodologies, highlighting the diversity in datasets, model architectures, and classification accuracies achieved.

Early works such as [35,48] used self-built datasets with 10–12 classes and employed custom CNN-based models like HerbSimNet, achieving relatively modest accuracies of 60% and 71.3%, respectively. More recent approaches adopted deeper architectures such as VGG16 with cascaded classifiers [50] or hybrid models like Ayur-PlantNet [63], achieving accuracies in the range of 81–92%. A significant performance leap was observed with the adoption of advanced architectures, such as EfficientNet B4 [57], Xception [60], MobileNetV3 [71], and Vision Transformers [72], all of which reported accuracies exceeding 91%. For example, Custodio et al. [64] utilized VGG19 to classify 40 medicinal plants with an accuracy of 92.67%, while Bandla et al. [66] achieved 93% accuracy using MobileNetV2 on a similar scale.

Compared to these, our proposed hybrid model, which combines VGG-16 with GRU and Transformer components, outperforms all previously reported methods. Evaluated on a self-built dataset containing 39 classes, the proposed model achieved an accuracy of 96.70%, demonstrating superior generalization, robust feature learning, and effective sequence modeling capabilities. This highlights the effectiveness of combining convolutional, recurrent, and attention-based mechanisms in medicinal plant classification tasks.

5.6. Limitations

Despite the promising results achieved through the proposed hybrid deep learning architecture, this study has some limitations that must be acknowledged:

Dataset Specificity: The dataset used in this work comprises high-resolution images of 39 aromatic and medicinal plant species collected under controlled conditions at a single research station (AMPRS, Odakkali, Kerala). While this ensures data quality and consistency, the model’s limited geographic scope may restrict its generalizability to other regions, environments, or species not included in the dataset.
Controlled Image Capture Conditions: All images were acquired using a SONY ALPHA 7R III camera with uniform lighting and a white background or consistent natural field conditions. In real-world deployment scenarios, factors such as occlusion, variable lighting, background clutter, and different device cameras may degrade classification performance.
Computational Complexity: The proposed model combines multiple architectural modules, including VGG16, Batch Normalization, Dilated Convolutions, GRU layers, and a Transformer Encoder, which, while effective, introduce a high level of computational complexity. This could pose challenges for deployment on edge devices or mobile applications where resources are constrained.
Data-Hungry Architecture: Deep learning models with GRUs and Transformer layers typically require large datasets to generalize effectively. Although our dataset is balanced with 100 samples per class, its overall size (3900 images) remains relatively small compared to large-scale image recognition benchmarks. Further scaling or augmentation may be necessary to utilize the model’s capacity fully.
Limited Cross-Domain Evaluation: The model is trained and evaluated solely on the collected dataset, without using cross-validation on external datasets or unseen species. As such, its robustness and transferability across different domains remain to be evaluated in future work.

6. Conclusions and Future Scope

This study systematically explored and evaluated a diverse range of deep learning architectures for classifying medicinal and aromatic plant species using leaf images. Beginning with a baseline custom CNN model, which exhibited limited performance due to its constrained feature extraction capacity, successive enhancements progressively improved classification outcomes. The application of transfer learning with the pre-trained VGG16 model yielded notable accuracy gains, which were further amplified through data augmentation techniques. Fine-tuning of deeper VGG16 layers significantly enhanced the model’s ability to learn domain-specific characteristics. Further architectural refinement using Squeeze-and-Excitation (SE) blocks introduced effective channel-wise attention mechanisms, allowing the model to adaptively prioritize subtle yet crucial leaf features such as vein structures and surface textures. The final hybrid model, integrating VGG16 with Batch Normalization, GRU layers, Transformer modules, and Dilated convolutions, demonstrated significant improvements. This design achieved an impressive validation accuracy of 96.70%, highlighting its efficacy in fine-grained classification tasks.

A notable contribution of this research is the construction and utilization of a high-quality, diverse dataset comprising 39 medicinal and aromatic plant species collected from the Aromatic and Medicinal Plant Research Station (AMPRS) in Odakkali, Kerala, India. This dataset addresses a significant gap in the field and provides a valuable resource for future botanical AI research. The results of this work underscore the importance of both architectural innovation and domain-specific data in developing robust, high-performing classification models for specialized plant species.

Building on the promising results of this study, future work can focus on expanding the dataset to include additional medicinal and aromatic plant species, capturing seasonal and geographical variability to enhance model generalizability. Optimizing the proposed hybrid architecture for deployment on mobile and edge devices will enable real-time, field-level plant identification, facilitating practical use in the agriculture and herbal medicine industries. Further improvements could be achieved by incorporating multimodal data sources such as spectral signatures or environmental metadata to enrich feature representation. Additionally, integrating explainable AI (XAI) techniques will enhance interpretability and trust in model predictions. Embedding the classification framework into broader decision support systems can support conservation efforts, plant-based drug discovery, and precision agriculture applications.

Author Contributions

Conceptualization, S.E.M. and D.A.C.; formal analysis, S.E.M. and D.A.C.; investigation, S.E.M., D.A.C., S.P.M. and A.P.; writing—original draft preparation, S.E.M.; writing—review and editing, D.A.C., S.P.M. and A.P.; supervision, D.A.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

The authors gratefully acknowledge the support of Ancy Joseph, Aromatic & Medicinal Plants Research Station, Kerala Agricultural University, Odakkali. During the preparation of this work, the author(s) used Grammarly 14.1245.0, QuillBot 4.27.2 and ChatGPT-4o in order to improve the grammatical accuracy and clarity of the manuscript. After using this tool/service, the author(s) reviewed and edited the content as needed and take(s) full responsibility for the content of the publication.

Conflicts of Interest

The author declares no conflicts of interest.

References

Shasany, A.K.; Shukla, A.K.; Khanuja, S.P. Medicinal and aromatic plants. In Technical Crops; Springer: Berlin/Heidelberg, Germany, 2007; pp. 175–196. [Google Scholar]
Pandey, A.K.; Kumar, P.; Saxena, M.; Maurya, P. Distribution of aromatic plants in the world and their properties. In Feed Additives; Elsevier: Amsterdam, The Netherlands, 2020; pp. 89–114. [Google Scholar]
Stuessy, T.F. Plant Taxonomy: The Systematic Evaluation of Comparative Data; Columbia University Press: New York, NY, USA, 2009. [Google Scholar]
Ereshefsky, M. Species, taxonomy, and systematics. In Philosophy of Biology; Elsevier: Amsterdam, The Netherlands, 2007; pp. 403–427. [Google Scholar]
Aptoula, E.; Yanikoglu, B. Morphological features for leaf based plant recognition. In Proceedings of the 2013 IEEE International Conference on Image Processing, Melbourne, Australia, 15–18 September 2013; pp. 1496–1499. [Google Scholar]
Singhal, P.; Maheshwari, A.; Semwal, P. Medicinal and aromatic plants: Healthcare and industrial applications—A book review. Ethnobot. Res. Appl. 2025, 30, 1–5. [Google Scholar] [CrossRef]
Paulson, A.; Ravishankar, S. AI based indigenous medicinal plant identification. In Proceedings of the 2020 Advanced Computing and Communication Technologies for High Performance Applications (ACCTHPA), Cochin, India, 2–4 January 2020; pp. 57–63. [Google Scholar]
Balick, M.J.; Cox, P.A. Plants, People, and Culture: The Science of Ethnobotany; Garland Science: New York, NY, USA, 2020. [Google Scholar]
Wang, M.; Lin, H.; Lin, H.; Du, P.; Zhang, S. From species to varieties: How modern sequencing technologies are shaping Medicinal Plant Identification. Genes 2024, 16, 16. [Google Scholar] [CrossRef] [PubMed]
Mosa, K.A.; Gairola, S.; Jamdade, R.; El-Keblawy, A.; Al Shaer, K.I.; Al Harthi, E.K.; Shabana, H.A.; Mahmoud, T. The promise of molecular and genomic techniques for biodiversity research and DNA barcoding of the Arabian Peninsula flora. Front. Plant Sci. 2019, 9, 1929. [Google Scholar] [CrossRef] [PubMed]
Pacifico, L.D.; Britto, L.F.; Oliveira, E.G.; Ludermir, T.B. Automatic classification of medicinal plant species based on color and texture features. In Proceedings of the 2019 8th Brazilian Conference on Intelligent Systems (BRACIS), Salvador, Brazil, 15–18 October 2019; pp. 741–746. [Google Scholar]
Jiang, Y.; Li, C. Convolutional neural networks for image-based high-throughput plant phenotyping: A review. Plant Phenomics 2020, 2020, 4152816. [Google Scholar] [CrossRef] [PubMed]
Mahadevan, K.; Punitha, A.; Suresh, J. A novel rice plant leaf diseases detection using deep spectral generative adversarial neural network. Int. J. Cogn. Comput. Eng. 2024, 5, 237–249. [Google Scholar] [CrossRef]
Lokesh, G.H.; Chandregowda, S.B.; Vishwanath, J.; Ravi, V.; Ravi, P.; Al Mazroa, A. Intelligent Plant Leaf Disease Detection Using Generative Adversarial Networks: A Case-study of Cassava Leaves. Open Agric. J. 2024, 18, e18743315288623. [Google Scholar] [CrossRef]
Gu, J.; Yu, P.; Lu, X.; Ding, W. Leaf species recognition based on VGG16 networks and transfer learning. In Proceedings of the 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chongqing, China, 12–14 March 2021; Volume 5, pp. 2189–2193. [Google Scholar]
Sambasivam, G.; Prabu Kanna, G.; Chauhan, M.S.; Raja, P.; Kumar, Y. A hybrid deep learning model approach for automated detection and classification of cassava leaf diseases. Sci. Rep. 2025, 15, 7009. [Google Scholar] [CrossRef]
Barhate, D.; Pathak, S.; Dubey, A.K. Hyperparameter-tuned batch-updated stochastic gradient descent: Plant species identification by using hybrid deep learning. Ecol. Inform. 2023, 75, 102094. [Google Scholar] [CrossRef]
Rashid, J.; Khan, I.; Abbasi, I.A.; Saeed, M.R.; Saddique, M.; Abbas, M. A hybrid deep learning approach to classify the plant leaf species. Comput. Mater. Contin. 2023, 76, 3897–3920. [Google Scholar] [CrossRef]
Mulugeta, A.K.; Sharma, D.P.; Mesfin, A.H. Deep learning for medicinal plant species classification and recognition: A systematic review. Front. Plant Sci. 2024, 14, 1286088. [Google Scholar] [CrossRef]
Dey, B.; Ferdous, J.; Ahmed, R.; Hossain, J. Assessing deep convolutional neural network models and their comparative performance for automated medicinal plant identification from leaf images. Heliyon 2024, 10, e23655. [Google Scholar] [CrossRef]
Ibrahim, Z.; Sabri, N.; Isa, D. Multi-maxpooling convolutional neural network for medicinal herb leaf recognition. In Proceedings of the 6th IIAE International Conference on Intelligent Systems and Image Processing, Kitakyushu, Japan, 27–30 September 2018; pp. 327–331. [Google Scholar]
Kadiwal, S.M.; Hegde, V.; Shrivathsa, N.; Gowrishankar, S.; Srinivasa, A.; Veena, A. Deep Learning based Recognition of the Indian Medicinal Plant Species. In Proceedings of the 2022 4th International Conference on Inventive Research in Computing Applications (ICIRCA), Coimbatore, India, 21–22 July 2022; pp. 762–767. [Google Scholar]
Malik, O.A.; Ismail, N.; Hussein, B.R.; Yahya, U. Automated real-time identification of medicinal plants species in natural environment using deep learning models—A case study from Borneo Region. Plants 2022, 11, 1952. [Google Scholar] [CrossRef] [PubMed]
Van Hieu, N.; Hien, N.L.H. Recognition of plant species using deep convolutional feature extraction. Int. J. Emerg. Technol. 2020, 11, 904–910. [Google Scholar]
Sharrab, Y.; Al-Fraihat, D.; Tarawneh, M.; Sharieh, A. Medicinal plants recognition using deep learning. In Proceedings of the 2023 International Conference on Multimedia Computing, Networking and Applications (MCNA), San Diego, CA, USA, 13–15 December 2023; pp. 116–122. [Google Scholar]
Lee, S.H.; Chan, C.S.; Wilkin, P.; Remagnino, P. Deep-plant: Plant identification with convolutional neural networks. In Proceedings of the 2015 IEEE International Conference on Image Processing (ICIP), Quebec City, QC, Canada, 27–30 September 2015; pp. 452–456. [Google Scholar]
Roslan, N.A.M.; Diah, N.M.; Ibrahim, Z.; Munarko, Y.; Minarno, A.E. Automatic plant recognition using convolutional neural network on malaysian medicinal herbs: The value of data augmentation. Int. J. Adv. Intell. Inform. 2023, 9, 136–147. [Google Scholar] [CrossRef]
Gopal, A.; Reddy, S.P.; Gayatri, V. Classification of selected medicinal plants leaf using image processing. In Proceedings of the 2012 International Conference on Machine Vision and Image Processing (MVIP), Taipei, Taiwan, 7–8 December 2012; pp. 5–8. [Google Scholar]
Gao, L.; Lin, X. Fully automatic segmentation method for medicinal plant leaf images in complex background. Comput. Electron. Agric. 2019, 164, 104924. [Google Scholar] [CrossRef]
Maulana, O.; Herdiyeni, Y. Combining image and text features for medicinal plants image retrieval. In Proceedings of the 2013 International Conference on Advanced Computer Science and Information Systems (ICACSIS), Jakarta, Indonesia, 28–29 September 2013; pp. 273–277. [Google Scholar]
Zin, I.A.M.; Ibrahim, Z.; Isa, D.; Aliman, S.; Sabri, N.; Mangshor, N.N.A. Herbal plant recognition using deep convolutional neural network. Bull. Electr. Eng. Inform. 2020, 9, 2198–2205. [Google Scholar] [CrossRef]
Taslim, A.; Saon, S.; Mahamad, A.K.; Muladi, M.; Hidayat, W.N. Plant leaf identification system using convolutional neural network. Bull. Electr. Eng. Inform. 2021, 10, 3341–3352. [Google Scholar] [CrossRef]
Mardiana, B.D.; Utomo, W.B.; Oktaviana, U.N.; Wicaksono, G.W.; Minarno, A.E. Herbal leaves classification based on leaf image using cnn architecture model vgg16. J. RESTI (Rekayasa Sist. Dan Teknol. Inf.) 2023, 7, 20–26. [Google Scholar] [CrossRef]
Javanmardi, S.; Ashtiani, S.H.M. AI-driven deep learning framework for shelf life prediction of edible mushrooms. Postharvest Biol. Technol. 2025, 222, 113396. [Google Scholar] [CrossRef]
Akter, R.; Hosen, M.I. CNN-based leaf image classification for Bangladeshi medicinal plant recognition. In Proceedings of the 2020 Emerging Technology in Computing, Communication and Electronics (ETCCE), Dhaka, Bangladesh, 21–22 December 2020; pp. 1–6. [Google Scholar]
Rakib, A.F.; Rahman, R.; Razi, A.A.; Hasan, A.T. A lightweight quantized CNN model for plant disease recognition. Arab. J. Sci. Eng. 2024, 49, 4097–4108. [Google Scholar] [CrossRef]
Anubha Pearline, S.; Sathiesh Kumar, V.; Harini, S. A study on plant recognition using conventional image processing and deep learning approaches. J. Intell. Fuzzy Syst. 2019, 36, 1997–2004. [Google Scholar] [CrossRef]
Borugadda, P.; Lakshmi, R.; Sahoo, S. Transfer Learning VGG16 Model for Classification of Tomato Plant Leaf Diseases: A Novel Approach for Multi-Level Dimensional Reduction. Pertanika J. Sci. Technol. 2023, 31, 813–841. [Google Scholar] [CrossRef]
Paymode, A.S.; Malode, V.B. Transfer learning for multi-crop leaf disease image classification using convolutional neural network VGG. Artif. Intell. Agric. 2022, 6, 23–33. [Google Scholar] [CrossRef]
Mousavi, S.; Farahani, G. A novel enhanced VGG16 model to tackle grapevine leaves diseases with automatic method. IEEE Access 2022, 10, 111564–111578. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
Jin, X.; Xie, Y.; Wei, X.S.; Zhao, B.R.; Chen, Z.M.; Tan, X. Delving deep into spatial pooling for squeeze-and-excitation networks. Pattern Recognit. 2022, 121, 108159. [Google Scholar] [CrossRef]
Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
Bjorck, N.; Gomes, C.P.; Selman, B.; Weinberger, K.Q. Understanding batch normalization. Adv. Neural Inf. Process. Syst. 2018, 31, 7694–7705. [Google Scholar]
Lagnaoui, S.; En-Naimani, Z.; Haddouch, K. The Effect of Normalization and Batch Normalization Layers in CNNs Models: Application to Plant Disease Classifications. In Proceedings of the International Conference On Big Data and Internet of Things, Virtual, 22–24 December 2022; pp. 250–262. [Google Scholar]
Ramya, R.; Kumar, P. High-performance deep transfer learning model with batch normalization based on multiscale feature fusion for tomato plant disease identification and categorization. Environ. Res. Commun. 2023, 5, 125015. [Google Scholar] [CrossRef]
Lee, S.H.; Chang, Y.L.; Chan, C.S.; Alexis, J.; Bonnet, P.; Goeau, H. Plant classification based on gated recurrent unit. In Experimental IR Meets Multilinguality, Multimodality, and Interaction, Proceedings of the 9th International Conference of the CLEF Association, CLEF 2018, Avignon, France, 10–14 September 2018; Springer: Cham, Switzerland, 2018; pp. 169–180. [Google Scholar]
Rani, N.S.; Bhavya, K.; Pushpa, B.; Devadas, R.M. HerbSimNet: Deep Learning-Based Classification of Indian Medicinal Plants with High Inter-Class Similarities. Procedia Comput. Sci. 2025, 258, 765–774. [Google Scholar] [CrossRef]
Pushpa, B.; Rani, N.S.; Chandrajith, M.; Manohar, N.; Nair, S.S.K. On the importance of integrating convolution features for Indian medicinal plant species classification using hierarchical machine learning approach. Ecol. Inform. 2024, 81, 102611. [Google Scholar] [CrossRef]
Rana, T.; Sinha, S.; Roy, R. A Novel Cascade Classifier Framework for Open-World Medicinal Plant Classification. In Proceedings of the 2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC), Osaka, Japan, 1–4 July 2024; pp. 902–907. [Google Scholar]
Kumar, G.; Kumar, V.; Hrithik, A.K. Herbal plants leaf image classification using machine learning approach. In Intelligent Systems and Smart Infrastructure; CRC Press: Boca Raton, FL, USA, 2023; pp. 549–558. [Google Scholar]
Amuthalingeswaran, C.; Sivakumar, M.; Renuga, P.; Alexpandi, S.; Elamathi, J.; Hari, S.S. Identification of medicinal plant’s and their usage by using deep learning. In Proceedings of the 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI), Tirunelveli, India, 23–25 April 2019; pp. 886–890. [Google Scholar]
Stella, K.; Thamizhazhakan, K.; Ruthika, S.; Sriharini, M. Precise Recognition of Medicinal Plants using Xception Architecture. In Proceedings of the 2024 5th International Conference on Data Intelligence and Cognitive Informatics (ICDICI), Tirunelveli, India, 18–20 November 2024; pp. 1397–1402. [Google Scholar]
Anami, B.S.; Nandyal, S.S.; Govardhan, A. A combined color, texture and edge features based approach for identification and classification of indian medicinal plants. Int. J. Comput. Appl. 2010, 6, 45–51. [Google Scholar] [CrossRef]
Kyalkond, S.A.; Aithal, S.S.; Sanjay, V.M.; Kumar, P.S. A novel approach to classification of Ayurvedic medicinal plants using neural networks. Int. J. Eng. Res. Technol. (IJERT) 2022, 11, 6. [Google Scholar]
Begue, A.; Kowlessur, V.; Singh, U.; Mahomoodally, F.; Pudaruth, S. Automatic recognition of medicinal plants using machine learning techniques. Int. J. Adv. Comput. Sci. Appl. 2017, 8, 166–175. [Google Scholar] [CrossRef]
Khan, S.; Siddiqui, F.; Ahad, M.A. Deep Learning for Classification and Efficacy of Medicinal Plants for Managing Respiratory Disorders. Procedia Comput. Sci. 2025, 258, 1640–1650. [Google Scholar] [CrossRef]
Rao, R.U.; Lahari, M.S.; Sri, K.P.; Srujana, K.Y.; Yaswanth, D. Identification of medicinal plants using deep learning. Int. J. Res. Appl. Sci. Eng. Technol 2022, 10, 306–322. [Google Scholar] [CrossRef]
Widneh, M.A.; Workneh, A.T.; Alemu, A.A. Medicinal Plant Parts identification and classification using deep learning based on multi label categories. Ethiop. J. Sci. Sustain. Dev. 2021, 8, 96–108. [Google Scholar]
Roy, R.; Roy, R.; Chatterjee, D. AI-Driven Recognition of Indian Medicinal Flora using Convolutional Neural Networks. In Computational Intelligence and Machine Learning; SCRS: Delhi, India, 2025. [Google Scholar]
Kalaiselvi, P.; Esther, C.; Aburoobha, A.; Nishanth, J.; Gopika, S.; Kabilan, M. Deep Learning Technique for Medicinal Plant Leaf Identification: Using Fine-Tuning of Transfer Learning Model. In Proceedings of the 2024 International Conference on Power, Energy, Control and Transmission Systems (ICPECTS), Chennai, India, 8–9 October 2024; pp. 1–6. [Google Scholar]
Sivappriya, K.; Kar, M.K. Classification of Indian Medicinal Plant Species Using Attention Module with Transfer Learning. In Proceedings of the 2024 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems (SPICES), Kottayam, India, 20–22 September 2024; pp. 1–8. [Google Scholar]
Pushpa, B.; Rani, N.S. Ayur-PlantNet: An unbiased light weight deep convolutional neural network for Indian Ayurvedic plant species classification. J. Appl. Res. Med. Aromat. Plants 2023, 34, 100459. [Google Scholar] [CrossRef]
Custodio, E.F. Classifying Philippine Medicinal Plants Based on Their Leaves Using Deep Learning. In Proceedings of the 2023 IEEE World AI IoT Congress (AIIoT), Seattle, WA, USA, 7–10 June 2023; pp. 0029–0035. [Google Scholar]
Pallavi, K.; Hegde, D.; Parimelazhagan, R.; Sanjay, S.; Acharya, S.P. Enhanced Medicinal Plant Leaf Classification via Transfer Learning: A VGG16-Driven Approach for Precise Feature Extraction. In Proceedings of the 2024 4th International Conference on Mobile Networks and Wireless Communications (ICMNWC), Tumkuru, India, 4–5 December 2024; pp. 1–6. [Google Scholar]
Bandla, R.; Priyanka, J.S.; Dumpali, Y.C.; Hattaraki, S.K.M. Plant Identification and Analysis of Medicinal Properties Using Image Processing and CNN with MobileNetV2 and AlexNet. In Proceedings of the 2024 International Conference on Innovation and Novelty in Engineering and Technology (INNOVA), Vijayapura, India, 20–21 December 2024; Volume 1, pp. 1–5. [Google Scholar]
Kan, H.; Jin, L.; Zhou, F. Classification of medicinal plant leaf image based on multi-feature extraction. Pattern Recognit. Image Anal. 2017, 27, 581–587. [Google Scholar] [CrossRef]
Lasya, S.; S, J.; Pushpa, B. Optimized Plant Species Classification through MobileNet-Enhanced Hybrid Models. In Proceedings of the 2024 5th International Conference for Emerging Technology (INCET), Belgaum, India, 24–26 May 2024; pp. 1–6. [Google Scholar]
Pushpa, B.; Jyothsna, S.; Lasya, S. HybNet: A hybrid deep models for medicinal plant species identification. MethodsX 2025, 14, 103126. [Google Scholar] [CrossRef]
Janani, R.; Gopal, A. Identification of selected medicinal plant leaves using image features and ANN. In Proceedings of the 2013 international Conference on Advanced Electronic Systems (ICAES), Pilani, India, 21–23 September 2013; pp. 238–242. [Google Scholar]
Girinath, S.; Neeraja, P.; Kumar, M.S.; Kalyani, S.; Mamatha, B.L.; GruhaLakshmi, N. Real-Time Identification of Medicinal Plants Using Deep Learning Techniques. In Proceedings of the 2024 International Conference on Trends in Quantum Computing and Emerging Business Technologies, Bhimdatta, Nepal, 18–20 February 2024; pp. 1–5. [Google Scholar]
Thendral, R.; Mohamed Imthiyas, M.; Aswin, R. Enhanced Medicinal Plant Identification and Classification Using Vision Transformer Model. In Proceedings of the 2024 International Conference on Emerging Research in Computational Science (ICERCS), Coimbatore, India, 12–14 December 2024; pp. 1–7. [Google Scholar]
Kavitha, K.; Sharma, P.; Gupta, S.; Lalitha, R. Medicinal Plant Species Detection using Deep Learning. In Proceedings of the 2022 First International Conference on Electrical, Electronics, Information and Communication Technologies (ICEEICT), Trichy, India, 16–18 February 2022; pp. 1–6. [Google Scholar]

Figure 1. Representative leaf images from the dataset comprising 39 distinct aromatic and medicinal plant species collected from AMPRS, Odakkali, Kerala.

Figure 2. Architecture of the custom CNN model. The model comprises two convolutional layers with max-pooling, followed by dropout and dense layers for classification into 39 classes.

Figure 3. Transfer learning architecture using pre-trained VGG16 as the base model. The top layers comprise a GlobalAveragePooling2D layer, a dense layer with 256 neurons (ReLU), a dropout layer (rate = 0.5), and a softmax output layer with 39 neurons for multi-class classification.

Figure 4. Fine-tuned VGG16 architecture for medicinal and aromatic plant species classification. The final two convolutional blocks are unfrozen to allow us to carry out domain-specific feature learning. Custom top layers include GlobalAveragePooling2D, Dense (256 neurons, ReLU), Dropout (rate = 0.5), and a softmax output layer with 39 classes.

Figure 5. The proposed model combining VGG16 and Squeeze-and-Excitation (SE) blocks. SE blocks are embedded after each convolutional block to provide channel-wise attention, enabling adaptive recalibration of feature responses.

Figure 6. Proposed hybrid model architecture combining VGG16, Batch Normalization, Dilated Convolutions, GRU layers, and Transformer Encoders. Each component addresses different aspects of feature learning, enabling robust representation of complex leaf structures.

Figure 7. Training and validation accuracy and loss curves for all evaluated models: (a) Custom CNN. (b) Transfer learning with VGG16 (without data augmentation). (c) VGG16 after data augmentation. (d) Fine-Tuned VGG16. (e) VGG16 with SE blocks. (f) Proposed hybrid model.

Figure 8. Confusion matrices of various deep learning models evaluated for multi-class classification. The classes labeled as follows: Asokam (0), Noni (1), Nagadandhi (2), Chuvannamantharam (3), Karinjotta (4), Panikkoorkka (5), Theeppala (6), Maramanjal (7), Kallal (8), Samudrappacha (9), Karuva (10), Thippali (11), Bengla Thippali (12), Alpam (13), Arayal (14), Modakam (15), Chengazhuneerkizhang (16), Thathiri (17), Mullatha (18), Ilanji (19), Karinochi (20), Dandhappala (21), Kumizh (22), Erukk (23), Ekanayakam (24), Kanjiram (25), Parijatham (26), Pambuvalenchedi (27), Karpooram (28), Aaduthinnapala (29), Athy (30), Attuvanchy (31), Chuvanna Kadalavanakk (32), Palakapayyani (33), Vathakkodi (34), Adalodakam (35), Kattumunthiri (36), Murikkootti chuvapp (37), and Chathuramulla (38).

Figure 9. Ablation study results of the proposed hybrid model. Each subfigure shows performance degradation when a specific component is excluded.

Figure 10. Layer-wise computational complexity of the proposed hybrid model measured in floating-point operations (FLOPs).

Figure 11. Layer-wise parameter count of the proposed hybrid model.

Table 1. Model hyperparameters used in the SE-VGG16 model.

Hyper-Parameter	Description
Base Architecture	VGG16 pre-trained on ImageNet
SE Block Insertion	After each of the five max-pooling layers
SE Reduction Ratio	16
Activation Function	ReLU + Sigmoid
Optimizer	Adam
Learning Rate	0.0001
Batch Size	32
Dropout Rate	0.5 in fully connected layers
Dense Layers	2 × 4096 (ReLU) followed by Softmax
Loss Function	Categorical Cross-Entropy

Table 2. Model hyperparameters for the proposed system combining VGG16, Batch Normalization, Dilated Convolutions, GRUs, and Transformer Encoder.

Hyper-Parameter	Description
Base Architecture	VGG-16 (5 convolution blocks)
Convolutional Kernel Size	$3 \times 3$
Activation Function	ReLU (after convolution and dense layers)
Batch Normalization	After each convolution layer
GRU Units	256
Dilated Convolution Layer	2 layers, $3 \times 3$ kernels, dilation rate = 2
Transformer Encoder	1 layer
GRU Direction	Uni-directional (spatial row-wise modeling)
Feed-Forward Network	2 layers
Multi-Head Attention	8 heads
Normalization	Layer normalization in Transformer
Residual Connections	Applied in Transformer Encoder
Fully Connected Layers	2 × Dense (4096 units, ReLU), 1 × Dense (1000 units, softmax)
Dropout Rate	0.3 (after attention and fully connected layers)
Learning Rate	0.0001
Optimizer	Adam
Loss Function	Categorical cross-entropy
Batch Size	32
Gradient Clipping	1
Validation Split	0.2

Table 3. Accuracy and loss comparison chart for various deep learning models.

Model	Training Accuracy (%)	Validation Accuracy (%)	Training Loss	Validation Loss
Custom CNN	43.14	52.53	1.76	1.71
VGG-16 before data augmentation	78.93	89.87	0.76	0.74
VGG-16 after data augmentation	91.14	92.10	0.44	0.43
VGG-16 after fine-tuning	92.90	93.21	0.24	0.19
VGG-16 and SE	93.80	94.30	0.14	0.12
Proposed Hybrid Model	95.60	96.70	0.13	0.11

Table 4. Comparison of baseline deep learning models with the proposed model based on classification metrics.

Model	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)
Custom CNN	44.94	45.79	45.92	41.00
VGG-19	59.49	64.47	59.49	58.20
VGG-16 (before data augmentation)	71.52	75.44	71.52	69.61
Xception	85.44	90.65	85.44	84.84
Inception v3	87.97	88.81	87.97	87.31
VGG-16 (after data augmentation)	89.24	92.03	89.24	88.94
VGG-16 (after fine-tuning)	90.51	92.40	90.51	90.19
MobileNetV2	93.67	94.99	93.67	93.65
VGG-16 + SE block	94.94	96.01	94.94	95.00
Proposed Model (VGG-16 + GRU + Transformer)	95.24	96.13	95.24	95.05

Table 5. Ablation study of the proposed hybrid model.

Model Variant	BN	DC	GRU	Trans.	Val. Accuracy (%)	Observations
No Batch Normalization	✗	✓	✓	✓	92.19	Training is slow and generalization performance diminished
No Dilated Convolution	✓	✗	✓	✓	93.31	Limited receptive field and fine details get lost
No GRU	✓	✓	✗	✓	86.97	Poor learning of spatial relationship and modeling of leaf margin is failed
No Transformers	✓	✓	✓	✗	86.30	Fail to model spatial interactions and fine details
Proposed model combining all	✓	✓	✓	✓	96.70	Combining all components, overall performance is excellent

Table 6. Comparison of state-of-the-art methods for medicinal plant classification.

Ref.	Dataset	No. of Classes	Model	Architecture Summary	Val. Accuracy (%)
[48]	Self-built	12	HerbSimNet	Custom shallow CNN	60.00
[35]	Self-built	10	CNN	Basic convolutional layers	71.30
[49]	RTP40	40	Hierarchical Classifier	Multi-stage classification model	75.46
[50]	Self-built	10	VGG-16 + Cascade	Deep CNN + cascaded classifier	81.66
[51]	Self-built	25	MLP	Multi-layer perceptron	82.51
[52]	Self-built	4	DNN	Fully connected dense network	85.00
[53]	Self-built	30	Xception	Depthwise separable CNN	88.00
[54]	Self-built	3	Texture + SVM	Texture features + SVM	90.00
[55]	Self-built	100	CNN	Deep CNN on large class count	90.00
[56]	Self-built	24	RF Classifier	Texture + Random Forest	90.10
[57]	DIMPSAR	80	EfficientNet B4	EfficientNet scaled CNN	91.37
[58]	Self-built	50	CNN	Conventional deep CNN	91.80
[59]	Self-built	5	MobileNet	Lightweight CNN model	92.00
[60]	Self-built	98	Xception	Residual separable convolution	92.00
[61]	PlantVillage	38	ResNet-18	Deep residual learning	92.00
[62]	DIMPSAR	40	Attn-CNN	CNN with attention layers	92.10
[63]	Self-built	40	Ayur-PlantNet	Hybrid CNN architecture	92.27
[64]	Self-built	40	VGG19	Deep CNN via transfer learning	92.67
[65]	Self-built	35	VGG16	Fine-tuned deep CNN	93.00
[66]	Indian Med. Plants	40	MobileNetV2	Compact CNN for mobile vision	93.00
[67]	Self-built	12	SF + TF + SVM	Shape + Texture + SVM	93.30
[68]	Self-built	13	AlexNet + MobileNet	Optimized hybrid CNN	93.86
[69]	Self-built	13	MobileNetV2 + SE	MobileNet with SE block	94.24
[70]	Self-built	6	ANN	Feedforward ANN	94.40
[71]	DFCU 2020	20	MobileNetV3	Optimized lightweight CNN	95.00
[72]	Self-built	3	Vision Transformer	Pure attention mechanism	95.00
[73]	Medicinal Leaf	30	Inception v3	Inception-based deep CNN	95.16
Ours	Self-built	39	VGG-16 + GRU + Transformer	CNN + temporal + attention fusion	96.70

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

E. M., S.; Chandy, D.A.; P. M., S.; Poulose, A. A Hybrid Deep Learning Model for Aromatic and Medicinal Plant Species Classification Using a Curated Leaf Image Dataset. AgriEngineering 2025, 7, 243. https://doi.org/10.3390/agriengineering7080243

AMA Style

E. M. S, Chandy DA, P. M. S, Poulose A. A Hybrid Deep Learning Model for Aromatic and Medicinal Plant Species Classification Using a Curated Leaf Image Dataset. AgriEngineering. 2025; 7(8):243. https://doi.org/10.3390/agriengineering7080243

Chicago/Turabian Style

E. M., Shareena, D. Abraham Chandy, Shemi P. M., and Alwin Poulose. 2025. "A Hybrid Deep Learning Model for Aromatic and Medicinal Plant Species Classification Using a Curated Leaf Image Dataset" AgriEngineering 7, no. 8: 243. https://doi.org/10.3390/agriengineering7080243

APA Style

E. M., S., Chandy, D. A., P. M., S., & Poulose, A. (2025). A Hybrid Deep Learning Model for Aromatic and Medicinal Plant Species Classification Using a Curated Leaf Image Dataset. AgriEngineering, 7(8), 243. https://doi.org/10.3390/agriengineering7080243

Article Menu

A Hybrid Deep Learning Model for Aromatic and Medicinal Plant Species Classification Using a Curated Leaf Image Dataset

Abstract

1. Introduction

2. Related Work

3. Rationale and Research Objectives

4. Materials and Methods

4.1. Dataset Collection

4.2. Preprocessing and Data Augmentation

4.3. Model Architectures and Experimental Setup

4.3.1. Custom CNN Architecture

4.3.2. Transfer Learning with VGG16

4.3.3. Fine-Tuned VGG16

4.3.4. VGG16 with Squeeze-and-Excitation (SE) Blocks

4.3.5. Hybrid Model with VGG16, Batch Normalization, GRUs, Transformers, and Dilated Convolutions

Backbone: VGG16

Batch Normalization

Dilated Convolutions

GRU Layer for Spatial Dependencies

Transformer Encoder for Global Context

Classification Layers

Justification of Complexity

Mathematical Formulations

5. Results and Discussion

5.1. Model Performance Analysis

5.2. Baseline Deep Learning Model Comparison

5.3. Ablation Study

5.4. Computational Complexity and Parameter Analysis

5.5. State-of-the-Art Comparison

5.6. Limitations

6. Conclusions and Future Scope

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI