Identification of Botanical Origin from Pollen Grains in Honey Using Computer Vision-Based Techniques

Le, Thi-Nhung; Nguyen, Duc-Manh; Giang, A-Cong; Pham, Hong-Thai; Le, Thi-Lan; Vu, Hai

doi:10.3390/agriengineering7090282

Open AccessArticle

Identification of Botanical Origin from Pollen Grains in Honey Using Computer Vision-Based Techniques

by

Thi-Nhung Le

^1,2

,

Duc-Manh Nguyen

¹

,

A-Cong Giang

³

,

Hong-Thai Pham

³

,

Thi-Lan Le

¹

and

Hai Vu

^1,*

¹

School of Electrical and Electronic Engineering, Hanoi University of Science and Technology, Hanoi 10000, Vietnam

²

Faculty of Information Technology, Vietnam National University of Agriculture, Hanoi 10000, Vietnam

³

Research Center for Tropical Bees and Beekeeping, Vietnam National University of Agriculture, Hanoi 10000, Vietnam

^*

Author to whom correspondence should be addressed.

AgriEngineering 2025, 7(9), 282; https://doi.org/10.3390/agriengineering7090282

Submission received: 3 June 2025 / Revised: 16 July 2025 / Accepted: 23 July 2025 / Published: 1 September 2025

(This article belongs to the Section Computer Applications and Artificial Intelligence in Agriculture)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Identifying the botanical origin of honey is essential for ensuring its quality, preventing adulteration, and protecting consumers. Traditional techniques, such as melissopalynology, physicochemical analysis, and PCR, are often labor-intensive, time-consuming, or limited to the detection of only known species, while advanced DNA sequencing remains prohibitively costly. In this study, we aim to develop a deep learning-based approach for identifying pollen grains extracted from honey and captured through microscopic imaging. To achieve this, we first constructed a dataset named VNUA-Pollen52, which consists of microscopic images of pollen grains collected from flowers of plant species cultivated in the surveyed area in Hanoi, Vietnam. Second, we evaluated the classification performance of advanced deep learning models, including MobileNet, YOLOv11, and Vision Transformer, on pollen grain images. To improve performances of these model, we proposed data augmentation and hybrid fusion strategies to improve the identification accuracy of pollen grains extracted from honey. Third, we developed an online platform to support experts in identifying these pollen grains and to gather expert consensus, ensuring accurate determination of the plant species and providing a basis for evaluating the proposed identification strategy. Experimental results on 93 images of pollen grains extracted from honey samples demonstrated the effectiveness of the proposed hybrid fusion strategy, achieving 70.21% accuracy at rank 1 and 92.47% at rank 5. This study demonstrates the capability of recent advances in computer vision to identify pollen grains using their microscopic images, thereby opening up opportunities for the development of automated systems that support plant traceability and quality control of honey.

Keywords:

VNUA-Pollen52; pollen classification; botanical origin of honey; MobileNet; YOLO; Vision Transformer

Graphical Abstract

1. Introduction

Authenticating the origin of honey from pollen grains can help identify the geographical origin of honey products, verify botanical sources of honey, and provide information on the environmental and ecological interactions of bees. To authenticate honey, an analysis of pollen grains present in honey, known as melissopalynology, can be performed. This process identifies the floral source and geographical origin of honey based on the types of pollen present. In the past, the origin of the pollen was usually analyzed prior to chemical treatment methods. More recently, DNA has been used for the identification of pollen taxa, as sequencing technologies have evolved in both handling and affordability. However, chemical-based approaches require specialized equipment and expertise. The DNA technique consists of complicated procedures to extract DNA sequences, such as DNA purity, amplicon purity, read length, quality score, or completeness of databases [1]. In addition, these conventional techniques for authenticating honey include the high cost of testing and the potential for incomplete databases, which may limit the ability to accurately identify all sources of honey. In this study, artificial intelligence (AI) and biological approaches are used to identify the botanical origin of honey. We evaluated the possibility of the proposed technique in the context of a honey farm in an urban area of Hanoi City, Vietnam. The proposed framework does not require complicated techniques and does not require high costs like conventional techniques. It shows a potential solution for developing an online honey origin tracing service.

In the context of rapidly advancing artificial intelligence (AI) technology, thanks to state-of-the-art deep convolutional neural networks (CNNs), the automatic classification of pollen from images has seen significant improvements in performance for both accuracy and efficiency [2]. For example, in [3], the authors report the performance of CNNs in a challenge that automatically classifies pollen grain images. According to [3], the best performance in the largest dataset of microscope pollen grain images achieved an accuracy of 98.35%. CNN-based pollen classification from images is one of the prominent applications of AI, particularly useful in plant traceability and honey authentication. For the traceability of botanical origin from honey, querying the pollen images, which are extracted from honeybees to a reference dataset of the pollen grains, identifies the plant species using pollen grain classifiers. Therefore, the performance of a pollen grain classifier plays an important role in ensuring the success of this approach. Pollen grain sample preparations that come from two different domains and/or contexts pose a data distribution shift problem. The distribution drift appears because of the presence of various sources, including lighting conditions, sensor mounting, and different viewpoints. These factors pose a barrier to the deployment of robust and automated systems on a large scale. For example, Figure 1 shows an example of the distribution/data shift problem for pollen classification [4]. In the upper row, the appearances of the queried pollen grains of a species, Eclipta alba, vary due to conditions in the extracted pollen images from honey. The lower rows display images of the same species collected from the training dataset, which was constructed using carefully selected pollen grain images.

Although CNN-based techniques show promising results for pollen identification on microscope slides, to apply this technique trace botanical origin of the honey, it needs to handle the two following problems: (1) how to solve the data distribution drift problem when reference and query images come from different contexts, and (2) how to verify results of the classifier when pollen grains are extracted from honey. Therefore, this study builds a framework to evaluate this possibility by applying the trace botanical origin of honey. The proposed solution matches the images of the pollen grains extracted from flowers and those extracted from honey in a honeybee farm area. In more detail, we collect a new pollen grain dataset in an urban area in Vietnam. The collected dataset contains 52 species in the area. We then extract pollen grains from honey produced in the same area. To handle the first issue, the proposed method prevents performance degradation using the late fusion and augmentation schema. In the experimental results, the best performance of the fusion method is achieved with pollen grain samples extracted from honey. The second issue is addressed using a consensus agreement of botanical experts. We implement the proposed framework on a website that makes it publicly available (http://mica.edu.vn:50208/, accessed on 12 November 2024). In summary, our contributions are as follows.

We first introduce a new pollen grain dataset consisting of images of 52 pollen grains in an urban area of northern Vietnam.
We present the proposed augmentation data and the fusion schema, combining the results of YOLO11 and Vision Transformer, which helps to handle the data distribution shift problem of the pollen grain images.
The proposed classification allows us to trace the botanical origin of the pollen grains extracted from honey samples. The proposed method achieves an accuracy rate of 70.21% in rank 1 and 92.47% in rank 5. The ground-truth dataset is constructed on the basis of the consensus of the five botanical experts.

The remainder of this paper is organized as follows. Section 2 reviews related works in the literature. Section 3 describes the constructed pollen grain dataset and the proposed method in detail. Section 4 reports the performance of the proposed classifier, as well as the results that identify the botanical origin of honey using only microscopy images of pollen grains. The discussions concerning the proposed method are presented in Section 5. Finally, Section 6 concludes the paper and outlines future directions.

2. Related Work

Identification of the botanical origin of honey ensures its purity and quality, preventing adulteration and protecting consumers from fraud. This activity includes the evaluation of plant–pollinator networks and the reconstruction of ancient plant communities. Examination of pollen found within honey using light microscopy. The works in [5] successfully used real-time PCR to identify different species of plants frequently found within Corsican honey, but this method requires a priori knowledge of the species likely to be found. The authors in [6] verify the authenticity of Hungarian honey using physicochemical analysis, near-infrared spectroscopy, and melissopalynology. They used 87 samples of different botanical origins and achieved a rate of 90% and higher than 55% accuracy in the case of pollen and NIR methods. Recently, nanopore-based DNA sequencing is being deployed in [1], in which pollen metabarcoding is carried out mainly with next-generation sequencing methods that generate short sequence reads. A recent review [7] describes the overview of the challenges of honey authenticity and related analytical methods. These techniques identify pollens in honey by microscopic examination, and pollen reference collections are needed. To the best of our knowledge, there is no literature available on techniques using computer vision-based techniques that have never been reasonably applied for the identification of the origin of honey.

Developing methods for pollen grain classification using image processing has received a great deal of attention over several decades. In palynology, biological experts rely primarily on morphological characteristics to distinguish pollen grains. Therefore, shape, polarity, and symmetry are extracted from images and used to develop automatic pollen classification models. In [8], seven geometric features were extracted using an automated palynology system called Classifynder to investigate variations in pollen morphology between the two most important species of New Zealand honey. In [1], the authors applied machine learning and image processing techniques to segment and classify pollen from environmental samples. The process includes image preparation, segmentation, and extraction of features such as shape, size, and surface structure, followed by the application of machine learning models for classification. However, pollen grains of different species often share similar morphological features, making traditional feature-based methods less effective.

Recently, pollen grain classification has been a focus of several state-of-the-art convolutional neural networks (CNNs). In [9], the authors constructed a dataset with 46 different pollen grain classes. They utilized a pre-trained AlexNet to classify the constructed dataset. Using data augmentation and cross-validation techniques, the result achieved a precision of 98%. Some other work deployed different CNN models on POLLEN-73S, a common dataset composed of 2523 images of 73 pollen types in Brazil [10]. For example, in [11], the authors evaluated eight popular architectures (e.g., InceptionV3, VGG16, ResNet50, NASNet, Xception) in the POLLEN73S dataset. The best performances show that DenseNet201 and ResNet50 outperform the other CNNs tested, achieving an accuracy rate of 97.217% and 94.257%, respectively. In [12], the authors explored transfer learning and data augmentation in the largest pollen class dataset composed of 134 classes. They utilize pre-trained features and deploy two approaches of transfer learning, consisting of feature extraction and fine-tuning. Their experiments on the introduced 134-class pollen dataset suggest that fine-tuning a pre-trained deep CNN applies to pollen classification, achieving up to 96.24% of accuracy. A study by Tahir Mahmood et al. in [13] introduced an attention-guided neural network to classify pollen. This model helps to focus on the key characteristics of pollen and then increases the accuracy of classification. As given from these results, CNNs have been used successfully in the classification of pollen grain images. It is possible to classify the images of pollen grains of different species accurately. They opened a crucial research area in biology and agriculture, with plant traceability and conservation of biodiversity. In particular, in palynology, it shows promising solutions for identifying the botanical origin of pollen grains extracted from honey.

The presence of distribution shifts is a significant challenge for many machine learning models. For example, as shown in Figure 1, the images of the pollen grains were collected from two different domains, one extracted from the flower and the other extracted from the honey. Therefore, they often present challenges due to the presence of the distribution shift, including lighting conditions, sensor mounting, and sample condition preparations. Despite ongoing research efforts, there is no generic algorithm to handle the distribution shift [14]. To mitigate the distribution shift problem, there are some approaches such as pre-training a model on a large source dataset, data augmentation techniques, designing invariant architectures, or data adaptation methods. For natural data sets such as pollen grain microscopy images, a recent work [15] reports that geometric data augmentation techniques mitigate distribution shifts in pollen classification. Adapting from that report, in this study, we deploy image processing techniques to handle this issue. Unlike public datasets with a wide range of context and environmental conditions, such as ImageNet, the pollen datasets collected come from a controlled environment, which makes the natural distribution shift. The predominant distribution shifts arise primarily from biodiversity and the varying species of input pollen grains. Therefore, a unique setting for data augmentation effects to tackle natural distribution shifts.

3. Materials and Methods

The pipeline proposed in this study is illustrated in Figure 2. The main objective is to identify the botanical origin of honey produced on a honeybee farm. To this end, the proposed pipeline consists of four main parts, as follows:

Collecting a pollen grain dataset around the honeybee farm. This dataset is named VNUA-pollen52. Panel (a) in Figure 2 illustrates the steps of dataset collection. More details are described in Section 3.1.
Pollen grains are extracted from honey, as shown in panel (c) of Figure 2. The detailed procedure for extracting pollen grains from honey is described in Section 3.2.
Developing the pollen grain classification and verification method, as shown in panel (b) and panel (d), respectively, in Figure 2. The VNUA-pollen52 dataset can be considered a reference image set of pollen grains from different plant species. To trace the botanical origin of honey, the extracted pollen grains from honey will be matched to the reference images. To address the data distribution drift problem, the schema for improving pollen grain classification is presented in Section 3.3. The procedure for identifying plant species of pollen grains extracted from honey is described in Section 3.4 to verify the results of the proposed method.

3.1. VNUA-Pollen52 Pollen Grains Dataset

In this study, a new dataset is constructed that contains pollen grain images of various plant species, named VNUA-Pollen52. Pollen grain samples were obtained from flowering species collected on the campus of the Vietnam National University of Agriculture (VNUA), an urban area in Hanoi City, Vietnam. The location of this area on the map is shown in Figure 3.

To ensure representativeness and minimize bias in the dataset, we randomly selected at least ten trees of each plant species, each planted at different locations throughout the study area. For each tree with multiple flowers, we randomly sampled one flower from any position—near the base, along the trunk, or near the top. The selected flowers were freshly bloomed, free of contamination, and not wet from dew or rainwater. Each flower sample collected was individually packaged with care and clearly labeled with information on the plant species, location, and collection time. The samples were then safely transported to the laboratory for pollen extraction. These pollen grains were then sprinkled onto a microscope slide, which had been pretreated with a drop of distilled water. A microscope coverslip was gently placed on top to prevent the formation of air bubbles. The prepared sample was then examined using a Nikon YS100 microscope (Nikon Corporation, Tokyo, Japan). A 40× objective lens was used, and the focus was carefully adjusted to ensure that pollen characteristics were identified. A microscope camera mounted on the microscope and connected to a computer running Future WinJoe software (V1.6.5.1129) was used to capture images of pollen grains. For each species, we captured images of multiple pollen grains from different viewing angles to ensure that the collected dataset accurately represents the key characteristics of each plant species. Figure 2a illustrates the entire process of collecting and capturing pollen grains microscopy images extracted from flowers.

In total, a dataset consisting of 11,776 pollen grain images was constructed from 52 different plant species, as presented in Table 1 and Figure 4. Among these, 49 species have more than 100 pollen images, while 3 species have fewer than 100 images due to limitations in the number of collected samples. This dataset includes many common pollen types, such as Lagerstroemia speciosa, Citrus maxima, Solanum lycopersicum, Brassica juncea, Chrysanthemum coronarium, Areca catechu, and many others. Each type of pollen is clearly labeled for classification and cross-referencing in the study process. Building this large and diverse database plays an essential role in training the AI model, enhancing the model’s ability to accurately identify and classify different pollen species. The images are carefully organized and stored to ensure accuracy and reproducibility for future research.

The initially collected dataset contains pollen grain images where the grain is not always centered, and may also include unwanted elements such as excessive background. To address this issue, the data was pre-processed, including cropping and noise reduction, to ensure that only the pollen grain was centered in each image while removing extraneous elements such as the background. This cropping helps standardize the images and makes it easier for the classification model to learn the essential features of the pollen grains. Cropping was performed using the Hough circle detection algorithm, which accurately detects and encloses pollen grains. This process involved three iterative detection rounds with varying circle sizes to selectively retain pollen grains and eliminate irrelevant parts. The VNUA-Pollen52 dataset is made public at https://doi.org/10.5281/zenodo.15377942. In addition, we have provided detailed statistics of our collected dataset along with existing datasets in Table 2.

3.2. Dataset of Pollen Grains Extracted from Honey for Botanical Origin Identification

To facilitate the identification of the botanical origin of honey, we implemented the following steps to construct an image dataset of pollen grains extracted from honey.

Calibrate the balance with a preweighed pipette placed on it.
Use the pipette to extract exactly 10 g of honey from the sample.
Dilute the honey sample with 20 mL of distilled water.
Add 4 to 5 drops of 75% ethanol to the mixture and centrifuge it for 10 min.
After centrifugation, let the sample rest for 8 min to allow a deposit to form at the bottom of the pipette.
Once the sediment becomes visible, the supernatant is discarded, and the deposit is dissolved by mixing it with ethanol and distilled water.
Transfer the resulting solution onto a microscope slide for observation.
A digital camera is mounted on the microscope and connected to a computer to capture images of pollen grains.
Finally, adjust the focus of the microscope until a sharp and well-defined pollen image is visible on the computer screen.

As a result, we collected 139 images of pollen grains present in the honey sample. These images were sent to five plant experts from the Faculty of Agronomy of the Vietnam National University of Agriculture for the identification of plant species. Figure 5 illustrates the result of the extraction of honey pollen and an image of a recovered pollen grain used to identify the botanical origin.

3.3. The Proposed Pollen Grain Classification Method

This research presents a pollen classification system based on the comparative analysis of three state-of-the-art deep neural networks: YOLO11, Vision Transformer (ViT), and MobileNetV3. By conducting a thorough combination of these models, this study finally aims to identify the most effective fusion schema for the accurate and efficient pollen classification task.

3.3.1. Architecture of Base Models

YOLO11 [18] represents a significant advancement in the YOLO family, designed with enhanced architectural features to improve accuracy and adaptability across various computer vision tasks. A crucial innovation is the decoupled detection head, which isolates the classification and localization branches to reduce task interference and increase convergence. Furthermore, YOLO11 supports both anchor-based and anchor-free paradigms, which increases flexibility and reduces dependency on manually set priors. Training is optimized using approaches such as cosine learning rate scheduling, label smoothing, and CIoU-based loss functions. In this work, YOLO11 is used to categorize individual pollen grain images after removing the detection head and replacing it with a classification layer.

Vision Transformer (ViT) [19] introduces a novel approach to image classification by leveraging the transformer architecture, originally developed for natural language processing tasks. Instead of using convolutional layers to extract local features, ViT splits the input image of size

H \times W \times C

into a sequence of non-overlapping patches, typically of size

P \times P

. Each patch is flattened into a vector and passed through a trainable linear projection to obtain patch embeddings of a fixed dimension. These embeddings are then prepended with a learnable class token and combined with positional encodings to retain spatial information. The generated sequence is generated through several transformer encoder layers, each with a multi-head self-attention (MSA) mechanism and a feed-forward network (FFN), surrounded by residual connections and layer normalization. The model’s self-attention mechanism detects long-range connections and global contextual relationships across all image patches, allowing ViT to learn structural and semantic representations more effectively than traditional convolutional neural networks. Finally, the output corresponding to the class token is processed by a classification head, which is often a multilayer perceptron (MLP), to produce the final class prediction [19,20].

MobileNetV3 [21] is a convolutional neural network architecture specifically designed for efficient computation on mobile and embedded vision systems. A key innovation of MobileNet is the use of separable depthwise convolutions, which decompose standard convolutions into a depthwise convolution followed by a pointwise (

1 \times 1

) convolution. This factorization significantly reduces the number of parameters and floating-point operations while maintaining competitive accuracy for image classification tasks. To further optimize for resource-constrained environments, MobileNet introduces two hyperparameters: the width multiplier, which controls the number of channels in each layer, and the resolution multiplier, which scales the input image resolution. These parameters enable a trade-off between latency, model size, and accuracy, allowing deployment across a wide range of hardware capabilities. Due to its compact design and low computational footprint, MobileNetV3 is particularly well-suited for real-time applications on mobile devices and embedded platforms [22].

3.3.2. Augmentation Strategy

After data preprocessing as described in Section 3.1, the distribution of images across classes remained imbalanced, with a few species represented by only around 50 images, while others had more than 500. The imbalance in data distribution between classes presents challenges for model training, as the model can easily overfit classes with more data and struggle to learn effectively from classes with less data. To address this issue, we applied data augmentation techniques to ensure that each class contains at least 200 images. This approach helps to mitigate class imbalance and enhance the diversity of the training dataset, thereby improving the model’s generalization ability and overall accuracy.

To enhance the robustness of the model and improve its ability to generalize across different imaging conditions, we adopt a comprehensive augmentation pipeline using the Albumentations library. Each input image was subjected to a series of transformations that simulate potential real-world variations in microscopy imaging. These included geometric transformations such as affine warping (scale, translation, rotation, and shear), elastic deformation, and grid distortion to mimic possible variations in sample positioning and lens distortions. Moreover, to address domain shifts due to lighting and color differences, we applied color jittering (brightness, contrast, saturation, and hue adjustments), gamma correction, and CLAHE (Contrast Limited Adaptive Histogram Equalization). Furthermore, several forms of image degradation were simulated via Gaussian, motion, and median blurs, as well as Gaussian noise and coarse dropout. Finally, normalization and horizontal flipping were applied. Each original image was augmented three times, significantly increasing data diversity and domain robustness during training. This configuration was selected after empirical testing with multiple augmentation strategies and consistently yielded the highest accuracy on cross-domain validation sets.

3.3.3. Fusion Schema

Due to the data distribution drift problem, classifier performance significantly degrades when testing/querying images collected from different contexts of training data. In this study, a key component is a late fusion schema, which combines predictions from different models to improve the accuracy of the classification. Let

s c o r e_{C_{k}} (I_{i}, S_{j})

denote the confidence scores assigned by the classifiers

C_{k}

to the prediction that a query image

I_{i}

belongs to pollen species

S_{j}

, these scores are used to compute the fused confidence scores. The fused confidence scores are calculated using fusion methods based on the max, sum, product, and hybrid rules, which are, respectively, denoted as

s c o r e_{m a x} (I_{i}, S_{j})

,

s c o r e_{s u m} (I_{i}, S_{j})

,

s c o r e_{p r o d} (I_{i}, S_{j})

and

s c o r e_{h y b r i d} (I_{i}, S_{j})

, and are described in the following. Here, N denotes the number of base classifiers used.

Max Rule: This rule selects the highest confidence score from among the individual confidence scores provided by the base classifiers N, based on the intuition that the most confident prediction is the most representative of the species.

$s c o r e_{m a x} (I_{i}, S_{j}) = {max}_{k = 1}^{N} s c o r e_{C_{k}} (I_{i}, S_{j})$

(1)
Sum Rule: By summing the confidence scores from multiple classifiers, this rule takes advantage of their complementary strengths to improve the recognition accuracy for each species.

$s c o r e_{s u m} (I_{i}, S_{j}) = \sum_{k = 1}^{N} s c o r e_{C_{k}} (I_{i}, S_{j})$

(2)
Product Rule: Given the assumption of statistical independence among the classifiers, this rule computes the fused confidence score for each species by multiplying the corresponding confidence scores of all the classifiers.

$s c o r e_{p r o d} (I_{i}, S_{j}) = \prod_{k = 1}^{N} s c o r e_{C_{k}} (I_{i}, S_{j})$

(3)
Hybrid fusion approach: This fusion utilizes the probability that a query image belongs to each species by a classifier. This probability is combined with the confidence score of each classifier. It is learned from a set of clean query images. To avoid the data shift problem, this clean set is constructed from images that have the same condition as the training images (to train the classifier $C_{k}$ ). Let us denote the probability $p r o b_{C_{k}} (S_{j})$ that a query image is a true sample of the species $S_{j}$ by classifier $C_{k}$ .

$s c o r e_{h y b r i d} (I_{i}, S_{j}) = \sum_{k = 1}^{N} p r o b_{C_{k}} (S_{j}) \cdot s c o r e_{C_{k}} (I_{i}, S_{j})$

(4)

3.4. Constructing a Subset of Data for Botanical Origin Identification

To build a reliable ground truth for our pollen classification dataset, we developed a custom web-based annotation tool designed to support expert labeling, as shown in Figure 6. This platform allows botanical experts to annotate pollen images by reviewing model-generated predictions. For each unlabeled image, predictions from three different classifiers—YOLO11, Vision Transformer (ViT), and MobileNetV3—are presented, showing their top five candidate classes ranked by confidence scores. A panel of five professional biologists independently examines these predictions and selects the most likely class from the suggested list. Experts identify pollen grain species using microscopic examination and compare morphological characteristics such as size, shape, and surface texture with known samples. The selected classes are ranked according to the expert’s confidence, with the first choice considered as the top-1 annotation. This multi-expert voting process helps mitigate uncertainty in labeling ambiguous samples and ensures a more robust and representative ground truth. The final annotations, including image filenames and expert-ranked class choices, are exported in JSON format for downstream model training and evaluation.

In this research, for each image, an expert sequentially suggested five species in descending order of priority from rank 1 to rank 5 (the first species suggested, corresponding to rank 1, has the highest priority, while the last suggested species, corresponding to rank 5, has the lowest priority). The level of consensus of five experts to identify the image

I_{i}

(i = 1.139)

as belonging to the species

S_{j} (j = 1.52)

, denoted as

a_{i}^{j}

, is determined by the following formula:

a_{i}^{j} = \sum_{k = 1}^{5} \frac{n_{k}}{5} w_{k}

(5)

where:

$n_{k}$ : cumulative count of times experts identified an image $I_{i}$ as belonging to species $S_{j}$ from rank 1 to rank k.
$w_{k}$ : weighting factor for the level of consensus in rank k to identify the image $I_{i}$ as belonging to species $S_{j}$ . Here, after considering various sets of values, we set the values of $k = 5$ , and the weights $w_{1}$ , $w_{2}$ , $w_{3}$ , $w_{4}$ , and $w_{5}$ are assigned as 0.5, 0.3, 0.1, 0.05, and 0.05, respectively.

Finally, the species label

S_{j_{*}}

is assigned to the image

S_{i}

if it has the highest consensus level among all consensus levels

a_{i}^{j}

(j = 1.52)

that satisfy the condition

a_{i}^{j} \geq 0.7

:

j^{*} = arg max_{j} {a_{i}^{j} ∣ a_{i}^{j} \geq 0.7}

(6)

As a result, we obtained a dataset consisting of 93 pollen grain images labeled with the actual plant species (ground truth).

4. Results

4.1. Pollen Grain Classification Setting

The training and testing processes were conducted on a dedicated computational setup consisting of two Intel(R) Xeon(R) Silver 4114 CPUs running at

2.20

GHz and equipped with 64 GB of RAM. Additionally, the environment used a single NVIDIA RTX A4000 GPU (NVIDIA Corporation, Santa Clara, CA, USA) with 16 GB of VRAM to accelerate deep learning computations. Both the model training and testing phases were executed within this configuration, ensuring consistency in hardware resources in all experimental evaluations. To train and test each classifier, the VNUA-Pollen52 dataset was split into training, validation, and test subsets in an 8:1:1 ratio. This split was performed entirely at random using the random module from the Python Standard Library.

The experiments involved training three distinct deep learning architectures, including MobileNetV3, YOLO11, and Vision Transformer (ViT). For the YOLO11-based classification model, we used the pre-trained weights provided by the yolo11x-cls.pt (https://docs.ultralytics.com/tasks/classify/, accessed on 26 October 2024) checkpoint. The model was trained for a total of 37 epochs, using a batch size of 128, an input image size of

224 \times 224

pixels, and a learning rate of

0.01

. In contrast, the ViT-based approach leveraged the pretrained model google/vit-base-patch16-224-in21k (https://huggingface.co/google/vit-base-patch16-224-in21k, accessed on 26 October 2024), fine-tuned over 38 epochs using a batch size of 128 with an input image size of

224 \times 224

pixels and a significantly lower learning rate of

2 \times 10^{- 5}

. We hypothesize that prolonged training causes the model to overfit domain-specific artifacts, particularly those subtle yet consistent within a single data source (e.g., lighting gradients, texture noise, or microscope-specific distortions). In contrast, training for a limited number of epochs helps the model retain more generalizable features, avoiding reliance on domain-specific cues. This phenomenon highlights the importance of early-domain generalization, where shorter training not only acts as a regularizer but also preserves cross-domain performance in microscopic pollen classification. These hyperparameter choices were selected based on preliminary experiments to optimize performance and convergence behavior for pollen image classification.

4.2. Evaluation Metrics

To evaluate the performance of pollen image classification models, several standard metrics derived from the confusion matrix are employed, including micro-averaged precision, micro-averaged recall, micro-averaged F1-score, top-k accuracy, and macro-averaged accuracy. Let K denote the total number of classes and let n denote the total number of samples.

T P_{i}

,

F P_{i}

, and

F N_{i}

represent the numbers of true positives, false positives, and false negatives for class i, respectively, where

i = 1, 2, \dots, K

. The definitions of these evaluation metrics are given below.

-: Micro-averaged precision [23]: This metric measures the ratio of true positives to the total number of predicted positives, aggregated over all classes. It is calculated as follows:

${Precision}_{micro} = \frac{\sum_{i = 1}^{K} {TP}_{i}}{\sum_{i = 1}^{K} ({TP}_{i} + {FP}_{i})}$

(7)
-: Micro-averaged recall [23]: This metric measures the ratio of true positives to the total number of actual positives, aggregated over all classes. It is calculated as follows:

${Recall}_{micro} = \frac{\sum_{i = 1}^{K} {TP}_{i}}{\sum_{i = 1}^{K} ({TP}_{i} + {FN}_{i})}$

(8)
-: Micro-averaged F1-score [23]: This metric is the harmonic mean of precision and recall, providing a balanced measure of classification performance across all classes. It is calculated as follows:

$\begin{matrix} {F 1 - score}_{micro} & = \frac{2 \cdot Precision \cdot Recall}{Precision + Recall} \end{matrix}$

(9)
-: Top-k accuracy [24]: This metric is commonly used to evaluate image classification models, especially in the ImageNet Large Scale Visual Recognition Challenge. Top-k accuracy considers a prediction correct if the true class appears among the k classes with the highest predicted probabilities. It is calculated as follows:

${Accuracy}^{top - k} = \frac{1}{n} \sum_{i = 1}^{n} δ (y_{i} \in {\hat{Y}}_{i}^{(k)})$

(10)

Here, $y_{i}$ is the true label, ${\hat{Y}}_{i}^{(k)}$ is the set of the top-k predicted labels for the i-th sample, and $δ (\cdot)$ is the indicator function (which returns 1 if the true label $y_{i}$ is in the top-k predictions ${\hat{Y}}_{i}^{(k)}$ , and 0 otherwise).
-: Macro-averaged accuracy [25]: This metric involves computing the accuracy of each class separately and then averaging the results of all classes. It is calculated as follows:

${Accuracy}_{marco} = \frac{1}{K} \sum_{i = 1}^{K} {Accuracy}_{i}^{top - 1}$

(11)

where ${Accuracy}_{i}^{top - 1}$ represents the best accuracy for class i.

4.3. Pollen Grain Classification Results

We perform evaluations using the testing set comprising 1000 images (approximately a ratio of 10%) extracted from the VNUA-Pollen52 dataset. It is noticeable that these images were captured in controlled environments similar to those used in training (or the same flower species). As shown in Table 3, among the three models tested, ViT performed best, achieving the highest scores on most evaluation metrics, all exceeding 90%. Notably, ViT reached a top accuracy of 93%, which is 4% higher than the second-best model, YOLO11, and achieved a micro-averaged precision of 90%, surpassing YOLO11 by 3%. However, both YOLO11 and Vision Transformer (ViT) generally achieved consistently high performance on this test set. Accordingly, they both achieved the highest scores on macro-averaged accuracy and top-five accuracy, at 96% and 99%, respectively. The MobileNetV3 model, with its relatively simple architecture and limited number of parameters, exhibited the lowest performance across all metrics. For example, in terms of top accuracy, MobileNetV3 achieved only 79%, which is 14% lower than the highest value achieved by ViT. Nevertheless, given its small model size and short inference time, MobileNetV3 can be considered a potential option for deployment in real-time systems or on resource-constrained devices. However, within the scope of this study, since we are more concerned with achieving high recognition accuracy to support the verification of the botanical origin of honey, we focus mainly on two stronger models, YOLO11 and ViT, in the subsequent steps.

The confusion matrix of YOLO11 and ViT are shown in Figure 7 and Figure 8, respectively. It is noticed that these results are achieved even without applying any data augmentation schema. These results are also in consensus with common reports in the literature (e.g., pollen grain challenge reported in [10]). The results also indicate strong learning capacity and accurate classification within the same data distribution.

4.4. Identification of Pollen Grains Extracted from Honey

To evaluate the ability of the proposed models to recognize pollen grains extracted from honey, we used the accuracy of top-k defined in (10) where k is the

r a n k

of the corrected species. In this experiment, we calculated the recognition accuracy at five different ranks, from rank 1 to rank 5, with a total number of samples n—that is, the number of pollen grain images requiring identification—being 93.

The experimental results of three individual models based on the YOLO11, Vision Transformer, and MobileNetV3 architectures are presented in Table 4. Among the three models, MobileNetV3 still achieved the lowest recognition accuracy, reaching 22.58% at rank 1 and 45.16% at rank 5. By contrast, YOLO11 and ViT both yielded relatively high recognition accuracies, particularly at ranks 3, 4, and 5, all exceeding 80%. Specifically, ViT achieved the highest recognition accuracy at rank 1 (62.37%), while YOLO11 attained the highest values at rank 2 (78.49%), rank 3 (86.02%), and rank 4 (86.02%), and both models reached the same highest accuracy at rank 5 (87.10%). Nevertheless, compared to the results presented in Table 3, these values remain significantly lower at both rank 1 and rank 5. This aligns with the concern that model performance would substantially decrease due to data drift—a common challenge in practical deployments, where classifiers trained on pollen grains extracted from flowers are subsequently used to recognize pollen grains from previously unseen domains (i.e., honey).

The results in Table 4 also demonstrate the effectiveness of the proposed data augmentation techniques in mitigating the performance degradation caused by distribution shift. Without data augmentation, ViT achieved a recognition accuracy of only 55.37% at rank 1, whereas with data augmentation, this value increased by more than 7%, reaching 62.37%. The benefits of data augmentation techniques were not limited to rank 1, as improvements were also observed at higher ranks. Comparable trends were observed for the YOLO11 classifier, although its recognition accuracies at ranks 4 and 5 did not improve significantly over rank 3. The results obtained by the single models with data augmentation are also illustrated in Figure 9.

Additionally, the results of the fusion schemes are presented in Table 4 and Figure 10. These findings indicate that the fusion models tend to achieve higher recognition accuracies than the single models across most ranks. The proposed hybrid fusion scheme obtained the highest recognition accuracy at rank 1 (70.21%), rank 2 (82.80%), and rank 3 (89.25%), which surpasses the results achieved by any single classifier, regardless of whether data augmentation was applied. The product rule-based fusion model achieved notable recognition accuracies at rank 4 (91.45%) and rank 5 (94.62%). Although there was a considerable disparity in recognition accuracy at rank 1, the differences among the values achieved by the four fusion schemes at ranks 4 and 5 were negligible.

To facilitate visual observation of pollen grain recognition results using classifiers such as YOLO11, ViT, MobileNetV3, and the hybrid model, we designed a web-based identification as illustrated in Figure 11. Given a pollen grain image extracted from a honey sample as input, each model outputs a list of five predicted plant species in descending order of confidence. The website is available at http://mica.edu.vn:50208/, accessed on 12 November 2024.

5. Discussion

To the best of our knowledge, this study is the first work using computer vision-based techniques to identify the botanical origin of honey by analyzing pollen grains. To illustrate this method, we constructed a new dataset of pollen images collected from flowers in an urban area of Hanoi City, Vietnam. A pollen grain classifier that utilizes recent advanced deep learning models such as YOLO11, Vision Transformer, and MobileNetV3 to classify pollen extracted from honey samples. A fusion schema and data augmentation strategies were used to address data distribution shifts between reference images and those from honey. Botanical experts were consulted to create a ground truth dataset, and the system achieved promising accuracy in identifying pollen plant species. There are some critical points to be investigated further.

5.1. Low Performances of the Classification at Top Ranks

Although the proposed methods can achieve high accuracy at higher ranks (e.g., up to 94.62% at rank 5 with product rule, up to 92.47% with max rule and hybrid), the accuracy for identifying the single correct species (rank 1 accuracy) from honey samples is considerably lower. Individual models achieved the best rank 1 accuracy of only 62.37% (ViT). Even with a fusion scheme that combined the model predictions, the highest achieved rank 1 accuracy on honey samples was only 70.21% (hybrid fusion). This indicates that while the correct species is often among the top predicted options, determining the most likely species as the single best prediction remains a challenge compared to controlled laboratory settings. More improvements are needed to solve the distribution drift problem for pollen grain identification. Although this is highlighted as a significant challenge and a critical issue in this study, the techniques such as data augmentation and fusion schemes are not yet robust enough to fully overcome distribution shifts.

In this work, we evaluated the consensus agreements of five experts. However, a method such as DNA sequencing or chemical-based pollen grain analysis needs to be further examined. Some related techniques, as denoted in Section 2, can help to verify the results of the expertise.

5.2. Tracing Based on Geographical Information

Although melissopalynology can determine geographic origin, the proposed framework, based on computer vision, is demonstrated for a particular location. Applying this method to different geographies would probably necessitate the creation of representative reference data sets for these regions and potentially require further research to address the challenges posed by variations in pollen appearance in diverse environments. This study opens the way for the construction of an online pollen grain identification system. It involves leveraging advanced computer vision techniques, particularly online training models. Therefore, it would likely require significant additional data and potentially an adaptation of the approach. This extension helps to cover diverse geographical regions.

6. Conclusions

Identifying botanical origin based on taxonomic information coming from their pollen grains offers many applications within various biological disciplines. Based on matching microscopy images of pollen grains, this study developed a feasible solution that takes advantage of recent CNN-based classifiers. Since the pollen grain samples are extracted from different contexts, such as from the original flowers and honey, the proposed method handles data distribution drift issues in the verification procedure. To achieve high performance in the identification procedure using the proposed classifiers, we adopted the augmentation and fusion schema of individual deep learning models such as YOLO11, ViT, and MobileNetV3. The experimental results showed that the performance of the augmentation and fusion schema mitigates the data distribution drift problem, increasing the performance from 55.37% to 70.21%, allowing plant traceability through direct pollen classification. The dataset collected in this study VNUA-Pollen52 is public, and pollen grain images are available online. This solution would help automate traceability processes and support ecological conservation in agriculture, opening up broad application potential in the future.

Author Contributions

Conceptualization, H.-T.P. and H.V.; data curation, D.-M.N.; formal analysis, H.-T.P. and H.V.; funding acquisition, H.V.; investigation, T.-N.L., A.-C.G., and H.V.; methodology, H.-T.P., T.-L.L., and H.V.; project administration, T.-L.L. and H.V.; resources, D.-M.N. and H.-T.P.; software, T.-N.L., D.-M.N., and T.-L.L.; validation, D.-M.N. and A.-C.G.; visualization, T.-N.L., D.-M.N., A.-C.G., and T.-L.L.; writing—original draft, T.-N.L., D.-M.N., and H.V.; writing—review and editing, T.-L.L. and H.V. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Ministry of Education and Training (MOET) under grant number B2023-BKA-10.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The VNUA-Pollen52 dataset is available at https://doi.org/10.5281/zenodo.15377942.

Acknowledgments

This research is funded by the Ministry of Education and Training (MOET) under grant number B2023-BKA-10 “Research on developing digital atlas and tools for the trace of the origin of plants in pollen and honey, applying in the quality management of bee products in Vietnam”.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

YOLO11	Yolo version 11 neural network
ViT	Vision Transformer neural network
MobineNetV3	Mobinet Version 3 neuronal network

References

Prudnikow, L.; Pannicke, B.; Wünschiers, R. A primer on pollen assignment by nanopore-based DNA sequencing. Front. Ecol. Evol. 2023, 11, 1112929. [Google Scholar] [CrossRef]
Viertel, P.; König, M. Pattern recognition methodologies for pollen grain image classification: A survey. Mach. Vis. Appl. 2022, 33, 18. [Google Scholar] [CrossRef]
Battiato, S.; Guarnera, F.; Ortis, A.; Trenta, F.; Ascari, L.; Siniscalco, C.; De Gregorio, T.; Suárez, E. Pollen Grain Classification Challenge 2020. In Proceedings of the Pattern Recognition. ICPR International Workshops and Challenges, Virtual Event, 10–15 January 2021; Del Bimbo, A., Cucchiara, R., Sclaroff, S., Farinella, G.M., Mei, T., Bertini, M., Escalante, H.J., Vezzani, R., Eds.; Springer: Cham, Switzerland, 2021; pp. 469–479. [Google Scholar]
Cao, N.; Meyer, M.; Thiele, L.; Saukh, O. Pollen video library for benchmarking detection, classification, tracking and novelty detection tasks: Dataset. In Proceedings of the Third Workshop on Data: Acquisition To Analysis, Virtual Event, 16–19 November 2020; pp. 23–25. [Google Scholar]
Laube, I.; Hird, H.; Brodmann, P.; Ullmann, S.; Schöne-Michling, M.; Chisholm, J.; Broll, H. Development of primer and probe sets for the detection of plant species in honey. Food Chem. 2010, 118, 979–986. [Google Scholar] [CrossRef]
Bodor, Z.; Kovacs, Z.; Benedek, C.; Hitka, G.; Behling, H. Origin Identification of Hungarian Honey Using Melissopalynology, Physicochemical Analysis, and Near Infrared Spectroscopy. Molecules 2021, 26, 7274. [Google Scholar] [CrossRef] [PubMed]
Zhang, X.-H.; Gu, H.-W.; Liu, R.-J.; Qing, X.-D.; Nie, J.-F. A comprehensive review of the current trends and recent advancements on the authenticity of honey. Food Chem. X 2023, 19, 100850. [Google Scholar] [CrossRef] [PubMed]
Holt, K.A.; Bebbington, M.S. Separating morphologically similar pollen types using basic shape features from digital images: A preliminary study1. Appl. Plant Sci. 2014, 2, 1400032. [Google Scholar] [CrossRef] [PubMed]
Sevillano, V.; Holt, K.; Aznarte, J.L. Precise automatic classification of 46 different pollen types with convolutional neural networks. PLoS ONE 2020, 15, e0229751. [Google Scholar] [CrossRef] [PubMed]
Astolfi, G.; Gonçalves, A.B.; Menezes, G.V.; Borges, F.S.B.; Astolfi, A.C.M.N.; Matsubara, E.T.; Alvarez, M.; Pistori, H. POLLEN73S: An image dataset for pollen grains classification. Ecol. Inform. 2020, 60, 101165. [Google Scholar] [CrossRef]
Garga, B.; Abboubakar, H.; Sourpele, R.S.; Gwet, D.L.L.; Bitjoka, L. Pollen Grain Classification Using Some Convolutional Neural Network Architectures. J. Imaging 2024, 10, 158. [Google Scholar] [CrossRef] [PubMed]
Geus, A.R.d.; Barcelos, C.A.; Batista, M.A.; Silva, S.F.d. Large-scale Pollen Recognition with Deep Learning. In Proceedings of the 2019 27th European Signal Processing Conference (EUSIPCO), A Coruna, Spain, 2–6 September 2019; pp. 1–5. [Google Scholar]
Mahmood, T.; Choi, J.; Park, K.R. Artificial intelligence-based classification of pollen grains using attention-guided pollen features aggregation network. J. King Saud Univ.-Comput. Inf. Sci. 2023, 35, 740–756. [Google Scholar] [CrossRef]
Yao, H.; Choi, C.; Cao, B.; Lee, Y.; Koh, P.W.W.; Finn, C. Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 28 November 2022; Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A., Eds.; Curran Associates, Inc.: New York, NY, USA, 2022; Volume 35, pp. 10309–10324. [Google Scholar]
Cao, N.; Saukh, O. Mitigating Distribution Shifts in Pollen Classification from Microscopic Images Using Geometric Data Augmentations. In Proceedings of the In the proceeding of the 29th IEEE International Conference on Parallel and Distributed Systems, Ocean Flower Island, China, 17–21 December 2023. [Google Scholar]
Gonçalves, A.B.; Souza, J.S.; Silva, G.G.d.; Cereda, M.P.; Pott, A.; Naka, M.H.; Pistori, H. Feature extraction and machine learning for the classification of Brazilian Savannah pollen grains. PLoS ONE 2016, 11, e0157044. [Google Scholar] [CrossRef] [PubMed]
Battiato, S.; Ortis, A.; Trenta, F.; Ascari, L.; Politi, M.; Siniscalco, C. Pollen13k: A large scale microscope pollen grain image dataset. In Proceedings of the 2020 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 25–28 October 2020; pp. 2456–2460. [Google Scholar]
Khanam, R.; Hussain, M. YOLOv11: An Overview of the Key Architectural Enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar] [CrossRef]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Khan, S.; Naseer, M.; Hayat, M.; Zamir, S.W.; Khan, F.S.; Shah, M. Transformers in vision: A survey. ACM Comput. Surv. (CSUR) 2022, 54, 1–41. [Google Scholar] [CrossRef]
Howard, A.G. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for MobileNetV3. arXiv 2019, arXiv:1905.02244. [Google Scholar] [CrossRef]
Takahashi, K.; Yamamoto, K.; Kuchiba, A.; Koyama, T. Confidence interval for micro-averaged F 1 and macro-averaged F 1 scores. Appl. Intell. 2022, 52, 4961–4972. [Google Scholar] [CrossRef] [PubMed]
Terven, J.; Cordova-Esparza, D.M.; Romero-González, J.A.; Ramírez-Pedraza, A.; Chávez-Urbiola, E. A comprehensive survey of loss functions and metrics in deep learning. Artif. Intell. Rev. 2025, 58, 195. [Google Scholar] [CrossRef]
Wang, G.; Lochovsky, F.H. Feature selection with conditional mutual information maximin in text categorization. In Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, Washington, DC, USA, 8–13 November 2004; pp. 342–349. [Google Scholar]

Figure 1. The pollen grains of a species named Eclipta alba. The distribution shift in a real situation reduces the performance of pollen grain classification. Upper row: image of pollen grains extracted from honey, lower row: collected data in training conditions. These images are cleaner and sharper than those in the upper row.

Figure 2. The proposed pipeline for identifying the botanical origin of honey: (a) collecting pollen grains from flowers; (b) classifying the pollen grains; (c) extracting pollen grains from honey; (d) verifying the pollen grains by experts.

Figure 3. Map of the pollen collection area.

Figure 4. Images of pollen grains collected from flowers of 52 plant species, with their species IDs and scientific names provided in Table 1.

Figure 5. Results of pollen grain extracted from honey. (a) Sediment at the bottom of the pipette; (b) Microscopic image of the pollen; (c) The cropped image for botanical origin identification.

Figure 6. A web-based annotation platform used for expert labeling. The system displays unlabeled pollen images along with sets of the top five predictions from three different classification models. Experts sequentially select the five candidates they consider most appropriate. The selected images are marked in shadow and indicated priority ranks from 1 to 5 (as given numbers below each selection). Finally, this selection information is exported to a result file.

Figure 7. Confusion matrix of the YOLO11 classifier. The label of each class consists of the first four characters of the scientific name. Please refer Table 1 for the full name of each species.

Figure 8. Confusion matrix of the ViT classifier. The label of each class consists of the first four characters of the scientific name. Please refer Table 1 for the full name of each species.

Figure 9. The accuracy results aggregated over all species of three different models, MobileNetV3, YOLO11, and ViT, with the data augmentation. The error bar indicates the variability from mean values. The lowest value is the worst species, whereas the highest value is the performance of the best species.

Figure 10. Classification results of pollen grain images (extracted from honey) using the fusion models versus the best performance of a single model (in this case, ViT). An error bar is added to the best fusion model, indicating the performance range from the species with the lowest to the highest recognition accuracy.

Figure 11. (a) An example query image. (b) Results of the query image in rank 1 to rank 5 with three models, YOLO11, ViT and MobileNetV3. In each image result, the name of the species is denoted in a short form using 4 characters. Their corresponding scientific names are referred to Table 1.

Table 1. List of 52 plant species in VNUA-Pollen52 dataset.

ID	Scientific Name	Number of Images
1	Lagerstroemia speciosa	216
2	Citrus maxima	223
3	Solanum lycopersicum	489
4	Brassica juncea	69
5	Chrysanthemum coronarium	252
6	Millettia speciosa	275
7	Areca catechu	203
8	Tagetes patula	247
9	Ageratum conyzoides	189
10	Hibiscus rosa-sinensis	160
11	Allospondias lakonensis	223
12	Carica papaya	401
13	Bombax ceiba	210
14	Prunus persica	139
15	Bougainvillea spectabilis	593
16	Rosa chinensis	181
17	Portulaca grandiflora	206
18	Manilkara zapota	242
19	Ocimum basilicum	178
20	Averrhoa carambola	212
21	Arachis hypogaea	211
22	Spathiphyllum patinii	203
23	Lilium longiflorum	216
24	Hippeastrum puniceum	219
25	Barringtonia acutangula	207
26	Oryza sativa	221
27	Punica granatum	205
28	Macadamia integrifolia	249
29	Bauhinia variegata	50
30	Luffa aegyptiaca	144
31	Zea mays	302
32	Michelia alba	214
33	Murraya paniculata	210
34	Dimocarpus longan	225
35	Eclipta alba	204
36	Psidium guajava	221
37	Capsicum frutescens	226
38	Delonix regia	219
39	Coriandrum sativum	235
40	Ipomoea aquatica	235
41	Anethum graveolens	257
42	Wedelia calendulacea	208
43	Nelumbo nucifera	231
44	Sechium edule	341
45	Dalbergia tonkinensis	224
46	Ziziphus mauritiana	208
47	Dracaena fragrans	233
48	Muntingia calabura	253
49	Litchi chinensis	204
50	Khaya senegalensis	203
51	Euphorbia millii	90
52	Bidens pilosa	200

Table 2. Detailed statistics of pollen image datasets.

Dataset	Number of Plant Taxa	Number of Images	Image Modality	Method	Performance
VNUA-Pollen52	52 species	11,776	Microscopy	YOLO11, Vision Transformer, MobileNetV3	Precision: 93%, Recall: 93%, F1-score: 91%, Accuracy: 96%
Pollen23E [16]	23 species	805	Microscopy	Transfer Learning + Feature Extraction + Linear Discriminant	Precision: 94.77%, Recall: 99.64%, F1-score: 96.69%
Pollen13K [17]	3 plant taxa	13,353	Microscopy	RBF SVM + HOG features	Accuracy: 86.58%, F1-score: 85.66%
Pollen73S [10]	73 pollen types	2523	Microscopy	DenseNet-201	Precision: 95.7%, Recall: 95.7%, F1-score: 96.4%
New Zealand Pollen [9]	46 pollen types	19,500	Microscopy	CNN	Precision: 97.9%, Recall: 97.8%, F1-score: 97.8%
Graz pollen [4]	16 pollen types	35,000 (per type)	Microscopy	MobileNet-v2, ResNet, EfficientNet, DenseNet	Mean Accuracy: 82–88%

Table 3. Experimental results of the three classifiers YOLO11, ViT, and MobileNetV3 on the test set.

Evaluation Metric	Model
	YOLO11	ViT	MobileNetV3
Micro-averaged precision ↑	0.90	0.93	0.89
Micro-averaged recall ↑	0.92	0.93	0.91
Micro-averaged F1-score ↑	0.89	0.91	0.88
Top accuracy ↑	0.89	0.93	0.79
Top-5 accuracy ↑	0.99	0.99	0.91
Macro-averaged accuracy ↑	0.96	0.96	0.90
Model size (MB) ↓	343.4	57.1	9.4
Time (ms) ↓	379	215	173

The arrows indicate the desired direction for each metric.

Table 4. Identification accuracy of the different classifier models applied to the pollen grains images extracted from honey.

Model	Rank 1	Rank 2	Rank 3	Rank 4	Rank 5
Single classification model without data augmentation
YOLO11	52.31	73.42	80.02	83.02	86.10
ViT	55.37	72.41	78.15	85.22	87.80
MobileNetV3	20.16	32.38	34.86	38.06	40.14
Single classification model with data augmentation
YOLO11	55.91	78.49	86.02	86.02	87.10
ViT	62.37	74.19	80.65	83.87	87.10
MobileNetV3	22.58	35.48	40.86	45.16	45.16
The fusion schema
Max rule	62.37	70.97	80.35	90.20	92.47
Sum rule	64.52	73.12	83.87	89.25	91.40
Product rule	63.44	80.25	87.10	91.45	94.62
Hybrid	70.21	82.80	89.25	91.25	92.47

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Le, T.-N.; Nguyen, D.-M.; Giang, A.-C.; Pham, H.-T.; Le, T.-L.; Vu, H. Identification of Botanical Origin from Pollen Grains in Honey Using Computer Vision-Based Techniques. AgriEngineering 2025, 7, 282. https://doi.org/10.3390/agriengineering7090282

AMA Style

Le T-N, Nguyen D-M, Giang A-C, Pham H-T, Le T-L, Vu H. Identification of Botanical Origin from Pollen Grains in Honey Using Computer Vision-Based Techniques. AgriEngineering. 2025; 7(9):282. https://doi.org/10.3390/agriengineering7090282

Chicago/Turabian Style

Le, Thi-Nhung, Duc-Manh Nguyen, A-Cong Giang, Hong-Thai Pham, Thi-Lan Le, and Hai Vu. 2025. "Identification of Botanical Origin from Pollen Grains in Honey Using Computer Vision-Based Techniques" AgriEngineering 7, no. 9: 282. https://doi.org/10.3390/agriengineering7090282

APA Style

Le, T.-N., Nguyen, D.-M., Giang, A.-C., Pham, H.-T., Le, T.-L., & Vu, H. (2025). Identification of Botanical Origin from Pollen Grains in Honey Using Computer Vision-Based Techniques. AgriEngineering, 7(9), 282. https://doi.org/10.3390/agriengineering7090282

Article Menu

Identification of Botanical Origin from Pollen Grains in Honey Using Computer Vision-Based Techniques

Abstract

1. Introduction

2. Related Work

3. Materials and Methods

3.1. VNUA-Pollen52 Pollen Grains Dataset

3.2. Dataset of Pollen Grains Extracted from Honey for Botanical Origin Identification

3.3. The Proposed Pollen Grain Classification Method

3.3.1. Architecture of Base Models

3.3.2. Augmentation Strategy

3.3.3. Fusion Schema

3.4. Constructing a Subset of Data for Botanical Origin Identification

4. Results

4.1. Pollen Grain Classification Setting

4.2. Evaluation Metrics

4.3. Pollen Grain Classification Results

4.4. Identification of Pollen Grains Extracted from Honey

5. Discussion

5.1. Low Performances of the Classification at Top Ranks

5.2. Tracing Based on Geographical Information

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI