Pollen-YOLO: A Deep Learning Framework for Automated Pollen Identification and Its Application to Palaeoecological Reconstruction on the Tibetan Plateau

Shi, Xuan; Hou, Guangliang; Wang, Fubo; Li, Hongyu

doi:10.3390/quat9010006

Open AccessArticle

Pollen-YOLO: A Deep Learning Framework for Automated Pollen Identification and Its Application to Palaeoecological Reconstruction on the Tibetan Plateau

by

Xuan Shi

^1,*,

Guangliang Hou

^2,*,

Fubo Wang

³ and

Hongyu Li

⁴

¹

Key Laboratory of Plateau Surface Process and Ecological Conservation, Ministry of Education, Qinghai Normal University, Xining 810016, China

²

College of Geographical Science, Qinghai Normal University, Xining 810016, China

³

College of Computer Science, Qinghai Normal University, Xining 810016, China

⁴

Qinghai Institute of Salt Lakes, Chinese Academy of Sciences, Xining 810008, China

^*

Authors to whom correspondence should be addressed.

Quaternary 2026, 9(1), 6; https://doi.org/10.3390/quat9010006

Submission received: 29 September 2025 / Revised: 29 December 2025 / Accepted: 4 January 2026 / Published: 14 January 2026

(This article belongs to the Special Issue Environmental Changes and Their Significance for Sustainability)

Download

Browse Figures

Versions Notes

Abstract

Automated pollen identification has become an increasingly important tool for palaeoecological research; however, its application to fossil pollen assemblages remains challenging due to complex backgrounds, morphological variability, and taxonomic similarity among pollen types. In this study, we propose Pollen-YOLO, a deep learning-based object detection framework designed for automated pollen identification from microscopic images, and evaluate its performance using the TPPOL23 dataset. The model integrates a tailored backbone architecture with attention-based feature enhancement and class-specific data augmentation strategies to address the characteristics of fossil pollen images. Experimental results indicate that Pollen-YOLO achieves stable and competitive detection performance for most pollen taxa under the tested conditions, particularly for dominant taxa with distinctive morphological features. Model behavior is further examined through ablation experiments and Grad-CAM-based interpretability analysis, which provide insights into feature learning and classification mechanisms. The applicability of the framework is explored using a fossil pollen sequence from the Shaqu profile on the Tibetan Plateau. Automated results show a high level of agreement with manual identification in capturing major stratigraphic trends and vegetation succession patterns, while discrepancies persist for morphologically similar or low-abundance taxa. Overall, this study suggests that object detection-based deep learning approaches have the potential to support fossil pollen analysis and palaeoecological reconstruction. Rather than replacing expert identification, Pollen-YOLO is intended as a complementary, high-throughput tool that may assist large-scale pollen analysis under appropriate quality control when combined with expert verification.

Keywords:

pollen identification; deep learning; Pollen-YOLO; Tibetan Plateau; quaternary paleoecology; age–depth modeling

1. Introduction

The Tibetan Plateau, often referred to as the “Roof of the World” and the “Third Pole”, is one of the most climatically sensitive regions on Earth owing to its extreme elevation, complex topography, and strong interactions with the Asian monsoon system. Since the Quaternary, pronounced climatic oscillations, glacial–interglacial cycles, and stepwise uplift have continuously reshaped regional vegetation patterns, exerting far-reaching impacts on hydrology, atmospheric circulation, and ecosystem dynamics across Asia and beyond [1,2]. Reconstructing long-term vegetation and environmental changes on the Tibetan Plateau is therefore critical for understanding both regional ecological evolution and broader climate system behavior.

Fossil pollen preserved in sediments represents one of the most widely used and robust proxies for Quaternary palaeoecological reconstruction. Variations in pollen assemblages provide essential information on past vegetation composition, ecosystem structure, and climate-driven environmental changes [3]. However, the reliability of such reconstructions strongly depends on the accuracy, consistency, and scale of pollen identification. Traditional palynological analysis relies on manual microscopic identification by trained experts, a process that is labor-intensive, time-consuming, and inevitably affected by observer bias, particularly when dealing with large stratigraphic sequences or multi-site datasets.

These challenges are further amplified in high-altitude and high-energy depositional environments such as the Tibetan Plateau. Fossil pollen grains in plateau sediments commonly exhibit deformation, fragmentation, and mineral interference caused by aeolian abrasion, mechanical compression, and post-depositional alteration. Such conditions often obscure fine diagnostic features (e.g., apertures and exine ornamentation), making consistent and reproducible identification difficult even for experienced palynologists. As a result, many Quaternary palaeoecological studies in this region primarily rely on family- or genus-level pollen identification, which has proven sufficient and robust for reconstructing major vegetation types, functional groups, and long-term environmental trends.

In recent years, rapid advances in deep learning, particularly convolutional neural networks (CNNs), have opened new avenues for automated pollen recognition [4,5,6]. CNN-based approaches have demonstrated superior performance over traditional image-processing methods and have been successfully applied to pollen classification, detection, and counting tasks [7]. Object detection frameworks such as the You Only Look Once (YOLO) series further enhance efficiency by enabling simultaneous localization and classification of multiple pollen grains within a single image [8,9,10]. These developments highlight the potential of deep learning to support high-throughput and objective pollen analysis [11].

Nevertheless, most existing automated pollen identification studies focus on modern pollen samples, limited taxonomic scopes, or highly controlled imaging conditions. Their direct applicability to fossil pollen assemblages remains limited, particularly in complex sedimentary contexts where preservation quality, background noise, and morphological variability pose substantial challenges. Moreover, although species-level identification may be achievable for a small number of taxa with highly distinctive pollen morphology, such resolution is not consistently attainable across diverse fossil pollen groups and may reduce reproducibility when applied at scale.

Against this background, the primary aim of this study is to develop a scalable, reproducible, and palaeoecologically meaningful deep learning framework for automated fossil pollen identification, rather than to replace expert-based species-level taxonomy. We introduce Pollen-YOLO, an improved YOLOv11-based detection framework incorporating a ResNet-152 backbone with dual-path attention and dual-pooling mechanisms, specifically designed to enhance sensitivity to subtle morphological features such as apertures and exine ornamentation in fossil pollen images. To support model training and evaluation, we constructed a standardized microscopic image database of common Quaternary pollen taxa from the Tibetan Plateau (TPPOL23), integrating expert annotation and class-specific data augmentation strategies.

To indicate the applicability of the proposed framework in Quaternary palaeoecology, Pollen-YOLO was applied to fossil pollen assemblages from the Shaqu profile on the northeastern Tibetan Plateau. An independent chronological framework was established using optically stimulated luminescence (OSL) dating combined with Bayesian age–depth modeling, allowing direct comparison between model-identified and manually identified pollen assemblages. By focusing on taxonomic levels that are both reliable and palaeoecologically informative, this study aims to provide an efficient tool for large-scale pollen analysis and long-term vegetation reconstruction in complex sedimentary environments.

2. Study Area

The Tibetan Plateau (26°00′–39°47′ N, 73°19′–104°47′ E) is the largest and highest plateau in the world, with a mean elevation exceeding 4000 m and a total area of approximately 2.5 million km². Owing to its unique topographic configuration, it exerts a profound “heat pump effect” on the Asian monsoon system and regulates hydrothermal patterns across Central Asia, making it one of the most climatically sensitive regions on Earth [12]. Since the Quaternary, stepwise uplift combined with climatic oscillations and glacial processes has significantly reshaped the plateau’s environmental gradients, thereby influencing vegetation dynamics, hydrological cycles, and atmospheric circulation on both regional and global scales.

Modern vegetation on the plateau is diverse and includes coniferous forests, mixed coniferous–broadleaf forests, broadleaf forests, shrublands, grasslands, meadows, herbaceous vegetation, alpine vegetation, deserts, and cultivated vegetation (Figure 1) [13]. The spatial distribution of these vegetation types is primarily controlled by hydrothermal conditions: forests and shrublands dominate the monsoon-influenced southeast; alpine meadows and steppe prevail in the central plateau; deserts and cushion-like alpine vegetation are characteristic of the arid northwest; while cultivated vegetation is mainly confined to river valleys and the eastern margins [14].

This hydrothermal–vegetation gradient not only creates high ecological heterogeneity but also strongly influences pollen production, dispersal, and deposition, thereby shaping distinct assemblage signals in lake, peat, and loess sediments. In recent years, regional and continental pollen datasets from East Asia and the Tibetan Plateau, including both surface and fossil samples, have been established. These datasets have enabled quantitative reconstructions of vegetation and climate since the Holocene and the Last Deglaciation [15]. Lake pollen records from different climatic zones of the plateau demonstrate strong sensitivity to variations in precipitation and aridification events, while also revealing considerable spatial heterogeneity across basins and catchments.

It is worth noting that the complex depositional environments of the plateau (e.g., aeolian input, salinization, and enhanced clastic sedimentation) often degrade or obscure diagnostic morphological traits of fossil pollen, such as apertures and exine ornamentation, thus increasing the difficulty of both manual and computer-assisted identification. Therefore, clarifying the linkages between modern vegetation patterns, environmental gradients, and pollen signals provides the essential context for developing automated pollen recognition methods and constructing large-scale image datasets.

3. Materials and Methods

3.1. Laboratory Preparation of Historical Samples

The pollen samples used in this study were derived from three sources:

(1): Historical sediment profiles and core samples.

The samples were obtained from the Key Laboratory of Surface Processes and Ecological Conservation, Ministry of Education, during the 2016–2024 research program on “Human activities and environmental change on the Tibetan Plateau.” They cover stratigraphic and lacustrine sections from the eastern, central, and northern parts of the plateau. Specific study sites and profiles include the Daiqu Site [16], Laodaqiao Site [16], Donggicuona Lake [17], Shalongka [18], Nankanyan Site [19], Xiadawu Site [20], Eling Lake Site [21], and Zhongda Site [22]. The depositional facies mainly consist of lacustrine clay layers, overbank silt deposits, and aeolian loess horizons. Here, the term “sample” refers specifically to an individual stratigraphic sediment sample (i.e., a depth interval), which was subsequently processed into one or more microscopic slides for pollen analysis, rather than to individual pollen grains or images. Due to the limitations of historical data management, detailed lithological and chronological information of these profiles is summarized in Table 1 and Figure 2.

(2): Modern pollen reference slides.

A series of modern pollen standard slides were produced by our research group (Figure 3). Preparation followed internationally recognized palynological protocols, including pretreatment of samples, chemical removal of organic and mineral impurities, concentration of pollen grains, and permanent slide mounting. To improve comparability and long-term preservation, an improved preparation procedure was adopted to minimize interference from clastic and organic debris.

(3): Open access pollen resources.

We additionally utilized the Atlas of Common Pollen Morphology of Plants in Eastern Tibet (2020–2025), released by the National Tibetan Plateau Data Center, citing both the associated dataset publication and the online data release [23,24].

For quality assurance, the first two categories of samples strictly adhered to the following criteria:

(1): Field documentation confirmed exclusion of modern bioturbation layers;
(2): Microscopic inspection verified that ≥75% of pollen grains retained intact exine ornamentation.

3.1.1. Description of the Natural Profile and Sampling Strategy

The studied natural profile is located in Jiuzhi County, Golog Tibetan Autonomous Prefecture, eastern Tibetan Plateau (33°12′ N, 101°35′ E; 3820 m a.s.l.), and is hereafter referred to as the Shaqu profile. It is exposed on the sidewall of a river valley and reveals a 2.0 m thick sequence of continuous fluvial and aeolian loess deposits (Figure 4). Based on soil sedimentary features and color, three lithological units were distinguished: 0–20 cm of surface soil, 20–60 cm of peat layer, and 60–200 cm of aeolian loess. Pollen samples were collected from 20 cm below the surface at 10 cm intervals downward, yielding a total of 18 samples. These samples were used both for manual identification and for cross-validation of model performance in pollen recognition. The sampling strategy follows standard palynological field protocols.

3.1.2. OSL Dating and Age–Depth Modeling

To establish a reliable chronological framework for the Shaqu profile, six optically stimulated luminescence (OSL) samples were collected from depths of 40–200 cm (SQ-0 to SQ-5). Quartz grains (63–90 µm) were extracted following standard chemical pretreatment procedures, including removal of carbonates with 10% HCl, oxidation of organic matter with 30% H₂O₂, etching with 40% HF for 40 min to remove the outer α-irradiated layer, and purity screening (>95%) by infrared stimulation. Equivalent dose (De) measurements were carried out using the single-aliquot regenerative-dose (SAR) protocol on a Risø TL/OSL-DA-20 automated luminescence reader (DTU Nutech, Roskilde, Denmark) at the Key Laboratory of Plateau Environmental Processes and Ecological Restoration, Qinghai Normal University [25]. Environmental dose rates were determined from U, Th, and K concentrations measured by high-purity germanium γ-spectrometry at the Xi’an Center of Mineral Resources Testing, Ministry of Natural Resources. Cosmic-ray contributions, water content, and the α-efficiency factor (α = 0.04 ± 0.02) were also included in dose–rate calculations.

The obtained OSL ages range from 4485 ± 320 yr BP at 40 cm (SQ-0) to 12,115 ± 780 yr BP at 200 cm (SQ-5), covering an overall timespan of approximately 12–4 ka BP. These ages were subsequently incorporated into a Bayesian age–depth model using the Bacon 2.5 package in R [26,27]. Markov Chain Monte Carlo (MCMC) simulations produced posterior distributions of accumulation rates and calibrated ages, yielding both mean estimates and 95% confidence intervals. The resulting age–depth model indicates relatively stable sedimentation rates in the lower and middle sections of the profile, with greater variability in the upper layers. This chronological framework provides the temporal basis for pollen analysis and palaeoecological reconstruction.

3.1.3. Laboratory Treatment of Natural Profile Pollen Samples

For clarity, it should be noted that three different hierarchical units are involved in this study: (i) stratigraphic sediment samples, (ii) microscopic images, and (iii) individual pollen grains. Stratigraphic samples refer to discrete sediment depth intervals collected from the studied sedimentary sections and processed for pollen extraction. Microscopic images were acquired from prepared slides under standardized optical conditions, and each image may contain multiple pollen grains. Individual pollen grains were manually annotated within each image and constitute the fundamental objects used for model training and evaluation. A total of 18 soil samples were selected for laboratory pollen preparation. Approximately 60 g of dried sediment was processed for each sample, with one Lycopodium clavatum spore tablet (containing 10,315 ± standard deviation spores; Batch #BC-1023) added as a marker for calculating pollen concentration and recovery. Fossil pollen was concentrated using a combination of acid–alkali digestion and heavy liquid separation procedures. Carbonates were removed using 10% hydrochloric acid (HCl, 24 h), and silicate minerals were dissolved with two successive treatments of 40% hydrofluoric acid (HF, 48 h each). Residues were rinsed with deionized water and centrifuged (3000× g rpm, 5 min) until neutral pH (7.0 ± 0.5) was achieved. Organic matter was oxidized by treating with 10% potassium hydroxide (KOH, 70 °C water bath, 30 min) to disperse colloids, followed by 30% hydrogen peroxide (H₂O₂, 24 h, room temperature) to remove humic substances. Grain-size fractionation was performed by passing the residues through a 7 µm nylon sieve to retain pollen grains. Ultrasonic treatment (40 kHz, 10 min) was applied to eliminate clay particles. The purified residues were then mounted and stored in 15 mL of 50% glycerol solution, sealed, and refrigerated at 4 °C until analysis. This workflow is consistent with internationally accepted palynological preparation protocols and incorporates improvements for optimizing pollen preservation and recovery [28,29].

Based on historical laboratory samples (Section 3.1), this study selected 23 common Quaternary fossil pollen taxa from the Tibetan Plateau to construct a pollen image database. Bright-field imaging was performed using an Olympus CX31 biological microscope (Olympus Corporation, Tokyo, Japan) equipped with a 40× objective lens (NA = 0.95) and a resolution of 2048 × 2048 pixels. Tests indicated that under this imaging condition, key micro-morphological features such as germination apertures could be effectively captured. A total of 1129 original microscopic images were acquired, within which 6616 individual pollen grains were manually identified and annotated by experienced palynologists (Figure 5).

Because pollen grains may undergo physical deformation during natural deposition and laboratory processing, and their apparent morphology may vary under the microscope due to orientation and focal differences (e.g., equatorial vs. polar views), class-specific data augmentation strategies were applied to enhance model robustness and generalization [30]. A total of 1129 original microscopic images were used as the source material, within which individual pollen grains were manually annotated by experienced palynologists. Based on these grain-level annotations, image-level data augmentation was subsequently performed using geometric and optical transformations, including flipping, color adjustment, scaling, cropping, Gaussian blur, and perspective transformation [31,32,33] (Figure 6). In this study, the term “original image count” (Table 2) refers to the number of individual pollen grains manually annotated from the original microscopic images for each taxon, rather than to the number of augmented training samples. As a result, the augmented dataset comprised 25,796 images (Table 2), and the coefficient of variation (CV) of sample distribution decreased from 0.618 to 0.159, indicating a substantial improvement in class balance among the 23 pollen taxa (Equation (1)).

To ensure reliability, 20% of the augmented images from each class were randomly selected for morphological validation by a panel of three palynological experts, and anomalous samples were excluded [34]. The final outcome is the Tibetan Plateau Quaternary Pollen Image Database (TPPOL23), consisting of expert-verified, manually annotated images (Figure 7). This dataset of 25,796 pollen images was subsequently used for deep learning model training.

The Coefficient of Variation (CV) is a statistical measure of relative variability [35]. It is calculated as the ratio of the standard deviation to the mean of a dataset. The formula is expressed as:

C V = \frac{\sqrt{\frac{1}{n - 1} \sum_{i = 1}^{n} {(x_{i} - μ)}^{2}}}{\frac{1}{n} \sum_{i = 1}^{n} x_{i}}

(1)

where

C V

: Coefficient of Variation—a dimensionless measure of relative dispersion;

n

: Number of samples (i.e., the total number of data points or categories);

x_{i}

: The value of the

i^{t h}

sample (e.g., the number of augmented images in class

i

).

μ

: The sample mean, calculated as

μ = \frac{1}{n} \sum_{i = 1}^{n} x_{i}

;

\sum_{i = 1}^{n} {(x_{i} - μ)}^{2}

: Sum of squared differences from the mean, used to compute variance.

\sqrt{\frac{1}{n - 1} \sum_{i = 1}^{n} {(x_{i} - μ)}^{2}}

: Sample standard deviation, measuring the absolute dispersion of the data.

3.2. Model Improvements Based on YOLOv11

To address the characteristics of fossil pollen images from sedimentary deposits—namely, the subtlety of germinal aperture structures, the diversity of exine ornamentation, and the complexity of morphological distortion—this study proposes a deep residual network with dual-path attention, termed PollenSENet, which replaces the original backbone of YOLOv11. PollenSENet is built upon the ResNet-152 architecture, incorporating an improved channel–spatial collaborative attention module (Dual-Path Attention Block, DPAB) and an adaptive multi-scale pooling strategy [36,37]. These modifications significantly enhance the network’s capacity to capture key morphological features in microscopic pollen images.

3.2.1. Compression Operation

In feature extraction for pollen micro-images, the objective of the compression operation is to achieve efficient dimensionality reduction and weight allocation, thereby enhancing the model’s sensitivity to key discriminatory traits such as subtle germinal-aperture structures and the diversity of exine ornamentation. Unlike the traditional SENet formulation that relies on global average pooling (GAP), we introduce a dual-modal pooling strategy that combines Soft Pooling and Global Max Pooling (GMP) to mitigate the loss of local saliency that GAP can induce during feature compression [38,39]. Our subsequent experiments show that this strategy markedly improves feature retention and model generalization in pollen classification. Specifically, germinal-aperture attributes (e.g., aperture size and position) and ornamentation types (e.g., reticulate or spiny textures) often produce highly localized responses that GAP tends to homogenize. To address this, PollenSENet incorporates Soft Pooling, which dynamically evaluates the importance of each spatial position via exponential weighting. For an input feature map

U \in R^{H \times W \times C}

, the weight for activation

x_{i}

is given by the Softmax function:

w_{i} = \frac{e^{x_{i}}}{\sum_{j \in U} e^{x_{j}}}

(2)

thereby amplifying high-response activations and directing attention to salient pollen regions (e.g., aperture edges).

Based on the weights calculated by Equation (2), each weight

w_{i}

is multiplied elementwise with the original activation

x_{i}

, and the results are summed to generate a compressed channel descriptor vector

y_{c}

=

R^{1 \times 1 \times C}

(Equation (3)).

y_{c} = x^{'} = \sum_{i \in U} w_{i} \times x_{i}

(3)

Compared with GAP, Soft Pooling exhibits a markedly reduced loss of discriminative information in pollen datasets, particularly in preserving delicate structural details such as the radial striations observed in Asteraceae pollen.

To compensate for the potential insensitivity of Soft Pooling to sparse features (e.g., isolated spiny structures typical of alpine steppe pollen), PollenSENet concurrently introduces Global Max Pooling (GMP), which enhances spatial invariance. For each channel of the feature map

U

, t the channel-wise descriptor is computed as

z_{m a x, c} = \underset{(h, w)}{m a x} U_{h, w, c}, Z_{m a x} = {[\begin{matrix} z_{m a x, 1}, \dots, z_{m a x, C} \end{matrix}]}^{⊤} \in R^{1 \times 1 \times C}

(4)

By extracting local feature extrema, GMP strengthens the model’s robustness against morphological distortion (e.g., ruptured ornamentation due to diagenetic processes) while maintaining spatial invariance. Finally, the outputs of Soft Pooling (

y_{c}

) and GMP (

Z_{m a x}

) are concatenated and adaptively fused according to:

Z_{f u s e d} = α \cdot y_{c} + (1 - α) \cdot z_{m a x}, α \in [0, 1]

(5)

where

α

is a learnable parameter optimized during backpropagation. The fused descriptor

Z_{f u s e d}

thus combines the detail sensitivity of Soft Pooling with the noise resistance of GMP, effectively reducing feature dimensionality while enhancing the extraction of key morphological traits. This design provides a robust foundation for the subsequent excitation and calibration operations.

3.2.2. Excitation Operation of PollenSENet

The subtle variations in germinal aperture structures and the diversity of exine ornamentation in pollen micro-images necessitate a mechanism that can dynamically recalibrate channel weights and apply spatial modulation to strengthen discriminative feature responses [40,41]. To this end, PollenSENet embeds a Dual-Path Excitation Block (DPEB) into the residual units of ResNet-152, consisting of two synergistic branches: channel recalibration and spatial attention (Figure 8). The channel descriptor vector

Z \in R^{1 \times 1 \times C}

, obtained from the compression stage, serves as the input to this module, where

C

is the number of channels.

The channel path employs a bottleneck structure that reduces and then restores dimensionality, using LeakyReLU and Hard-Sigmoid activations to obtain a sparse yet stable channel-weight vector:

Z_{m i d} & = ϕ (W_{1} Z + b_{1}), W_{1} \in R^{\frac{C \times C}{r}}, ϕ = L e a k y R e L U (0.2)

(6)

A_{c h a n n e l} & = σ_{h} (W_{2} Z_{m i d} + b_{2}), W_{2} \in R^{C \times \frac{C}{r}}, σ_{h} - S i g m o i d

(7)

with reduction ratio

r = 16

. The resulting channel-weight vector is broadcast to the spatial dimension and multiplied with the original feature map

U \in R^{H \times W \times C}

:

U_{r e c a l i b r a t e d} = A_{c h a n n e l} * U

(8)

This recalibration enhances high-weight channels corresponding to germinal-aperture regions, while suppressing low-weight channels such as background sediment. The learned channel-attention distribution is visualized in Figure 9, where the left panel shows the input pollen image and the right panel displays

A_{c h a n n e l}

, highlighting apertures and exine ornamentation.

To further capture local details of exine ornamentation (e.g., reticulate meshes, spiny protrusions) and improve robustness to morphological distortion, the spatial branch applies a depthwise convolution (3 × 3) to

U_{r e c a l i b r a t e d}

, yielding a spatial attention map:

M_{spatial} = σ (f_{dw}^{3 \times 3} (U_{recalibrated})), M_{spatial} \in R^{H \times W}

(9)

which is then fused with the recalibrated features:

U_{modulated} = U_{recalibrated} + Broadcast (M_{spatial})

(10)

Here,

σ

is the Sigmoid function. This dual-path mechanism (channel + spatial) enables dynamic optimization across both dimensions, thereby suppressing noise while preserving discriminative details. The properties of the activation functions (ReLU, LeakyReLU, Sigmoid, Hard-Sigmoid) are summarized in Figure 10 to support the design rationale of Equations (6) and (7).

Through the excitation operation, PollenSENet significantly enhances its ability to distinguish fine-scale pollen features, establishing a high-confidence feature representation for subsequent adaptive calibration.

3.2.3. Adjustment Operation of PollenSENet

On the basis of the multi-scale feature maps output from the excitation module (Figure 11), the goal of the adjustment operation is to perform dynamic weight allocation and feature fusion [42]. This design optimizes the model’s adaptability to morphological distortions in fossil pollen (e.g., exine folding or partial breakage caused by diagenesis) while simultaneously enhancing classification specificity for fine-grained structures such as germinal apertures and ornamentation edges.

Specifically, three groups of feature maps

[X_{1}^{'}; X_{2}^{'}; X_{3}^{'}] \in R^{1 \times 1 \times C}

, corresponding to different receptive scales of germinal-aperture and ornamentation features, are processed by a lightweight

1 \times 1

convolution to produce a weight matrix:

W = {Conv}_{1 \times 1}! ([X_{1}^{'}; X_{2}^{'}; X_{3}^{'}]), W \in R^{3 \times C}

(11)

The channel-wise importance of each scale is then normalized using an independent Softmax function per channel (scale attention), following the channel–spatial attention paradigm:

w_{k, c} = \frac{e^{W_{k, c}}}{\sum_{i = 1}^{3} e^{W_{i, c}}}, k = 1, 2, 3

(12)

Finally, the fused multi-scale representation is obtained as:

X_{fused} = \sum_{k = 1}^{3} w_{k, c} \cdot X_{k}^{'}, X_{fused} \in R^{1 \times 1 \times C}

(13)

To handle the common issue of fossil pollen morphological distortion, a deformable feature calibration is applied. A lightweight MLP predicts the sampling offset

Δ p \in R^{3 \times 3 \times 2}

, which is used in a deformable convolution:

X_{calibrated} = DeformConv (X_{fused}, Δ p)

(14)

The calibrated features are then merged with the original input via a residual connection to stabilize training under the ResNet backbone:

X_{final} = X_{calibrated} + X_{input}

(15)

The output

X_{final}

serves as the input to the subsequent ResNet-152 layers for deeper feature extraction. This operation ensures robustness to deformed or broken exine structures while preserving the discriminability of fine-scale features.

3.3. Integration of PollenSENet with the YOLOv11 Framework

After the proposed adjustments and improvements, PollenSENet (i.e., ResNet-152 integrated with SE-based dual-path attention) was adopted as the backbone of YOLOv11, forming the fused model Pollen-YOLO. In this configuration, the original YOLOv11 backbone was replaced by PollenSENet, while the Neck (PANet) and Head (Detection Head) modules were preserved to ensure the stability of the detection framework. As illustrated in Figure 12, PollenSENet outputs multi-scale feature maps (C3, C4, and C5), which are subsequently fed into PANet for feature fusion. The fused features are then passed to the YOLO Head for bounding-box regression and class prediction.

Experimental evaluations indicate that Pollen-YOLO achieves higher detection accuracy and better generalization ability in Quaternary pollen identification tasks compared to the vanilla YOLOv11 baseline. These results verify the effectiveness of integrating PollenSENet with in the YOLOv11 framework, particularly in handling the morphological complexity of fossil pollen grains.

4. Results

4.1. Model Training Environment and Parameter Settings

In this study, the deep learning models were deployed locally on a 64-bit Windows 11 operating system. The training framework was PyTorch (version 2.1.0), running on an NVIDIA GeForce RTX 4070 Ti (12 GB) GPU with CUDA version 11.2. The implementation was written in Python 3.9, and code development and debugging were performed in PyCharm 2023.3.7.

To ensure experimental consistency, all models were trained without pre-trained weights [43]. The parameter settings were standardized across experiments, as summarized in Table 3.

4.2. Evaluation Metrics

The model performance was evaluated using five standard metrics: loss function, precision, recall, mean average precision (mAP), and the F1 score (Table 4).

4.3. Ablation Study

To evaluate the effectiveness of the proposed improvements in fossil pollen classification, an ablation study was conducted on the TPPOL23 dataset, with the results systematically recorded for each experiment. YOLOv11 with its original backbone was used as the baseline model. Using a controlled variable approach, we gradually incorporated the dual-modal pooling strategy, the improved excitation module, and the multi-scale adjustment operation, and compared their performance against the standard SE module.

All experiments were carried out under identical conditions to ensure fairness: input image size of 640 × 640, initial learning rate of 0.001 with the Adam optimizer, batch size of 16, and 200 training epochs. Performance evaluation employed standard metrics including precision, recall, F1-score, mAP@0.5, and mAP@0.5:0.95.

Table 5 summarizes the results of the ablation study. The baseline YOLOv11 achieved precision of 78.32%, recall of 88.20%, F1-score of 83.26%, mAP@0.5 of 80.56%, and mAP@0.5:0.95 of 70.25%. Replacing the backbone with ResNet-152 led to moderate improvements, particularly in recall and mAP. Introducing the proposed dual-modal pooling strategy further boosted the F1-score by 3.75%, demonstrating its effectiveness in feature fusion. Incorporating the standard SE attention improved precision but yielded limited gains in mAP@0.5:0.95. In contrast, the proposed Dual-Path Excitation Block (DPEB) provided a more balanced enhancement in precision and recall, raising the F1-score to 91.25%. Finally, the complete PollenSENet (our full model) achieved the best overall performance, with precision of 90.67%, recall of 93.16%, F1-score of 91.92%, mAP@0.5 of 95.29%, and mAP@0.5:0.95 of 75.79%.

4.4. Comparison with Other Models

Comparison experiments were conducted between the proposed Pollen-YOLO and several mainstream object detection models on the TPPOL23 dataset. The compared models included YOLOv3, YOLOv4, YOLOv7, and the lightweight EfficientDet-D3 [44,45,46,47]. All models were trained and validated using the same hyperparameter settings [48]. Performance was evaluated using precision, recall, F1-score, mAP@0.5, and mAP@0.5:0.95.

The quantitative comparison results are presented in Table 6. Among the baseline models, the YOLO series exhibited relatively strong detection performance, with YOLOv7 achieving the highest mAP values [49]. The proposed Pollen-YOLO achieved the highest performance across all evaluated metrics, with a precision of 90.7%, recall of 93.2%, F1-score of 91.9%, mAP@0.5 of 95.3%, and mAP@0.5:0.95 of 75.8%.

4.5. Full-Scale Evaluation of Pollen-YOLO

Full-scale experiments were conducted on the TPPOL23 dataset to evaluate the overall performance of Pollen-YOLO [50]. During training and validation, changes in precision and recall were monitored across epochs. Both metrics increased progressively from initial low values and reached higher levels toward the later stages of training, as shown in the evaluation curves.

4.5.1. Training and Validation Losses

During full-scale training, the training and validation losses decreased across epochs. As shown in Figure 13, the training losses (box_loss, cls_loss, and dfl_loss) decreased from 4.79, 11.66, and 3.59 in the first epoch to 0.74, 0.42, and 0.86 at epoch 200. Correspondingly, the validation losses declined from 2.95, 5.28, and 2.17 to 0.89, 0.56, and 0.93 over the same training period. Similar decreasing trends were observed for both training and validation loss components.

4.5.2. Model Performance Metrics

During full-scale training, changes in precision and recall were monitored across 200 epochs (Figure 14a,b). Precision increased from 0.23 at epoch 1 to 0.90 at epoch 200, while recall increased from 0.056 to 0.94 over the same period. The average precision metrics also exhibited increasing trends during training. The mAP@0.5 value rose from 0.01019 at epoch 1 to 0.9529 at epoch 200 (Figure 14c). The mAP@0.5:0.95 curve showed larger fluctuations during early epochs and gradually stabilized toward the later stages of training (Figure 14d).

The F1-score evolution during training is shown in Figure 15. The F1-score increased from 0.09 at epoch 1 to 0.70 within the first 20 epochs. Between epochs 20 and 60, the F1-score varied within the range of 0.60–0.83. From epoch 70 onward, the F1-score remained above 0.85 and continued to increase, reaching approximately 0.88 at epoch 100. In the later training stage (epochs 150–200), the F1-score stabilized and reached 0.919 at epoch 200. The average inference time of Pollen-YOLO on the TPPOL23 validation set was 3.1 ms per image, excluding input/output operations.

4.5.3. Class-Wise Performance on the Validation Set

Figure 16 illustrates the recognition performance of Pollen-YOLO on the validation set of the TPPOL23 dataset (5159 images), displayed as a heatmap in percentage form, where deeper blue indicates higher recognition accuracy for pollen types. These results are calculated based on the test dataset, which comprises 20% of the expert-annotated pollen grains randomly selected from the augmented TPPOL23 dataset and was not used during model training. On the validation set, Pollen-YOLO achieved robust overall performance (Precision = 0.906, Recall = 0.932, mAP@0.5 = 0.953, mAP@0.5:0.95 = 0.761), indicating a good balance between precision and recall. Notably, pollen taxa such as Juglandaceae, Caprifoliaceae, Picea, and Poaceae showed the most outstanding detection results, with Precision and Recall values approaching or exceeding 0.95, with almost no observable omissions. By contrast, taxa such as Lamiaceae, Ranunculaceae, and Ulmus exhibited relatively weaker performance, with mAP@0.5:0.95 only in the 0.5–0.6 range, reflecting the difficulty in distinguishing morphologically similar pollen groups. The confusion matrix further revealed the primary sources of misclassification for these weaker taxa, such as overlaps between Lamiaceae and Apiaceae or between Ranunculaceae and Ulmus, which led to classification errors. Overall, Pollen-YOLO achieved nearly perfect detection results for most taxa, while maintaining rapid inference speed (approximately 3.1 ms per image), underscoring its potential for large-scale applications in Quaternary palynology.

Table 7 summarizes the quantitative evaluation metrics of Pollen-YOLO for each pollen taxon, including Precision, Recall, mAP@0.5, mAP@0.5:0.95, and F1-score. Overall, most taxa achieved Precision and Recall values above 0.90, with F1-scores stabilizing around 0.90, demonstrating balanced performance in single-class pollen recognition. Among them, Juglandaceae, Lycopodiaceae, Picea, and Caprifoliaceae achieved nearly perfect scores (F1-score > 0.96), indicating that the model is highly reliable in detecting and classifying these pollen types. In contrast, taxa such as Lamiaceae, Ranunculaceae, and Ulmus exhibited lower mAP@0.5:0.95 values (approximately 0.51–0.59), and their F1-scores were noticeably lower (<0.90), highlighting these groups as the weak points of the model.

Figure 17 presents examples of YOLO-POLLEN classification results on the TPPOL23 validation dataset. The examples include pollen grains observed under different imaging conditions, such as variations in illumination (bright-field and dark-field), focal clarity, and background complexity. In some cases, pollen grains appear partially visible or exhibit reduced contrast with the surrounding background. The detected bounding boxes indicate that pollen grains are identified across a range of visual conditions, including images containing background debris and laboratory residues.

4.5.4. Grad-CAM Visualization and Interpretability Analysis

Deep learning models for image recognition are often regarded as “black boxes,” with their internal reasoning processes remaining opaque [51]. While the YOLO-POLLEN model showed/exhibited strong performance in pollen classification, its decision-making pathway differs fundamentally from that of experienced palynologists, who rely on direct morphological cues such as size, exine ornamentation, and aperture structures. With the advancement of explainable AI, several techniques have been developed to visualize model attention and interpret feature contributions, including CNN feature visualization, dimensionality reduction methods such as t-SNE and UMAP, as well as perturbation-based saliency approaches such as LIME and SHAP [52,53,54,55,56]. Compared to these perturbation-based methods, Grad-CAM provides more intuitive and spatially coherent localization, making it particularly effective for highlighting class-discriminative regions [57,58].

By applying Grad-CAM to Pollen-YOLO, we reveal how the model focuses on biologically meaningful areas during classification. In full microscope images containing both pollen grains and background noise, high-response regions (shown in red) are concentrated on pollen structures, while background and debris remain in low-response regions (blue), indicating that the model has effectively learned to distinguish pollen from non-pollen information (Figure 18). When applied to individual pollen grains, Grad-CAM highlights critical morphological traits such as germinal apertures and exine ornamentation (Figure 19). These results confirm that Pollen-YOLO not only localizes pollen accurately but also captures discriminative structural features consistent with palynological expertise, thereby validating its biological interpretability and enhancing trust in its predictions.

4.6. Application to the Shaqu Profile

An age–depth model for the Shaqu profile was established based on six optically stimulated luminescence (OSL) dates (Figure 20). The dated horizons range from SQ-0 (4485 ± 320 yr BP at 40 cm) to SQ-5 (12,115 ± 780 yr BP at 200 cm), indicating that the sedimentary sequence spans approximately 12–4 ka BP. The age–depth model shows relatively stable sedimentation rates in the lower (180–200 cm) and middle (124–130 cm) sections of the profile, whereas the upper section (40–60 cm) exhibits greater variability (Figure 20).

Based on this chronological framework, pollen concentration diagrams were constructed for both manually identified and model-identified assemblages to enable direct comparison (Figure 21A,B). The two diagrams display broadly consistent stratigraphic patterns, including herbaceous dominance during the early stage, increased representation of coniferous taxa in the middle stage, and higher proportions of shrub taxa in the late stage [59]. For dominant taxa such as Poaceae, Picea, and Chenopodiaceae, the stratigraphic concentration trends derived from automated recognition closely match those obtained by manual identification (Figure 21).

To quantitatively assess differences between the two identification approaches, normalized relative errors were calculated and visualized as a heatmap (Figure 22). For most dominant taxa, relative errors remain low across the majority of stratigraphic levels. In contrast, larger relative errors are concentrated in morphologically similar taxa and in taxa occurring at low abundances. Notable discrepancies are observed for taxonomic pairs prone to morphological confusion (e.g., Betulaceae versus Corylus), as well as for rare taxa that are occasionally overestimated or omitted in the model-identified assemblages (Figure 22).

5. Discussion

5.1. Mechanisms Underlying Model Performance

The overall strong performance of the Pollen-YOLO framework can be attributed to the combined effects of architectural design, tailored data augmentation, and attention-based feature enhancement. By adopting an object detection paradigm rather than a classification-only strategy, the model is able to simultaneously localize and classify pollen grains in complex microscopic images containing impurities, overlapping particles, and heterogeneous backgrounds [61]. This design choice is particularly important for fossil pollen samples, where idealized single-grain images are rarely available.

The incorporation of class-specific data augmentation further contributes to model robustness. Fossil pollen grains frequently exhibit deformation caused by natural deposition, laboratory processing, and variable observation angles under the microscope. Augmentation strategies designed according to the morphological characteristics of different pollen taxa enable the model to better generalize across such variability, thereby reducing sensitivity to orientation, scale, and focal differences. This mechanism partially explains the stable recognition performance observed across a wide range of taxa.

Performance heterogeneity among pollen taxa highlights the inherent limitations imposed by pollen morphology. Taxa with distinctive ornamentation or aperture features are more readily recognized, whereas morphologically similar groups remain challenging. Explainability analysis based on Grad-CAM indicates that, for highly confusable taxa, the model tends to focus on overlapping surface structures, leading to classification ambiguity. These results suggest that the current model performance is constrained not only by network architecture but also by the intrinsic morphological similarity among certain pollen types.

The mechanisms discussed above are supported by quantitative training dynamics and ablation results. The convergence behavior and performance stability of the model are illustrated by the precision, recall, mAP, and F1-score trends shown in Figure 14 and Figure 15, while class-wise recognition differences are summarized in Figure 16 and Table 7. Ablation experiments (Table 5) show the quantitative contributions of individual architectural components, while Grad-CAM visualizations (Figure 18 and Figure 19) provide insights into the interpretability of the model’s feature learning. Several pathways may further improve discrimination capability in future work. From a data perspective, expanding both the quantity and diversity of annotated fossil pollen samples would allow the model to better capture subtle inter-taxon differences. From a methodological perspective, integrating more advanced multi-scale attention mechanisms or incorporating morphological priors such as pollen size and shape descriptors may enhance feature separability. Finally, from an application standpoint, combining automated recognition with expert verification offers a practical human–AI collaborative framework, in which high-throughput efficiency is complemented by domain expertise to ensure analytical reliability.

5.2. Implications of the Shaqu Application for Palaeoecological Reconstruction

The Shaqu case study indicates that automated pollen recognition can reproduce the primary stratigraphic signals for Quaternary vegetation reconstruction at a level comparable to expert manual identification. These palaeoecological comparisons are visually summarized in Figure 21 and Figure 22, which provide a direct stratigraphic evaluation of agreement and deviation between manual and model-based pollen identification. The close agreement for dominant taxa is particularly important because these taxa largely determine the first-order structure of pollen assemblages and, consequently, the main vegetation succession trends [62,63].

Discrepancies are concentrated in morphologically similar taxa and in taxa occurring at very low abundances (Figure 22), indicating that the practical reliability of automated recognition is taxon- and abundance-dependent. This pattern suggests that model-based identification is well suited for high-throughput reconstruction of major vegetation changes, while fine-grained interpretation of rare or highly confusable taxa may still benefit from targeted expert verification [1,13,64]. Therefore, automated recognition and manual analysis should be considered complementary: the model provides efficiency and objectivity at scale, whereas expert assessment provides calibration and quality control for challenging taxa.

5.3. Comparison with Existing Pollen Recognition Approaches

Automated pollen recognition has been investigated for more than two decades, with methodological paradigms evolving from handcrafted feature extraction to deep learning–based approaches [65]. Early studies typically relied on manually designed geometric or ornamentation descriptors combined with conventional classifiers such as support vector machines and random forests. Although these methods achieved moderate accuracy under controlled conditions, their strong dependence on expert-defined features limited their robustness and scalability when applied to large, taxonomically diverse fossil pollen datasets.

With the adoption of convolutional neural networks (CNNs), automated pollen recognition achieved substantial performance gains, particularly on standardized single-pollen image datasets [66]. However, most CNN-based studies focused on classification-only tasks and required prior manual segmentation or cropping of individual pollen grains [4]. As a result, these approaches did not explicitly address the challenges posed by real microscopic samples, including overlapping grains, impurities, and heterogeneous backgrounds. Consequently, reported accuracies from such studies are not directly comparable with results obtained from object detection frameworks operating on raw fossil pollen images [67].

By integrating localization and classification into a unified framework, we advance existing approaches; evaluation against representative models under identical conditions (Table 6) shows that our Pollen-YOLO achieves consistently superior performance across multiple metrics. More importantly, the model’s effectiveness is further supported by its application to a stratigraphic fossil pollen sequence, where automated results closely reproduce manually identified assemblage patterns (Figure 21) and exhibit low relative errors for dominant taxa across most stratigraphic levels (Figure 22). These results provide empirical evidence that object detection-based pollen recognition can operate reliably beyond idealized experimental settings.

Despite these advances, limitations remain when distinguishing morphologically similar or low-abundance taxa. As illustrated by the error distributions in Figure 22, confusion persists between taxa with overlapping morphological characteristics (e.g., Betulaceae versus Corylus), indicating that the discriminative capacity of automated models is constrained by intrinsic pollen morphology. In this respect, approaches based on deep feature embedding and clustering may offer complementary insights into inter-taxon relationships, although they are often less suited to high-throughput detection in complex images [68,69].

Overall, rather than replacing existing classification-based systems, Pollen-YOLO represents a complementary methodological pathway tailored to fossil pollen analysis under realistic laboratory and archeological conditions. By prioritizing robust detection in complex microscopic images and emphasizing genus- or family-level identification, the proposed framework aligns with the practical requirements of Quaternary palaeoecological reconstruction, where dominant taxa and long-term vegetation trends are of primary interpretative importance.

6. Conclusions

This study presents Pollen-YOLO, a deep learning-based object detection framework for automated pollen identification, and evaluates its performance on fossil pollen images from the Tibetan Plateau. The results indicate that, under the experimental settings adopted in this study, the proposed framework can achieve stable and reliable detection performance for most dominant pollen taxa, while maintaining high computational efficiency. Ablation analysis and interpretability experiments suggest that architectural design choices, attention-based feature enhancement, and class-specific data augmentation contribute to the observed model performance. At the same time, recognition accuracy remains constrained for morphologically similar or low-abundance taxa, reflecting intrinsic limitations associated with pollen morphology rather than model design alone. Application to the Shaqu fossil pollen sequence shows that automated recognition is able to reproduce major stratigraphic patterns and vegetation succession trends derived from manual identification within the examined profile. Taken together, these results suggest that deep learning–based object detection approaches have practical potential for assisting palaeoecological reconstruction, particularly in large-scale or time-intensive pollen analyses. Importantly, automated pollen identification should be regarded as a complementary tool rather than a substitute for expert analysis. Future work may focus on expanding annotated fossil pollen datasets, improving discrimination of morphologically similar taxa, and further integrating automated recognition with expert validation workflows to enhance reliability and interpretability in palaeoecological applications.

Author Contributions

Conceptualization, X.S. and G.H.; Methodology, X.S. and F.W.; Software, X.S.; Validation, X.S., G.H. and H.L.; Formal analysis, X.S.; Investigation, X.S.; Resources, G.H.; Data curation, X.S.; Writing—original draft preparation, X.S.; Writing—review and editing, G.H., F.W. and H.L.; Visualization, X.S.; Supervision, G.H.; Project administration, G.H.; Funding acquisition, G.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (NSFC), Grant No. 42571192, “Lithic technology and its implications for human dispersal and ecological adaptation in the Qaidam Basin since the Late Pleistocene”.

Data Availability Statement

The codes and part of the dataset used in this study are openly available at GitHub: https://github.com/zxht0878/Pollen-YOLO- (accessed on 3 January 2026). Due to size limitations, only a subset of the raw pollen images is included; the full dataset is available from the corresponding author upon reasonable request.

Acknowledgments

The authors would like to thank Haicheng Wei (Qinghai Institute of Salt Lakes, CAS) for his guidance in database construction, Hongyu Li for his support in database development, and Fubo Wang (School of Computer Science, Qinghai Normal University) for his valuable suggestions on model design and improvement. In addition, Lin Bing also offered some suggestions regarding language usage. Part of the dataset used in this study was derived from the Atlas of Common Pollen Morphology in the Eastern Tibetan Plateau (2020–2025) provided by the National Tibetan Plateau/Third Pole Environment Data Center (http://data.tpdc.ac.cn accessed on 3 January 2026).

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

OSL	Optically Stimulated Luminescence
SAR	Single-Aliquot Regenerative-dose protocol
YOLO	You Only Look Once
CNN	Convolutional Neural Network
mAP	mean Average Precision
SE	Squeeze-and-Excitation
PANet	Path Aggregation Network
TPPOL23	Tibetan Plateau Pollen Dataset

References

Han, Y.; Liu, L.; Sun, Z.; Hou, J.; Cao, X. A Synthesis of Pollen Sequences from the Central-Western Tibetan Plateau Reveals Regional Aridification since the Middle to Late Holocene. Glob. Planet. Change 2026, 256, 105195. [Google Scholar] [CrossRef]
Herzschuh, U. Legacy of the Last Glacial on the Present-day Distribution of Deciduous versus Evergreen Boreal Forests. Glob. Ecol. Biogeogr. 2020, 29, 198–206. [Google Scholar] [CrossRef]
Xu, Q.; Li, M.; Zhang, S.; Zhang, Y.; Zhang, P.; Lu, J. Modern processes of pollen in China during the Quaternary period: Progress and problems. Sci. China Earth Sci. 2015, 45, 1661–1682. [Google Scholar]
Yu, X.; Zhao, J.; Xu, Z.; Wei, J.; Wang, Q.; Shen, F.; Yang, X.; Guo, Z. AIpollen: An Analytic Website for Pollen Identification Through Convolutional Neural Networks. Plants 2024, 13, 3118. [Google Scholar] [CrossRef]
Mitra, B.; Craswell, N. An Introduction to Neural Information Retrieval. Found. Trends Inf. Retr. 2018, 13, 1–126. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Keita, E.; Takefumi, H.; Tomotaka, K.; Hiroyuki, S.; Tomohito, S.; Akane, S.; Chisa, S.; Ryota, F.; Yoshihiro, T. Estimation of the Amount of Pear Pollen Based on Flowering Stage Detection Using Deep Learning. Sci. Rep. 2024, 14, 13163. [Google Scholar] [CrossRef] [PubMed]
Ishino, S.; Itaki, T.; Fukuda, M. Deep Learning Object Detection for Fossil Diatom Counting: Assessing the Impact of Fossil Preservation and Intraspecific Morphological Variation. Mar. Micropaleontol. 2025, 201, 102519. [Google Scholar] [CrossRef]
Lin, T.-Y.; Maire, M.; Belongie, S.J.; Bourdev, L.D.; Girshick, R.B.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. arXiv 2014, arXiv:1405.0312. [Google Scholar]
Everingham, M.; Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
Birks, H.J.B.; Lotter, A.F.; Juggins, S.; Smol, J.P. Tracking Environmental Change Using Lake Sediments; Kluwer Academic Publishers: Dordrecht, The Netherlands, 2001. [Google Scholar]
Wei, H.; Yuan, Q.; Xu, Q.; Qin, Z.; Wang, L.; Fan, Q.; Shan, F. Assessing the Impact of Human Activities on Surface Pollen Assemblages in Qinghai Lake Basin, China. J. Quat. Sci. 2018, 33, 702–712. [Google Scholar] [CrossRef]
Chen, H.; Hou, G.; Wen, D.; Qiao, H.; Gao, J.; Jin, S. Distribution and evolution of Pinus vegetation on the Qinghai-Tibet Plateau during the Last Glacial Maximum and Holocene. Quat. Res. 2023, 43, 1211–1224. [Google Scholar]
Cao, C.; Song, L.; Wang, Z.; Liu, F.; Han, Y.; Cao, X. Using Modern Pollen Assemblages in Vegetation and Climate Reconstructions from the Eastern Tibetan Plateau. Palaeogeogr. Palaeoclimatol. Palaeoecol. 2026, 681, 113389. [Google Scholar] [CrossRef]
Ma, L.; Xu, Q.; Zhang, S.; Li, Y.; Miao, Y.; Cao, X. Review of Relative Pollen Productivity Estimates for the Tibetan Plateau. Sci. China Earth Sci. 2025, 68, 2413–2425. [Google Scholar] [CrossRef]
Jin, S. A Study on Prehistoric Human Activities and Survival Strategies in the Tongtian River Basin. Ph.D. Thesis, Qinghai Normal University, Xining, China, 2024. [Google Scholar] [CrossRef]
Gao, J. A Study on the Living Environment and Adaptation Strategies of Prehistoric Humans in the Yellow River Basin of the Qinghai-Tibet Plateau. Doctoral Dissertation, Qinghai Normal University, Xining, China, 2019. [Google Scholar] [CrossRef]
Gao, J.-Y. Prehistoric Human Living Environment and Adaptation Strategies in the Yellow River Basin of the Tibetan Plateau. Ph.D. Thesis, Qinghai Normal University, Xining, China, 2023. (In Chinese) [Google Scholar]
Hou, Z.; Qi, B.; Qiao, H.; Sun, Y.; Wang, Y.; Hou, G. Environmental Evolution and Human Activities in the Gonghe Basin during the Middle and Late Holocene as Revealed by the Nankanyan Site in Qinghai. Sci. Bull. 2025, 70, 1365–1381. [Google Scholar]
Zhao, Y.; Hou, G.; Chongyi, E.; Yang, L.; Wang, Q. Environmental evolution and human activities reflected by carbon dust concentration in the Dawu area of the Qinghai-Tibet Plateau. J. Earth Environ. 2016, 7, 19–26. [Google Scholar]
Jin, S.; Hou, G. Microlithic Human Activities and Environmental Background in the Qinghai Lake and Eling Lake Areas of the Qinghai-Tibet Plateau. Master’s Thesis, Qinghai Normal University, Xining, China, 2020. [Google Scholar]
Chen, X.; Hou, G. A Study on Prehistoric Human Activities and Environmental Adaptation in the Yangtze-Lancang River Source Area. Ph.D. Thesis, Qinghai Normal University, Xining, China, 2022. [Google Scholar] [CrossRef]
Cao, X.; Tian, F.; Li, K.; Ni, J. Atlas of Pollen and Spores for Common Plants from the East Tibetan Plateau; National Tibetan Plateau/Third Pole Environment Data Center: Beijing, China, 2020. [Google Scholar] [CrossRef]
Cao, X.; Wang, N.; Cao, Y.; Tian, F. Hostile Climate during the Last Glacial Maximum Caused Sparse Vegetation on the North-Eastern Tibetan Plateau. Quat. Sci. Rev. 2023, 301, 107916. [Google Scholar] [CrossRef]
Murray, A.S.; Wintle, A.G. Luminescence Dating of Quartz Using an Improved Single-Aliquot Regenerative-Dose Protocol. Radiat. Meas. 2000, 32, 57–73. [Google Scholar] [CrossRef]
Blaauw, M. Methods and Code for ‘Classical’ Age-Modelling of Radiocarbon Sequences. Quat. Geochronol. 2010, 5, 512–518. [Google Scholar] [CrossRef]
Makó, L.; Cseh, P.; Nagy, B.; Sümegi, P.; Molnár, D. Paleoecological Reconstruction Derived from an Age–Depth Model and Mollusc Data, Pécel, Hungary. Quaternary 2025, 8, 37. [Google Scholar] [CrossRef]
Riding, J.B. A Guide to Preparation Protocols in Palynology. Palynology 2021, 45, 1–110. [Google Scholar] [CrossRef]
Traverse, A. Paleopalynology; Springer: Dordrecht, The Netherlands, 2007. [Google Scholar]
Shorten, C.; Khoshgoftaar, T.M. A Survey on Image Data Augmentation for Deep Learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
Alsafadi, F.; Yaseen, M.; Wu, X. An Investigation on Machine Learning Predictive Accuracy Improvement and Uncertainty Reduction Using VAE-Based Data Augmentation. Nucl. Eng. Des. 2025, 445, 114433. [Google Scholar] [CrossRef]
Zhou, H.; Zhang, X.; Wang, H. Highly Censored Survival Analysis via Data Augmentation. Biomed. Signal Process. Control. 2025, 106, 107675. [Google Scholar] [CrossRef]
Hassan, M.; Illanko, K.; Fernando, X.N. Single Image Super Resolution Using Deep Residual Learning. AI 2024, 5, 426–445. [Google Scholar] [CrossRef]
Murray, A.S.; Wintle, A.G. The Single Aliquot Regenerative Dose Protocol: Potential for Improvements in Reliability. Radiat. Meas. 2003, 37, 377–381. [Google Scholar] [CrossRef]
Sokal, R.R.; Rohlf, F.J. Biometry: The Principles and Practice of Statistics in Biological Research, 3rd ed.; W.H. Freeman: New York, NY, USA, 1995. [Google Scholar]
Wang, R.; Pang, J.; Han, X.; Xiang, M.; Ning, X. Automated Magnetocardiography Classification Using a Deformable Convolutional Block Attention Module. Biomed. Signal Process. Control. 2025, 105, 107602. [Google Scholar] [CrossRef]
Muhammad, S.; Zhaoquan, G. Deep Residual Learning for Image Recognition: A Survey. Appl. Sci. 2022, 12, 8972. [Google Scholar] [CrossRef]
Gabor, M.; Zdunek, R. Reduced Storage Direct Tensor Ring Decomposition for Convolutional Neural Networks Compression. Neural Netw. Off. J. Int. Neural Netw. Soc. 2025, 193, 107994. [Google Scholar] [CrossRef]
Marandi, A.; Dahl, J.; De Klerk, E. A Numerical Evaluation of the Bounded Degree Sum-of-Squares Hierarchy of Lasserre, Toh, and Yang on the Pooling Problem. Ann. Oper. Res. 2018, 265, 67–92. [Google Scholar] [CrossRef]
Unal, Y. Integrating CBAM and Squeeze-and-Excitation Networks for Accurate Grapevine Leaf Disease Diagnosis. Food Sci. Nutr. 2025, 13, e70377. [Google Scholar] [CrossRef] [PubMed]
Li, Q.; Luo, S.; Tan, S.; Li, Z. SEAP: Squeeze-and-Excitation Attention Guided Pruning for Lightweight Steganalysis Networks. EURASIP J. Inf. Secur. 2025, 2025, 24. [Google Scholar] [CrossRef]
Yuan, S.; Gao, H.; Gu, X.; Ma, J. Synthetic Aperture Radar Target Recognition with Limited Training Data Based on Frequency-Domain-Assisted Dual-Stream Attention Hierarchical Deformable Convolutional Networks. Eng. Appl. Artif. Intell. 2025, 161, 112309. [Google Scholar] [CrossRef]
Kensert, A.; Desmet, G.; Cabooter, D. MolGraph: A Python Package for the Implementation of Molecular Graphs and Graph Neural Networks with TensorFlow and Keras. J. Comput.-Aided Mol. Des. 2024, 39, 3. [Google Scholar] [CrossRef]
Vdoviak, G.; Sledevič, T.; Serackis, A.; Plonis, D.; Matuzevičius, D.; Abromavičius, V. Evaluation of Deep Learning Models for Insects Detection at the Hive Entrance for a Bee Behavior Recognition System. Agriculture 2025, 15, 1019. [Google Scholar] [CrossRef]
Chuan-Jie, Z.; Teng, L.; Jinxu, W.; Danlan, Z.; Youxin, Z.; Yang, G.; Hui-Zhen, W.; Jialin, Y.; Min, C. Evaluation of the YOLO Models for Discrimination of the Alfalfa Pollinating Bee Species. J. Asia-Pac. Entomol. 2024, 27, 102195. [Google Scholar]
Soeb, M.J.A.; Jubayer, M.F.; Tarin, T.A.; Al Mamun, M.R.; Ruhad, F.M.; Parven, A.; Mubarak, N.M.; Karri, S.L.; Meftaul, I.M. Tea Leaf Disease Detection and Identification Based on YOLOv7 (YOLO-T). Sci. Rep. 2023, 13, 6078. [Google Scholar] [CrossRef]
Jiawei, Z.; Guangzhao, T.; Chang, Q.; Baoxing, G.; Kui, Z.; Qin, L. Weed Detection in Potato Fields Based on Improved YOLOv4: Optimal Speed and Accuracy of Weed Detection in Potato Fields. Electronics 2022, 11, 3709. [Google Scholar] [CrossRef]
Dutta, M.; Ganguly, A. Incremental-Based YoloV3 Model with Hyper-Parameter Optimization for Product Image Classification in E-Commerce Sector. Appl. Soft Comput. 2024, 165, 112029. [Google Scholar] [CrossRef]
Yi, F.; Zhang, H.; Yang, J.; He, L.; Mohamed, A.S.A.; Gao, S. YOLOv7-SiamFF: Industrial Defect Detection Algorithm Based on Improved YOLOv7. Comput. Electr. Eng. 2024, 114, 109090. [Google Scholar] [CrossRef]
Jie, J.; Lele, Y.; Jiawei, J.; Yuhong, L.; Bin, C. Angel: A New Large-Scale Machine Learning System. Natl. Sci. Rev. 2018, 5, 216–236. [Google Scholar]
Zeiler, M.D.; Fergus, R. Visualizing and Understanding Convolutional Networks. arXiv 2013, arXiv:1311.2901. [Google Scholar] [CrossRef]
Skrodzki, M.; van Geffen, H.; de Plaza, N.F.C.; Hollt, T.; Eisemann, E.; Hildebrandt, K. Accelerating Hyperbolic T-SNE. IEEE Trans. Vis. Comput. Graph. 2024, 30, 4403–4415. [Google Scholar] [CrossRef]
Schmitz, S.; Weidner, U.; Hammer, H.; Thiele, A. Evaluating Uniform Manifold Approximation and Projection for Dimension Reduction and Visualization of Polinsar Features. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2021, 1, 39–46. [Google Scholar] [CrossRef]
Sanja, B.; Marko, P.; Predrag, M.; Mira, S.; Dragana, B.; Branko, Š. Explainable AI for Unveiling Deep Learning Pollen Classification Model Based on Fusion of Scattered Light Patterns and Fluorescence Spectroscopy. Sci. Rep. 2023, 13, 3205. [Google Scholar] [CrossRef]
Allaoui, M.; Hedjam, R.; Bouanane, K.; Allili, M.S.; Kherfi, M.L.; Belhaouari, S.B. Exploring Non-Negativity for Improved Manifold Embedding: Application to t-SNE. Knowl.-Based Syst. 2025, 330, 114547. [Google Scholar] [CrossRef]
Ribeiro, M.T.; Singh 0001, S.; Guestrin, C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. arXiv 2016, arXiv:1602.04938. [Google Scholar]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. Int. J. Comput. Vis. 2020, 128, 336–359. [Google Scholar] [CrossRef]
Fang, J.P.; Zhou, J.; Cui, Q.; Tang, C.Z.; Li, L.F. Interpreting Model Predictions with Constrained Perturbation and Counterfactual Instances. Int. J. Pattern Recognit. Artif. Intell. 2022, 36, 2251001. [Google Scholar] [CrossRef]
Chen, X.; Hou, G.; Jin, S.; Gao, J.; Duan, R. Pollen record of human activities in the Middle and Late Holocene of the Qinghai-Tibet Plateau. Earth Environ. 2020, 48, 643–651. [Google Scholar]
Shu, J.; Huang, X.; Xu, D.; Chen, W.; Song, B.; Cui, A.; Grimm, E. New Tilia software: Chinese guide and usage tips. Acta Palaeontol. Sin. 2018, 57, 260–272. [Google Scholar]
Elżbieta, K.; Agnieszka, K.; Paweł, K.; Krystyna, P.; Magdalena, S. Detection and Recognition of Pollen Grains in Multilabel Microscopic Images. Sensors 2022, 22, 2690. [Google Scholar] [CrossRef]
Zhou, S.; Zhang, J.; Cheng, B.; Zhu, H.; Lin, J. Holocene Pollen Record from Lake Gahai, NE Tibetan Plateau and Its Implications for Quantitative Reconstruction of Regional Precipitation. Quat. Sci. Rev. 2024, 326, 108504. [Google Scholar] [CrossRef]
Ren, W.; Zhang, X.; Liang, C.; Li, Q.; Qin, F.; Yi, G. Holocene Vegetation Dynamics and Abrupt Biodiversity Shifts in the Zoige Basin, Northeastern Tibetan Plateau. Palaeogeogr. Palaeoclimatol. Palaeoecol. 2025, 675, 113079. [Google Scholar] [CrossRef]
Wei, H.; Chongyi, E.; Duan, R.; Zhang, J.; Sun, Y.; Hou, G.; Gao, J. History of pastoral activities in northeastern Tibetan Plateau since the mid-Holocene as recorded by fungal spores. Sci. China Earth Sci. 2021, 51, 1907–1922. [Google Scholar]
Rapiejko, P.; Wawrzyniak, Z.M.; Jachowicz, R.S.; Jurkiewicz, D. Image Analysis in Automatic System of Pollen Recognition. Acta Agrobot. 2012, 59, 385–393. [Google Scholar] [CrossRef]
Zhang, C.J.; Liu, T.; Wang, J.; Zhai, D.; Chen, M.; Gao, Y.; Yu, J.; Wu, H.Z. DeepPollenCount: A Swin-Transformer-YOLOv5-Based Deep Learning Method for Pollen Counting in Various Plant Species. Aerobiologia 2024, 40, 425–436. [Google Scholar] [CrossRef]
Durand, M.; Paillard, J.; Ménard, M.P.; Suranyi, T.; Grondin, P.; Blarquez, O. Pollen Identification through Convolutional Neural Networks: First Application on a Full Fossil Pollen Sequence. PLoS ONE 2024, 19, e0302424. [Google Scholar] [CrossRef]
Keller, A.; Danner, N.; Grimmer, G.; Ankenbrand, M.V.D.; Von Der Ohe, K.; Von Der Ohe, W.; Rost, S.; Härtel, S.; Steffan-Dewenter, I. Evaluating Multiplexed Next-Generation Sequencing as a Method in Palynology for Mixed Pollen Samples. Plant Biol. 2015, 17, 558–566. [Google Scholar] [CrossRef]
He, P.; Glowacki, G.; Gkantiragas, A. Unsupervised Representations of Pollen in Bright-Field Microscopy. arXiv 2019, arXiv:1908.01866. [Google Scholar] [CrossRef]

Figure 1. Distribution map of vegetation types on the Qinghai–Tibet Plateau.

Figure 2. Schematic diagram of pollen sampling points. Black triangles indicate historical sediment profiles samples, whereas red circles mark the newly discovered stratigraphic section used in this study.

Figure 3. Modern pollen standard glass slide.

Figure 4. Shaqu profile: (a) Profile sedimentation (vertical height of the profile); (b) Slanted view of the same profile (slope distance of the profile); (c) surrounding environment). The vertical height in (a) is 2 m, while the slanted length in (b) is 5.3 m. The scale bars in both images correspond to the respective measurements.

Figure 5. Representative imaging results of the TPPOL23 pollen database. (a) Apiaceae; (b) Asteraceae; (c) Betula; (d) Brassicaceae; (e) Caprifoliaceae; (f) Chenopodiaceae; (g) Corylus; (h) Cyperaceae; (i) Elaeagnaceae; (j) Gentianaceae; (k) Hippophae; (l) Juglandaceae; (m) Lamiaceae; (n) Lycopodiaceae; (o) Ostrya; (p) Picea; (q) Pinus; (r) Poaceae; (s) Ranunculaceae; (t) Salix; (u) Thymelaeaceae—Daphne; (v) Ulmus; (w) Zygophyllaceae.

Figure 6. Examples of data augmentation applied to Picea pollen images in the TPPOL23 database. (a) Original image; (b) HSV color adjustment; (c) Histogram equalization with color normalization; (d) Random cropping; (e) Horizontal flip; (f) Vertical flip; (g) Gaussian blur; (h) Perspective transformation; (i) Multi-scale Gaussian pyramid fusion; (j) Random scaling.

Figure 7. Manual annotation of pollen grains in the TPPOL23 database by palynological experts. (a) Picea; (b) Chenopodiaceae. Yellow bounding boxes indicate expert-annotated pollen grains used as ground-truth labels for model training.

Figure 8. Architecture of the Dual-Path Excitation Block (DPEB) embedded in PollenSENet. The input channel descriptor

Z

undergoes channel recalibration (Equations (6)–(8)) and spatial modulation (Equations (9) and (10)) to produce the modulated feature map

U_{modulated}

. Different colors and arrows are used only to visually distinguish processing steps within the block and do not represent additional categorical or functional meanings.

Figure 8. Architecture of the Dual-Path Excitation Block (DPEB) embedded in PollenSENet. The input channel descriptor

Z

undergoes channel recalibration (Equations (6)–(8)) and spatial modulation (Equations (9) and (10)) to produce the modulated feature map

U_{modulated}

. Different colors and arrows are used only to visually distinguish processing steps within the block and do not represent additional categorical or functional meanings.

Figure 9. Visualization of channel attention in PollenSENet. (Left): input pollen micro-image; (Right): learned channel-attention distribution

A_{c h a n n e l}

, highlighting discriminative responses associated with germinal apertures and exine ornamentation.

Figure 9. Visualization of channel attention in PollenSENet. (Left): input pollen micro-image; (Right): learned channel-attention distribution

A_{c h a n n e l}

, highlighting discriminative responses associated with germinal apertures and exine ornamentation.

Figure 10. Characteristic curves of activation functions used in PollenSENet: ReLU, LeakyReLU (slope = 0.2), Sigmoid, and Hard-Sigmoid.

Figure 11. Structure of the adjustment module in PollenSENet (based on a ResNet-152 residual block). The module takes multi-scale feature maps

[X_{1}^{'}; X_{2}^{'}; X_{3}^{'}]

as input, generates scale-specific weights through

1 \times 1

convolution and Softmax normalization (Equations (11) and (12)), fuses them into

X_{fused}

(Equation (13)), and applies deformable convolution calibration (Equation (14)) with residual connection (Equation (15)).

Figure 11. Structure of the adjustment module in PollenSENet (based on a ResNet-152 residual block). The module takes multi-scale feature maps

[X_{1}^{'}; X_{2}^{'}; X_{3}^{'}]

as input, generates scale-specific weights through

1 \times 1

convolution and Softmax normalization (Equations (11) and (12)), fuses them into

X_{fused}

(Equation (13)), and applies deformable convolution calibration (Equation (14)) with residual connection (Equation (15)).

Figure 12. Architecture of the Pollen-YOLO model, where the YOLOv11 backbone is replaced with PollenSENet while the Neck (PANet) and Head modules remain unchanged. Arrows indicate the direction of feature propagation, and different colors are used only to visually distinguish network modules; they do not represent additional functional or categorical meanings.

Figure 13. Training and validation loss curves of Pollen-YOLO. (a) Training losses, including box_loss, cls_loss, and dfl_loss; (b) Validation losses, including val/box_loss, val/cls_loss, and val/dfl_loss. Epoch denotes one complete pass through the training dataset; the x-axis represents the number of training epochs.

Figure 14. Performance curves of Pollen-YOLO during training on the TPPOL23 dataset across 200 epochs. An epoch denotes one complete training cycle over the entire training dataset. The horizontal axis represents the epoch number, while the vertical axis represents the corresponding performance metric values (ranging from 0 to 1). (a) Precision; (b) Recall; (c) mAP@0.5; (d) mAP@0.5:0.95.

Figure 15. Evolution of the F1-score of Pollen-YOLO during training on the TPPOL23 dataset across 200 epochs. An epoch denotes one complete training cycle over the entire training dataset. The horizontal axis represents the epoch number, and the vertical axis represents the F1-score value (ranging from 0 to 1).

Figure 16. Recognition performance of Pollen-YOLO evaluated on the validation set of the TPPOL23 dataset, comprising 5159 images (20% of the augmented dataset), displayed as a heatmap. Recognition accuracy is calculated based on expert-annotated pollen grains within each class. Darker blue indicates higher recognition accuracy.

Figure 17. Representative examples of pollen detection and classification by Pollen-YOLO on the TPPOL23 validation set. All images have a resolution of 2048 × 2048 pixels. (a) Apiaceae; (b) Betula; (c) Brassicaceae; (d) Caprifoliaceae; (e) Chenopodiaceae; (f) Corylus; (g) Thymelaeaceae—Daphne; (h) Elaeagnaceae; (i) Gentianaceae; (j) Hippophae; (k) Lamiaceae; (l) Ostrya; (m) Picea; (n) Pinus; (o) Ranunculaceae; (p) Ulmus; (q) Zygophyllaceae; (r) Asteraceae; (s) Cyperaceae; (t) Juglandaceae; (u) Lycopodiaceae; (v) Poaceae; (w) Salix.

Figure 18. Grad-CAM visualization on full microscopic images. High-response regions (red) correspond to pollen grains, while background and debris are represented by low-response regions (blue), demonstrating the model’s ability to localize pollen against complex backgrounds. All images have a resolution of 2048 × 2048 pixels. (a) Asteraceae; (b) Lycopodiaceae.

Figure 19. Grad-CAM visualization on individual pollen grains. (a) Juglandaceae, (b) Apiaceae, (c) Picea. Red regions highlight critical morphological features such as apertures and exine ornamentation, illustrating how Pollen-YOLO focuses on biologically relevant structures during classification.

Figure 20. Age–depth model for the Shaqu profile based on six OSL dates using Bacon Bayesian modeling. The black shaded envelope represents the posterior age distributions estimated by the Bacon Bayesian age–depth model, with darker tones indicating higher probability density. The red line denotes the modeled mean age–depth curve, and the green violin-shaped distributions represent the OSL ages with their associated uncertainty ranges.

Figure 21. Comparison of pollen assemblages from the Shaqu profile obtained by traditional manual identification (“artificially identified”) and automated recognition using the Pollen-YOLO framework (“model-identified”). Both assemblages are based on the same sediment samples and prepared slides. (A) Artificially identified pollen concentration diagram; (B) Model-identified pollen concentration diagram generated by Pollen-YOLO. Both diagrams were produced using Tilia (version 2.6.1, Illinois State Museum, Springfield, IL, USA) [60].

Figure 22. Heatmap of normalized relative errors between artificial and model-based pollen identification results for the Shaqu profile. Relative errors were calculated as |Model−Artificial|/Artificial and normalized to [0, 1]. White represents low errors (close agreement), whereas red indicates large deviations. Crosses (×) denote taxa with zero counts in artificial results where errors could not be computed. The heatmap highlights that major discrepancies are concentrated in rare or morphologically similar taxa (e.g., Betulaceae vs. Corylus), while dominant taxa exhibit relatively low errors.

Table 1. Names and detailed information of pollen sampling sites. “Pollen sample count” indicates the number of sediment samples analyzed for pollen at each site. 14C denotes radiocarbon dating, and OSL denotes optically stimulated luminescence dating.

Section Name	Sediment Type	Stratigraphic Age (Dating Method)	Pollen Sample Count
Daiqu Site	Aeolian loess–fluvial deposit	12.1–10.9 ka (OSL)	65
Laodaqiao Site	Aeolian loess–fluvial deposit	13–3.3 ka (OSL)	97
Donggicuona Lake	Aeolian loess–lacustrine deposit	2.01–8.77 ka (OSL)	45
Shalongka	Aeolian loess–flood deposit	8.5–3.9 ka (14C)	208
Nankanyan Site	Aeolian loess–fluvial deposit	13.7–0.8 ka (OSL)	66
Xiadawu Site	Aeolian loess–fluvial deposit	0–6.3 ka (14C)	62
Eling Lake Site	Aeolian loess–lacustrine deposit	2.1–14.6 ka (OSL)	105
Zhongda Site	Aeolian loess–fluvial deposit	6.2–19 ka (OSL)	31

Table 2. Composition of the TPPOL23 dataset. “Original image count” denotes the number of raw microscopic images acquired prior to data augmentation for each pollen taxon. These images constitute the primary dataset from which individual pollen grains were annotated and subsequently expanded through class-specific augmentation strategies.

Pollen Type	Original Image Count	Augmentation Factor	Augmented Image Count	Key Augmentation Methods (Intensity Parameters)	Augmentation Purpose
Apiaceae	1084	1.2	1300	Horizontal flip (p = 0.5)	Improve robustness to rotational symmetry features
				HSV color jitter (ΔH = ±15°, ΔS = ±0.1, ΔV = ±0.1)
				Random crop (scale 0.8–1.0)
Asteraceae	135	8	1080	Vertical flip (p = 0.5)	Simulate different section views, enhance radial pattern recognition
				Gaussian blur (σ = 1.5)
				Perspective transform (rotation ± 20°, scaling ± 15%)
Betulaceae	250	4	1000	Random scaling (0.7–1.3×)	Enhance invariance to aperture position
				Color inversion (p = 0.3)
				Local cropping (retain ≥ 70% area)
Brassicaceae	203	6	1218	Perspective transform (tilt ± 15°)	Simulate microscope slide tilt effects
				Gaussian noise (SNR = 25 dB)
				Motion blur (length = 5 px)
Caprifoliaceae	345	4	1380	Color normalization (histogram matching)	Suppress color bias, enhance mesh pattern consistency
				Center crop (80%)
				Random flip (p = 0.5)
Chenopodiaceae	315	4	1260	Elastic deformation (α = 50, σ = 5)	Improve generalization for irregular edge structures
				Color enhancement (saturation ± 20%)
				Multi-scale scaling (0.5–2.0×)
Corylus (Betulaceae)	181	7	1267	Affine transform (translation ± 10%)	Reduce overfitting, improve robustness of aperture ring detection
				Gaussian blur (σ = 0.5–2.0)
				Random occlusion (≤15%)
Cyperaceae	92	8	736	Perspective distortion (grid warp)	Simulate optical diffraction in thin-walled pollen
				Color channel shift (RGB offset ± 3 px)
				Frequency domain filtering (hybrid high/low pass)
Elaeagnaceae	726	1.5	1089	Random flip + crop combination	Improve multi-angle recognition of surface tubercles
				Color balance (white point correction)
				Motion blur (random angle)
Gentianaceae	91	8	728	Scaling + perspective joint transform	Enhance stability of striate patterns across resolutions
				CLAHE (Contrast Limited Adaptive Histogram Equalization)
				Local Gaussian blur
Hippophae	198	6	1188	Color space conversion (LAB enhancement)	Optimize representation of oily layer optical properties
				Random grid deformation
				Directional blur (along long axis)
Juglandaceae	353	4	1412	Non-rigid deformation (thin-plate spline)	Handle morphological variation due to folding
				Color jitter (brightness ± 10%)
				Random rotation (±180°)
Lamiaceae	375	3	1125	Multi-view perspective projection	Enhance symmetry-breaking feature extraction
				Color channel weighted blending
				Anisotropic Gaussian filtering
Lycopodiaceae	312	4	1248	Frequency domain enhancement (high-frequency boost)	Increase contrast of tiny germination structures
				Random cropping (retain spiny structures)
				Color normalization
Ostrya (Betulaceae)	103	8	824	Light Gaussian noise (σ = 0.5)	Prevent overfitting, preserve aperture ring authenticity
				Random flip (p = 0.5)
				Local color perturbation
Picea (Pinaceae)	527	2.5	1317	3D volume rendering view simulation	Correct morphological distortion due to angle in air sac structures
				Multi-scale Gaussian pyramid fusion
				Color channel shift
Pinus (Pinaceae)	177	6.5	1150	Selective enhancement of air sac regions	Improve classification specificity of bisaccate structures
				Polarized light effect simulation
				Non-uniform illumination synthesis
Poaceae	329	3.5	1151	Morphological dilation/erosion (kernel = 3 × 3)	Enhance rotational invariance of monoporate pollen
				Random perspective deformation
				Color space enhancement
Ranunculaceae	103	8	824	Focus stack simulation (multi-plane blending)	Handle randomness in surface granule distribution
				Color inversion (p = 0.2)
				Elastic deformation
Salix (Salicaceae)	238	5	1190	Frequency-spatial joint enhancement	Improve generalization for tricolpate pollen
				Random occlusion (≤10%)
				HSV color space perturbation
Ulmus	135	8	1080	Selective Gaussian blur	Compensate for spherical projection distortion, highlight aperture features
				Elastic deformation
				Spherical projection transform
Zygophyllaceae	209	5.5	1149	Morphological skeleton enhancement	Optimize edge feature extraction of spiny protrusions
				Color channel separation enhancement
				Motion blur synthesis
Daphne (Thymelaeaceae)	135	8	1080	Multi-directional lighting rendering	Enhance recognition of reticulate patterns under varying illumination
				Gaussian-Poisson noise mixing
				Non-rigid deformation

Table 3. Training parameters and experimental setup for Pollen-YOLO model.

Parameter	Value
Epochs	200
Batch size	16
Workers	4
Learning rate	0.001
Optimizer	Adam
Input image size	640
Ratio of training set to validation set	8:2

Table 4. Evaluation metrics used in this study, including their mathematical definitions and analytical purposes.

Metric	Formula	Definition	Purpose
Loss function (L)	$L = - \sum_{i = 1}^{C} y_{i} \cdot l o g (p_{i})$	Measures the error between predicted and ground-truth values across C classes. In YOLO, it combines bounding box regression loss, classification loss, and distribution focal loss.	Evaluates training stability and effectiveness of learning rate scheduling.
Precision	$Precision = \frac{T P}{T P + F P}$	Proportion of correctly predicted positives among all predicted positives.	Reflects the accuracy of positive predictions.
Recall	$Recall = \frac{T P}{T P + F N}$	Proportion of actual positives correctly identified by the model.	Reflects the completeness of positive detection.
Mean Average Precision (mAP)	$mAP = \frac{1}{N} \sum_{i = 1}^{N} {A P}_{i}$	The mean of AP values across all N classes. Includes ${m A P}_{50}$ (IoU = 0.5) and ${m A P}_{50 - 95}$ (averaged across IoU thresholds from 0.5 to 0.95).	Provides a comprehensive measure of detection precision and localization.
F1 Score	$F_{1} = 2 \cdot \frac{Precision \cdot Recall}{Precision + Recall}$	Harmonic mean of precision and recall.	Provides a balanced evaluation when precision and recall are uneven.

Table 5. Ablation study results on the TPPOL23 dataset.

Experiment ID	Model Variant	Precision (%)	Recall (%)	F1-Score (%)	mAP@0.5 (%)	mAP@0.5:0.95 (%)
1	YOLOv11 (Original Backbone)	78.32	88.2	83.26	80.56	70.25
2	+ResNet-152 Backbone	78.59	89.5	84.05	82.32	72.1
3	+ResNet-152 + Dual-Modal Pooling (Ours)	84.5	91.1	87.8	81.69	73.95
4	+ResNet-152 + Standard SE Attention	88.8	90.3	89.55	93.2	73.2
5	+ResNet-152 + DPEB (Ours)	90.1	92.4	91.25	94.8	74.9
6	+PollenSENet (Full, Ours)	90.67	93.16	91.92	95.29	75.79

Table 6. Comparison of different models on the TPPOL23 dataset.

Model	Precision (%)	Recall (%)	F1-Score (%)	mAP@0.5 (%)	mAP@0.5:0.95 (%)
YOLOv3	82.5	87.3	84.8	86.2	71.4
YOLOv4	86.9	89.2	88	90.5	73.6
YOLOv7	88.3	90.1	89.2	92.7	74.5
EfficientDet-D3	85.6	88.7	87.1	89.8	72.1
Pollen-YOLO (Ours)	90.7	93.2	91.9	95.3	75.8

Table 7. Quantitative performance metrics of Pollen-YOLO for individual pollen taxa.

Class	Precision	Recall	mAP50	mAP50-95	F1-Score
Apiaceae	0.864	0.806	0.888	0.514	0.834
Asteraceae	0.872	0.976	0.955	0.742	0.921
Brassicaceae	0.917	0.89	0.952	0.718	0.903
Betulaceae	0.853	0.937	0.951	0.762	0.893
Caprifoliaceae	0.951	0.964	0.992	0.928	0.957
Chenopodiaceae	0.938	0.913	0.968	0.83	0.925
Corylus	0.929	0.902	0.956	0.806	0.915
Cyperaceae	0.903	0.953	0.968	0.827	0.927
Daphne	0.948	0.925	0.981	0.789	0.936
Elaeagnaceae	0.896	0.947	0.937	0.703	0.921
Gentianaceae	0.95	0.923	0.983	0.778	0.936
Hippophae	0.843	0.965	0.906	0.765	0.900
Juglandaceae	0.972	0.997	0.993	0.927	0.984
Lamiaceae	0.762	0.877	0.839	0.512	0.815
Lycopodiaceae	0.968	0.993	0.994	0.784	0.980
Ostrya	0.946	0.913	0.979	0.78	0.929
Picea	0.942	0.996	0.994	0.945	0.968
Pinus	0.953	0.886	0.934	0.809	0.918
Poaceae	0.923	0.98	0.985	0.839	0.951
Ranunculaceae	0.838	0.881	0.919	0.585	0.859
Salix	0.938	0.953	0.979	0.778	0.945
Ulmus	0.842	0.941	0.903	0.581	0.889
Zygophyllaceae	0.902	0.909	0.963	0.81	0.905

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shi, X.; Hou, G.; Wang, F.; Li, H. Pollen-YOLO: A Deep Learning Framework for Automated Pollen Identification and Its Application to Palaeoecological Reconstruction on the Tibetan Plateau. Quaternary 2026, 9, 6. https://doi.org/10.3390/quat9010006

AMA Style

Shi X, Hou G, Wang F, Li H. Pollen-YOLO: A Deep Learning Framework for Automated Pollen Identification and Its Application to Palaeoecological Reconstruction on the Tibetan Plateau. Quaternary. 2026; 9(1):6. https://doi.org/10.3390/quat9010006

Chicago/Turabian Style

Shi, Xuan, Guangliang Hou, Fubo Wang, and Hongyu Li. 2026. "Pollen-YOLO: A Deep Learning Framework for Automated Pollen Identification and Its Application to Palaeoecological Reconstruction on the Tibetan Plateau" Quaternary 9, no. 1: 6. https://doi.org/10.3390/quat9010006

APA Style

Shi, X., Hou, G., Wang, F., & Li, H. (2026). Pollen-YOLO: A Deep Learning Framework for Automated Pollen Identification and Its Application to Palaeoecological Reconstruction on the Tibetan Plateau. Quaternary, 9(1), 6. https://doi.org/10.3390/quat9010006

Article Menu

Pollen-YOLO: A Deep Learning Framework for Automated Pollen Identification and Its Application to Palaeoecological Reconstruction on the Tibetan Plateau

Abstract

1. Introduction

2. Study Area

3. Materials and Methods

3.1. Laboratory Preparation of Historical Samples

3.1.1. Description of the Natural Profile and Sampling Strategy

3.1.2. OSL Dating and Age–Depth Modeling

3.1.3. Laboratory Treatment of Natural Profile Pollen Samples

3.2. Model Improvements Based on YOLOv11

3.2.1. Compression Operation

3.2.2. Excitation Operation of PollenSENet

3.2.3. Adjustment Operation of PollenSENet

3.3. Integration of PollenSENet with the YOLOv11 Framework

4. Results

4.1. Model Training Environment and Parameter Settings

4.2. Evaluation Metrics

4.3. Ablation Study

4.4. Comparison with Other Models

4.5. Full-Scale Evaluation of Pollen-YOLO

4.5.1. Training and Validation Losses

4.5.2. Model Performance Metrics

4.5.3. Class-Wise Performance on the Validation Set

4.5.4. Grad-CAM Visualization and Interpretability Analysis

4.6. Application to the Shaqu Profile

5. Discussion

5.1. Mechanisms Underlying Model Performance

5.2. Implications of the Shaqu Application for Palaeoecological Reconstruction

5.3. Comparison with Existing Pollen Recognition Approaches

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI