1. Introduction
In a complex physical world characterized by high uncertainty, accurate autonomous localization from weak and sparse signals remains a frontier problem in robotics and automation [
1]. Odor source localization (OSL) is particularly relevant to hazardous source inspection in confined spaces, deep-space exploration, autonomous plume tracking by micro aerial vehicles, and post-disaster search-and-rescue missions [
2,
3,
4]. Unlike light or sound, however, odor molecules are transported primarily by turbulent fluid motion and therefore form fragmented, intermittent plumes rather than smooth spatial gradients. Conventional gradient-following chemotaxis or purely upwind anemotaxis can become trapped in local optima under high-Reynolds-number turbulence or degenerate into inefficient random wandering once odor cues are lost. These limitations have motivated increasing interest in biologically inspired approaches that can operate robustly under intermittent sensing conditions [
5,
6].
In nature, many insects exhibit remarkable capabilities in locating odor sources within highly turbulent environments. Among insects, hymenopterans—especially bumblebees [
7], honeybees [
8,
9], and wasps—display remarkable olfactory navigation and spatial cognition. In broad natural environments with highly complex airflow, these insects can precisely locate food sources and subsequently return to the nest using weak floral odor cues [
10]. Behavioral studies based on the Proboscis Extension Response (PER) and related conditioning paradigms indicate that this outstanding navigational ability is supported not only by instantaneous perception but also by strong coupling between olfactory learning and memory [
11]. A broad body of work further suggests that bumblebee olfactory memory spans multiple time scales [
12,
13]. Short-term memory helps insects respond to nearby turbulent odor patches, whereas longer-lasting memory supports directional persistence and reuse of previously rewarding spatial information when signals become intermittent [
14,
15]. This property is especially relevant to robotic search in turbulent environments, where brief signal loss is common. Taken together, these biological findings suggest that introducing bumblebee olfactory cognition and short- and long-term memory mechanisms into robotic navigation algorithms may provide an effective route to overcoming the bottlenecks of odor source localization in turbulent environments [
16,
17].
Despite these biological insights, translating such mechanisms into robotic systems remains challenging [
18]. In turbulent environments, sensor observations are sparse, noisy, and often reduced to binary plume encounter or non-encounter events. Accordingly, odor source localization is fundamentally a sequential decision-making problem in a partially observable environment [
19,
20]. The combination of a Hidden Markov Model (HMM) and a Partially Observable Markov Decision Process (POMDP) provides a natural foundation for this class of problems [
21]. Within this framework, the HMM models the time-evolving odor plume and wind field, while the POMDP maintains a belief state over possible source locations and supports action selection under uncertainty. As a consequence, these methods often exhibit unstable trajectories, inefficient exploration, or failure under prolonged signal loss. In particular, conventional probabilistic models lack mechanisms to preserve directional continuity or to exploit accumulated experience when sensory input becomes unreliable.
To address this limitation, this paper proposes a novel bio-inspired navigation algorithm that integrates the POMDP-HMM framework with bumblebee olfactory cognition [
22]. The central idea is to retain probabilistic inference as the backbone for handling turbulence-induced uncertainty while introducing a higher-level spatial memory module with both short-term and long-term components. Short-term memory is used to discourage repeated local revisits and thereby improve global exploration efficiency, whereas long-term directional memory preserves attraction toward historically informative regions and stabilizes the search direction when instantaneous cues become unreliable. This design bridges purely mathematical estimation and persistent, experience-dependent biological behavior.
Situated at the intersection of bionics and robotic control engineering, this study aims to solve robust odor source localization in complex turbulent environments. The work includes three components—biological experiments, mathematical modeling, and high-fidelity simulations—and makes the following contributions.
A bio-inspired navigation architecture integrating probabilistic modeling with biologically grounded short-term and long-term memory is constructed.
Behavioral experiments quantify how repeated odor–reward training strengthens both proboscis extension response retention and odor-triggered spatial preference in bumblebees.
Comparative simulations show that the proposed framework improves search success, path efficiency, and robustness under intermittent plume conditions.
The remainder of this paper is organized as follows.
Section 2 presents the biological experiments,
Section 3 establishes the integrated navigation framework,
Section 4 reports comparative simulation studies under different flow field conditions,
Section 5 discusses the main findings,
Section 6 summarizes the limitations and engineering migration feasibility, and
Section 7 concludes the paper. The navigation framework proposed in this paper is based on three layers: the probabilistic estimation layer, the memory stabilization layer, and the adaptive fusion and policy-generation layer. It is hereinafter referred to as “Bio-Nav”. The nomenclature list of variables is shown in
Table A1.
2. Materials and Methods
To support the design of Bio-Nav at the behavioral level, the biological experiments in this study were not treated as independent observations of insect behavior. Instead, they were organized to provide direct evidence for the two functional requirements of the navigation algorithm: (i) odor–reward association must be learnable and strength-dependent, (ii) learned odor cues must be transferable into persistent spatial preference. Accordingly, this design allowed the behavioral evidence to map directly onto the memory formation, memory dominance, and adaptive fusion mechanisms later implemented in Bio-Nav.
In the context of robotic odor source localization, it is insufficient to show only that bumblebees can detect odor and respond to reward. What is more important is whether odor experience can be accumulated, whether it can be transformed into spatially structured guidance, whether the resulting memory weakens over time, and whether previously stored experience can compete with or complement current online odor input. These questions determine whether a navigation algorithm should include memory modules, how memory strength should be updated, and how online sensory evidence should be fused with stored internal states. For this reason, the experiments reported in this section were designed as a behavioral evidence chain for the later three-layer architecture of Bio-Nav.
2.1. Bumblebee Sample Preparation
Adult worker bumblebees were collected at Sun Yat-sen University (Shenzhen, China) (Latitude 22.46° N, Longitude 113.54° E) and reared in hives. The samples were adult worker bumblebees of Bombus terrestris, with a body mass of 200 ± 100 mg and an estimated age range of 20–201360 days after emergence. Temperature and humidity were maintained at 25 °C and 50%, respectively. Bumblebee behavior outside the hive was recorded using a digital camera. All experiments were conducted within 24 h of capturing bees from the hive. We confirmed that no special permits were required for the experimental location/activities. Experiments were conducted from 15 June to 15 July 2025.
2.2. Classical Conditioning Experiment
The classical conditioning experiment was designed to determine whether bumblebees could establish a stable odor–reward association and whether the strength of this association depended on training frequency [
23]. A modified centrifuge tube was used as a restraint device to gently immobilize the body while allowing free movement of the head, antennae, and proboscis. Carnation essential oil (concentration:20%, manufacturer: Nefertari) was delivered as the conditioned stimulus, and sugar water was used as the unconditioned reward (
Figure 1a) [
24]. In each training trial, the bee was first exposed to the odor for 10 s, after which sugar water was presented. The 10 s odor-exposure period and the 5–10 min inter-trial interval were selected according to standardized PER-conditioning procedures reported in previous insect olfactory-learning studies and were further confirmed in preliminary trials as sufficient to evoke antennal sampling without causing apparent fatigue or habituation [
25]. Following the resting interval, the same procedure was repeated (
Figure 1b).
A total of 60 adult worker bumblebees with similar body size were randomly assigned to three groups (20 individuals per group). Group I received 1 training trial, Group II received 5 training trials, and Group III received 10 training trials. After training, the proboscis extension response (PER) to the same odor was tested at multiple post-training time points, and the temporal change in response probability was analyzed [
25].
The results showed that PER retention increased monotonically with training frequency. Bumblebees trained for 5 or 10 trials displayed both higher response probability and longer retention than bees trained for a single trial. Across groups, response probability decreased gradually with increasing post-training delay, but the rate of decline was substantially smaller in the higher-training groups. Statistical comparison indicated significant effects of training frequency and retention interval on PER expression. In Group I, a certain proportion of responses could be observed within a short period of time, but the retention time was relatively short (
Figure 1c). As the delay time increased, the response rate decreased rapidly. The response of Group II to the same odor stimulus was more stable, and a higher PER incidence rate could be maintained at multiple time points. Group III demonstrated the most obvious advantage in memory retention, not only having the highest overall response probability but also maintaining a recognizable response level even after a longer delay. It can be seen from this that the more times the bumblebee is trained, the more stable its olfactory memory becomes, and the longer its behavioral response to olfactory cues last [
25].
These findings indicate that odor memory in bumblebees is not a transient reflex-like reaction, but a trainable internal state whose behavioral expression depends on prior experience. In other words, repeated odor–reward pairing increases the behavioral salience of the associated cue. For Bio-Nav, this result provides the biological basis for assigning accumulated behavioral value to odor-associated states rather than treating each odor encounter as a memoryless instantaneous event.
2.3. Olfactorily Triggered Spatial Memory Experiment
To test whether learned odor cues could guide free movement in space rather than merely triggering reflexive responses in restrained individuals [
26], we performed a spatial memory experiment using a square arena of 200 mm × 200 mm with a 30 mm entrance channel at the bottom. Odor-release ports were located at the top (Source A) and at the lower-left sidewall (Source B), while bumblebee release positions were located at the middle of the bottom edge (Start) (
Figure 2a). A total of 60 bumblebees were divided into six groups for two bio-experiments.
In bio-experiment 1, 30 bumblebees were randomly divided into Groups I–III. Groups I and II received 5 and 10 pre-training trials, respectively, in which bumblebees were released from Start and rewarded at Source A. Group III received no pre-training. After pre-training, all bees were released again from Start under unchanged conditions, and their trajectories were recorded individually. In bio-experiment 2, another 30 bumblebees were divided into Groups IV–VI. Groups IV and V received 5 and 10 pre-training trials, respectively, from Start to Source A, while Group VI received no pre-training. However, during the testing phase, the experimental conditions changed. The odor and sugar water rewards were placed at Source B. The path of every bumblebee was recorded under the new condition (
Figure 2d).
In bio-experiment 1, most bumblebees eventually located Source A, but clear between-group differences were observed in path organization. Pre-trained bees exhibited more concentrated and efficient trajectories than untrained controls, and the effect was stronger in Group II than in Group I. These results indicate that repeated odor–space pairing improves path convergence and reduces blind exploration (
Figure 2e). In bio-experiment 2, all groups eventually located Source B, but Groups IV and V showed a marked tendency during the early phase of the trial to deviate toward the previously trained direction (toward Source A), even though no odor was released there during testing. By contrast, Group VI moved more directly toward the actual source and showed no stable directional bias (
Figure 2f).
To quantify this effect, the odor information perceived at the entrance was defined as an X-direction cue, whereas the spatial bias established during pre-training was defined as an orthogonal Y-direction cue. The Y-direction displacement
was therefore used as an indicator of odor-triggered spatial memory strength (
Figure 2b). Both the mean and maximum Y-direction displacements were larger in Groups IV and V than in Group VI, and Group V exceeded Group IV (
Figure 2c).
These findings demonstrate that familiar odor cues can trigger a persistent directional spatial bias rather than simply eliciting local attraction [
7,
27]. For Bio-Nav, this result directly supports the introduction of long-term directional reference memory, because the behavioral evidence suggests that previously rewarding odor encounters are encoded as structured spatial preferences that continue to influence future navigation.
2.4. Memory Decay Experiment
To investigate whether odor-triggered spatial memory decays over time and whether weakened memory traces can be reactivated by familiar odor cues, a memory decay and reactivation experiment was designed based on the previous spatial memory paradigm. The group labels in this experiment were reset and do not refer to the Groups I–VI used in
Section 2.3. A total of 60 bumblebees were divided into six new groups according to a two-factor design: training intensity (5 or 10 pre-training trials from Start to Source A) and retention interval (10 min, 1 h, or 3 h). In the test phase, all bees were released from Start and rewarded at Source B, while Source A remained unrewarded (
Figure 3a).
Figure 3b further illustrates how the behavioral expression of odor-triggered spatial memory depends jointly on training intensity and retention interval. A larger initial yaw angle and longer correction latency indicate stronger interference from the previously rewarded direction, whereas a shorter path length and arrival time indicate more efficient adaptation to the current rewarded source. Therefore, these four parameters jointly describe the dynamic transition from memory-dominated navigation to current-cue-dominated correction.
Two-factor ANOVA was used to evaluate the effects of training intensity and retention interval. The results showed main effects of both factors on initial yaw angle, path length, arrival time, and correction latency. Interaction effects were also observed, indicating that memory decay was jointly modulated by prior training strength and retention duration (
Figure 3b).
2.5. Cue-Conflict Decision Experiment
To determine how bumblebees resolve conflict between previously learned spatial preference and current online odor evidence, a cue-conflict decision experiment was conducted. Eighty bumblebees were divided into 8 groups, and they were first trained from Start to Source A under repeated odor–reward pairing, thereby establishing a stable odor-linked directional memory in the arena (
Figure 4a). During the testing phase, every bumblebee was released from Start. Source A no longer provided sugar water rewards or emitted odors; Source B, according to the group, provided 0%, 5%, 10%, or 20% odor cues while also offering sugar water rewards (
Figure 4b). The purpose of this experimental design was to be able to separately control “memory strength” and “current perception reliability”, thereby observing the weights of the two in the bumblebee’s navigation decisions.
Behavioral indicators mainly include: First, the Stay Time in area A, which is used to characterize the individual’s dependence on the old memory location. Second, the Decision Latency, which is the time experienced from release to the formation of a stable orientation. Third, the Path Length, which is used to comprehensively evaluate the navigation efficiency under conflicting conditions. Fourth, the Conflict Index, which is used to describe the individual’s weighing process in the conflict cues, is the difference between the cumulative displacement towards the old memory location and the cumulative displacement towards the actual odor source in the first segment of the trajectory, after normalization, to obtain the indicator. After statistical analysis, interaction effects could be observed in these four parameters (
Figure 4c). The four behavioral variables in
Figure 4 were selected to characterize different aspects of cue-conflict resolution. Stay time in area A and conflict index mainly reflect the influence of the old memory trace, whereas decision latency and path length reflect the behavioral cost of resolving inconsistency between memory and current odor evidence. The interaction between training intensity and odor concentration therefore provides direct behavioral support for the adaptive fusion mechanism later implemented in Bio-Nav.
Behavioral results showed that the decision pattern was not all-or-none. Many individuals displayed an early trajectory component biased toward the previously learned direction, followed by gradual correction toward the currently rewarded source. The final success rate remained high, indicating that the new odor cue was still behaviorally effective, but the early trajectory structure revealed that the stored memory trace strongly influenced initial decision-making. This result provides the strongest behavioral rationale for the adaptive fusion mechanism in Bio-Nav. If navigation were driven only by current odor evidence, the early bias toward the old direction would not appear. If navigation were driven only by memory, bees would fail to correct their trajectories toward the actual source. The path suggests that bumblebee navigation relies on dynamic weighting between online evidence and stored experience, which is exactly the principle implemented by the adaptive fusion layer of the algorithm.
2.6. Summary of Biological Experiments
Taken together, the biological experiments demonstrate that bumblebee odor-guided navigation cannot be explained by instantaneous sensing alone. Classical conditioning shows that odor-associated value is learnable and training-dependent. The olfactorily triggered spatial memory experiment shows that this learned value becomes directionally organized and continues to influence later navigation. The memory decay experiment demonstrates that the internal representation is dynamic rather than static, and that weakened memory traces can be re-expressed under familiar sensory cues. Finally, the cue-conflict experiment reveals that navigation emerges from context-dependent arbitration between stored experience and current online odor information.
These results collectively motivate the three-layer design of Bio-Nav. The probabilistic estimation layer accounts for the uncertainty of the external sensory environment, the memory stabilization layer accounts for the persistence and updating of internal experience, and the adaptive fusion layer accounts for the dynamic coordination between the two.
3. Bio-Inspired Olfactory Navigation Framework
Bio-Nav is organized as a three-layer navigation architecture rather than as a simple combination of probabilistic modeling and bio-inspired heuristics. The first layer, probabilistic estimation, infers the most likely source location and the most likely plume distribution under partial observability and turbulent transport. The second layer, memory stabilization, preserves directional continuity and suppresses inefficient local revisits when sensory evidence becomes sparse or intermittent. The third layer, adaptive fusion, dynamically arbitrates between online probabilistic evidence and stored spatial experience to generate the reward landscape used for action selection.
This organization is motivated directly by the biological experiments presented in
Section 2. The classical conditioning experiment shows that odor-associated value can be learned and strengthened through repetition. The olfactorily triggered spatial memory experiment shows that learned odor cues become directionally organized and continue to bias later navigation. The memory decay experiment shows that this memory is dynamic and time dependent. The cue-conflict experiment further shows that stored experience and current odor input are not used independently but are weighted adaptively according to the current behavioral context. Bio-Nav translates these behavioral principles into a unified engineering framework for odor source localization in turbulent environments.
3.1. Probabilistic Estimation Layer
3.1.1. Search-Space Discretization
The odor source localization task is defined in a two-dimensional rectangular search domain
, which is discretized into a regular grid to facilitate probabilistic state estimation and action selection. Suppose the domain is divided into
rows and
columns, yielding a total of
grid cells. Each cell is treated as a candidate hidden source state. Let
denote the discrete state space. The geometric center of cell
is denoted by
.
To map between the one-dimensional state index
and the two-dimensional grid coordinates
, the following bijection is defined as
where
represents the remainder of
divided by
, and
denotes the largest integer not exceeding
. This discretization provides the spatial support for all subsequent maps in Bio-Nav, including the source-belief map, the plume-distribution map, and the memory map.
3.1.2. Advection–Diffusion Plume Model
In turbulent environments, odor does not form a smooth and continuous concentration gradient. Instead, it appears as a fragmented and intermittent collection of filaments transported by both mean flow and stochastic turbulent diffusion. The motion of an odor filament at time
can be modeled as
where
is the deterministic wind velocity and
is a zero-mean Gaussian perturbation representing turbulent fluctuation. The covariance of
reflects turbulence intensity.
If a filament is released from source located at
at time
, the position of this filament
at a subsequent time
is
Accordingly, the conditional probability density function
—which quantifies the likelihood that a plume detected by the robot at cell
at time
originated from a source at cell
released at time
—can be approximated as:
Here, and are the wind-induced displacement components derived from the integral of historical anemometer data. This physical model forms the basis for both the source-belief update and the plume prediction process.
3.1.3. POMDP-Based Source-Belief Update
Because the robot cannot directly observe the true source location, odor source localization is formulated as a Partially Observable Markov Decision Process (POMDP) [
28]. The hidden state is the true source cell, while the robot receives only incomplete and noisy observations from local odor sensing.
Let the
Observation Space () be
where
denotes plume detection at time
and
denotes no detection.
Observation Probability () defines the probability
. If the robot at time
fails to detect a plume (event
), it implies that no filaments released from the source (at
), from
to
have reached the current location. Defining
as the joint probability of “missed detection” given a source at
, the observation model is:
The
Source-Belief State vector represents the belief map and the core advantage of this method is the recursive Bayesian update of the belief state
.
can be defined as:
where
is the posterior probability that the source is in cell
.
This belief map answers the question: Where is the source most likely to be? It is therefore the exploitation-oriented component of the probabilistic estimation layer.
3.1.4. HMM-Based Plume Distribution Prediction
While the POMDP estimates “where the source is”, the robot also needs to know “where the plume is likely to be” to recover the trail if the signal is lost. We employ a Hidden Markov Model (HMM) to predict the dynamic propagation of the plume.
Let be the probability that cell contains a detectable filament at time . The propagation of filaments is modeled as a Markov process with a state transition matrix , where element represents the probability of a filament moving from to under current wind conditions.
The Plume-Distribution Map
is calculated by superimposing the distributions of filaments released at all historical timesteps
. Utilizing the forward algorithm of HMM, the map is updated recursively:
where
is the current source probability vector. This model leverages historical wind data to predict curved or segmented plume shapes, providing a crucial “re-acquisition” guide for the robot during the exploration phase.
The HMM-based plume-distribution map answers the question: Where is odor most likely to be encountered next? It is therefore the exploration-oriented component of the probabilistic estimation layer.
3.1.5. Functional Role of the Probabilistic Estimation Layer
Together, the source-belief map and the plume-distribution map provide a statistically grounded estimate of the external environment under uncertainty. However, because both maps remain strongly dependent on incoming observations, their directional stability weakens when plume detection becomes rare, intermittent, or temporarily absent. This limitation motivates the introduction of the second layer, memory stabilization.
3.2. Memory Stabilization Layer
3.2.1. Biological Motivation
The biological experiments showed that bumblebee navigation is influenced not only by current odor input but also by previously acquired odor-linked spatial experience [
11,
14]. This influence is not static: it strengthens with repetition, decays with time, and can be reactivated by familiar cues. Bio-Nav translates this behavioral principle into a memory stabilization layer consisting of short-term memory (STM) and long-term memory (LTM) [
21].
STM is designed to reduce inefficient local revisits, which corresponds to an engineering analog of inhibition of return. LTM is designed to preserve the directional significance of previously informative odor encounters, thereby maintaining global search continuity under sparse sensory conditions.
3.2.2. Short-Term Memory
Let denote the short-term memory intensity of cell at time . When the robot visits cell , its STM value is activated to a maximum level . Thereafter, it decays exponentially:
where
is the most recent visit time of cell
, and
is the STM decay constant.
Because STM represents recently explored locations, it is used as a repulsive term in decision-making. High STM values indicate that a region was recently sampled and should therefore be temporarily deprioritized. Functionally, STM suppresses oscillatory trajectories and improves local search efficiency.
3.2.3. Long-Term Directional Reference Memory
Let denote the long-term memory intensity of cell . LTM is updated when a strong odor event is detected. Unlike STM, which is strictly local and short-lived, LTM spreads the reinforcement toward the upwind region, reflecting the directional bias observed in the behavioral experiments.
If the robot detects a high-value odor event at cell
, the long-term memory update is defined as
where
is the reinforcement strength,
is the upwind directional offset, and
controls the spatial spread of the reinforced region.
This formulation ensures that previously informative odor encounters continue to exert directional influence even when current observations are weak or absent. Functionally, LTM acts as a global directional anchor.
3.2.4. Memory Decay and Reactivation
Because biological memory is dynamic, both STM and LTM must evolve with time. STM decays rapidly by definition. LTM also decays, but at a slower rate:
where
is the long-term memory retention coefficient.
To model reminder-cue effects observed in the reactivation experiment, LTM can be refreshed when the current odor observation matches previously reinforced odor conditions:
where
is the reactivation gain and
denotes the current odor-triggered reactivation signal.
3.2.5. Composite Memory Map
The composite memory map
combines the repulsive STM component and the attractive LTM component:
where
and
are non-negative coefficients controlling the relative influence of the two components.
After normalization, provides a stabilization field that indicates both where the robot should continue searching and where it should avoid revisiting. In this way, the memory stabilization layer complements the probabilistic estimation layer by preserving behavioral continuity when online sensory evidence becomes unreliable.
3.3. Adaptive Fusion and Policy Generation
Neither probabilistic estimation nor spatial memory is universally sufficient for navigation in turbulent environments. When odor observations are reliable and frequent, the robot should rely more strongly on the probabilistic source belief. When plume contact has been lost, however, the system should depend more heavily on plume prediction and stored directional memory. Therefore, Bio-Nav does not use fixed-weight integration. Instead, it performs adaptive fusion according to the current search context.
This design is consistent with the cue-conflict decision experiment, which showed that bumblebees do not follow either online odor evidence or stored spatial memory exclusively. Rather, they display context-dependent behavioral arbitration between the two [
22].
3.3.1. Fuzzy Inference Controller
To implement this arbitration mechanism, Bio-Nav uses a fuzzy inference system (FIS) [
29]. The controller takes three inputs and generates one output variable:
- •
Input 1: Sensed Plume Concentration (). Reflects the richness of information at the current location. Fuzzy sets: Low (L), Medium (M), High (H).
- •
Input 2: Time Since Last Detection (). Reflects the urgency of the search state. Fuzzy sets: Short (Sh), Average (Av), Long (Lo).
- •
Input 3: Local Memory Strength (). Reflects the historical value of the current vicinity. Fuzzy sets: Weak (W), Medium (M), Strong (S).
- •
Output: Fusion Weight (). A coefficient regulating the relative contribution of online probabilistic evidence and memory-based stabilization. Fuzzy sets: Very Small (VS), Small (S), Middle (MI), Large (L), Very Large (VL).
A rule base of 27 fuzzy rules is constructed based on biological heuristics. Representative rules include:
Rule A: (Surging Behavior): IF Concentration is High ( is H) AND Time Since Detection is Short ( is Sh), THEN the robot is likely in the plume center. The confidence in the Source Map should be maximized. is Very Large (VL).
Rule B: (Revisiting Behavior): IF Concentration is Low ( is L) BUT Memory is Strong ( is S), THEN the robot has lost the plume but is in a high-value zone. It should maintain a moderate focus on the source to prevent excessive drifting. is Middle (MI).
Rule C: (Wide-Area Exploration): IF Time Since Detection is Long ( is Lo) AND Memory is Weak ( is W), THEN the robot is lost. It must rely on the Plume-Distribution Map (HMM) or explore new areas. is Small (S).
3.3.2. Dynamic Reward Construction
The adaptive fusion coefficient is then used to construct the immediate reward map [
30]:
where
is the POMDP source belief,
is the HMM plume prediction,
is the composite memory map, and
is the memory-scaling factor.
This reward map answers the central decision question of Bio-Nav: How should online evidence and stored experience jointly determine the attractiveness of each candidate action region?
3.3.3. Value-Iteration-Based Action Selection
Once the dynamic reward map is obtained, action selection is performed using value iteration. Let
denote the value function of cell
. Based on the Bellman optimality equation [
31], the value function of a cell
can be calculated by
where
is the discount factor,
is the set of 8-neighbor movement actions.
Importantly, value iteration is not treated here as an independent module parallel to probabilistic estimation or memory. Rather, it is the policy solver operating on the reward landscape generated by adaptive fusion. In this sense, adaptive fusion and value iteration together constitute the final decision layer of Bio-Nav.
The Bellman equation is repeatedly applied to update the value function V over the entire map until the value difference between two adjacent iterations becomes smaller than the predefined threshold ϵ, indicating convergence.
Policy generation: the action that maximizes is selected as the current optimal policy . The robot then executes this action, moves to the next position, and repeats the above process. At each time step, this algorithm generates a vector field pointing toward the globally optimal region. The robot only needs to execute the optimal action corresponding to its current grid cell, thereby achieving autonomous navigation throughout the whole process from plume discovery, to plume tracking, and finally, to source confirmation. In this way, the globally optimal path generated by the robot not only points toward the currently most probable source location but also accounts for plume re-acquisition and traversal of high-value memory regions.
3.4. Interpretation of the Three-Layer Architecture
In summary, Bio-Nav can be understood as a layered architecture in which each layer solves a different problem in odor-guided navigation.
The probabilistic estimation layer determines what the external environment currently suggests under uncertainty. It uses the POMDP source-belief map and the HMM plume-distribution map to estimate both source likelihood and plume reacquisition opportunity.
The memory stabilization layer preserves navigation continuity when online evidence becomes sparse, intermittent, or contradictory. By combining short-term repulsion from recently explored regions and long-term attraction toward historically valuable upwind areas, it prevents the robot from either oscillating locally or drifting aimlessly under signal loss.
The adaptive fusion layer determines how online evidence and stored experience should be weighted at each time step. By generating a context-dependent reward map and solving the corresponding action policy through value iteration, it allows Bio-Nav to switch smoothly between exploitation, plume reacquisition, and memory-guided persistence.
To clarify the complete information flow of Bio-Nav,
Figure 5 explicitly labels the sensor inputs, intermediate maps, and final action output. At each time step, the odor sensor provides local concentration and plume-encounter information, while the wind-speed sensor provides the local airflow vector for plume-transport prediction. These two sensory streams are not used independently; instead, they are jointly transformed into source-belief and plume-distribution maps by the probabilistic estimation layer. The robot-position history is then used to update STM and LTM, and the fuzzy inference system adaptively fuses online sensory evidence with memory-based stabilization. Finally, value iteration converts the fused reward landscape into an executable movement command for the robot controller.
The algorithm receives three categories of input information at each time step: odor-sensor input, wind-speed sensor input, and robot-state/history input. The odor sensor provides the local odor concentration and the binary plume-encounter observation , which are used for source-belief updating, plume-contact evaluation, and fuzzy inference. The wind-speed sensor or anemometer provides the local wind velocity vector , which is used by the advection–diffusion plume model and the HMM-based plume-distribution prediction module to estimate the likely propagation direction of odor packets. The robot-state/history input provides the current robot position , the recent trajectory, visited-cell records, and the time since last plume detection . These inputs are first processed by the probabilistic estimation layer, where the POMDP module updates the source-belief map , and the HMM module predicts the plume-distribution map . In parallel, the memory stabilization layer updates the short-term memory (STM) map, long-term memory (LTM) map, and composite memory map . The fuzzy inference system then integrates sensed plume concentration, time since last detection, and local memory strength to generate the adaptive fusion coefficient . Based on , , , and , the dynamic reward map is constructed and solved by value iteration to obtain the value function and optimal policy . The final output of the algorithm is the optimal movement action , which is sent to the robot controller as the navigation command. After the robot executes the action and moves to the next grid cell, new odor, wind, and position measurements are collected, forming a closed-loop perception–memory–decision cycle.
3.5. Biological-to-Algorithmic Parameterization and Calibration Strategy
The current implementation of Bio-Nav uses biologically inspired but engineering-oriented parameters. Specifically, the STM decay constant, the LTM retention coefficient, the Gaussian spread of LTM reinforcement, and the fuzzy membership functions, were initialized according to the qualitative behavioral tendencies observed in the experiments: recently visited regions should be temporarily suppressed, previously rewarding upwind regions should retain longer-lasting attraction, and the relative dominance of current sensory evidence should increase when the plume signal is reliable. This design provides a functional translation from biological behavior to robotic navigation, but it does not yet constitute a direct quantitative fit between every biological metric and every algorithmic hyperparameter.
To make this limitation explicit, we propose a data-driven calibration strategy for future work. First, the PER retention curves in
Figure 1c can be fitted by exponential or multi-exponential decay models to estimate a biological memory-retention time constant. This time constant can then be used to initialize the LTM decay coefficient. Second, the directional displacement statistics in
Figure 2c and the initial-yaw-angle data in
Figure 3b can be used to estimate the spatial spread and directional gain of LTM reinforcement. Third, the correction latency and cue-conflict index can be used to optimize the fuzzy membership functions and rule weights by minimizing the discrepancy between simulated decisions and biological trajectories. In practice, this calibration can be formulated as a constrained optimization problem or Bayesian parameter-estimation problem, with the objective function combining PER-retention error, directional-bias error, and path-correction error.
Therefore, the present Bio-Nav framework should be interpreted as a biologically grounded engineering abstraction rather than a fully fitted neuroethological model. The advantage of this abstraction is that it preserves the functional principles revealed by bumblebee behavior while remaining computationally simple and transferable to robotic odor source localization.
4. Simulation Results and Comparison
Based on the bio-inspired navigation model that integrates olfactory perception and spatial memory, a series of experiments was carried out in a two-dimensional turbulent simulation environment. The purpose was to verify the rationality of introducing a bumblebee-inspired spatial memory mechanism and to compare the resulting algorithm (hereafter referred to as Bio-Nav) with mainstream olfactory-navigation methods in terms of search efficiency, success rate, and robustness under complex wind fields.
4.1. Simulation Environment and Parameter Settings
To evaluate algorithmic performance, a high-fidelity two-dimensional olfactory-navigation simulation platform was developed in MATLAB_R2022b. The simulation environment was discretized into a rectangular grid (), and each cell had physical dimensions of and .
Wind field and plume model: the airflow field was modeled as the superposition of a mean wind velocity and turbulent perturbations. The mean wind vector was set to , and the initial turbulence intensity was set to . The odor source was fixed at coordinates and continuously released odor packets to generate an intermittent turbulent plume.
Sensor model: the robot’s odor-detection probability decayed exponentially with distance, using a proportionality coefficient of 1/15, and zero-mean Gaussian white noise was added to the measured concentration to simulate realistic sensor interference.
Spatial memory module: the initial long-term-memory strength was set to 0.4, and the directional vector was initialized as to mimic a biological prior preference for a specific wind direction. The fusion weight between real-time memory and the initial memory was set to 0.6. The short-term-memory decay followed an exponential law.
Planning-based policy optimization via value iteration: the discount factor for value iteration was set to to encourage long-horizon planning, the convergence threshold was set to , and the maximum number of iterations was 50.
4.2. Verification of Bio-Inspired Behavioral Consistency
This section examines whether introducing spatial memory causes the robot’s search trajectories to exhibit biological characteristics consistent with real bumblebee foraging. Three simulation conditions were designed to reproduce Biological Experiment 2 in
Section 2: a no-memory baseline, an STM-only model, and a combined-memory model (STM + LTM). For each case, the algorithm generated the search route, airflow field, source probability map, plume prediction map, and spatial memory map.
Figure 6 was designed to examine whether the proposed memory modules generate behavioral effects consistent with the biological experiments. The no-memory baseline tests the limitation of relying only on probabilistic estimation, the STM-only condition evaluates whether short-term memory can suppress local revisits, and the complete STM + LTM condition evaluates whether long-term directional memory can stabilize the search direction after intermittent plume loss. This comparison provides a behavioral-level validation before the quantitative performance comparisons in later sections.
No-memory baseline (pure POMDP + HMM): when the robot failed to detect the plume for a prolonged period (
), the lack of new observations prevented effective belief-state updating. As a result, the robot tended to execute ineffective circular wandering within a few grid cells around the last detected odor location, becoming trapped in a local search deadlock (
Figure 6a).
Short-term memory only (STM introduced): when the robot repeatedly revisited a grid cell, the STM intensity
at that location was instantly activated to its peak value, creating a persistent repulsive effect. The simulated trajectories clearly showed that after only one or two local oscillations, the robot was pushed out of the previously visited region by the STM repulsion, thereby reproducing the biologically typical inhibition-of-return mechanism and improving the exploration efficiency of unknown areas (
Figure 6b).
Proposed Bio-Nav (STM + LTM): when the robot happened to pass through the high-concentration core of the plume, long-term directional memory was rapidly activated and diffused upwind in the form of a Gaussian kernel. This generated a strong attractive field within the global reward function. The robot did not wander randomly on top of the STM mechanism alone; instead, it exhibited a trapline-like oriented search pattern similar to that observed in bumblebees, exploring upwind along the memory direction and ultimately forming an efficient and nearly straight search route (
Figure 6c).
4.3. Comparative Experiments and Performance Evaluation
To quantitatively evaluate the superiority of Bio-Nav, 100 independent Monte Carlo simulations were performed under identical initial conditions (the same wind field, the same airflow environment, the same initial position of the robot, and the same location of the odor source), with the robot starting at
, i.e., downwind of the source. The proposed method was compared against three classical odor source localization algorithms: a moth-inspired strategy based on the surge-casting logic [
5,
32], Infotaxis based on information-entropy maximization [
33], and a standard POMDP-based probabilistic grid algorithm without spatial memory or fuzzy inference.
The results reveal a marked improvement in search efficiency and statistical robustness. As shown in
Table 1, the proposed Bio-Nav achieved the highest success rate among all algorithms, with 96 successful trials out of 100 Monte Carlo simulations. In contrast, the success rates of the moth-inspired [
34], Infotaxis, and standard POMDP methods were 66.0%, 58.0%, and 81.0%, respectively. Statistical comparison using Fisher’s exact test or the chi-square test indicated that the success rate of Bio-Nav was significantly higher than those of the moth-inspired and Infotaxis methods (
p < 0.001) and higher than that of the standard POMDP baseline (
p < 0.01).
In terms of search efficiency, Bio-Nav required only 20.3 ± 6.1 steps and generated an average path length of 155.1 ± 37.8 cm, both of which were significantly lower than those of the three baseline algorithms. One-way ANOVA followed by Tukey’s post hoc test showed significant differences in search steps, path length, and distance ratio among the compared algorithms (p < 0.001). Compared with the standard POMDP baseline, Bio-Nav reduced the average number of search steps by approximately 43.0% and shortened the average path length by approximately 24.6%. The distance ratio of Bio-Nav was also the closest to 1, indicating that the proposed method produced smoother and more direct trajectories.
These improvements can be attributed to the coordinated effect of the spatial memory mechanism and the fuzzy adaptive fusion strategy. When the concentration was low and memory was weak, the fuzzy inference system encouraged wider exploration. Once the plume or a high-value memory region was detected, the controller increased the contribution of the source-belief and long-term memory, thereby reducing redundant lateral movement and improving trajectory convergence (
Figure 7a).
To make the trajectory differences more interpretable,
Figure 7 visualizes not only the final search paths but also the decision context of each algorithm. The moth-inspired method mainly relies on reactive plume encounter and upwind casting (
Figure 7(bi)); Infotaxis emphasizes uncertainty reduction (
Figure 7(bii)); and the standard POMDP method depends on recursive source-belief updating (
Figure 7(biii)). In contrast, Bio-Nav uses the same probabilistic source-estimation backbone but further introduces spatial memory and fuzzy adaptive fusion (
Figure 7(biv)). Therefore, when the plume signal becomes intermittent, Bio-Nav can continue to exploit memory-guided directionality rather than degenerating into local wandering or repeated uncertainty-driven exploration.
4.4. Environmental Adaptability and Robustness Analysis
In practical applications, wind fields and sensor conditions are dynamically variable. This section therefore evaluates the robustness of the algorithm under extreme boundary conditions.
4.4.1. Influence of Strong Turbulence and Dynamic Wind Fields
The turbulence intensity of the flow field
was increased from 0 to 0.6 to simulate gusty and highly vortical environments, and the trajectory under each condition was recorded (
Figure 8a). As turbulence increased, plume continuity was increasingly disrupted, and the odor-concentration distribution became more fragmented (
Figure 8b).
Although the average number of localization steps increased under strong turbulence, the success rate of Bio-Nav remained above 91% (
Figure 8c). This result indicates that the decay mechanism embedded in the spatial memory matrix acts as an effective low-pass filter, smoothing decision oscillations caused by abrupt environmental variation.
Figure 8 provides a detailed visualization of how the internal decision variables of Bio-Nav respond to increasing turbulence. In low-turbulence conditions, the airflow field and plume distribution provide relatively consistent directional information, allowing the robot to approach the source directly. As turbulence intensity increases, plume encounters become intermittent and the fuzzy inference variables fluctuate more strongly. Under these conditions, the spatial memory map becomes particularly important because it preserves high-value regions associated with previous odor encounters and suppresses unstable decision switching caused by transient plume fragmentation.
4.4.2. Adaptability to Sidewind and Upwind Initial Positions
The initial deployment position of the robot was changed to test fully downwind, sidewind, and upwind blind-zone conditions (
Figure 9). Under sidewind initialization, the robot started outside the plume envelope and therefore had a very low observation probability. After repeated non-detections, the fuzzy controller reduced reliance on the current source map, allowing the memory-guided strategy to re-enter the main wind corridor and intercept the plume.
These results demonstrate that the proposed method is relatively insensitive to the initial deployment position and is therefore suitable for randomly deployed search-and-rescue robots operating in open environments.
The detailed panels in
Figure 9 further show that the adaptation of Bio-Nav is not only reflected in the final trajectory but also in the internal decision process. Under downwind initialization, the robot can rapidly exploit plume information. Under crosswind initialization, the robot first needs to explore laterally before plume interception. Under upwind or blind-zone initialization, the plume signal is initially weak or absent, and the navigation process depends more strongly on airflow cues and memory-guided reward shaping. The consistent convergence across these three conditions indicates that the proposed memory–perception coupling strategy can support robust search even when the initial deployment position is not favorable.
4.5. Module Ablation and Comparison with Deep-Learning-Based Baselines
To further respond to the reviewer’s request and to verify the contribution of the main components of Bio-Nav, we added module-level ablation experiments and additional learning-based baseline comparisons. All tests were conducted under the same simulation protocol as
Section 4.3, including the same 20 × 20 grid environment, wind-field distribution, source position, initial robot position, 8-neighbor action set, termination criterion, and 100 Monte Carlo trials. Continuous variables are reported as mean ± standard deviation (s.d.), and localization performance is reported as the success rate over 100 independent trials.
For statistical analysis, success rates were compared using Fisher’s exact test or the chi-square test, while continuous variables, including search steps, path length, and distance ratio, were compared using one-way ANOVA followed by Tukey’s post hoc test. A value of p < 0.05 was considered statistically significant. This reporting format was adopted to provide both error ranges and statistical evidence, as requested by the reviewer.
4.5.1. Module Ablation Experiments
The ablation experiments were designed to determine whether the improvement of Bio-Nav originated from a single module or from the coordinated contribution of memory stabilization, adaptive fusion, and policy planning. Four variants were constructed: Bio-Nav without long-term memory (without LTM), Bio-Nav without short-term memory (without STM), Bio-Nav without the fuzzy inference system (without FIS), and Bio-Nav without value-iteration-based planning (without Planning). In all variants, the POMDP-HMM probabilistic estimation layer was retained, so that the influence of each removed component could be isolated while keeping the basic odor-source belief update and plume prediction mechanisms unchanged.
As shown in
Table 2, removing any major module degraded the overall localization performance. When LTM was removed, the algorithm still responded to local plume observations, but its ability to preserve directional continuity after plume loss was weakened. This led to a lower success rate and a longer search path, indicating that LTM acts as a global directional anchor under sparse and intermittent odor observations. When STM was removed, the success rate remained relatively high, but search steps and path length increased markedly. This result indicates that STM mainly suppresses repeated visits and local oscillations rather than directly determining the final localization probability. Removing FIS produced a more pronounced performance decline because the system lost its context-dependent ability to switch between online sensory evidence and stored spatial experience. In this case, the robot was more easily disturbed by local false concentration peaks or by outdated memory cues. Finally, removing the value-iteration planning module also reduced path efficiency, indicating that policy optimization on the fused reward landscape contributes to generating smoother and more goal-directed trajectories.
These ablation results indicate that the advantages of Bio-Nav cannot be attributed to a single heuristic term. LTM mainly improves robustness after plume loss, STM reduces inefficient local revisits, FIS enables adaptive arbitration between current perception and memory, and value iteration converts the fused reward map into a globally smoother action policy. The complete Bio-Nav architecture therefore achieves better performance through the joint coupling of probabilistic estimation, biologically inspired spatial memory, adaptive fusion, and planning-based policy generation.
4.5.2. Comparison with Deep-Learning-Based Baselines
To further evaluate whether the proposed biologically inspired framework remains competitive against deep-learning-based navigation policies, we added representative deep-learning-based baselines, including Deep Q-Network (DQN), Proximal Policy Optimization (PPO), and a Transformer-based sequence policy. The DQN and PPO baselines used the robot position, local odor concentration, binary plume-detection signal, wind vector, time since last detection, and recent heading information as state inputs. The Transformer policy used the same variables over a short temporal window to model recent plume-encounter history and to output the next movement action. For fairness, all learning-based baselines used the same grid resolution, action space, sensor input format, source position, wind-field distribution, maximum search horizon, and 100-trial evaluation protocol as Bio-Nav.
The deep-learning-based comparison was intended as a representative benchmark rather than an exhaustive hyperparameter search. DQN and PPO were trained for 5000 episodes under the simulated plume environment, while the Transformer policy was trained from 3000 successful simulated trajectories. After training, all policies were tested in the same Monte Carlo evaluation environment without changing the source position or wind-field statistics.
As shown in
Table 3, all deep-learning-based baselines were able to locate the odor source to some extent after training, but their performance remained inferior to that of Bio-Nav under intermittent turbulent plume conditions. DQN showed the lowest success rate among the learning-based methods, mainly because sparse plume encounters and delayed terminal rewards made value-function learning unstable. PPO improved trajectory stability and success rate compared with DQN, but it still generated redundant exploratory movements after plume loss. The Transformer-based sequence policy further improved performance by using recent observation history; however, it still lacked an explicit probabilistic plume model and a structured spatial memory mechanism.
In contrast, Bio-Nav achieved the highest success rate and the shortest search path without offline training. This result indicates that the explicit coupling of POMDP-HMM probabilistic estimation, STM-based revisit suppression, LTM-based directional persistence, fuzzy adaptive fusion, and value-iteration-based policy generation provides a more sample-efficient and interpretable strategy for odor source localization in sparse turbulent environments. Therefore, the proposed framework is not only superior to classical odor-source localization baselines but also remains competitive against representative data-driven policies when evaluated under the same sensing and action constraints.
5. Discussion
To address the challenging problem of odor source localization for mobile robots in complex turbulent environments, this study proposed and validated a bio-inspired navigation model, termed Bio-Nav, by drawing inspiration from bumblebee foraging behavior. The proposed framework links biological evidence, probabilistic estimation, spatial memory, and reward-based planning within a unified decision architecture.
First, the biological experiments showed that odor learning in bumblebees is closely coupled to spatially organized behavior: repeated odor–reward training strengthened both PER retention and odor-triggered directional preference. These findings support the view that olfactory information can be transformed into persistent spatial guidance rather than acting only as a transient trigger.
Second, the proposed algorithm integrates a POMDP-HMM perception backbone with short-term working memory, long-term directional reference memory, fuzzy inference, and value iteration. This design preserves the uncertainty-handling strengths of probabilistic navigation while adding inhibition of return and directional continuity, two properties that are especially valuable under intermittent plume conditions.
Finally, high-fidelity simulations demonstrated consistent gains in efficiency, success rate, and robustness relative to moth-inspired search, Infotaxis, and a standard POMDP baseline. Even under strong turbulence and unfavorable initial conditions, Bio-Nav maintained stable search behavior and high localization success. Future work should extend the framework to three-dimensional flow fields and real robotic platforms.
Overall, unlike existing approaches that rely solely on instantaneous information, Bio-Nav introduces a persistent internal state that reshapes the decision landscape. And the proposed model provides a compact and biologically grounded strategy for odor-guided navigation in uncertain turbulent environments.
6. Limitations and Engineering Migration Feasibility
Several limitations should be acknowledged. First, the present Bio-Nav model is inspired by bumblebee behavioral evidence, but the algorithmic parameters are not yet fully fitted to biological data. The STM decay constant, LTM retention coefficient, LTM Gaussian spread, fuzzy membership functions, and fuzzy rules were selected to reproduce the qualitative behavioral principles observed in the experiments. Future work should calibrate these parameters quantitatively using PER-retention curves, directional-displacement statistics, initial-yaw-angle measurements, correction latency, and cue-conflict index.
Second, the current simulation is two-dimensional. This simplification is useful for isolating the effect of memory–perception coupling and for comparing algorithms under controlled turbulence, but real odor plumes are inherently three-dimensional. Vertical dispersion, boundary layer effects, plume meandering, robot body disturbance, and sensor response delay can all influence odor encounter statistics. Therefore, the present 2D results should be interpreted as a controlled proof of principle rather than as a complete representation of real turbulent odor transport.
Third, migration to engineering platforms is feasible but requires several adaptations. The 20 × 20 grid can be replaced by a local occupancy–probability map or a continuous-state particle filter. The binary plume observation can be extended to concentration intensity and sensor confidence, while the wind input can be obtained from a compact anemometer array or estimated from onboard airflow sensors. For ground robots, Bio-Nav can be implemented as a local planner that outputs velocity commands; for aerial robots, the state space should be extended to 3D and the action space should include altitude control and safety constraints. The computational components of Bio-Nav—Bayesian belief updating, HMM plume prediction, fuzzy fusion, and value iteration—are lightweight enough for real-time execution on embedded processors when the map size is moderate. A practical migration route is therefore: 2D simulation validation, 3D CFD or wind-tunnel validation, ground-robot plume tracking, and finally aerial-robot testing in outdoor turbulent environments.
7. Conclusions
This study proposed Bio-Nav, a bumblebee-inspired spatial memory navigation framework for robotic odor source localization in turbulent environments. Biological experiments showed that repeated odor–reward training strengthens PER retention, that familiar odors can trigger directional spatial preference, that memory traces decay with time, and that bumblebees dynamically arbitrate between stored memory and current sensory evidence under cue conflict. These findings motivated a three-layer engineering architecture consisting of probabilistic estimation, memory stabilization, and adaptive fusion.
Simulation results demonstrated that coupling POMDP-HMM-based probabilistic estimation with STM, LTM, fuzzy inference, and value iteration improves search success, trajectory efficiency, and robustness under intermittent plume conditions. Compared with moth-inspired search, Infotaxis, and standard POMDP navigation, Bio-Nav achieved higher success rate and shorter search paths in 100 Monte Carlo trials. The added ablation and baseline-comparison framework further clarifies how each module contributes to the final performance.
Overall, the study suggests that memory–perception coupling is a useful design principle for robotic search under uncertainty. Future work will focus on quantitative calibration of memory parameters from biological data, extension to three-dimensional plume transport, and validation on real robotic platforms.