#### 3.1. Network Model

In this paper, we employ an elementary model for describing WSNs as undirected and weighted graphs. As such, let $G=(V,E,W)$ be a simple weighted graph, in which V is the set of nodes denoted by positive integer numbers from one to $\left|V\right|$, representing the sensor nodes and the sink. The set E is composed of links, here expressed by ${e}_{i,j}$, where $i,j\in V$. For each ${e}_{i,j}\in E$, there is a ${w}_{i,j}\in W$ attached, which denotes the energy-related function.

Under this network model, a source node periodically senses and collects the data from the surroundings, then sends the data to the next hop until the data reach the sink node. The goal is to find an energy-efficient route such that the energy consumption is balanced and the lifetime of the network prolonged. Moreover, we assume that:

All nodes are isomorphic;

Links are symmetric. If the target’s transmitted power is known, nodes can calculate the approximate distance of senders according to the Received Signal Strength Indication (RSSI);

Depending on the recipient’s distance, the node can adjust its transmit power.

We use a first radio model as described in [

3].

Based on these assumptions, the transmitter power level can be adjusted to use the minimum energy required to reach the intended next hop receiver, then the energy consumption rate per unit information transmission depends on the choice of the next hope node, i.e., the routing decision.

#### 3.3. Initial Route Establishment

In ACO-based routing algorithms, the pheromone is initialised to a small uniform value for the start of the route discovery. The probability distributions of route selection will not be updated until the first artificial ant arrives at the sink and returns back. Hence, the distributions do not guide the artificial ants, which end up constructing (and reinforcing) paths of very bad quality. As discussed in

Section 2, AFSA demonstrates exceptionable global search abilities in the early stage of route discovery and can obtain high quality satisfactory routes quickly. Furthermore, AFSA is not sensitive to the initial parameter values. Therefore, we use the positive qualities of AFSA to apply the preferred routes obtained by AFSA as the initial pheromone value of the following ACO routing algorithm.

Initialise the state vector of the artificial fish swarm attached to each sensor node as

$X=({x}_{1},{x}_{2},\dots ,{x}_{n})$,

$n\in N$;

N represents the number of sensor nodes;

${x}_{i}(i=1,2,\dots ,n)$ denotes the residual energy of current node. The sensory distance and the crowd factor are represented by

$visual$ and

$\delta (0<\delta <1)$. In our network model,

$visual$ represents the communication radius of sensor node, which means the nodes in

$visual$ of node

i are the neighbour nodes of node

i.

${N}_{i}$ denotes the set of nodes within the

$visual$ of node

i. Crowd factor

$\delta $ represents the congestion degree around a certain sensor node, which is used to avoid over crowding or collision with neighbouring regions. We also define a fitness function

$Y=f\left({x}_{i}\right)$, which stands for the average energy within the communication radius of sensor node

i.

All the AF search for the optimal route toward the sink node using three distinct properties, namely preying, swarming and following.

#### 3.3.1. Preying Behaviour

Preying is the basic biological behaviour adopted by AF looking for food. Generally, an AF perceives the region with more residual energy by vision or other sense and moves quickly towards this sensed region. Suppose the current state of an AF is

${x}_{i}$, i.e., residual energy of the current sensor node; it selects the next node randomly within its

$visual$ distance (sensory distance), such that:

where

${x}_{j}$ is the current state and

${x}_{i}$ is the previous state. If the fitness function

$f\left({x}_{j}\right)>f\left({x}_{i}\right)$, i.e., there is higher energy density around node

j, the AF in node

i goes forward to node

j; if not, select a node

${x}_{j}$ randomly again, and judge whether it satisfies the forward requirement or not. If the forward requirement cannot be satisfied after

try_number, the AF would move a step randomly, and this can help the AF flee from the local extreme field.

#### 3.3.2. Swarming Behaviour

Let the current state of AF be

${x}_{i}$ and

$\left|{N}_{i}\right|$ be the number of its neighbour nodes within the

$visual$ distance. If

$\left|{N}_{i}\right|\ne 0$, the maximum fitness value in neighbour nodes is defined:

where

${N}_{i}$ represents the set of neighbour nodes of node

i. If

$\frac{{Y}_{max}}{\left|{N}_{i}\right|}>\delta \times {Y}_{i}$, that means the node with maximum fitness is not very crowded. If

${Y}_{max}>{Y}_{i}$, the AF moves towards the node

j, otherwise, the AF executes the preying behaviour according to Equation (

2). Here, the crowd factor

$\delta $ limits the scale of the artificial fish swarm, causing more AF to cluster at the area with more average residual energy, which ensures that AF move to an optimum in a wide field.

#### 3.3.3. Following Behaviour

Following behaviour accelerates AF moving to better states and at the same time accelerates AF moving to the global extreme value from the local extreme values. When an AF finds food, neighbouring AF will trail behind and reach the food.

Suppose the ${x}_{s}$ represents the sink node within ${x}_{i}$ $visual$ distance and ${Y}_{s}=f\left({x}_{s}\right)$ is the fitness value of the sink node. If $\frac{{Y}_{s}}{\left|{N}_{i}\right|}>\delta \times {Y}_{i}$ and ${Y}_{s}>{Y}_{i}$, then the AF swims toward the sink node, otherwise, the preying behaviour is executed.

After several iterations, AFSA will find satisfactory feasible routes to the sink node. Furthermore, the AF leave positive feedback information in each link, which will be used in the ACO-based route discovery process described in

Section 3.4.

#### 3.4. Hybrid Route Discovery

In our proposed hybrid routing algorithm based on AFSA and ACO, we use the heuristic information from AFSA as an initial pheromone value in the early stage in the ACO route discovery process, which is expected to avoid chaos and falling into a local optimum.

We introduce a novel probabilistic route discovery scheme, which takes its inspiration from a state transition rule in the Ant Colony System (ACS) [

6]. A sensor node

i releases a forward ant toward the sink node at some interval and will select the next node

j to move to by applying the rule given by Equation (

4):

where

q is a random number uniformly distributed in

$[0,1]$,

${q}_{0}$ is a control parameter

$(0\le {q}_{0}\le 1)$ of route exploitation and exploration and

${p}_{ij}$ is a random variable selected according to the probability distribution given in Equation (

5):

where

${\tau}_{ij}$ and

${\eta}_{ij}$ refer to the global pheromone trail and the local heuristic desirability of link

$(i,j)$, respectively.

$\alpha $ and

$\beta $ are two parameters that control the relative importance of pheromone trail and heuristic value.

${N}_{i}$ is the set of neighbours of node

i.

M is the tabu list, which stores the visited node and carried by the forward ant.

According to this method, a sensor node will select the optimal neighbour node (exploitation) or a stochastic one (exploration). Here, in Equation (

4), exploitation implies that the forward ant has the ability to exploit prior and accumulated knowledge, while exploration means that the artificial ant pays more attention to exploring new paths toward the sink node. This method will reinforce the positive effect of an artificial ant learning process and then increase the convergence speed for the route discovery in order to achieve a better path in the early route discovery phase.

However, at the later stage of the algorithm, the concentration of pheromone in a certain path continuously increases. The ACO routing protocol has a higher probability of being trapped in the optimal layout according to Equations (

4) and (

5) [

39]. Here, we introduce the crowd factor

${\delta}_{ij}$ in AFSA to limit the scale of pheromone increment.

where

${\delta}_{ij}$ and

${\tau}_{ij}$ refer to the pheromone value and crowd value of link

$(i,j)$, respectively. At the same time, we use

$\delta \left(t\right)$ to indicate the crowd threshold at time

t. If

${\delta}_{ij}<\delta \left(t\right)$, which means the current link is less crowded, the artificial ant will follow that link toward the next sensor node, otherwise, the artificial ant has to re-select the next hop according to Equation (

5). The crowd threshold value is a function of time

t.

where

$\lambda $ is constant and

t is the route discovery iteration times. In this way, the crowd threshold value will change with the route discovery times. The larger

t is, the larger

$\delta \left(t\right)$ will be.

In the initial phase of the algorithm, the crowd factor ${\delta}_{ij}$ and the crowd threshold $\delta \left(t\right)$ are both small; thus, the crowd factor does not restrict the route selection process. As the iterative process continues, the concentration of pheromone in some links increases significantly, and the crowd threshold also gradually increases. In this case, the crowd factor will limit the artificial ant influx into the high pheromone link. The introduction of a crowd factor controls the number of artificial ants in the overcrowded path. This hybrid algorithm can effectively prevent the artificial ants from prematurely aggregating on a path with high pheromone to cause the premature routing.

At the end of route discovery, the impact of the crowd factor becomes insignificant according to Equations (

6) and (

7). The hybrid route discovery algorithm based on AFSA and ACO degenerates into the ACO routing algorithm whose route selection depends on Equations (

4) and (

5).

In general, in the initial stage, this hybrid routing algorithm avoids overcrowding of the sub-optimal route, while in the later stage, it can ensure that the routing algorithm can quickly converge to the optimal route.

#### 3.5. Global Pheromone Update Strategy

The global pheromone update procedure will be applied when the FANT arrives at the sink node. The global pheromone includes information about a route obtained by long-term learning from a FANT. After arriving at the sink node, the FANT will be converted into a BANT. The BANT inherits all route statistic information from the FANT, including path length, energy levels and the visited node list along that path. Meanwhile, the sink node calculates the amount of pheromone value attached to this path according to the route statistics. We calculate

$\Delta \tau $ in the following manner:

${E}_{min},{E}_{avg}$ and

${E}_{init}$ represent the minimum energy, average energy and initial energy in the current discovered path respectively, and

${F}_{ant}$ denotes the length of the path. The BANT carries

$\Delta \tau $ at the start of its journey following the reverse path. Each node in the path updates the global pheromone trail (i.e., routing table) according to Equation (

9).

where

$0<\rho <1$ is the pheromone evaporation coefficient and

$(1-\rho )$ is the pheromone residue factor, and

$\Delta {\tau}_{ij}$ is given by:

where

$0<\xi <1$ is a control coefficient, which normalises

$\Delta {\tau}_{ij}$ to the interval

$(0,1)$.

${E}_{j}$ is the residual energy of node

j, from which a BANT has come.

${B}_{ant}$ denotes the path length from the sink node to the current node. The pheromone value is the function of both energy levels and the length of the path. As a result:

The shorter path (less hops) will get a larger pheromone increment.

When the minimum energy value in this path is larger, the pheromone increment is also larger, which avoids more data traffic routed through this path.

The average energy of a relatively high path will get more attention, and therefore will attract more data flow.

The closer the node is, the more pheromones are obtained.