The Role of Genetic Algorithm Selection Operators in Extending WSN Stability Period: A Comparative Study

: A genetic algorithm (GA) contains a number of genetic operators that can be tweaked to improve the performance of speciﬁc implementations. Parent selection, crossover, and mutation are examples of these operators. One of the most important operations in GA is selection. The performance of GA in addressing the single-objective wireless sensor network stability period extension problem using various parent selection methods is evaluated and compared. In this paper, six GA selection operators are used: roulette wheel, linear rank, exponential rank, stochastic universal sampling, tournament, and truncation. According to the simulation results, the truncation selection operator is the most efﬁcient operator in terms of extending the network stability period and improving reliability. The truncation operator outperforms other selection operators, most notably the well-known roulette wheel operator, by increasing the stability period by 25.8% and data throughput by 26.86%. Furthermore, the truncation selection operator outperforms other selection operators in terms of the network residual energy after each protocol round.


Introduction
As a result of rapid advancements in the field of micro-electro-mechanical systems (MEMS), small sensor nodes have become inexpensive and self-sufficient [1]. These sensor nodes can sense and monitor the environment, analyze and aggregate data, and communicate data to one other or to a central point, commonly known as the sink. Sensor nodes can be interconnected to serve an application-specific purpose through the use of wireless sensor networks (WSN) [2,3].
Sensor nodes have a certain amount of battery power, and these batteries are rarely rechargeable. Sensor nodes often use the most energy for their communication functions [4]. When a node's energy source runs out, the node is declared dead and is no longer useful. WSNs can be used in a wide variety of real-world situations. They are employed in a variety of fields, including agriculture, industry, health care, surveillance, target tracking, and security management, in both the civilian and military sectors [5][6][7]. Figure 1 shows a typical sensor node block diagram [8].
Sensor nodes have the ability to send their collected data straight to the sink node [9], but this consumes more energy and leads to the premature death of the node; as a result, other issues are introduced into the network, such as the coverage/hole problem [10]. Using clustering, sensor nodes can be balanced in terms of energy consumption. Clustering is based on putting together nodes that are near each other or have similar characteristics or functions. Cluster heads (CH) and member nodes (MN) are the two types of nodes that Direct communication between CH and sink node is possible, as well as communication involving multiple hops [11]. In general, CHs use more energy than MNs. The CH role must be rotated between network nodes in order to maintain a stable overall network energy level while also extending the network's life span [12]. For the creation, operation, and maintenance of clusters, numerous protocols have been designed. A number of them, such as low energy adaptive clustering hierarchy (LEACH) and stable election protocol (SEP), can be found in the literature [13]. Clustering and routing protocols in WSNs have a problem in that they cannot reduce the overall network energy and balance energy consumption among sensor nodes over the network's lifetime [11]. In some cases, a significant reduction in the overall energy consumption of the network is achieved at the expense of unequal remaining energies among sensor points. In other cases, balancing the remaining energies of sensor nodes is achieved at the expense of increasing the overall energy consumed by the network.
WSNs use meta-heuristics algorithms, which are also known as nature-inspired algorithms, or intelligent optimization algorithms, to balance energy consumption among sensor nodes while also reducing overall energy consumption [14,15]. This trade-off has proven to be successful. There are four types of meta-heuristic algorithms: bio-inspired, human-inspired, geography-inspired, and physics-inspired. The biological system is the source of inspiration for the vast majority of nature-inspired algorithms. The bio-inspired algorithms are further classified into three categories, namely evolutionary, swarm-based and plant-based [16,17]. The genetic algorithm (GA) is a meta-heuristic, bio-inspired and evolutionary technique, which is widely used in WSNs to solve fundamental problems, such as sensor node localization, energy efficient clustering, data aggregation and optimal coverage [18]. With GA, the number of clusters in the network can be optimized, and sensor node energy consumption can be balanced, allowing the WSN to operate for longer time periods. In lieu of two-dimensional WSN (2D-WSN), three-dimensional WSN (3D-WSN) is being researched because it is more realistic than 2D-WSN since the third dimension is crucial in determining the WSN's lifespan [5].
GA is made up of three main components: selection, crossover, and mutation. The selection operator is responsible for the production of the next generation; crossover and mutation are responsible for the manipulation of the selected individuals to form the next generation. The mechanism of selection determines which individuals are selected for reproduction (mating) and how many offspring each selected individual produces. The selection strategy's core principle is that the better an individual is, the greater the likelihood that it will be a parent. As a rule of thumb, crossover and mutation expand the search area, while selection narrows the search area within a population by eliminating poor solutions [19].
This study has the following contributions: First, the central dynamic clustering protocol based on GA for extending the 3D WSN stability period is proposed. Second, six GA selection operators are utilized, evaluated and compared for better reliability and maximum network life time. Three, a new GA fitness function is used to optimize the number of clusters in the network, which essentially includes information about the type of connections that MNs establish with CHs in a cluster and CHs establish with the sink in the network.
The rest of the paper is organized as follows: Section 2 overviews the related work. Section 3 describes the GA operators' details. In Section 4, the simulation setup and configuration are provided. The simulation results are explained in Section 5. Finally, we conclude our research and provide some future work in Section 6.

Related Work
Energy conservation in WSN is critical for extending the lifetime and ensuring the smooth operation of resource-constrained nodes in the network. Many techniques are proposed in the literature to increase the lifespan and reliability of WSNs. Clustering/routing, mobile relays and sinks, optimal node deployment, data correlation, energy harvesting, beamforming, and various optimization techniques are among them [20,21]. In the following sections, we limit our related work to optimization techniques that use nature-inspired algorithms, specifically the GA. There are numerous studies that used meta-heuristic algorithms to extend the life of WSNs. GA has demonstrated a wide range of applicability and solutions in WSNs. An energy-efficient clustering protocol (GAEEC) was proposed by the authors in [22], which divided the network into optimal numbers of clusters using GA before using GA again to select the appropriate node within a cluster as a CH, depending on its fitness function per algorithm round. They found that their protocol outperformed the current renowned LEACH protocol in terms of network throughput and stability. The authors in [23] proposed GAECH, a GA-based energy-efficient clustering hierarchy protocol whose goal is to reduce overall network energy consumption while also extending the network's lifespan. When the sink node is located outside the network area, their results show a significant increase in network life time compared to other well-known clustering protocols. Using genetic algorithms for energy-efficient clustering and routing in 3D WSNs, the authors in [8] proposed PEGASIS protocol variants. GA was used to create data transmission chains with a remarkably short overall length. An improved method for devising CHs that takes distance and remaining energy into account was developed. Their results showed that the normal PEGASIS protocol was superior in terms of the first-nodedie (FND) metric, compared to other well-known protocols. The fuzzy C-means genetic algorithm (FCM-GA) protocol was proposed by the authors in [24]. Two phases make up the protocol. First, clustering is done using FCM based on three objectives: the appropriate distribution of online energy among each cluster, the distances within each cluster, and the separation between each CH and the sink. As compared to other protocols, such as direct transmission, SH-MEER, and MH-FEER, the FCM-GA results show significant gains in terms of both network throughput and life span. In [25], the authors proposed an improved version of the LEACH protocol (O-LEACH). When determining the best route, O-LEACH relies on GA for selecting the best route. As a result, their energy usage was reduced by 17.40%.
To the best of our knowledge, no research has been done on the impact of using different GA selection operators on 3D-WSN reliability, lifetime and energy conservation. The following studies looked at and compared the outcomes of using GA selection operators in other domains. Using different parent selection strategies, the authors in [26] compared the GA's performance in solving the traveling salesman problem (TSP). The results show that the tournament selection strategy outperforms proportional roulette wheel and rank-based roulette wheel selections, achieving the best solution quality with the lowest computing times, in several TSP tests. The results also show that tournament and proportional roulette wheel selection can be superior to rank-based roulette wheel selection only for smaller problems and can become susceptible to premature convergence as the problem size increases. To solve the network topology design problem, the authors in [27] employed three different selection strategies: tournament, ranking, or the use of a roulette wheel. They were compared in terms of quality of solution and computation time. For a 10-node network, tournament selection yielded the best solution in terms of quality and computation time, while ranking and scaling delivered the worst solution in terms of quality and computation time. The solution quality of ranking and scaling was equal to that of tournament selection for networks with 21 and 36 nodes, but tournament outperformed ranking and scaling when it came to computation time.

Genetic Algorithm Overview
GA is a stochastic, meta-heuristic optimization algorithm. GA is made up of three different operators: selection, crossover, and mutation. The GA solution is encoded in a binary string known as chromosome. The elements of a chromosome are referred to as genes. The number of genes on each chromosome should be the same. An objective function is used to assess a chromosome's fitness value. The term population refers to a grouping of chromosomes [19].
GA is an iterative process. It uses the three operators at each run to find new chromosomes that may produce a better solution; see Algorithm 1. The GA operators are as follows: The selection operator has the most effect in directing the GA toward the optimal solution and narrowing the search space. Moreover, it aims to exploit the best characteristics of good candidate solutions in order to improve these solutions over generations. This should, in theory, guide the GA to converge to an acceptable and satisfactory solution of the optimization problem at hand. However, despite decades of research, there are no general guidelines or theoretical support for how to choose a good selection method for each problem. This can be a serious issue because a poor selection operator can lead to poor performance of the GA in terms of both speed and reliability [28].
Six different selection operators are utilized in this research to evaluate the performance of 3D WSN, namely [19]: the roulette wheel selection (RWS), the linear rank selection (LRS), the exponential rank selection (ERS), the stochastic universal sampling (SUS), the tournament selection (TOS), and the truncation selection (TRS). All of the algorithms steps are shown in MATLAB as pseudo-code.

1.
Roulette wheel selection (RWS): The distinguishing feature of this selection method is that it assigns a probability p i of selection to each chromosome i in the current population, proportional to its fitness value f i as shown in the following expression.
The population size is denoted by n. Algorithm 2 shows the RWS pseudo-code. It should be noted that a well-known disadvantage of this technique is the risk of the GA converging to a local optimum too soon, due to the presence of a dominant individual who always wins the competition and is chosen as a parent.

2.
Linear rank selection (LRS): This is variant of RWS that attempts to overcome the disadvantage of the premature convergence of the GA to a local optimum. It is based on a chromosome's rank rather than its fitness. The best chromosome receives rank n, while the worst individual receives rank 1. As a result, each chromosome has the probability of being chosen given by the following expression: Algorithm 3 shows the LRS pseudo-code.

3.
Exponential rank selection (ERS): It is based on the same principle as the LRS, but it varies slightly from the LRS in terms of the probability of selecting each individual. This probability is given by the expression where w is the exponent base, a typical value of w ∈ [0 1]. Algorithm 4 shows the ERS pseudo-code.

4.
Stochastic universal sampling (SUS): SUS is roughly equivalent to RWS. The only difference is that instead of a single fixed point, we have several fixed points. As a result, all of the parents are chosen at random from a single spin of the wheel based on their fitness value. Furthermore, such a setup encourages the most highly fit chromosome to be chosen at least once in a spin. Algorithm 5 shows the SUS pseudo-code.

5.
Tournament selection (TOS): This is a type of ranking-based selection method. Its basic idea is to pick a group of k chromosomes at random. These chromosomes are then ranked based on their relative fitness, with the fittest being chosen for reproduction. TOS is a popular method of selection in evolutionary algorithms. It works well for a wide range of problems, it is easy to implement, and it is parallelizable. Algorithm 6 shows the TOS pseudo-code. 6.
Truncation selection (TRS): The TRS method sorts chromosomes based on their fitness value. Only the best chromosomes are chosen to be the parents of the next new population. The main parameter for truncation selection is the TRS threshold. It denotes the specified proportion of the population to be chosen as parents, with values ranging from 50% to 10%. Individuals who fall below the TRS threshold do not reproduce and are discarded. Algorithm 7 shows the TRS pseudo-code.

Experiment Models and Setup
Various assumptions and models are introduced in this section. These assumptions and models are required for the construction of the simulation model in this paper. MAT-LAB software is used to program all models of the WSN and the GA algorithm. Table 1 shows the variables' definition used in the analysis and simulation.

Network Setup
A three-dimensional WSN is utilized in this paper. The reason for that is that 3D-WSNs are more realistic than 2D-WSNs. The third dimension is critical in deciding which CH to connect with compared to the 2D case. For instance, suppose a number of nodes are deployed in 3D space and then in 2D space by removing the third dimension. Following that, the nodes go through the same process of electing the CHs. The outcome will show that some nodes that are connected to one CH in 3D space are joining another CH in 2D space [29]. Moreover, the third dimension is also crucial in calculating the energy expenditure during the protocol round. It is demonstrated that the life span of 3D-WSNs is less than that of 2D-WSNs [5]. The following assumptions about network model are fixed: • The sink node is a resource-rich device in the network's center. • All sensor nodes are homogeneous in the sense that they have identical hardware capabilities and are set to the same energy level. • Sensor nodes are distributed uniformly in a 3D network cube geometry. • Following deployment, all sensor nodes are stationary. • Every sensor node must connect to a CH, regardless of whether it is the closest (the default) or not. • For a given signal to noise ratio, the communication channel is symmetric (i.e., the energy required to transmit a data report from node s 1 to node s 2 is the same as the energy required to transmit a message from node s 2 to node s 1 ).
A simple radio model is used for energy dissipation in which the transmitter dissipates energy to run the radio electronics and the power amplifier, and the receiver dissipates energy to run the radio electronics. Figure 2 shows a simple radio energy model adopted in this study. The transmitter's energy expenditure to process and send a l-bit packet is given by where E elec is the energy consumed by the RF radio in joules/bit. The amplification parameters for free space and multipath fading propagation models are represented by f s and mp . The distance threshold d 0 is used to switch between the free space and multipath models.
The receiver consumes the following in order to receive a l-bit packet:

Clustering Protocol Setup Phases
GA dynamic clustering is used to extend the life of 3D-WSNs by optimizing the number of clusters in the network and balancing consumed energies among sensor nodes. The process of forming optimal network clusters per protocol round is referred to as dynamic clustering. The sink node begins each round by forming a random number of CHs, then computes a fitness value for such a configuration, and then begins to search for better clusters configurations by minimizing the fitness value. The process is repeated until no further improvements to the fitness value are made and the round is completed. Moreover, the rounds continues until the network is exhausted.
Dynamic clustering has the following two steps: 1.
Formation of network clusters: GA dynamic clustering is performed in the sink node because it has all node geographical location information. The binary encoding technique is used. CHs are encoded as binary '1' in this approach, while other member nodes are encoded as binary '0'. The GA dynamic clustering algorithm applies all GA operators to a randomly generated initial population until the termination condition is met. Following that, the best-fitting chromosome is produced, which represents the new cluster configuration. The sink node distributes the cluster settings to the network's nodes.

2.
Data communication: Data packets collected by member nodes are transferred to the CHs during this phase. The collected data are then processed, aggregated, and compressed by CHs before being transferred to the sink node. CHs organize data collection from their member nodes using time division multiple access (TDMA) scheduling and send data reports using the code division multiple access (CDMA) technique.
Fitness or objective function (Y) is an important defining parameter of the GA. GA dynamic clustering fitness function is comprised of the following components.

1.
Overall energy consumption for a typical round of data collection (E O ). The primary aim of all existing algorithms is to reduce network energy consumption. As a result, it is directly used in calculating the fitness function of the chromosomes. Total energy consumption is equal to the sum of intra-cluster and inter-cluster energy consumption.

2.
Number of long range wireless connections (L). A long range connection is defined by Equation (6). A node can be connected to either a CH or a sink node. A long range connection is used if the distance between a CH and the sink node or the distance between a member node to its CH is greater than a threshold value (d 0 ). Furthermore, a long-range connection consumes more energy when transmitting data reports than a short-range connection. Long-range connections are made up of two parts: long-range connections within a cluster and long-range connections to the sink node.

3.
Clusters distance (d c ) is the sum of the distances between member nodes and their corresponding CHs, as well as the distances between CHs and the sink node. 4.
Number of CHs (m). CHs consume more energy than other member nodes. Then, it is critical to reduce the number of CHs to a bare minimum.

5.
Variability of energy consumption between clusters (v). Most existing algorithms reduce overall energy consumption, but not energy consumption among clusters. The overall energy savings may have an impact on one or more individual clusters with higher energy costs. It causes certain nodes in the network to die prematurely and reduces the network's overall stability. The standard deviation is a metric that measures the difference in energy consumption between existing network clusters.
E i denotes the amount of energy consumed per cluster. µ is the average of the energies of the clusters.
The fitness function (Y) of GA dynamic clustering is a function of five independent variables and is given by where w 1 , w 2 , w 3 , w 4 and w 5 represent constant coefficients weights. Their values are application dependent.

GA Complexity Analysis
The overall complexity (T) of our WSN dynamic clustering technique using the GA meta-heuristic is composed of the following:

1.
Measuring the complexity of the mutation, crossover and selection operators.

2.
Determining the number of generations, the population size and chromosome length.

3.
Measuring the complexity of the fitness function.
The overall complexity is given by Table 2 shows the time complexity for the different GA operators used in the simulation, provided that we utilize a single gene mutation and single point crossover.

Operator
Complexity The complexity of the fitness function is given by Then the overall complexity of our algorithm, depending on the selection operator used, is

Simulation Setup and Results
This section provides an overview of the simulation metrics and parameter settings for the WSN and GA algorithm, followed by a discussion of the results.

Simulation Metrics and Parameters
A variety of network metrics are used to assess the performance of the various GA selection methods. These include the following:

1.
Network life time (NLT): The time elapsed between the start of the network and the death of the last node. The NLT is divided into two parts: the stability period or network reliability, which lasts from the start of network operation until the death of the first node (FND), and the instability period, which lasts from the death of the first node to the death of the last node (LND).

2.
Network remaining energy: represents the total amount of energy that remains after each protocol round.

3.
Network throughput: the total data reports received by the sink node. Table 3 shows the parameter settings used in the simulation. The results of the simulation results are an average of five simulation runs.

Simulation Results
The simulation results are described in detail in this section. Table 4 shows the numerical simulation results for the network's six selection operators. The table describes the time duration in rounds (r), the average number of clusters (c), and the average energy consumption (ē). The table shows the percentages of dead nodes (dn) ranging from one percent (1%)-which represents the FND because we used 100 nodes in the network-to one hundred percent (100%) for the death of the last node in the network. Using the RWS selection operator, for example, the round r at which the first node (1%) died is 252, and the death of the 10% of nodes occurs at r = 329, while the average numbers of CHs and consumed energy from the start of the network to the death of the first node (1%) arē c = 11.73 andē = 0.0236, respectively, and the average number of CHs and consumed energy from the death of the first node (1%) to the death of the (10%) of nodes arec = 11.60 andē = 0.0227. Figure 3 shows the NLT metric. It is clearly shown that using the TRS selection operator prolongs the network stability period and reliability compared to other selection operators.    Table 5 shows the improvement (γ) of the TRS operator over the other selection operators utilizing the following formula.
where So TRS represents the property value of the TRS selection operator, and SO i represents the property value of any other selection operator. The table shows that, with the exception of the ERS operator near the end of network operation, the TRS operator outperforms all other selection operators during the network's lifetime. When compared to other operators, the TRS operator extends the network's stability period. Furthermore, it outperforms the well-known RWS operator by 25.8%. Extending the stability period in WSN is important because it adds reliability to the network. When the node population begins to decline, the number of CHs per round becomes unstable (lower than intended), and there is no guarantee that an optimal number of CHs will be generated per round. Furthermore, because there are fewer alive nodes, the field is sampled (sensing) over fewer nodes than intended.
The total remaining energy for the six selection operators is depicted in Figure 4. It demonstrates that the TRS operator saves energy the most because its remaining energy values are the highest across the network operation. In terms of energy savings, the ERS operator ranks second. The TRS operator performs well because it selects the best chromosomes for mating (crossover) per round, which reduces the solution search space, and it selects the best CHs configuration, which minimizes the fitness value. These new chromosomes are added to the existing population. The next selection is based on this combined population. Figure 5 shows that the TRS operator outperforms the other selection operators in terms of the total number of packet reports that reach the sink node. This is because the TRS operator reduces energy consumption during the network's lifetime by utilizing the optimal number of clusters per protocol round.
The cumulative throughput at the sink node for the six selection operators is shown in Table 6. Furthermore, it demonstrates that when using the truncation operator, a minimum increase of 5.67% is observed when compared to the exponential ranking operator, a maximum increase of 29.45% when compared to the tournament operator, and an increase of 26.86% when compared to the well-known roulette wheel operator.

Conclusions and Future Work
This paper proposes a central dynamic clustering protocol based on GA for extending the 3D-WSN stability period. The GA algorithm makes use of six parental selection opera-tors. Furthermore, in order to find the best solution, the performance of these operators is evaluated and compared in terms of network stability period and reliability, total remaining energy, and data throughput. The truncation operator is found to have the greatest impact on network stability period and reliability, followed by the exponential ranking operator. The truncation operator outperforms other selection operators, particularly the well-known and widely used roulette wheel operator, in the stability period and cumulative throughput at the sink node by 25.8% and by 26.86%, respectively. A number of challenges can be investigated for future work: First, a comparative study of different nature-inspired optimization algorithms, viz. ant colony optimization (ACO), particle swarm optimization (PSO), with the GA approach utilized in this paper to identify which optimization algorithm extends the network reliability and life time more. Second, the impact of different GA selection and crossover techniques can be studied to identify which selection/crossover techniques can be combined to further extend the network reliability and life time. Finally, the same research can be carried out in heterogeneous WSN. Nodes in this WSN have varying structures, capabilities, computational power, and energy storage. A new fitness function can be proposed to take advantage of these properties in order to increase the reliability and life time of heterogeneous WSNs.