FEHCA: A Fault-Tolerant Energy-Efﬁcient Hierarchical Clustering Algorithm for Wireless Sensor Networks

: Technological advancements have led to increased conﬁdence in the design of large-scale wireless networks that comprise small energy constraint devices. Despite the boost in technological advancements, energy dissipation and fault tolerance are amongst the key deciding factors while designing and deploying wireless sensor networks. This paper proposes a Fault-tolerant Energy-efﬁcient Hierarchical Clustering Algorithm (FEHCA) for wireless sensor networks (WSNs), which demonstrates energy-efﬁcient clustering and fault-tolerant operation of cluster heads (CHs). It treats CHs as no special node but equally prone to faults as normal sensing nodes of the cluster. The proposed scheme addresses some of the limitations of prominent hierarchical clustering algorithms, such as the randomized election of the cluster heads after each round, which results in signiﬁcant energy dissipation; non-consideration of the residual energy of the sensing nodes while selecting cluster heads, etc. It utilizes the capability of vector quantization to partition the deployed sensors into an optimal number of clusters and ensures that almost the entire area to be monitored is alive for most of the network’s lifetime. This supports better decision-making compared to decisions made on the basis of limited area sensing data after a few rounds of communication. The scheme is implemented for both friendly as well as hostile deployments. The simulation results are encouraging and validate the proposed algorithm.


Introduction
The world is witnessing the increasing involvement of sensors in human life due to the technological advancements that have been achieved in recent decades. Various arrangements of sensors are serving humankind in the present scenario. Wireless sensor networks can be visualised as various sensing nodes sensing a specified domain, gathering information normally pertaining to environmental changes (movement of human beings, animals or vehicles; temperature changes, etc.) and forwarding it to a central control station, generally referred to as a sink or base station (BS), for decision-making purposes [1][2][3]. Figure 1 shows a typical hierarchical architecture, where sensing nodes collect the data from the environment and forward it to a leader node in the cluster, referred to as a cluster head (CH), which, after aggregation, further forwards it towards the base station. The analysis of data for decision-making purposes is generally performed at the BS level, but some basic computations can also be performed at the CH level or even at the normal sensing node level depending on the approach used [1,4].
Inherent to the wireless networks are a few constraints such as limited battery life, bandwidth, memory, processing capabilities, abrupt sensor breakdown or abrupt behaviour, etc., which all need to be efficiently addressed. These networks may be deployed in friendly Inherent to the wireless networks are a few constraints such as limited battery life, bandwidth, memory, processing capabilities, abrupt sensor breakdown or abrupt behaviour, etc., which all need to be efficiently addressed. These networks may be deployed in friendly environments such as homes and offices or may be air-dropped into hostile environments where sometimes it is not possible to replace the batteries of the sensors or calibrate the sensors [1,3]. The sensing nodes of WSNs are generally fixed but they may have limited mobility as well, such as sensors floating on water bodies.
Continuing advancements in micro-electro-mechanical systems (MEMS) technology have made it possible to fabricate smaller, inexpensive, energy-efficient sensors with increased memory and increased computational capability [1,[5][6][7] (e.g., Mica motes from Crossbow, Tmote Sky from Moteiv, the MKII nodes from UCLA and SunSpot from Sun), which have further boosted confidence in the use of WSNs. The integration of sensing capabilities, computation and communication into a single unit has also been much refined. This has tempted researchers to analyse the sensors' data for the purpose of efficient decision-making.
Various methodologies have been adopted in relation to WSNs; of late, many researchers have been tempted to use machine learning (ML) in the context of WSNs [8,9]. It provides generalised solutions through an architecture that can learn and improve its performance. The application of ML has led to great improvements in the performance of WSNs. Thus, advancements in machine learning have and are being applied to solve various WSN issues [10,11].
Routing strategies also play a key role in the efficient working of WSNs. These strategies in WSNs, on the basis of the network structure, are broadly classified into flat network routing, location-based routing and hierarchical network routing. In flat network routing, all the sensors are uniformly deployed at the same level and each sensor serves as the other's peer. This can further be divided into proactive or reactive routing [12]. On the other hand, in the case of location-based routing, the sensors are clustered based on their location in the deployment, which is determined by the received signal strength [13,14].
Amongst the three, hierarchical routing protocols have attracted the attention of many researchers as they ensure better energy efficiency [15,16]. Here, the protocols arrange the deployed sensors into groups and each group has a designated cluster head (generally the sensor with the maximum energy). The cluster head coordinates with all the nodes inside the group, generally called a cluster, and with the activities outside the cluster, whether communicating with other CHs or the BS.
One of the most prominent hierarchical routing protocols is Low-Energy Adaptive Clustering Hierarchy (LEACH). It has shown great energy efficiency and is the basis of Continuing advancements in micro-electro-mechanical systems (MEMS) technology have made it possible to fabricate smaller, inexpensive, energy-efficient sensors with increased memory and increased computational capability [1,[5][6][7] (e.g., Mica motes from Crossbow, Tmote Sky from Moteiv, the MKII nodes from UCLA and SunSpot from Sun), which have further boosted confidence in the use of WSNs. The integration of sensing capabilities, computation and communication into a single unit has also been much refined. This has tempted researchers to analyse the sensors' data for the purpose of efficient decision-making.
Various methodologies have been adopted in relation to WSNs; of late, many researchers have been tempted to use machine learning (ML) in the context of WSNs [8,9]. It provides generalised solutions through an architecture that can learn and improve its performance. The application of ML has led to great improvements in the performance of WSNs. Thus, advancements in machine learning have and are being applied to solve various WSN issues [10,11].
Routing strategies also play a key role in the efficient working of WSNs. These strategies in WSNs, on the basis of the network structure, are broadly classified into flat network routing, location-based routing and hierarchical network routing. In flat network routing, all the sensors are uniformly deployed at the same level and each sensor serves as the other's peer. This can further be divided into proactive or reactive routing [12]. On the other hand, in the case of location-based routing, the sensors are clustered based on their location in the deployment, which is determined by the received signal strength [13,14].
Amongst the three, hierarchical routing protocols have attracted the attention of many researchers as they ensure better energy efficiency [15,16]. Here, the protocols arrange the deployed sensors into groups and each group has a designated cluster head (generally the sensor with the maximum energy). The cluster head coordinates with all the nodes inside the group, generally called a cluster, and with the activities outside the cluster, whether communicating with other CHs or the BS.
One of the most prominent hierarchical routing protocols is Low-Energy Adaptive Clustering Hierarchy (LEACH). It has shown great energy efficiency and is the basis of various other subsequent routing protocols [17]. It makes use of randomisation for the purpose of evenly distributing the energy load across all the sensing nodes and arranges various sensors into groups, where each group has a special sensor behaving as the leader of the group, coordinating the activities of the group. Here, a few sensors designate themselves as cluster heads based on a certain probability and the number of times they have been the cluster head so far. This leader's role is rotated so that it does not drain any single sensor's energy [15].
Since, the deployed battery-operated sensing nodes are energy-constrained, it is important to utilize the energy wisely to extend the network's lifetime. Moreover, the nodes deployed in hostile environments are very prone to faults, which may be related to a harsh environment, energy depletion or hardware failure. Although it is quite evident that a non-CH faulty node will degrade the network performance, a faulty CH node can be much more problematic, as it will hamper the whole data aggregation and dissemination process of this cluster and will make normal working nodes useless. This paper discusses the basic concept of the hierarchical clustering protocols and proposes a base-stationcontrolled Fault-tolerant Energy-efficient Hierarchical Clustering Algorithm (FEHCA), which focuses on:

•
The issue of energy efficient clustering to enhance the network lifetime; • The optimal cluster counts for the network; • The use of k-means clustering that partitions N-nodes into k-clusters, in which each node is associated with the cluster with the nearest mean; • The election of CHs near the centroids of the respective clusters for effective intercluster communication; • Minimization of energy dissipation by deferring the re-election of new CHs after each round; • Attempting to make the cluster heads (CHs) fault-tolerant by appointing a secondary cluster head (sCH) in case the current cluster head goes down between two successive rounds.
In essence, this paper proposes a centralised cluster formation technique controlled by the base station (BS), where the deployed sensors are divided into an optimal number of clusters. Once the number of clusters is decided and cluster formation is done, the algorithm, instead of electing any arbitrary CH, as shown in Figure 2, elects the CH near the centroid of respective clusters. The algorithm has been developed to be fault-tolerant in the context of the CH and does not require re-cluster formation between two successive rounds if the CH is found to be permanently defective. purpose of evenly distributing the energy load across all the sensing nodes and arranges various sensors into groups, where each group has a special sensor behaving as the leader of the group, coordinating the activities of the group. Here, a few sensors designate themselves as cluster heads based on a certain probability and the number of times they have been the cluster head so far. This leader's role is rotated so that it does not drain any single sensor's energy [15]. Since, the deployed battery-operated sensing nodes are energy-constrained, it is important to utilize the energy wisely to extend the network's lifetime. Moreover, the nodes deployed in hostile environments are very prone to faults, which may be related to a harsh environment, energy depletion or hardware failure. Although it is quite evident that a non-CH faulty node will degrade the network performance, a faulty CH node can be much more problematic, as it will hamper the whole data aggregation and dissemination process of this cluster and will make normal working nodes useless. This paper discusses the basic concept of the hierarchical clustering protocols and proposes a base-station-controlled Fault-tolerant Energy-efficient Hierarchical Clustering Algorithm (FEHCA), which focuses on: • The issue of energy efficient clustering to enhance the network lifetime; • The optimal cluster counts for the network; • The use of k-means clustering that partitions N-nodes into k-clusters, in which each node is associated with the cluster with the nearest mean; • The election of CHs near the centroids of the respective clusters for effective intercluster communication; • Minimization of energy dissipation by deferring the re-election of new CHs after each round; • Attempting to make the cluster heads (CHs) fault-tolerant by appointing a secondary cluster head (sCH) in case the current cluster head goes down between two successive rounds.
In essence, this paper proposes a centralised cluster formation technique controlled by the base station (BS), where the deployed sensors are divided into an optimal number of clusters. Once the number of clusters is decided and cluster formation is done, the algorithm, instead of electing any arbitrary CH, as shown in Figure 2, elects the CH near the centroid of respective clusters. The algorithm has been developed to be fault-tolerant in the context of the CH and does not require re-cluster formation between two successive rounds if the CH is found to be permanently defective.  The work was carried out by considering both friendly and hostile environments; thus, the placement of the BS was assumed to be within the deployment and far away from the hostile environment, respectively. The performance of the network highly depends on the placement of the base station. Suppose that a BS is placed within the monitored area (user-friendly environments such homes and offices, etc.); it will be in close proximity to the cluster heads, as compared to a base station, which is outside the monitored area (generally in the case of hostile deployments such as harsh terrains, border areas, etc.), where most of the cluster heads generally will be at a distance from the base station.
Most of the implementations discuss either friendly environment deployments or hostile environment deployments, but the proposed algorithm was simulated and validated for both scenarios. We also compared our work with some of the state-of-the-art existing technologies with common parameters by positioning the BS within the monitored area. Furthermore, the paper simulates the proposed algorithm in hostile environments by integrating high energy nodes into the network, with the primary aim of keeping the entire monitored area alive for the majority of the lifetime of the network rather than monitoring only the surrounding region after certain rounds of communication, as shown in Figure 3, and the results are promising. The work was carried out by considering both friendly and hostile environments; thus, the placement of the BS was assumed to be within the deployment and far away from the hostile environment, respectively. The performance of the network highly depends on the placement of the base station. Suppose that a BS is placed within the monitored area (user-friendly environments such homes and offices, etc.); it will be in close proximity to the cluster heads, as compared to a base station, which is outside the monitored area (generally in the case of hostile deployments such as harsh terrains, border areas, etc.), where most of the cluster heads generally will be at a distance from the base station.
Most of the implementations discuss either friendly environment deployments or hostile environment deployments, but the proposed algorithm was simulated and validated for both scenarios. We also compared our work with some of the state-of-the-art existing technologies with common parameters by positioning the BS within the monitored area. Furthermore, the paper simulates the proposed algorithm in hostile environments by integrating high energy nodes into the network, with the primary aim of keeping the entire monitored area alive for the majority of the lifetime of the network rather than monitoring only the surrounding region after certain rounds of communication, as shown in Figure 3, and the results are promising. The rest of the paper is organized as follows. Section 2 presents a summary of the related work done in this domain, whereas the proposed system model is explained in Section 3, which includes the network model and energy model for the proposed system. Section 4 discusses the proposed algorithm for cluster head selection and fault tolerance. Investigational outcomes are discussed in detail in Section 5, which also contains the comparative study of our algorithm. Finally, the summary and conclusions are presented in Section 6.

Related Work
Clustering and fault tolerance are the focus of various researchers working on wireless senor networks [3,18]. Clustering is generally studied as either centralised or distributed [18]. Centralised clustering involves clustering that is dictated by the base station and the decisions are communicated in the field to the elected CHs; it is generally used in location-aware sensor networks. On the other hand, distributed clustering can be utilised The rest of the paper is organized as follows. Section 2 presents a summary of the related work done in this domain, whereas the proposed system model is explained in Section 3, which includes the network model and energy model for the proposed system. Section 4 discusses the proposed algorithm for cluster head selection and fault tolerance. Investigational outcomes are discussed in detail in Section 5, which also contains the comparative study of our algorithm. Finally, the summary and conclusions are presented in Section 6.

Related Work
Clustering and fault tolerance are the focus of various researchers working on wireless senor networks [3,18]. Clustering is generally studied as either centralised or distributed [18]. Centralised clustering involves clustering that is dictated by the base station and the decisions are communicated in the field to the elected CHs; it is generally used in location-aware sensor networks. On the other hand, distributed clustering can be utilised in the case of location-unaware sensor networks, where the sensors are not aware of their relative positioning in the network.
Mechta et al. [30] have proposed a protocol named LEACH-CKM that makes use of kmeans classification for grouping the nodes. The proposed protocol is based on Centralised LEACH, where CHs are appointed centrally with the help of the CH's and sensing node's local information. LEACH-CKM addresses the node isolation problem where a remote node is not able to communicate with the base station. It utilises a Minimum Transmission Energy routing algorithm for routing the information from remote nodes.
An energy-efficient, fault-tolerant, distributed clustering and routing scheme for wireless sensor networks has been proposed by Azharuddin et al. [31]. The proposed scheme is a set of two algorithms jointly named Distributed Fault-tolerant Clustering and Routing (DFCR). It makes use of a distributed run time recovery strategy for the sensor nodes in case of unexpected failure of the cluster heads. The scheme is shown to be capable of handling the sensor nodes that do not have any cluster head to be associated with.
In another work, LEACH's threshold formula has been modified and presented by Xingguo L et al. [20], where the residual energy is taken into account while selecting a cluster head, as the traditional LEACH does not pay attention to the residual energy of the nodes while choosing the cluster heads, which might lead to a low energy sensor node being elected as the cluster head, which will exhaust early, leading to wastage of resources utilised for cluster head selection. The protocol uses direct BS and CH communication.
Bhatti et al. [32] have proposed a clustering technique using fuzzy c-means. The paper presents a cluster-based cooperative spectrum sensing scheme to reduce energy consumption and proposes an algorithm that creates clusters and elects the cluster heads in such a way that focuses on both the issues of energy consumption and efficiency in a positive manner. The proposed scheme ensures maximum probability of detection under an imperfect channel, utilising minimal consumption of energy when compared with the conventional clustering approaches.
Further, the role of energy balancing in extending the network lifespan has been explored by Azharuddin et al. [33]. The authors propose a Particle Swarm Optimization (PSO)-based routing and clustering algorithm for wireless sensor networks, where the routing algorithm creates a balance between energy efficiency and energy balancing. The clustering algorithm is responsible for the CH and normal sensing node's energy consumption. The proposed algorithms are also sophisticated enough to handle the failure of cluster heads.
A comparison between Particle Swarm Optimisation and Genetic Algorithm when applied to wireless sensor networks for network optimisation has been presented by Parwekar et al. [34]. It shows how both the algorithms can optimise cluster formation. Their experimental results found that PSO outperformed GA for clustering purposes.
Saveros et al. [35] have proposed a low-power handover design algorithm for wireless sensor networks (WSNs). The algorithm is designed to place the majority of the scanning responsibility on the mains-powered access points. The proposed algorithm has been implemented and empirically evaluated. The evaluation results show that the energy consumption can be reduced by several orders of magnitude compared to existing algorithms for WSNs.
Nigam, G.K et al. [36] have pointed out at some of the drawbacks of the LEACH protocol, such as the unnecessary cluster head election after each round, which results in significant energy depletion, discarding the remaining energy of the sensors, nonuniformity of number of cluster heads, etc. The paper proposes an algorithm, namely ESO-LEACH, for addressing some of these issues. It makes use of metaheuristic particle swarm enhancement for the initial clustering of the nodes, introduces the concept of advanced nodes and also takes the residual energy of the nodes into consideration. An enhanced set of rules and a fitness function are also defined for the purpose of cluster Based on the above observations, it seems that there is a need for a centrally administered fault-tolerant and energy-efficient clustering algorithm. In order to achieve this, we have proposed an improved, centrally administered, fault-tolerant and energy-efficient clustering algorithm that is capable of enhancing the energy load balance and able to achieve encouraging results. Therefore, in the next section, a system model is considered for implementing the proposed algorithm.

System Model
A system model has prime importance when it comes to algorithm discussions. In this section, the system model considered in the proposed algorithm is discussed in terms of network model, energy model and fault model.

Network Model
The proposed work assumes the random deployment of N-sensing nodes that, once deployed, are stationary. The deployed nodes are homogeneous in nature, with identical initial energy and hardware. The basic working principle is divided into rounds as in [37], where each round comprises local data gathering by the sensing nodes, which is forwarded to the cluster head of the particular clusters. On receiving the data, the cluster head aggregates the received data, removes the redundancies and transmits these data to the base station (BS). In between the successive rounds, the nodes are assumed to switch off their radios in order to conserve their batteries. The following assumptions have been considered for this purpose: Assumption 1. All the sensing nodes, once deployed, and the base station are stationary.

Assumption 2a.
All the sensing nodes start with the same initial energy (in the case of BS positioned at the centre of the deployment).
Assumption 2b. The deployment is equipped with advanced nodes in incremental fashion with increasing distance from the BS as per Lemma 2 (in the case of BS positioned outside the deployment area).

Assumption 3.
The BS controls the clustering process and has high computational power and is also not energy-constrained.  The above-listed assumptions are taken into consideration in order to study the performance of the proposed scheme in line with the other popular schemes mentioned in the paper and highlight the improvements of the proposed scheme in terms of the common parameters.

Energy Model
The proposed work makes use of the popular energy model described in [37], which makes use of both the free space (d 2 power loss) and the multipath propagation (d 4 power loss) model depending on the distance between the communicating entities.
If the distance between the sender and the receiver is less than a threshold d o , then the free space model is utilised; otherwise, the multipath model is utilised. If E elec is the amount of energy dissipated by the radio to run the transmitter or receiver circuitry per bit, Energies 2021, 14, 3935 7 of 21 ε fs and ε mp are the energy required for amplification in the free space and multipath model, respectively. The energy required by the radio to transmit a k-bit message over a distance d is given by Equation (1) as: To receive the k-bit message, the radio expends E Rx energy, as given by Equation (2).
The given parameter E elec depends on numerous factors, such as digital coding, modulation, filtering and spreading of the signal, while the amplification energy ε fs and ε mp is subject to the distance between the sender and the receiver and to the acceptable bit rate error.
For simulation purposes, the parameters are set to: E elec = 50 nj/bit, ε fs = 10 pj/bit/m 2 ,

Fault Model
Failures are an inherent part of WSNs; different types of faults may affect them [38]. If, in between the CH selection rounds, the CH goes down, then either the data aggregated of that cluster will be lost or a new CH selection will be initiated immediately. The proposed Fault-tolerant Energy-efficient Hierarchical Clustering Algorithm (FEHCA) does not introduce immediate CH selection in between two successive rounds as such a practice leads to energy depletion. The CH going down in between two successive rounds is managed by the sCH node of each cluster, which is on standby in every cluster and takes over the task of CH temporarily.
The algorithm ascertains the selection of the CH at the centroid or near the centroid of the clusters. As a random CH selection does not present an ideal scenario, as shown in Figure 2, unpredictable CH selection generally leads to non-uniform energy dissipation in the network, which ultimately reduces the network lifetime due to the rapid energy discharge. Election of the CH near the centroids makes it easily accessible and helps in achieving uniform energy dissipation, and, once the current CH's energy is depleted beyond a certain threshold, preference is given to the eligible sensors near the previous round's CH and the defined energy threshold is satisfied.

Proposed Algorithm
In the proposed algorithm, the network lifetime is divided into several rounds. The algorithm works in two phases, the setup phase and the steady-state phase, as shown in Figure 4. In the setup phase, the base station (BS) gathers the energy and location information of the deployed nodes, based on which it clusters the entire region into an optimal number of clusters (7 optimal clusters in our case), as shown in Figure 5. The BS, after deciding the optimal cluster ratio (C r ) in the network, performs the clustering using the k-means clustering algorithm [39]. Once the clusters are determined, the BS elects the CHs at or near the centroid of the clusters, which is followed by the steady-state phase.
In the steady-state phase, when the cluster formation is accomplished, the non-cluster head sensing nodes sense the region and forward the sensed data to the appropriate CHs. The CHs aggregate the data, look for any redundancy and uncorrelated data and send it to the BS in a single hop. For better energy efficiency, the proposed algorithm elects the CHs near the centroid of the cluster and delays the subsequent setup phases, unlike in LEACH, until the residual energy of the current CH (CH_E res ) >= specified threshold energy (Cluster_E avg ), thus conserving energy, as each setup phase consumes a significant amount of energy in exchanging the control packets required for the cluster formation. CHs. The CHs aggregate the data, look for any redundancy and uncorrelated data and send it to the BS in a single hop. For better energy efficiency, the proposed algorithm elects the CHs near the centroid of the cluster and delays the subsequent setup phases, unlike in LEACH, until the residual energy of the current CH ( _ ) >= specified threshold energy ( _ ), thus conserving energy, as each setup phase consumes a significant amount of energy in exchanging the control packets required for the cluster formation.  Algorithm 1 describes the proposed scheme, followed by Lemma 1′s analysis of its complexity. The clustering and cluster head selection processes used in the proposed algorithm are discussed in the following section.

Clustering Phase
The proposed algorithm uses a popular unsupervised vector quantisation k-means algorithm to divide the randomly distributed sensors into clusters and performs iterative computations to optimise the centroids. The algorithm tries to minimise the distance between the sensor nodes within the clusters with the respective centroids. FEHCA implements a centralised approach where the BS performs the task of clustering. The BS elects the sensor node nearest to the centroid with energy greater than or equal to the average cluster energy, i.e., Eres >= Cluster_Eavg. Here, the current CH retains its role until its energy falls below the average energy of the cluster. Moreover, the number of clusters to be formed is not changed unless the ideal number of clusters required changes. In essence, the clustering phase is divided into two phases, namely centralised clustering and cluster head selection.

Centralised Clustering
In centralised clustering, each ith node is represented by Ni, where i ∈ {1, 2, ..., n}, and the base station is represented as N + . Therefore, the overall entities participating are N= {N1, N2, ..., Nn, N + }. N + , based on the sensor's location and residual energy, partitions the network into clusters; for this, the proposed scheme makes use of k-means clustering as shown in Algorithm 2. Once the clustering decision is made, N + conveys the clustering CHs. The CHs aggregate the data, look for any redundancy and uncorrelated data and send it to the BS in a single hop. For better energy efficiency, the proposed algorithm elects the CHs near the centroid of the cluster and delays the subsequent setup phases, unlike in LEACH, until the residual energy of the current CH ( _ ) >= specified threshold energy ( _ ), thus conserving energy, as each setup phase consumes a significant amount of energy in exchanging the control packets required for the cluster formation.  Algorithm 1 describes the proposed scheme, followed by Lemma 1′s analysis of its complexity. The clustering and cluster head selection processes used in the proposed algorithm are discussed in the following section.

Clustering Phase
The proposed algorithm uses a popular unsupervised vector quantisation k-means algorithm to divide the randomly distributed sensors into clusters and performs iterative computations to optimise the centroids. The algorithm tries to minimise the distance between the sensor nodes within the clusters with the respective centroids. FEHCA implements a centralised approach where the BS performs the task of clustering. The BS elects the sensor node nearest to the centroid with energy greater than or equal to the average cluster energy, i.e., Eres >= Cluster_Eavg. Here, the current CH retains its role until its energy falls below the average energy of the cluster. Moreover, the number of clusters to be formed is not changed unless the ideal number of clusters required changes. In essence, the clustering phase is divided into two phases, namely centralised clustering and cluster head selection.

Centralised Clustering
In centralised clustering, each ith node is represented by Ni, where i ∈ {1, 2, ..., n}, and the base station is represented as N + . Therefore, the overall entities participating are N= {N1, N2, ..., Nn, N + }. N + , based on the sensor's location and residual energy, partitions the network into clusters; for this, the proposed scheme makes use of k-means clustering as shown in Algorithm 2. Once the clustering decision is made, N + conveys the clustering Algorithm 1 describes the proposed scheme, followed by Lemma 1 s analysis of its complexity. The clustering and cluster head selection processes used in the proposed algorithm are discussed in the following section.

Clustering Phase
The proposed algorithm uses a popular unsupervised vector quantisation k-means algorithm to divide the randomly distributed sensors into clusters and performs iterative computations to optimise the centroids. The algorithm tries to minimise the distance between the sensor nodes within the clusters with the respective centroids. FEHCA implements a centralised approach where the BS performs the task of clustering. The BS elects the sensor node nearest to the centroid with energy greater than or equal to the average cluster energy, i.e., E res >= Cluster_E avg . Here, the current CH retains its role until its energy falls below the average energy of the cluster. Moreover, the number of clusters to be formed is not changed unless the ideal number of clusters required changes. In essence, the clustering phase is divided into two phases, namely centralised clustering and cluster head selection.

Centralised Clustering
In centralised clustering, each ith node is represented by N i , where i ∈ {1, 2, . . . , n}, and the base station is represented as N + . Therefore, the overall entities participating are N= {N 1 , N 2 , . . . , N n , N + }. N + , based on the sensor's location and residual energy, partitions the network into clusters; for this, the proposed scheme makes use of k-means clustering as shown in Algorithm 2. Once the clustering decision is made, N + conveys the clustering information to the sensing nodes (N i ). N + elects the CHs and sends the corresponding CH parameters to N i . On receiving the values from the BS, each node determines whether it is the CH or a normal participating node in the current round. If any node is not selected as a CH in the current round, then the N + directly transmits the N i value corresponding to the CH with which that node needs to associate. The k-means clustering initially performs a random partition of the network into centroid based k-clusters; then, it repeatedly performs the task of minimising the distance between each node and the respective centroid. Ultimately, this process ensures that the sensor nodes within the cluster are in close proximity. This leads to reduced energy dissipation

Cluster Head Selection
In this process, the BS (N + ) elects the CHs based on the distance from the centroid and the specified energy requirements, and it communicates the N i value corresponding to the selected CH to the participating nodes in the network. If the received N i value corresponds to the value of the receiving node, then the recipient recognises itself as the CH and creates and broadcasts to its member nodes for the coordination of the data transmission in the specified schedule. Otherwise, if the N i value does not correspond to the value of the receiving node, then it recognises itself as a normal member node of the cluster, sends a cluster joining request to the CH whose N i value it has received from N + and subsequently waits for the schedule to be generated by its CH. Once this process is completed, each member node in the cluster transmits the sensed data and the corresponding residual energy to the respective CH as per the received schedule. Between the subsequent transmission schedule, the sensor switches off the radios to conserve the energy. The CH compares its energy with the average energy of the cluster and communicates this information to the BS. If its energy is greater than or equal to the average energy of the cluster, then it remains the CH; otherwise, the BS elects a new CH by considering the residual energy and the distance from the centroid in the next round. New CHs are elected from the sensors within the current clusters only, and new clusters are created only when the ideal number of clusters to be formed is changed (depending on the C r ). The cluster head election in FEHCA is shown in Algorithm 1 which has a worst-case time complexity of O (Cn 2 + Cn CN 2 log CN) and is discussed in Lemma 1.

Algorithm 1. Cluster head election in FEHCA.
Input: (N, E i , n, C r ) N (N i , ∀ i ∈ {1, 2, . . . , n}) alive nodes with co-ordinates (x i , y i ) E: initial energy (E 1 , E 2 , E 3 , . . . , E n ) n: number of alive nodes Cr: Optimal cluster ratio Output: CH where CH represents cluster heads Procedure: FEHCA(N, E i , Cr, n) Generally, the network lifetime is considered till the last cluster head is able to communicate with the base station; once the base station stops receiving the sensed data of the region, the network is assumed to be exhausted. Though this is a valid way of computing the lifetime of a network, it does not give a true picture of the sensed region. This is because, initially, the BS receives sensed values from the entire region being monitored but, as the time elapses, the CHs far away from the BS start dying earlier than the CHs near the BS (in the case of direct non-multi-hop networks) and, as the time elapses, only the CHs near the BS are able to communicate, as depicted in Figure 3, this fact is later validated in the results analysis section. Then, based on the nearby cluster readings, the BS is forced to either made decisions for the entire monitored area (which is not justified) or make decisions for only the areas from which it can communicate and preserving the rest of the area for further monitoring. The same issue occurs when CHs transmit multi-hop data to the BS, with the exception that CHs closer to the BS wear out faster than CHs farther away due to the overhead of transmitting the additional packets from faraway nodes to the BS. This is taken into account by the proposed algorithm (FEHCA), which focuses on controlling the entire region for almost the entire network lifespan. It demonstrates the inclusion of advanced nodes within the clusters in uniform fashion (in case the BS is within the deployment zone) as well as in increasing fashion when the distance between the cluster and BS increases (in case the BS is far away from the deployment zone). This is crucial in order to monitor a larger area rather than only monitoring clusters in close proximity to the BS for most of the time and making decisions for the entire area based solely on inputs from neighbouring CHs only. With or without the same number of high energy nodes, the CHs far away from the BS die out early and only the CHs and the clusters near the BS remain alive for a longer period of time. We need sensed data from the entire area being monitored for better decision-making, which means that the CHs of the entire area must be alive, and one way to achieve this is to deploy advanced nodes in increasing order as the distance from the BS increases.
If two CHs are at different distances from the BS, then the transmission energy consumption must be different according to Equation (1). Therefore, in order to equate the transmissions of both the CHs to the BS, we require additional energy for the distant CH. The computation of the additional amount of energy is given in Lemma 2.

Lemma 2. Let the energy requirement for the transmission of k-bits from a cluster head (CH) to the base station (BS) in the case of a multipath be E Tx
where 'd' is the distance between BS and a CH. If the distance between two CHs (CH 1 and CH 2 ) from the BS is d 1 and d 2 such that d 2 >d 1 , then, in order to equate the transmissions of CH 1 and CH , we need additional energy for CH 2, which is given by where θ is the initial energy with CH 1 and CH 2 .
Proof. The energy requirement for transmitting k-bits over the distance d as discussed in Equation (1) is given by: Since E Tx () is a strictly increasing function over 'd', E Tx (k, d 2 ) > E Tx (k, d 1 ) for given d 2 > d 1. Now, using Equation (3), we obtain Reversing the inequality in Equation (4) produces Further, multiplying θ in (5), we obtain Then, equating Equation (6), we obtain

Results Analysis
In this section of the paper, a simulation setup along with two scenarios is considered. The setup was prepared for the purpose of evaluation of the proposed algorithm (FEHCA). The LEACH and the proposed algorithm were simulated in Python on an Intel Core i5-9300 H CPU @ 2.40 GHz and 8GB RAM running on Windows 10. The sensors were randomly deployed in a 100 m × 100 m area; the initial energy of the sensor was assumed to be 0.5 J. The CH probability was not fixed in advance as in LEACH and was computed at run time, as depicted in Figure 5. FEHCA's effectiveness was assessed on the same basic parameters as in [37], and the results are shown in Table 1. We tested our proposed algorithm extensively for the following two scenarios and compared the results with the existing LEACH protocol and ESO-LEACH [36].

Scenario 2.
A wireless sensor network (WSN 2 ) is deployed and BS is placed at position (50, 200), which is far away from the deployment area (x, y).

Experimental Evaluation for Scenario 1
In Scenario 1, the deployment area (x, y) is 100 m × 100 m, and 100 nodes are deployed at random, with the base station at the centre (50, 50), as shown in Figure 6. Using k-means clustering, the randomly deployed nodes are grouped into an optimal number of clusters, and the centroid of each cluster (marked as 'x') is chosen, as shown in Figures 7 and 8.
The results were collected by keeping the base station at position (50, 50) in a 100 m × 100 m area, which was monitored. We first simulated the LEACH algorithm and then the proposed FEHCA with the same simulation parameters and found the proposed algorithm to last, on average, 70 percent longer than the LEACH algorithm, as shown in Figure 9 and Table 2a. which is far away from the deployment area (x, y).

Experimental Evaluation for Scenario 1
In Scenario 1, the deployment area (x, y) is 100 m × 100 m, and 100 nodes are deployed at random, with the base station at the centre (50, 50), as shown in Figure 6. Using k-means clustering, the randomly deployed nodes are grouped into an optimal number of clusters, and the centroid of each cluster (marked as 'x') is chosen, as shown in Figures 7 and 8.

Experimental Evaluation for Scenario 1
In Scenario 1, the deployment area (x, y) is 100 m × 100 m, and 100 nodes are deployed at random, with the base station at the centre (50, 50), as shown in Figure 6. Using k-means clustering, the randomly deployed nodes are grouped into an optimal number of clusters, and the centroid of each cluster (marked as 'x') is chosen, as shown in Figures 7 and 8.   The results were collected by keeping the base station at position (50, 50) in a 100 m × 100 m area, which was monitored. We first simulated the LEACH algorithm and then the proposed FEHCA with the same simulation parameters and found the proposed algorithm to last, on average, 70 percent longer than the LEACH algorithm, as shown in Figure 9 and Table 2a. The results were collected by keeping the base station at position (50, 50) in a 100 m × 100 m area, which was monitored. We first simulated the LEACH algorithm and then the proposed FEHCA with the same simulation parameters and found the proposed algorithm to last, on average, 70 percent longer than the LEACH algorithm, as shown in Figure 9 and Table 2a.  No. of Alive Nodes in FEHCA: H_en-ergy_UD  100  100  100  100  100  99  200  100  99  200  93  99  300  100  99  300  88  99  400  100  99  400  85  99  500  100  99  500  81  99  600 100 97 600 71 99  Furthermore, we discovered that, similar to LEACH, most of the nodes in FEHCA were operational for the majority of the time, as shown in Figure 9, whereas this is not the case in many proposed algorithms [36], where the nodes' batteries start to become depleted immediately, as shown in Table 2b; after a certain number of communication rounds, this forces a decision to be made on the data received from limited regions only. Furthermore, FEHCA lasted 50 percent longer than LEACH, with over 90% of nodes still alive. The LEACH protocol is valued for its ability to perform with almost all nodes, resulting in higher area coverage for the majority of its lifetime, but its random clustering, unpredictable and random cluster head election mechanism results in high energy dissipation fluctuations, as shown in Figure 10, which must be addressed. Furthermore, LEACH makes no mention of the data packets lost during the algorithm's operation. FE-HCA, on the other hand, addresses both of these issues and the findings are supported by Figures 11 and 12. Furthermore, we discovered that, similar to LEACH, most of the nodes in FEHCA were operational for the majority of the time, as shown in Figure 9, whereas this is not the case in many proposed algorithms [36], where the nodes' batteries start to become depleted immediately, as shown in Table 2b; after a certain number of communication rounds, this forces a decision to be made on the data received from limited regions only. Furthermore, FEHCA lasted 50 percent longer than LEACH, with over 90% of nodes still alive.
The LEACH protocol is valued for its ability to perform with almost all nodes, resulting in higher area coverage for the majority of its lifetime, but its random clustering, unpredictable and random cluster head election mechanism results in high energy dissipation fluctuations, as shown in Figure 10, which must be addressed. Furthermore, LEACH makes no mention of the data packets lost during the algorithm's operation. FEHCA, on the other hand, addresses both of these issues and the findings are supported by Figures  11 and 12.   The Fault-tolerant Energy-efficient Hierarchical Clustering Algorithm (FEHCA), similarly to LEACH, operates at nearly full strength for the majority of its lifespan, which is critical for reliable networks. It achieves minimum energy dissipation variation due to the determination of optimum number of cluster heads and then by focusing on cluster  The Fault-tolerant Energy-efficient Hierarchical Clustering Algorithm (FEHCA), similarly to LEACH, operates at nearly full strength for the majority of its lifespan, which is critical for reliable networks. It achieves minimum energy dissipation variation due to the determination of optimum number of cluster heads and then by focusing on cluster The Fault-tolerant Energy-efficient Hierarchical Clustering Algorithm (FEHCA), similarly to LEACH, operates at nearly full strength for the majority of its lifespan, which is critical for reliable networks. It achieves minimum energy dissipation variation due to the determination of optimum number of cluster heads and then by focusing on cluster head placement at or near the centroids, as opposed to LEACH, which elects cluster heads at random. The fault-tolerant behaviour of cluster heads has been also demonstrated by FEHCA. In the event of a cluster head failure, secondary cluster heads (sCHs) take over the task. As shown in Figure 12, failed packet transmissions have been handled and re-transmitted.
In addition, the paper compares FEHCA: H_energy_UD to ESO-LEACH [36] with the same parameters. FEHCA: H_energy_UD uniformly distributes high energy nodes with twice the energy of normal nodes (i.e., 1J initial energy) in each cluster. High energy nodes are contained in the set of N-normal nodes; thus, normal node count = N-high energy nodes.
FEHCA: H_energy_UD, on average, keeps 90% of the network alive for more than 80% of the network life time, while ESO-LEACH keeps around 50% of the network alive for approximately 40% of the network life time, as shown in Table 2b.

Experimental Evaluation for Scenario 2
In Scenario 2, the deployment area (x, y) is 100 m × 100 m, 100 sensor nodes are randomly deployed and the base station is placed in position (50, 200), i.e., far away from the deployment area, as in hostile environments such as inaccessible border regions, wildlife monitoring, etc., as shown in Figure 13.  Figure 14 shows the clustered network after the optimal clusters have been determined, with the CHs chosen by the BS near or at the centroids. Each cluster communicates directly with the BS with the help of the elected CH, as depicted by Figure 15.  Figure 14 shows the clustered network after the optimal clusters have been determined, with the CHs chosen by the BS near or at the centroids. Each cluster communicates directly with the BS with the help of the elected CH, as depicted by Figure 15.
In Figure 16, a typical scenario involving the distance between the BS and the communicating CHs is illustrated, in which three CHs with 0.5 J of initial energy communicating directly with the BS are located at 117 m, 151 m and 185 m distances from the BS. When considering multipath propagation ε mp , we can see that the CH furthest from the BS is functional for 80 rounds, while the CHs closer to the BS are able to communicate for 173 and 426 rounds, respectively.  Figure 14 shows the clustered network after the optimal clusters have been determined, with the CHs chosen by the BS near or at the centroids. Each cluster communicates directly with the BS with the help of the elected CH, as depicted by Figure 15.  In Figure 16, a typical scenario involving the distance between the BS and the communicating CHs is illustrated, in which three CHs with 0.5 J of initial energy communicating directly with the BS are located at 117 m, 151 m and 185 m distances from the BS. When considering multipath propagation εmp, we can see that the CH furthest from the BS is functional for 80 rounds, while the CHs closer to the BS are able to communicate for 173 and 426 rounds, respectively. Thus, with knowledge of the distance (di) from the BS and the residual energies (Eres) of the member nodes in the network, a ratio of the energy requirements in accordance with the increasing distance can be formulated to keep the entire monitored area operational. We can compute the lifetime of the closest node in terms of the number of rounds  In Figure 16, a typical scenario involving the distance between the BS and the communicating CHs is illustrated, in which three CHs with 0.5 J of initial energy communicating directly with the BS are located at 117 m, 151 m and 185 m distances from the BS. When considering multipath propagation εmp, we can see that the CH furthest from the BS is functional for 80 rounds, while the CHs closer to the BS are able to communicate for 173 and 426 rounds, respectively. Thus, with knowledge of the distance (di) from the BS and the residual energies (Eres) of the member nodes in the network, a ratio of the energy requirements in accordance with the increasing distance can be formulated to keep the entire monitored area operational. We can compute the lifetime of the closest node in terms of the number of rounds required for it to function, and we can then calculate the energy requirements of the farthest nodes to match the performance in terms of rounds. As a result, as the distance from the base station increases, and we will need to introduce a greater number of high energy nodes.
Due to different energy requirements, cluster heads that are far away from the base station tend to lose energy at a much faster rate than cluster heads that are close to the base station. As a result, after a few rounds of communication with the base station, the Thus, with knowledge of the distance (d i ) from the BS and the residual energies (E res ) of the member nodes in the network, a ratio of the energy requirements in accordance with the increasing distance can be formulated to keep the entire monitored area operational. We can compute the lifetime of the closest node in terms of the number of rounds required for it to function, and we can then calculate the energy requirements of the farthest nodes to match the performance in terms of rounds. As a result, as the distance from the base station increases, and we will need to introduce a greater number of high energy nodes.
Due to different energy requirements, cluster heads that are far away from the base station tend to lose energy at a much faster rate than cluster heads that are close to the base station. As a result, after a few rounds of communication with the base station, the network is left with only nearby cluster heads that are communicating with the base station and are responsible for increasing the network lifetime. The simulation results validate this, and is shown in Figure 17. Based on the distance from the BS and the initial energy known, the lifetime of the nodes closer to the BS and the farthest nodes in terms of number of rounds can be calculated. Furthermore, in accordance with Lemma 2, the ratio of high energy nodes to be deployed as the distance from the BS increases has been determined, and the results are presented in Figure 18, which presents data from distant clusters for the majority of the network lifetime. Compared to the result of uniform distribution of high energy nodes as shown in Figure 17, the induction of high energy nodes in an increasing manner shows promising results, and the base station is able to receive data from a larger area rather than only from the nearby region, as shown in Figure 18, for a prolonged period of time. This assists in making practical decisions that are much more reliable than decisions made solely on the basis of nearby region cluster heads.
The simulation results depicted in Figure 19, also shows that the proposed FEHCA, with the help of controlled energy dissipation variations due to (a) optimal number of cluster selection, (b) better positioning of cluster heads and (c) energy conservation by delaying the new cluster head election procedure in the case of CH failures, helped in almost doubling the performance as compared to LEACH for the same parameters. Based on the distance from the BS and the initial energy known, the lifetime of the nodes closer to the BS and the farthest nodes in terms of number of rounds can be calculated. Furthermore, in accordance with Lemma 2, the ratio of high energy nodes to be deployed as the distance from the BS increases has been determined, and the results are presented in Figure 18, which presents data from distant clusters for the majority of the network lifetime. Based on the distance from the BS and the initial energy known, the lifetime of the nodes closer to the BS and the farthest nodes in terms of number of rounds can be calculated. Furthermore, in accordance with Lemma 2, the ratio of high energy nodes to be deployed as the distance from the BS increases has been determined, and the results are presented in Figure 18, which presents data from distant clusters for the majority of the network lifetime. Compared to the result of uniform distribution of high energy nodes as shown in Figure 17, the induction of high energy nodes in an increasing manner shows promising results, and the base station is able to receive data from a larger area rather than only from the nearby region, as shown in Figure 18, for a prolonged period of time. This assists in making practical decisions that are much more reliable than decisions made solely on the basis of nearby region cluster heads.
The simulation results depicted in Figure 19, also shows that the proposed FEHCA, with the help of controlled energy dissipation variations due to (a) optimal number of cluster selection, (b) better positioning of cluster heads and (c) energy conservation by delaying the new cluster head election procedure in the case of CH failures, helped in almost doubling the performance as compared to LEACH for the same parameters. Compared to the result of uniform distribution of high energy nodes as shown in Figure 17, the induction of high energy nodes in an increasing manner shows promising results, and the base station is able to receive data from a larger area rather than only from the nearby region, as shown in Figure 18, for a prolonged period of time. This assists in making practical decisions that are much more reliable than decisions made solely on the basis of nearby region cluster heads.
The simulation results depicted in Figure 19, also shows that the proposed FEHCA, with the help of controlled energy dissipation variations due to (a) optimal number of cluster selection, (b) better positioning of cluster heads and (c) energy conservation by delaying the new cluster head election procedure in the case of CH failures, helped in almost doubling the performance as compared to LEACH for the same parameters. As shown in Figure 19, the proposed algorithm has been tested with high energy nodes that are uniformly distributed (FEHCA: H_energy_UD) and incrementally distributed (FEHCA: H_energy_ID).
When comparing FEHCA and its variants to LEACH, Figure 20 shows the energy consumption variations per transmission, which are again minimised in the case of FEHCA and its variants.

Conclusions
Energy conservation and fault tolerance are two major aspects when it comes to wireless sensor networks. This paper presents a centralised clustering mechanism with an energy-efficient data dissemination scheme that can cope with a certain number of faulty cluster heads. We simulated the proposed Fault-tolerant Energy-efficient Hierarchical Clustering Algorithm (FEHCA) for the scenarios with the base station inside as well as outside the monitored area and found the results encouraging. The energy consumption per transmission was much better as compared to LEACH. The proposed FEHCA demonstrated, on average, a 70 percent increase in network performance in terms of the number of rounds as compared to LEACH when the base station was positioned inside the deployment. FEHCA surpassed LEACH in keeping most of the nodes alive for the majority of the time, unlike many well-known proposed schemes with the proper energy management; in fact, FEHCA lasted for 50 percent more rounds than LEACH with more than 90 percent of nodes alive. The proposed algorithm was also compared with ESO-LEACH in Figure 19. Performance of FEHCA and LEACH in terms of operational nodes per round when BS is positioned outside the deployment area.
As shown in Figure 19, the proposed algorithm has been tested with high energy nodes that are uniformly distributed (FEHCA: H_energy_UD) and incrementally distributed (FEHCA: H_energy_ID).
When comparing FEHCA and its variants to LEACH, Figure 20 shows the energy consumption variations per transmission, which are again minimised in the case of FEHCA and its variants. As shown in Figure 19, the proposed algorithm has been tested with high energy nodes that are uniformly distributed (FEHCA: H_energy_UD) and incrementally distributed (FEHCA: H_energy_ID).
When comparing FEHCA and its variants to LEACH, Figure 20 shows the energy consumption variations per transmission, which are again minimised in the case of FEHCA and its variants.

Conclusions
Energy conservation and fault tolerance are two major aspects when it comes to wireless sensor networks. This paper presents a centralised clustering mechanism with an energy-efficient data dissemination scheme that can cope with a certain number of faulty cluster heads. We simulated the proposed Fault-tolerant Energy-efficient Hierarchical Clustering Algorithm (FEHCA) for the scenarios with the base station inside as well as outside the monitored area and found the results encouraging. The energy consumption per transmission was much better as compared to LEACH. The proposed FEHCA demonstrated, on average, a 70 percent increase in network performance in terms of the number of rounds as compared to LEACH when the base station was positioned inside the deployment. FEHCA surpassed LEACH in keeping most of the nodes alive for the majority of the time, unlike many well-known proposed schemes with the proper energy management; in fact, FEHCA lasted for 50 percent more rounds than LEACH with more than 90 percent of nodes alive. The proposed algorithm was also compared with ESO-LEACH in

Conclusions
Energy conservation and fault tolerance are two major aspects when it comes to wireless sensor networks. This paper presents a centralised clustering mechanism with an energy-efficient data dissemination scheme that can cope with a certain number of faulty cluster heads. We simulated the proposed Fault-tolerant Energy-efficient Hierarchical Clustering Algorithm (FEHCA) for the scenarios with the base station inside as well as outside the monitored area and found the results encouraging. The energy consumption per transmission was much better as compared to LEACH. The proposed FEHCA demonstrated, on average, a 70 percent increase in network performance in terms of the number of rounds as compared to LEACH when the base station was positioned inside the deployment. FEHCA surpassed LEACH in keeping most of the nodes alive for the majority of the time, unlike many well-known proposed schemes with the proper energy management; in fact, FEHCA lasted for 50 percent more rounds than LEACH with more than 90 percent of nodes alive. The proposed algorithm was also compared with ESO-LEACH in terms of common parameters and it succeeded in keeping more than 90 percent of the nodes alive for more than 80 percent of the network lifetime, as compared to ESO-LEACH, where almost 50% of the network was down within approximately 40% of the network lifetime. The algorithm also doubled the communication rounds as compared to LEACH in the case of the base station being placed far away from the deployment. Moreover, with the induction of high energy nodes in an increasing manner in each cluster as per the increasing distance from the base station in accordance with Lemma 2, we observed extended coverage for a prolonged period of time, with the last few sensing nodes left from the entire area to be monitored. FEHCA is also capable of handling lost packets in the case of faulty cluster heads. However, for fault tolerance, FEHCA considers only permanent failures of CHs. In the future, we will attempt to introduce multi-hopping in FEHCA in order to minimise energy consumption and prolong the network lifetime.