A Formal Energy Consumption Analysis to Secure Cluster-Based WSN: A Case Study of Multi-Hop Clustering Algorithm Based on Spectral Classification Using Lightweight Blockchain

Wireless Sensors Networks are integrating human daily life at a fast rate. Applications cover a wide range of fields, including home security, agriculture, climate change, fire prevention, and so on and so forth. If WSN were initially flat networks, hierarchical, or cluster-based networks have been introduced in order to achieve a better performance in terms of energy efficiency, topology management, delay minimization, load balancing, routing techniques, etc. As cluster-based algorithms proved to be efficient in terms of energy balancing, security has been of less importance in the field. Data shared by nodes in a WSN can be very sensitive depending on the field of application. Therefore, it is important to ensure security at various levels of WSN. This paper proposes a formal modeling of the energy consumed to secure communications in a cluster-based WSN in general. The concept is implemented using the Proof of Authentication (POAh) paradigm of blockchain and applied to a Multi-hop Clustering Algorithm based on spectral classification. The studied metrics are residual energy in the network, the number of alive nodes, first and last dead node.


Introduction
WSN face many challenges: resource constraints in terms of energy and computational power, low transmission bandwidths and processing capabilities, and hidden terminals provoking collisions during data transmission [1,2]. Clustering techniques were proposed to tackle some of these issues. Compared with flat networks, cluster-based networks organize the network in smaller groups based on network characteristics and application requirements.
They logically divide WSN in units or clusters where different types of nodes are identified and assigned with roles: • Cluster Members (CM): these nodes are attached to the same cluster. They read data from the physical environment. • Cluster-Heads (CH): coordinate CMs, collect, aggregate, and forward data to a closer CH or to the sink. • Gateways (optional): in some cluster-based WSN (CBWSN), gateways serve as relay point to forward data to other clusters or to maintain contact with close range clusters. • Base Station (BS) or Sink: the final destination of data.
Ref. [2] defines clustering as a type of topology management techniques which can group nodes to improve the efficiency of the network by managing resources and rotating responsibilities among nodes to provide fairness. Key characteristics of CBWSN [1] include data fusion and load management, efficient energy management, relay node, collision avoidance, latency reduction, fault tolerance, deadlock prevention, quality of service (QOS), etc. In order to evaluate the performance of CBWSN, some elements to consider are: • Cluster formation: the network model is either distributed or centralized. Cluster heads are selected using the node's id, neighbor numbers, average hop distance, remaining energy, communication cost, and distance to the BS or to other cluster members. Clusters can be formed by grouping nodes and choosing CHs in these groups or by selecting first CHs and requesting CMs to connect to CHs. CHs are elected based on the availability of resources, randomly or in a predetermined way. CH election can be centralized or distributed. • Cluster complexity: referred to the number of rounds required to perform the cluster formation and the transformation of data using broadcasting, multi-casting, or data aggregation • Cluster communications and data forwarding: as nodes transfer data to CHs, it can be forwarded to the BS raw or combined with other information. Intra-and inter-cluster transmissions are executed using single hop or multi-hop routing depending on the position of the receiver and the distance between the sender and the endpoint. • Cluster management: deals with network topology changes over time. Re-clustering may occur after a specific interval of time (time-based), or an event (event-based). For example, how is the topology affected when a a node enters, exits the network or runs out of energy?
According to [2], clustering algorithms have been developed to meet specific objectives, such as load balancing, scalability, packets delivery, throughput, delay, and 94.8% of surveyed articles in the CA field listed energy consumption as their objective. These clustering techniques were implemented using clear communications and did not integrate security features, such as cryptography. This study formally evaluates the costs generated by the addition of security elements in cluster-based WSN. The concept is implemented using PoAh, a lightweight Blockchain paradigm, applied to a Multi-hop Clustering Algorithm based on spectral classification. The paper is organized as follow: Section 2 examines clustering algorithms and security features. Section 3 covers Multi-hop Clustering Algorithm based on spectral classification and formal modeling of costs generated to secure communications, as well as the simulation environment. Section 4 presents results and interpretations with regards to residual energy, the number of alive nodes, first, half, and last dead nodes.
Current applications require modern and powerful security techniques because they handle sensitive data depending on the field of application. Several propositions emerged to address security in cluster-based WSN.
An Energy Aware Trust-based Secured Routing Algorithm (EATSRA) [12] was proposed to evaluate the trust reputation of different nodes in the network. The end goal was to create a secure path to reach the BS using trust metrics such as trust scores and node behavior in order to identify safe paths.
Ref. [13] focused on CH election based on trust with the objective of detecting malicious cluster heads. Ref. [14] proposed a lightweight trust-based secure clustering in WSN using honeybee mating intelligence approach, while [15] used blockchain technology in WSN.
Previous works [16,17] introduced Proof-of-Authentication (PoAh) paradigm in blockchain which ensures integrity, confidentiality, and authentication with less requirements in terms of CPU usage. Ref. [18] adapted it to secure a flat and centralized network. Although these techniques address security with regards to various attacks, most of them do not evaluate the impact of security on energy consumption. The objective of this paper is to secure a clustering algorithm in WSN and evaluate the induced cost in terms of energy. This paper implements a secure Multi-hop Clustering Algorithm based on spectral classification (MHCA-SC) [19] using PoAh.

Methodology
The Multi-hop Clustering Algorithm based on spectral classification steps are summarized in the following subsections:

Pre-Processing Phase
Using the RSS (Received Signal Strength), each node calculates its distance to the BS in order to define the region it belongs to ( Figure 1). The information is conveyed to the BS which builds a graph and determines a similarity matrix A based on the following equation: where d(i, j) is the euclidean distance between i and j and σ is the standard deviation or the mean distance of a point to its nearest neighbors. The degree and Laplacian matrices are then generated.

Clustering Phase
Based on K-means classification and the Laplacian matrix, the main goal of this step is to determine the optimal number of clusters and assign each node to its corresponding cluster.

Cluster Head Election Phase
CHs are then determined and selected based on nodes' id and their remaining energy in each cluster. For a node to be elected as a CH, its residual energy must be greater than the mean residual energy in the cluster, while the node's id equals to Ck = (r mod|Sk|) where r is the round and Sk the number of nodes in the cluster. All nodes with a remaining power lower than Ermin notify node Ck + 1 of its energy.

Intra-Cluster Routing
In the same cluster, nodes communicate directly with their CH using schedule TDMA (Time Division Multiple Access).

Inter-Cluster Multi-Hop Routing
Although nodes easily reach their CH using single hop communications in the same cluster, packets are sent from CHs to the sink using a multi-hop routing technique ( Figure 1).

Securing MHCA-SC
Compared with HCA-SC [20], SCNOC [21], DECSA [22], and LEACH-C [23], MHCA-SC proved to be more efficient, considering the FDN (first dead node) and the total residual energy in the network. However, with the growing number of attacks on WSN and their consequences, it is important to introduce security techniques in order to protect the network. In [18], a lightweight blockchain for secured WSN in single hop and centralized wireless networks was implemented. Prior to sending a message to the sink, each node hashes its MAC address and the message to send using SHA256 or MD5. Mac and message hashes are merged, then encrypted using AES256 and sent to the sink as detailed in Algorithm 1. The sink decrypts data (hashes) and authenticates the node by comparing the received values with a list of trusted nodes details, as described in Algorithm 2.

Algorithm 1 Node
Require: MACAddress Ensure: Send Message to the sink nodeMACadd ← readContikiMAC(); textData ← "Hello"; hashMAC ← hash(nodeMACadd); while true do startTimer(timer); hashText ← hash(textData); dataSend ← merge(hashMAC, hashText); if timer is expired then send(dataS, sinkId); end if end while The main challenge of securing clustering algorithms in WSN and specifically MHCA-SC, is to identify where cryptography takes place in the communication model. Messages in clustering in general and MHCA-SC are exchanged from ( Figure 1): • Node to CH (intra cluster): Each node within a cluster sends data to its CH (1 in Figure 1); • CH to CHs (inter cluster): CHs cannot directly reach the sink and forward their data to CHs that relay to other CHs (2 in Figure 1) or to CHs that can easily reach the BS (3 in Figure 1); • CH to BS/sink: these CHs directly access the sink with single hop communications (4 in Figure 1).

1.
Securing Node to CH communications: each node computes hashes, encrypts the data and sends it to the CH. Here, the added energy is the energy required to hash the message and the MAC address and finally cypher the combined data. Extra energy is calculated as follows: where E nch is the energy spent to secure communications between a node and its CH, E hash the energy spent to compute a hash, E enc the energy used to encrypt, t is the number of nodes in the cluster.

2.
Securing CH to CHs (inter cluster) communications: The choices available in this scenario are: (a) A CH deciphers messages received from every single node within its cluster in order to authenticate the sender. Upon a successful authentication, the CH aggregates hashes, encrypts and forwards data to the next hop (steps b or c depending on the position of the CH and the cluster). Added energy in this case is calculated as follows: where E CH1 is the energy spent to secure communications between a CH and another CH, E dec the energy spent to decrypt each message, E auth the energy spent to authenticate a node.
3. CH aggregates data received from nodes within its cluster (already encrypted by nodes) and forwards it to the next hop (steps b or c). Here, the added energy is 4.
If a CH receives data from another CH, it either decrypts the data, authenticates the sender (another CH) and relays it (encryption) with the resulting equation:

5.
Or directly forwards it to the next CH.
Securing CH to BS communications: CHs are directly in contact with the Base Station/sink.
(a) CH deciphers messages received from other CHs. If the sender (CH) is authenticated, data are prepared and securely sent to the sink. The resulting energy is: where E CHS is the energy spent to secure communications between a CH and the sink.
Otherwise, the CH conveys data to the sink without any processing. Table 1 highlights the required amount of energy to send messages from cluster1 to the base station based on Figure 1 in various configurations (E 1-8 ). The overall consumption of energy in the network includes energy costs generated by clusters 2, 3, and 4. Where: It is clearly noticeable that hashing and encrypting messages at every stage of the transmission increases the energy consumption in the network. In order to ensure minimal energy consumption, path E 8 is the best choice. In this case, hashes are computed and then encrypted by the node, while cluster heads only forward the encrypted data to the base station. The sink deciphers and authenticates each received message. Figure 2 illustrates a packet sent by a node to its CH and forwarded to the base station combined with packets from the same cluster. Encryption provides confidentiality while hashing ensures integrity: any data alteration by a malicious node during transportation is detected by the sink because the sink maintains a list of trusted nodes. Any node spoofing a trusted node MAC is considered as malicious by the sink because the node's ID, MAC address, and the corresponding hashes will not match. It is also important to note that the sink is considered as a device with unlimited power and CPU resources compared with other nodes. As a result, it manages the majority of the computation load in the network.

Simulation and Environment
Simulations were carried out in two phases on a DELL Latitude 55220 with 16GO RAM. The first one consisted in the evaluation of the energy consumed to implement the MAC address hashing using SHA256, as well as the encryption of messages using AES256. This part was implemented in a Tmote Sky in Contiki 3.0, an emulator for WSN and IoT where network devices are written in C. Hashing and encryption were implemented on a single node in order to calculate the energy spent for these operations. Energest is a module in Contiki that measures the number of clock ticks in different states, such as transmission, reception, CPU, sleep mode, and low power mode. During our simulations, the CPU clock ticks was the only needed value. Algorithm 1 was repeated 100 times and the average value was calculated (1078 ticks). Using the Tmote datasheet specifications, the consumed energy was obtained. The next phase consisted in simulating the SMHCA-SC. MHCA-SC was originally implemented in MATLAB. During simulations, 300, 400, and 500 nodes were randomly scattered on a 300 m × 300 m area and the sink was positioned at x = 150 m, y = 350 m during our simulations. Three 100 m wide areas were created containing nodes. Figures 3 and 4 represent simulations with 300 and 500 nodes.
The message was 4000 octets in size and the energy for driving the electronics (E e lec) was 50 nJ/bit. The coefficients E f s and E mp were respectively 10 pJ/bit/m 2 and 0.0013 pJ/bit/m 4 . These parameters, as shown in Table 2 are used to calculate the amplifier energy E amp . The energy consumed by hashing and encryption was introduced in the model when nodes in a same cluster send messages to their cluster head. In total, 10 randomly networks were generated for each model and the mean values were calculated.

Mean Residual Energy
Integrating security features increased the energy spent by nodes and therefore affected the overall lifespan of the network. Figure 6a-c show that mean residual energy of SMHCA-SC remained below MHCA-SC in all scenarios (numbers of nodes). In terms of energy (Figure 6d), a 3% decrease in residual energy was observed at round 100 between the two models. The decline grew gradually to 5,8,16, and 37% at rounds 200, 300, 400, and 500, respectively (300/400 nodes). This means that the MHCA-SC lifespan is greater than its secure version and that security features introduce a cost in terms of residual energy. The same patter was recorded for the 500 nodes model.
The graph representing the number of alive nodes over time is related to the residual energy in the network. In Figure 7a, with 300 nodes at the beginning of the simulation, the curves for MHCA-SC and SMHCA-SC remained relatively close up to rounds 199 and 171, respectively, where 294 nodes were still alive. From 294 alive nodes to 150 alive nodes, the curves decline by 23% for MHCA-SC and 30% for SMHCA-SC, respectively. From 150 alive nodes, MHCA-SC curve declines 30% less than SMHCA-SC as they reached 18 alive nodes at rounds 989 and 761, respectively, and, finally, decreased by 19% and 34% before all nodes die. In Figure 7b,c, MHCA-SC decline less than SMHCA-CS, thus showing the impact of the security features on the number of alive nodes over time.      Figure 9a for the 300-node model. It also occurred in simulations 2, 6, and 8 for 400 nodes (Figure 9b), and in simulations 8 and 9 for 500 nodes (Figure 9c). These deviations could be explained by the fact that as nodes are randomly scattered in the network, for the same model simulation 5 may produce a different model for MHCA-SC and SMHCA-SC. This also suggests that other factors, such as distance and numbers of nodes in the cluster may have more impact on nodes energy and should, therefore, be thoroughly investigated.

Half Dead Node
As shown in Table 4, 50% of nodes died faster in SMHCA-SC compared with MHCA-SC. For 300 nodes, it was spotted at rounds 639 and 801, respectively, while the same event was observed at rounds 687 and 873 for the 400-nodes model and 707 and 910 for the 500-nodes model.

Last Dead Node
The network lifespan is also affected by cryptographic techniques. As illustrated in Table 5 for the 300-node model, all nodes were dead at rounds 1080 and 813, respectively, for MHCA-SC and SMHCA-SC. With an initial number of 400 nodes, the batteries of all sensors drained at rounds 1099 and 810 while the values were 1110 and 815 for the 500-node model . These values are shown in Figure 10.  [24], a Location-Energy Spectral Clustering Algorithm, was more effective than LEACH-C and DECSA in terms of residual energy. Compared with LESCA, SMHCA-CS lasted longer in terms of mean residual energy for 500 nodes (Figure 11). FDN also occurred faster with LESCA, as shown in Figure 12. The number of alive nodes decreased at a linear rate when 90% nodes were still alive, but remained lower than SMHCA-CS up to 20% alive nodes. From 20% alive nodes, LESCA was more effective in terms of lifespan ( Figure 13).  From the above results, as energy consumption was evaluated using different metrics, SMHCA-CS drained more energy in nodes compared with MHCA-CS. The number of alive nodes in SMHCA-CS declined faster compared with MHCA-CS. Half of the alive nodes and the last dead node occurred also faster in all cases. Compared with some open (not secured) cluster-based WSN, such as LESCA and SMHCA-CS, was less energy consuming. The main advantage of this work is to identify communications within cluster-based WSN and formally assess energy consumption induced to secure them. Lightweight blockchain paradigm (PoAh) ensures confidentiality, integrity, and authentication. This study's main disadvantage is the use of two separate environments: Contiki measures energy and MATLAB evaluates the impact of security on the network related to energy. This method is not suitable to test the secured version against well known attacks such as sinkhole, wormhole, sybil attacks, and Denial Of Service because security was not implemented within MATLAB. This issue can be addressed by fully implementing a cluster-based WSN in Contiki or even NS2 for example and performing the previously mentioned attacks. It will also give an opportunity to study other metrics, such as throughput and latency, that are important in WSN.

Conclusions
Clustering usage improved WSN lifespan, load balancing, scalability, and packet delivery by rotating cluster head's responsibilities over different nodes in clusters. CBWSN are evaluated in cluster formation, complexity, communication, and management. Security issues in WSN have been raised, especially in sensitive areas, such as military, medical, water delivery, and even smart grids, as more and more attacks surface, aiming to compromise these networks. Although some research projects propose several solutions to increase WSN safety using cryptography and ensure trustworthiness of nodes, authentication, integrity, and confidentiality, the cost in terms of energy consumption and lifespan is not formally assessed. The study formally assesses energy costs induced to secure CBWSN. The concept is applied to Multi-hop Clustering Algorithm using blockchain's paradigm namely Power of Authentication at different levels of the communication model.
Compared with MHCA-CS, SMHCA-CS consumed more energy in general. This was precisely measured with FDN, LDN, residual energy, and the number of alive nodes. For example, with an initial number of 300 nodes, FDN was recorded at rounds 165 and 150, respectively, for MHCA-SC and SMHCA-SC. Although, for 400 nodes, FDN was spotted at rounds 204 and 187, it was noted at round 216 for MHCA-SC and SMHCA-SC. The half dead node happened at rounds 801 and 639 while LDN at rounds 1080 and 813 for 300 nodes. Compared with LESCA, SMHCA-CS exhibited better performances in terms of residual energy, FDN and the number of alive nodes at some point. For instance, the number of alive nodes decreased at a linear rate when 90% of nodes were alive but remained below SMHCA-CS.This illustrates the need for lightweight security algorithms. Although this study focuses on Multi-hop Clustering Algorithm based on spectral classification, it was implemented on two different environments, thus preventing simulations of well-known attacks that may occur in a WSN. Future works include a full implementation of a clusterbased WSN and its secured version, its robustness against several know attacks, as well as the evaluation of other metrics. Funding: The APC was sponsored by SMARTiLab/EMSI.

Acknowledgments:
The authors would like to thank SMARTiLab/EMSI and Public Safety Canada for support and infrastructure.

Conflicts of Interest:
The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.
Sample Availability: Data are available on request to authors.

Abbreviations
The following abbreviations are used in this manuscript: