A Secure Protocol against Selﬁsh and Pollution Attacker Misbehavior in Clustered WSNs

: Wireless sensor networks ( WSNs ) have been widely used for applications in numerous ﬁelds. One of the main challenges is the limited energy resources when designing secure routing in such networks. Hierarchical organization of nodes in the network can make efﬁcient use of their resources. In this case, a subset of nodes, the cluster heads ( CH s), is entrusted with transmitting messages from cluster nodes to the base station ( BS ). However, the existence of selﬁsh or pollution attacker nodes in the network causes data transmission failure and damages the network availability and integrity. Mainly, when critical nodes like CH nodes misbehave by refusing to forward data to the BS , by modifying data in transit or by injecting polluted data, the whole network becomes defective. This paper presents a secure protocol against selﬁsh and pollution attacker misbehavior in clustered WSNs , known as ( SSP ). It aims to thwart both selﬁsh and pollution attacker misbehaviors, the former being a form of a Denial of Service ( DoS ) attack. In addition, it maintains a level of conﬁdentiality against eavesdroppers. Based on a random linear network coding ( NC ) technique, the protocol uses pre-loaded matrices within sensor nodes to conceive a larger number of new packets from a set of initial data packets, thus creating data redundancy. Then, it transmits them through separate paths to the BS . Furthermore, it detects misbehaving nodes among CH s and executes a punishment mechanism using a control counter. The security analysis and simulation results demonstrate that the proposed solution is not only capable of preventing and detecting DoS attacks as well as pollution attacks, but can also maintain scalable and stable routing for large networks. The protocol means 100% of messages are successfully recovered and received at the BS when the percentage of lost packets is around 20%. Moreover, when the number of misbehaving nodes executing pollution attacks reaches a certain threshold, SSP scores a reception rate of correctly reconstructed messages equal to 100%. If the SSP protocol is not applied, the rate of reception of correctly reconstructed messages is reduced by 90% at the same case.


Introduction
Wireless sensor networks (WSNs) are receiving much attention from both research and industrial fields. The main attractive aspects are their low cost, and effective and easy-to-implement sensors. They are re-configurable to many types of applications to solve real-world problems. However, these small sensors are resource-constrained and battery-limited. In addition, they can be deployed in open and remote environments on large scales that are possibly exposed to several risks. Consequently, they are vulnerable to numerous security attacks. Counteracting these attacks and implementing security techniques can involve a great energy cost to these networks. It can damage the balance of energy consumption in the network, decreasing the nodes' connectivity and the network survival time.
One way to preserve some battery power during communications involves the choice of network architecture and of the protocol used to route data from a sensor to the BS. A number of methods designed to handle security issues in WSNs exploit network hierarchization and cluster-based routing [1]. Clustering a sensor network typically leads to possibilities for scaling, better control of nodes and energy efficiency gains provided by partitioning [2,3]. Other ways to preserve battery power are sleep scheduling and energy harvesting. The former aims at minimizing the number of sensors to activate to cover a desired portion of the region of interest preserving the connectivity among sensors [4]. The latter aims to resolve the issue that the nodes are often unreachable after deployment and introduces the concept of renewable energy that can be harvested from the surrounding environment [5]. As mentioned before, WSNs are vulnerable to several attacks, which can be classified into two types:inside attacks, performed by authorized nodes and outside attacks, performed by unauthorized nodes.
Previous studies have shown that inside attacks are far more difficult to control than outside ones [6,7]. Selfishness and pollution attacks are two forms of critical inside attacks, where nodes can easily launch a type of Denial of Service (DoS) in the first, or pollute data in transit to interrupt the normal functioning of the network, in the second. More specifically, in the former attack, selfish nodes refuse to relay packets and use network resources for their own benefit. This operation is one variety of a DoS attack which can be performed in different ways [8] and has multiple forms [9]. In fact, a DoS attack can also be achieved by congesting the network or draining the energy from its components [10]. This attack is considered one of the most frequent attacks in WSNs [11,12]. In the latter attack, pollution attacker nodes modify native data packets or inject fake ones into the network. A pollution attack is potentially more damaging when applied in protocols for WSNs based on network coding (NC) since it may have devastating impact on the data routing [13]. Figure 1 illustrates the aforementioned considered attacks. In a clustered network, if the misbehaving nodes that invoke a selfishness or a pollution attack are the CH nodes, then the network will be more disordered and the consequences more serious. The CHs have a critical role in forwarding the cluster data to the BS [14]. DoS and pollution attacks are challenging issues that aim to disable the service that a WSN is supposed to provide, and create routing failure. This paper proposes a secure protocol for WSNs that defends against the misbehavior of selfish and pollution attacker nodes in the network. It extends our previous work [15], which ensured confidentiality of exchanges in clustered WSNs using an optimized (NC) technique. This extension provides a solution for preventing an attack on network avail-ability as well as integrity. It also proposes attack detection and a punishment mechanism for malicious nodes causing the DoS attack or the pollution attack. The solution takes advantage of the matrix format of keys specific to each sensor node used for confidential communication in [15] and implements a new scheme of NC by adding data redundancy. It converts a file/message into a set of p pieces in such a way that the file/message can be reconstructed from any predefined subsets of t out of p pieces. Consequently, our solution ensures data availability and integrity with considering the confidentiality requirement already existing.
Our proposition is a secure protocol against selfish and pollution attacker misbehavior in clustered WSNs, named SSP. It aims to shield against a coalition of misbehaving CH nodes that initiate a DoS attack by dropping received packets or a pollution attack by altering them. The main contributions of this paper are as follows: 1. Preventing a DoS attack by a group of CH nodes and ensuring a reliable transfer of data between source and BS. 2. Ensuring data integrity and error correction despite the existence of a pollution attack performed by a group of CH nodes. 3. Locating the misbehaving nodes among CHs and implementing a punishment mechanism against them to enhance the network availability and integrity. 4. Providing simulation results that prove the effectiveness of our protocol compared to a defenseless protocol, regarding the percentage of correct messages received and also considering the case when the number of misbehaving nodes is varying. 5. Proving the scalability of the protocol.
The present paper is structured as follows. Section 2 gives feedback on the literature dealing with existing secure protocols for DoS and pollution attacks prevention/detection. Section 3 presents the assumptions about the system and the threat model used in the paper. The design steps for the protocol and a theoretical discussion are given in Section 4. The simulation set-up and the results are presented in Section 5. Section 6 summarizes the paper and provides conclusions.

Related Work
In this section, we present feedback on the literature on secure protocols dealing with availability and integrity requirements in WSNs.
Many studies have been conducted to propose solutions that provide data availability to routing protocols, whether by preventing or detecting DoS attacks in WSNs.
In Ref. [16], the authors proposed a protection against DoS attacks in WSNs. Their solution is based upon two phases. The first phase partitions the network into clusters via the Hybrid Energy-Efficient Distributed clustering (HEED) protocol. The second sets up a Key Distribution Server (KDS), which supplies each node with session keys and unique IDs. Then, it executes a mutual authentication process between server and CHs and then between CHs and cluster members using a hash function. When a CH detects a malicious node after an unsuccessful authentication process, it requests the KDS to delete its secret key and requests all the other CHs to block it from inter-cluster communication. Consequently, this node becomes keyless and all the services related to it are arrested. After a certain period of time, the KDS calculates new session keys for each CH to communicate with the server and with the cluster members. Although the proposed mechanism can prolong the network lifetime and reduce the overhead using HEED clustering protocol, the cryptography-based mechanism used for defending DoS attacks is computationally expensive and also energy consuming.
Mansouri et al. designed a clustering-based approach to address DoS attacks in WSNs [17]. Their approach is conducted in two phases. The first phase elects a control node (Cnode) with a recursive low-energy adaptive clustering hierarchy (LEACH) processing algorithm, for each cluster. The second phase permits detecting and blocking of compromised nodes by the Cnode. When a node sends a number of messages that are greater than a threshold, the Cnode identifies that node as a compromised node. All messages sent by the compromised node will be rejected and ignored by the neighboring nodes. The proposed protocol tends to provide significant results in term of time detection and throughput. However, it only deals with a specific type of DoS attack, which is excessive data transmission to drain the node's energy. In addition, it does not consider the case when a CH node executes the attack.
In Ref. [18], the authors adopted recursive clustering based on the LEACH algorithm. However, they enhanced it by using a novel algorithm named Fast and Flexible Unsupervised Clustering Algorithm (FFUCA). In this approach, the recursive clustering is used to identify the CH node and control nodes (Cnode) in each cluster, taking into account the node's location and energy consumed. The authors compared the relevance of these two algorithms (FFUCA and LEACH). FFUCA generates an optimal solution for minimizing distances between Cnode and sensor nodes when deploying the network compared to LEACH. It also achieves better DoS detection for false positive and false negative rates. However, this approach still does not deal with the case when the CHs are compromised. Sujit et al. proposed a new approach for detection of selfish node's behavior in WSN [19]. They used average and maximum values of re-transmissions in several nodes aimed to detect selfish nodes in the network. First, the protocol sets the shortest routes using Dijkstra's algorithm. Then, it decides on a node type, whether it is a partially selfish node, a fully selfish node or a non-selfish node, using a threshold value and the calculated maximum value of re-transmissions. This system detects but does not prevent selfish behavior when routing data in the network.
Virmani et al. [20] introduced an exponential trust-based mechanism to identify malicious nodes in a clustered network. Heads of clusters are selected based on their higher energy level. When a source node in a cluster sends a packet to a particular node, its streak counter stores the count of consecutive packets dropped. The mechanism calculates a trust factor (TF) formula, depending on the counter value, which falls exponentially with the increase in the number of packets dropped by the particular node. When TF goes below a certain threshold value, the node is stated as an adversary. In this case, the cluster head sends a request to the source node to re-transmit the packet. The proposition tends to detect the black hole attack, a type of DoS attack, where packets are consecutively dropped. However, it does not ensure protection from the attack and uses re-transmission of dropped packets, a process which has an extra energy cost.
Kalkha et al. [21] proposed an approach for preventing black hole attack in WSN using the Hidden Markov Model (HMM) to model the succession of choices of shortest path made by a source node to reach a destination node. Their approach uses a Viterbi algorithm to determine the path with the greatest probability of being malicious. Next, it identifies the likeliest fake nodes in the selected malicious path and sets a new routing algorithm that avoids these malicious nodes. The approach helps to avoid the selection of malicious path and node for routing data to the destination. However, it is dependent on a probabilistic, not a deterministic method. Moreover, it uses flat routing where each source node performs a multiple paths selection and re-selection after the maliciousness analysis, which will exhaust its resources quickly.
The network coding (NC) methodology has been used in many research papers to ensure security against various attacks [22]. Specifically, this methodology was introduced as an alternative to conventional networking, in which intermediate nodes merely forwarded incoming packets without alteration. In fact, it performs computations on received data prior to forwarding the data to the next hop. The NC technique exists in a range of types that are further clarified and detailed in [23]. We have proposed, in a previous study, a secure network coding-enabled approach for a confidential cluster-based routing in wireless sensor networks [15], called SNCR. Without the need to use expensive traditional cryptographic-based methods, it offers an optimized version of NC methodology to overcome eavesdropping attacks on transmitted data. It exploits pre-loaded hidden encoding vectors and transmits only a single digit instead of transferring all the coding coefficients with the coded vector. Therefore, it minimizes the overheads compared to conventional network coding systems. SNCR has been proved to guarantee confidential data transmission whether the adversary attack is internal or external to the network. However, it does data coding at two hierarchical levels of the network, source nodes and CHs, which reduce the overall power of the network.
The specific communication mode of NC, in which intermediate nodes are allowed to deliberately change the received packets, creates opportunities for malicious nodes to disrupt correct data routing. When a packet (possibly encoded) is polluted, the decoding of the set of encoded packets associated with it will not generate the correct original message data at the destination node. Thus, a pollution attack in an environment where network coding is used is more likely, and preventing it becomes more important [24].
In this context, several solutions have been proposed to thwart pollution attacks in NC-enabled networks. Adeli et al. proposed in [25] a secure linear network coding scheme built upon a cryptographic hash function. This latter is used to introduce different random noisy terms within the information symbols. However, their scheme imposed a restriction on the linear network code design. A new refined scheme was proposed by Kim and Young-Sik in [26] to remove this restriction but this type of scheme still needs computation of the hash values of each packet, thereby increasing the transmission delay and the required operations' complexity.
Yu et al. proposed an efficient homomorphic signature scheme based on the RSA signature [27]. In their scheme, the source signs its message using its private key. Intermediate nodes verify the received messages using the source's public key. The authors used a novel homomorphic function where the encoded message's signature is composed from those of the input messages (used to create the encoded message). Their scheme can accomplish source authentication and data verification. However, it increases transmission overhead, since the size of an RSA signature is typically very large. The authors of [28] proposed an identity-based digital signature scheme to detect pollution attacks in intra-session network coding. Their scheme does not involve a third-party query to certificate authority and does not have the key issue, and it does not take into consideration the energy consumption parameter in resource-constrained WSNs.
Liu Xiang et al. [29] presented a privacy-preserving signature scheme for linear network coding-based networks. They used a homomorphic signature and applied new signing and verification processes that enhance the computational efficiency. Their scheme optimizes the inherent security potential of random network coding and provides countermeasures against both eavesdropping attacks and pollution attacks. By encrypting a message before signing it, the intermediate nodes are able to capture and discard fake packets. Although the homomorphic signature NC scheme can solve the problem of data integrity and confidentiality, the signature generation and verification processes remain high-energy expenses, reducing the transmission efficiency.
Another approach, proposed in [30], used an identity-based linearly homomorphic signature scheme for NC-enabled wireless sensor networks to ensure data integrity and authenticity. In their scheme, the authors applied a signature generation and verification that are both independent of the size of the data packet to reduce the computation cost. They used the unique serial numbers of source nodes as an identity-based public key that is free of certificate management, allowing the destination to track the source of data.
In the previous studies on solutions, whether for DoS attacks or pollution attacks, we noted several disadvantages. In the solutions tackling DoS attacks, some ignored the case where the CHs are the malicious nodes in the network, despite their major role in controlling data routing, which can affect network availability. Others focused only on solutions for detecting DoS attacks and not for preventing them. Others approaches used cryptographic mechanisms with high computational costs. Solutions attempting to resist pollution attacks, using hash functions or homomorphic signatures, are computationally expensive and may lead to traffic overhead.
Our solution tackles both DoS and pollution attacks, along with eavesdropping attacks. In contrast to the previous work concerning network availability, the solution proposes a prevention as well as a detection scheme for misbehaving CH nodes. It ensures original packets reach their destination even if a number of packets are dropped. In addition, it considers the case where the CH nodes are malicious. Moreover, our solution takes advantage of the NC technique used in [15] to guarantee data confidentiality and optimizes it by achieving data coding only at source nodes. Hence, it minimizes overheads and energy depletion in the network. Second, in relation to the previous work attempting to oppose pollution attacks, the solution does not involve an extra cryptographic mechanism, which drains sensor node energy, to ensure data integrity; it requires only extra computations at the level of the BS, which is supposed to be an unlimited-resource device.

Model Establishment
In this section, we start by defining the basics of the NC technique and the modifications brought to it and used in this paper for ensuring data availability and integrity and minimizing the overheads in the network. Then, we introduce the assumptions used for the network and the sensor nodes. We finish this section by presenting the threat model along with the targeted security objectives upon which our approach is built.

About the Network Coding Technique
We begin by introducing the basic principles of the standard NC and then we present the modified version of NC that uses data redundancy to protect against DoS and pollution attacks.
In NC-enabled networks, intermediate nodes are able to re-code and combine the received packets in order to generate new ones. More specifically, a fixed set of original data messages received at a relay node r from different source nodes is linearly combined in a hierarchical network to generate new coded messages. As a result that the combination is linear, the technique is called linear network coding (LNC) [31,32]. Let us note x = (x 1 , x 2 , . . . , x L ) the set of original data messages of size L. Once received at a relay node r, a resulting coded packet y is generated using an encoding vector v = (v 1 , v 2 , . . . , v L ). Eventually, it processes the coding operation as follows: If two data packets meet in the network at a relay node, the node can derive a new linear combination using random encoding vectors. The standard NC technique thus allows data re-coding multiple times. The final destination, which is supposed to be the BS, needs to receive at least L coded packets (y 1 , y 2 , . . . , y L ) from different relay nodes to be able to retrieve the original data messages (x 1 , x 2 , . . . , x L ) using the same encoding vectors v = (v 1 , v 2 , . . . , v L ). Figure 2 shows an example of NC, where nodes S and D correspond to the source node and the destination node, respectively. It illustrates the way the NC technique operates with a set of original data messages of size 2.
There are many existing NC techniques in the literature. Ref. [23] depicts the evolution of NC theory and its families.
In order to ensure availability and integrity and counteract the misbehavior of the adversary, we incorporate new features into the NC technique. In fact, the NC technique used in this paper creates an (L, M) data redundancy. It produces M coded data messages from the set of L original ones, so that L of them suffice for reconstructing the original messages, where L < M. Let . . , M}, denote the encoding vectors associated with the coded packets y i and M is the number of relay nodes. The set of the coded data messages is generated through the following system of equations: The BS will need L data coded messages out of M to retrieve the original ones. It will proceed to solve the linear equation system that should be contained in the set of coded received packets (y q 1 , y q 2 , . . . , y q L ), extracted from the initial constructed coded packets (y 1 , y 2 , . . . , y M ), using the L corresponding encoding vectors, as follows: This BS can achieve this process with a simple elimination of Gauss or by inverting the matrix made up of the different rows (q 1 , . . . , q L ) of the matrix of encoding vectors used by the L out of M existing relay nodes.

Assumptions about the Network and Sensor Nodes
Our scheme makes the following assumptions.

•
The network has a hierarchical topology. Sensor nodes are grouped in clusters. Each cluster is headed by a cluster head (CH) selected at the beginning of the protocol by the base station (BS). • The clustering is static and determined by a centralized K-means algorithm, which provides optimized choices of cluster heads and their member nodes that favor minimized distances between them. • Member nodes of each cluster are responsible for detecting external events. They are called source nodes. CHs are the nodes in charge of forwarding received data to the BS. They are the ones susceptible to being compromised. • Each sensor node is equipped with a unique matrix of randomly chosen integers of large size (l × L) considered as a secret key, where l > t and L > p. These matrices are pre-loaded before the network deployment and are shared with the BS.

•
The BS cannot be compromised and has essentially unlimited processing power compared to sensor nodes. Moreover, the BS is aware of the positions of all sensor nodes and their distances to each other in the network.
• The total number of CHs in the network is a minimum of p, chosen once by the user before deploying the network according to its size and scale, which are related to the network domain of application. • Each source node has several paths to the BS. A node can send data to the BS through p different CHs. For simplicity, we assume that these CHs are of approximately similar distances to the source node. • Each source node proceeds to send sensed data to the BS through the routing protocol after t sensed data measurements. In other words, each routing round ensures the transmission, for each source node, of a data message composed of t sensed original data packets.

Threat Model and Security Objectives
In the threat model, maliciousness is manifested in two ways. The first is when the malicious node refuses to cooperate in the routing process, i.e., it is a non-cooperative node that discards data packets once receiving them. The second is when the malicious node alters data packets before forwarding them or even injects a new packet with the same expected format. This type of node is known as a misbehaving node (MN). Since the next hop from source node in the path to the BS is always a CH node, the MN, in our proposition, is a CH node. The adversary is modeled as a coalition of CH nodes varying from 1 to K (Figure 3). They can proceed to: 1. reject packets received from source nodes, and stop them from reaching the BS. This attack, considered as a form of DoS attack, causes loss of packets and degrades the performance of the network; 2. modify packets received from source nodes or inject fake ones into the network. This is known as a pollution attack where the BS receives polluted packets that cannot be separated from the original ones in a conventional standard routing. The proposed solution guarantees that with a coalition of MNs of a size in the range of a threshold K = p − t − 1, the adversary fails to stop the successful reception of any message from a source node at the BS. The proposed solution is also protected against eavesdropping attacks as in [15]. These occur when an adversary has unauthorized access to useful data transmitted by any link in the network. The adversary, in this case, can be internal to the network as any relay node except the source nodes and the BS; or external to the network with the ability to wiretap inter-cluster or intra-cluster links.
Our proposition deals with the following security objectives: • Availability: ensuring that messages transmitted by a source node reach the BS, thus providing survivability of the WSN services.
• Integrity: ensuring the reception of the correct original messages of source nodes when there are attempts by hackers to modify data en route to the BS. • Detection and isolation: identifying adversarial behavior sources, and isolating adversary nodes from the routing to prevent any damage to the network. • Confidentiality: ensuring secrecy of the wireless communication links to avoid eavesdropping.

Design and Implementation
This section provides the design of our new secure routing protocol, referred to as SSP. As described in Section 3.2, it applies to networks with hierarchical topology. Its main goal is to defend against DoS and pollution attacks and to detect the malicious CHs responsible for it through a modified NC-based routing. The protocol consists of three phases: (1) initialization; (2) DoS and pollution attack prevention; and (3) attack detection and punishment.

Initialization
The network is composed of randomly deployed sensor nodes and a BS. The solution we propose begins with clustering. The BS divides the network into N clusters, i.e., N groups of source nodes, headed by a CH. Obviously here, N must be greater than p since p is a sub-group of CH nodes. p is predefined by the user before deploying the network, taking into consideration the network size.
Next, the BS chooses, for each source node, a set of the nearest p CHs and sends their identities to it. As a result, the paths to be used for routing data elements from each source node are established and shared with the BS. Data are transmitted from source nodes to CH nodes and then from CH nodes to the BS.

DoS and Pollution Attacks Prevention
Our solution main phase is the prevention against DoS attacks and pollution attacks, ensuring data availability and integrity despite the existence of misbehavior in the network. It prevents a coalition of malicious CHs executing a DoS attack by dropping all the packets they receive or achieving a pollution attack by modifying the packets before their transmission to the BS. It guarantees the reconstruction of original data messages at the BS. For this purpose, our protocol uses the NC-techniques as described in Section 3.1. It produces p parts from a particular data message of size t, where t < p, so that t of them suffice for reconstructing the original one at the final destination, the BS. This phase is composed of: (i) encoding and parts creation at the source node and (ii) decoding and parts reconstruction at BS.

Encoding and Parts Creation at Source Node
The encoding process is conducted by each source node N i in the network. Each node is pre-equipped with its specific unique matrix. A i = [a k,q ] of large size (l × L) denotes the pre-loaded matrix, where a q k are A i 's integer coefficients, k ∈ 1, l and q ∈ 1, L , chosen so that each combination of l columns forms an invertible matrix of size (l × l).
When measuring events, a node N i registers the measurements in its memory. After t periods, it has collected t parts of these measurements in a vector of size t, X i = (x i,1 , x i,2 , . . . , x i,t ), forming a data message of size t. Then, N i randomly chooses a number p i such that l ≤ p i ≤ l × L and pinpoints the coefficient of A i having the position p i . Next, it extracts a sub-matrix A i = [a k,q ] of size t × p starting at the indicated p i position in the matrix A i , where k ∈ 1, t and q ∈ 1, p . Figure 4 shows the process of extraction of the sub-matrix A i used to generate the coded packets.
The value p i , required to decode data at the BS, is attached to each data packet to be sent through a distinct CH.
After encoding the original data parts, N i transmits these parts along the p distinct CHs, chosen at the beginning of the protocol. More specifically, N i sends the packets (Id i , m i,j , p i ), where j ∈ 1, p , through the nearest p different CHs saved in its routing table, where Id i is the node N i identity and p i is the random number chosen for conducting the network coding phase of N i . Next, each CH forwards the received packets from multiple source nodes to the BS.
Each node N i has p distinct paths to the BS through p different CHs. The scheme tolerates (p − t − 1) of them being compromised. In other words, the coalition of misbehaving CHs involved in the attack here is assumed to be of a size of maximum K with K = p − t − 1 to ensure complete reception of all source nodes messages. In fact, the source node will have at least t + 1 parts received in the BS, to be sure it can rebuild the original measures. Beyond the threshold K, the performance of the protocol in term of availability and integrity decreases.

Decoding and Parts Reconstruction at BS
The decoding process is conducted by the BS for each source node. When the BS receives all the packets from CH nodes, it collects the packets belonging to the same source node N i using its identity Id i . Since the adversary model guarantees that the BS receives at least t + 1 parts for each node, the BS is able to reconstruct the original data packet corresponding to each source node. In fact, the BS receives the parts (m i,q 1 , m i,q 2 , . . . , m i,q rp ) from different rp paths corresponding to different rp CHs, where t ≤ rp ≤ p (rp represents the number of received packets for a source node). In this case, it triggers the decoding process as follows: The BS checks for the p i values received from the different packets, memorize the value p i existing at least t + 1 times and distinguishes it as the true one (p T i ) that has not been modified by an adversary. This is true because we assumed that the protocol guarantees that at least t + 1 packets are correctly received for each source node N i . Next, it checks the pre-loaded matrix A i and extracts from it the sub-matrix A i of size t × p at the position value p T i . Afterwards, to reconstruct the original measurements, the BS uses each combination c of t data parts out of the rp received parts to resolve the system of equations S 1 (4). More specifically, it resolves a number value of C t rp systems to extract a set of possible original messages X (c) where {q j 1 , q j 2 , . . . , q j t } are C t rp extracted combinations of size t from the set {q 1 , q 2 , . . . , q t , . . . , q rp }.
Next, the BS detects the most apparent message in the set of the constructed messages X (c) i and recognizes it as the original one. The correct message will appear a number of times equal to C t rp−pp whereas each of the wrong messages will only appear once. pp depicts the number of polluted packets received at the BS.
The process of resolving the system of equations above can be done with a simple elimination of Gauss or by inverting the matrix made up of the different t encoding vectors of A i to obtain a matrix of t columns q j 1 , . . . , q j t corresponding to the different t CH nodes that sent the packets m i,q j 1 , . . . , m i,q j t and which are stored in the routing table at the BS. If the matrix A i is chosen randomly and so is the matrix A i , there is no known method to rebuild the original data, even partially, from (t − 1) parts.

Attack Detection and Punishment
Our solution then has an attack detection and punishment phase, which locates the misbehavior that affects the network availability or integrity. It identifies the MNs among the CHs executing the DoS or pollution attacks and applies a punishment mechanism to exclude them from the routing process. The BS is aware of all the nodes' locations and the different groups of clusters in the network. In addition, the BS has a control counter (cc) for each CH, which decides on its maliciousness.
At the end of each routing round, the BS is supposed to receive the set of packets . . , p} through p different paths coming from the nodes CH q j . q j are the p CH identities saved in the routing table for each source node N i . The BS's decision depends on the source nodes and the packets received for each one. It begins by scanning the malicious CHs invoking a DoS attack then continues by scanning the malicious CHs effecting a pollution attack. Once it has collected the list of packets received for a source node N i , the BS compares it to the list L i .
First, for DoS attack detection, it notes the set of packets missing and identifies the CH identities {q j } j∈{1,...,p} in charge of transmitting them. Then, it increments their counter cc by 1, as they are suspected of being malicious nodes.
Second, for pollution attack detection and after collecting the list of packets received for a source node N i , the BS proceeds via two steps: 1. Since it has already saved the true value p T i in its memory when calculating the original routed message, the BS checks for received packets containing a p i value different from the p T i value. It recognizes the CHs that have sent these packets as potentially malicious nodes and increments their counter cc by 1.

It checks the C t
rp−pp combinations of packets that generated the original message and notes the list of CH identities that have transmitted these packets. Then, it classifies the CHs with identities not belonging to this list as potentially being malicious and increments their cc by 1. The reason is that one polluted packet in a combination of t packets will generate a fake calculated message.
Lastly, if the cc of a specific CH node Id q S reaches the value r, where r is a threshold chosen by the user before deploying the network, the BS recognizes this CH as a selfish node. In other words, if the node with identity Id q S has effected a number of attacks, whether by refusing to send a data packet or by faking it, equal to r, the BS proceeds as follows: • It excludes the node with identity Id q S from the network and bans it from the routing process by deleting it from the network nodes list. • It invokes a network re-clustering using the updated list of existing nodes. • It resets the paths used for each source node by choosing new p CH identities for each one.
In this way, the protocol increases the security level of the network, not only by preventing DoS and pollution attacks, but also by conducting active steps to localize their source and punish the MNs responsible.

Discussion
We present, in this subsection, some main points for discussing our solution in regards to the method used for ensuring the security aspects of availability and integrity, and confidentiality.
The main algorithm, the NC technique that creates a (t, p) data redundancy, is effective with respect to size; that is to say that rebuilding t measures only needs t parts whose total size is identical. It is, however, possible to use more parts (p > t) to help protect the network against DoS attacks or pollution attacks. In other words, an attacker needs to compromise at least p − t separate paths, i.e., in the worst case, (p − t) CH nodes. If the attacker does not know the topology set up to route data in the network, it has no choice other than to randomly compromise the nodes. As a result, it will likely need to capture more than (p − t) CH nodes.
This strong availability and integrity is obtained at the cost of extra data parts transmission for each source node, i.e., transmitting p parts instead of t parts where p > t. However, our approach does not require heavy operations: dispersal and reconstruction are computationally efficient [33] and the systems of equations have to be solved only by the BS.
It is not possible for a few compromised nodes to reconstruct the measures. In other words, there is no technique for rebuilding, even partially, the measurements from t − 1 parts. This is the major advantage of dispersing the information in the network.
The solution, based on NC, provides complete confidentiality, so that no information can be obtained by intercepting coded symbols or random values of p i . In fact, a wiretapper cannot recover the original data packets of a source node N i , whether it is an outsider node or a captured CH inside the network. The main reason is that the corresponding matrix A i belonging to a source node N i is secret and not communicated to the destination when routing data packets. Moreover, the chosen values of p i for each node N i are changing and chosen randomly each round, so no other node but the BS is able to reconstruct the original data. A detailed demonstration is available in our previous paper [15].
Apart from security requirements, our solution minimizes the overheads compared to conventional network coding systems. First, instead of transferring all the coded vectors, i.e., the coefficients a k,q , where k, q ∈ { 1, t , 1, p }, along with the coded p symbols, a source node N i adds only a single digit p i to the coded vector to be sent. Second, the solution performs encoding only at the level of source nodes; there is no re-coding at the level of intermediate nodes, i.e., the CH nodes. This decreases the energy consumption and increases the network lifespan.

Simulation Setup and Results
In this section, we provide a simulation setup where we introduce the general parameters used for the experiments, and simulation results where we present the different scenarios implemented to show the benefits of our proposed solution.

Simulation Setup
For our experiments, we assume a fixed number of static nodes that are randomly distributed in a large two-dimensional area and a high processing power BS located randomly within the nodes' geographic area. The nodes are generated through a uniform random distribution where all values of the interval have equal probability. We assume that all nodes have the same amount of energy, at the beginning of the protocol. The benefit of the protocol is tested through a series of simulations with MATLAB by varying the number of rounds and the number of misbehaving nodes (MN). For comparison purposes, SSP is compared with the SNCR protocol; the same confidential routing protocol but without incorporating the security scheme against DoS and pollution attacks. The benefit of the protocol is also tested by varying the size of the network, i.e., the number of total sensor nodes in the network, to verify its scalability.
The clustering phase is operated with a K-means algorithm for simplicity and efficiency. Both protocols use direct transmission from CHs to BS as a communication pattern. The simulations are performed into two phases. The first allows routing of packets from member nodes to CH nodes while the second phase processes the required computations within the BS and verifies which packets have been successfully recovered.
Data availability against selfish nodes is evaluated in terms of the parameters of the Packet Delivery Ratio (PDR) and correct messages ratio, whereas data integrity against pollution attackers is evaluated in terms of the number of correct messages received. Table 1 shows the input parameters for the experiments. We provide an example of the BS coordinates used in this simulation and illustrate it later, in the network presentation figures.

Simulation Results
The sensor network, which is composed of 100 nodes with IDs [1; 100] randomly deployed, is illustrated in Figure 5. We execute a K-means algorithm, and the obtained result is 15 clusters distributed as shown in Figure 6. The K-means algorithm processes an unsupervised learning with the aim of minimizing the distance between member nodes and their CHs. Consequently, it reduces the energy consumption of inter-cluster communication.  Figure 7 depicts the PDR ratio and the correct messages ratio for both SNCR and SSP when a DoS attack is performed. The first ratio is between the number of data packets received at the BS and the number of data packets sent in the network. The second ratio is between the number of successfully constructed messages within the BS and the number of total messages sent by source nodes. Calculations were done by varying the number of rounds and fixing the number of selfish CH nodes at the value t − p − 1. The resulting values were similar for each round for each protocol. This is why we presented a single value for the PDR ratio and for the correct messages ratio for both protocols. Figure 7 shows that in the case of a defenseless SNCR, the value of the correct messages ratio successfully received at the BS decreased significantly compared to the PDR ratio value. It also shows that for the SSP protocol, the value of the correct messages ratio is greater than the PDR ratio value. This graph shows that a DoS attack conducted by the MNs and affecting the PDR ratio parameter has a negative influence on the total number of successfully received messages for the SNCR protocol, which decreased to 30%. It does not affect the total number of successfully received messages in the case of the SSP protocol, which maintains a value of almost 100%. In fact, as the PDR ratio decreases, the correct messages ratio increases. We can conclude from these results that the proposed solution is beneficial in maintaining reception of a high number of correct data packets at the final destination despite an adversarial attack. In fact, if a number of data parts are lost due to attacks on the network, the BS is still capable of constructing all the original data packets. Figure 8 presents the impact of increasing the number of malicious CH nodes, performing a pollution attack, on the variation in the number of correct messages received at the BS for SNCR and SSP protocols. The number of correct messages received at the BS is the number of correctly constructed messages at the end of the routing process. This variation decreases significantly as the intensity of pollution attack increases in the case of SNCR, as shown by the red line in Figure 8. As can be seen in the graph, the presence of two or three misbehaving CH nodes in the network reduced the number of correct messages received at the BS by more than half. This number continued to decrease until it reached 0 when the number of MNs was 7. This same number of correct messages has a certain stability in the case of the SSP protocol, when the intensity of the pollution attack increases. From the same graph, it is clear that the number of correct messages is higher in SSP compared to SNCR, as shown by the blue line. From point 0 to point 6 on the x-axis where the number of MNs goes from 0 to 6, the number of correct messages is constant and equal to O = 85 (total number of source nodes of original messages to be sent from source nodes), i.e., all the messages sent from source nodes are successfully received at the BS. Then, as more MNs are present, the number of correct messages starts decreasing and reaches a value of 0 only when the number of misbehaving CH nodes is equal to 10. This is because for the interval [0, 4], where the number of malicious CH nodes is less than the threshold K = 4(K = p − t − 1) as assumed previously, the remaining number of honest (CH) nodes is at least t = 5. This guarantees that the number of correct parts transmitted for each source node is at least t + 1 parts, which is a sufficient number to reconstruct the original data parts. Then, for the interval beyond 4, where the number of malicious CH nodes is greater than the threshold K = 4, the number of collected parts starts going below t for a member node. The number of correct messages is equal to O in some cases in this interval, when the number of malicious CHs is 5 and 6. This is explained by the fact that the total number of CH nodes (15 here) in the network is greater than the limit p = 10. Therefore, when the network is more scalable, the tolerance for choosing the threshold value K is enhanced. Figure 9 demonstrates the scalability aspect of the protocol. It depicts the variation in the percentage of correct messages received at the BS when a DoS attack is executed, related to the size of the network, i.e., the total number of sensor nodes in the network and the number of clusters in it. We chose for this simulation a number of malicious CH nodes that belonged to the interval beyond the threshold K = p − t − 1. K is set here to 6. It can be seen that the percentage of correct messages increases as the network size increases. In fact, when the number of sensor nodes increases, the number of clusters increases. Consequently, the number of CH nodes increases, as does the number of possible paths for a source node to route data packets to the BS. In this case, despite S, the number of MNs is greater than the threshold K =, the probability of collecting t + 1 parts is, obviously, raised. The reason is that S is distributed among a higher number of honest CHs even with a number of MNs S greater than the threshold K. The protocol enhances tolerance to the existence of MNs in the network while guaranteeing greater prevention against a DoS attack when the network gets larger. By evaluating the scalability factor, we can conclude that the proposed solution is capable of supporting network expansion without degrading the security performance.

Conclusions
In this study, we designed a scheme to prevent misbehaving nodes influencing network availability and integrity, using the NC technique and data redundancy creation at source nodes. The approach integrates a module to detect these malicious nodes and executes a punishment mechanism to exclude them from the routing process. It also ensures confidential data routing in the presence of eavesdroppers.
Our approach was implemented in a cluster-based network exposed to a DoS attack and pollution attack using MATLAB. These attacks were generated by a coalition of selected misbehaving CH nodes. They dropped the packets received from source nodes on their way to the BS when executing a DoS attack, and they incorrectly altered the packet in transit or injected a new fake one into the network when executing a pollution attack. The protocol that we have designed improves the percentage of correct messages received as compared to a defenseless SNCR protocol. The BS successfully rebuilds the original data from a source node despite selfish behavior that diminishes the number of packets received compared to the number of packets sent. In addition, it ensures a complete and correct reception of all original packets sent by source nodes with a number of malicious CH nodes initiating a pollution attack and varying in the range of a certain threshold K. The BS successfully reconstructs the original data of a source node even when the adversary alters the packets in transit before being sent to the BS. Moreover, the protocol favors scalability. It reduces the percentage of lost packets as the size of the network and so the number of malicious nodes increases. Funding: This work is partially supported by the Région Nouvelle-Aquitaine under the grant for project "SVP-IoT" and by the MIRES research federation under grants for projects "CANIoT" and "SECIoT".

Conflicts of Interest:
The authors declare no conflict of interest.