Combining Network Coding and Retransmission Techniques to Improve the Communication Reliability of Wireless Sensor Network

This paper addresses the use of network coding algorithms combined with adequate retransmission techniques to improve the communication reliability of Wireless Sensor Networks (WSN). Basically, we assess the recently proposed Optimized Relay Selection Technique (ORST) operating together with four different retransmission techniques, three of them applying network coding algorithms. The target of this assessment is to analyze the impact upon the communication reliability from each of the proposed retransmission techniques for WSN applications. In addition, this paper presents an extensive state-of-the-art study in what concerns the use of network coding techniques in the WSN context. The initial assumption of this research work was that the ORST operating together network coding would improve the communication reliability of WNS. However, the simulation assessment highlighted that, when using the ORST technique, retransmission without network coding is the better solution.

,b show how communication occurs over wireless networks. Nodes A and C want to exchange packets between them via an intermediate node. Node A wants to send packet X A to node C via node B. Similarly, node C wants to send a packet X C to node A via the intermediate node B. Figure 1a presents a conventional communication in a wireless network. Figure 1b presents communication using NC techniques. In this process, only three transmission steps are required. First, node A and node C transmit packets X A and X C individually to node B. Then, node B receives both packets, performs an XOR operation with packets X A and X C , creating a new encoded packet X A ⊕ X C . Finally, node B retransmits the encoded packet. Then, node A decodes X A ⊕ (X A ⊕ X C ) to get packet X C , and node C decodes X C ⊕ (X A ⊕ X C ) to obtain packet X A . In this way, NC reduces the number of packet transmissions. Network coding techniques can be performed in different ways. In the example shown in Figure 1b, binary coding based on XOR was used. In Section 2.2, we present the major state-of-the-art techniques to perform NC.
Consider Figure 2 to show how cooperative diversity can work together with a NC technique in the retransmission step. Figure 2 shows a network composed of ten nodes, nine final devices and one coordinator node. Nodes N 2 , N 6 , and N 8 are relay nodes, and nodes N 1 , N 3 , and N 7 are outside the coverage area of coordinator node (node C). In this network, we consider that communication takes place in two steps. In the first step, each node transmits a message to the coordinator, but not all messages successfully arrive at the coordinator; messages from nodes N 1 , N 3 , and N 7 , due to interference, do not reach the coordinator node. Then, in a second step, each of the relay nodes applies a NC technique ( Figure 2 presents a linear NC, which will be explained in Section 2.2) to the messages they have heard. In this case, the relay node N 2 codes messages from nodes N 1 , N 3 , N 4 , N 5 , and its own message. Relay node N 6 codes messages from nodes N 5 , N 7 , and its own message, and relay node N 8 codes messages from nodes N 1 and its own message. Then, each relay node retransmits its coded message to the coordinator. Some works in state-of-the-art show benefit from using NC together with cooperative diversity. In [6,19], the authors use relays and NC in the retransmission step, which has improved the reliability of the communications in industrial wireless networks. According to Liu et al. [20] combining cooperative communication and NC can increase the packet loss-resistant capability due to the packet redundancy. In addition, the network may be able to overcome node failures via cooperative communications.
In previous work, we had studied solutions for relay selection and proposed an Optimized Relay Selection Technique (ORST) [12,21,22]. We also investigated the best parameters to be considered when selecting relay nodes.
Considering the benefits of using both cooperative communication and NC techniques presented in the literature, in this paper, we consider a holistic approach to improve the communication reliability, considering the ORST technique combined with the use of an effective retransmission mechanism, aiming to evaluate the operation of the ORST technique together with the NC approaches. Random and Sparse Linear Network Coding will be used as a retransmission mechanism, considering a scheme that allows the relay nodes and the coordinating node to combine a priori which coefficients will be used. In this way, the NC technique would be able to improve the retransmission reliability and reduce the overhead generated when sending the coding coefficients. As the main contributions in this paper, we can mention: • An extensive state-of-the-art study concerning relay selection and NC techniques, presenting relevant and current works; • A simulation assessment of both proposed schemes, relay selection and the NC, working together in the communication. In addition, we will present an analysis of the advantages and drawbacks of the combined implementation of both schemes; • A discussion about some of the negative results obtained when combining the ORST technique with specific NC approaches.
This paper is structured as follows: Section 2 presents the state-of-the-art in what concerns relay selection techniques in WSNs and retransmission mechanisms using NC techniques. In addition, this section presents a classification framework of how to perform network coding. Section 3 describes the proposed relay selection technique and the related NC technique, which aims to improve the communication reliability in WSNs. Section 4 presents the simulation assessments of the relay selection and the retransmission mechanisms using NC. Finally, conclusions are presented in Section 5.

Related Work
This section presents some of the most relevant state-of-the-art works related to cooperative communication and NC techniques within WSN communication context. In cooperative communication, we present relay selection techniques, which are decisive to improve communication reliability. In NC, first, we will introduce a classification of the way to perform network coding, which we divided into four categories: Physical Layer Network Coding, Analog Network Coding, Binary Coding-XOR, and Linear Network Coding, which is subdivided into Random Linear Network Coding, Deterministic Linear Network Coding, and Sparse Linear Network Coding, as presented in Figure 3. Besides, we select the most relevant state-of-the-art works with a focus on how to carry out the selection and sending of the coefficients used to encode the messages; consequently, we analyze how to reduce the overhead generated by the transport of the coefficients.

Relay Selection Related Work
Tripathi et al. [23] proposed an Energy Balance Load Aware Relay Selection in Cooperative Routing (EBLCR) protocol. If the packet reception ratio (PRR) of routing nodes is less than a respective threshold, a new relay node will be selected for doing the data transmission. The router node will broadcast a control message for relay node selection. After receiving the control message, each node checks its residual energy. If the energy is greater than a predetermined threshold, then it can work as a relay node. The possible relay nodes start their timers after receiving the control packet, and the one with the lowest timer value node will be selected as a relay. The authors compared the EBLCR protocol with just one other state-of-the-art technique, and the results show that the EBLCR improves the throughput and the energy consumption per packet.
Yang et al. [24] proposed a relay selection method based on Q-learning (QL), named QL-RSA, which selects the relays using the maximum cumulative reward to obtain the maximum throughput of the cooperative networks. The authors considered that the interaction between the agent and the environment is a Markov decision process (MDP), which consists of a finite and discrete set of environmental states, a set of finite and discrete learner actions, scalar enhanced signals, and a learner's strategy. In each iteration, the source node (learner) perceives the state of the environment and selects actions to act upon the environment, according to the current strategy. Then, a reinforcement signal, called a reward, is generated to feedback to the source node. Based on this, the strategy is updated and the next iteration is initiated. The ultimate goal of learning is to find the best strategy for each state, aiming to maximize the expected long-term cumulative reward, and consequently, reaching the maximum throughput of the destination node after the action is performed. The reward value is obtained through the feedback channel between the source and the destination for updating the Q-matrix and guiding the future policy selection. The authors compare its proposed technique just with a random relay selection algorithm (R-RSA); the results showed that the throughput obtained by QL-RSA is better than that of R-RSA.
Mei and Lu [25] proposed and analyzed three relay selection schemes, named random relay node selection (RRS), best relay node selection (BRS), and all relay nodes selection (ARS). The proposed cooperative communication system considers two phases of transmission. In the first phase, the source node transmits signals to the destination node and to M relay nodes. During the second phase, depending on the given relay node selection scheme, relay nodes that correctly decoded the signals received in the previous phase are selected to relay information. Consider D(s) as a set of relay nodes that can correctly decode the signals transmitted from the source node during phase I. The operation of the relay selection techniques occurs as follows. The RRS scheme randomly selects a single relay node R m from D(s) for relaying information during the second transmission phase. In the BRS, the best relay node B m , defined as the relay with the highest instantaneous channel gain across the relay-destination link, is selected for relaying information. The ARS scheme selects each node in D(s) to retransmit the decoded signals to the destination node. The results presented by the authors just analyzed the outage probability. The results achieved showed that the ARS and the BRS performed better than RRS. However, other metrics also need to be considered for a better assessment.
Zhang et al. [26] presented a cooperative relay selection technique for a cluster tree network. The objective is to reduce energy consumption. The authors consider that the node spends more energy to make long-distance transmissions in a single hop than if there are nodes that can cooperate with it. As a selection parameter, the authors consider the residual energy of each node and the node density. As a node density, they consider the number of neighbors of each node divided by the number of nodes within the cluster. As a result, the authors compared it to a network that considers only one hop, and their proposed technique showed lower energy consumption.
Su et al. [27] proposed a (Deep-Q-Net) DQN-based relay selection scheme in WSNs, named DQ-RSS. The scheme combines deep learning with Q-learning to accelerate learning for selecting the optimal relay among the relay candidates according to outage probability and channel information. A source node collects the CSI from the environment and then sends the integral system state to the DQN to evaluate the optimal policy for relay selection. Simulation results show that their relay selection scheme exceeds the Q-learning based relay selection and the random relay selection scheme in terms of lower outage probability and lower energy consumption. However, the proposed technique only works for static networks.
Elsamadouny et al. [28] proposed a relay selection technique for multihop communication that allows for L relays to be selected between source and destination nodes. The authors modeled the network as a Markov chain, where each Markov chain state is parameterized by (L − 1) adjacent number representing the number of packets in the queue of the (L − 1) intermediate relay nodes. The transmission of a single packet over a specific hop will cause the system to move from one state to another. The first and the last state index represent the possibility of packet transmission from the source node to the first relay and from the last relay to the destination node, respectively. All the intermediate states represent the possibility of data transmission from an intermediate relay node to the subsequent relay node. The technique works as follows: During each time slot, the highest quality hop (best SNR) is activated for transmission as long as the corresponding relay node has packets to transmit and the corresponding receive node buffer is not full. Otherwise, the second-best hop is activated, and so on. If the selected hop has SNR below a certain threshold SNR, this event will be considered as an outage event. This threshold SNR is predetermined according to the required quality of service. The results presented by the authors showed that the outage probability of their scheme outperforms the conventional multihop scheme.

Brief Explanation of the Main Types of Network Coding
In this section, we present a framework for the classification of how to perform network coding, dividing it into six categories: Physical Layer Network Coding, Analog Network Coding, Binary Coding-XOR, Random Linear Network Coding, Sparse Network Coding, and Deterministic Linear Network Coding ( Figure 3).
Physical Layer Network Coding exploits the overlap of electromagnetic waves that occur in wireless communication and applies the concept of NC to the physical layer. In this way, nodes A and C, as shown in Figure 4, transmit their messages simultaneously to the intermediate node, node B, which receives the overlapping signals. Then, the intermediate node extracts a linear combination from the received signal, without the need to individually obtain the messages, and proceeds similarly to the network coding technique [29]. The Analog Network Coding uses the interference generated by simultaneous transmissions as an ally. The idea is the following: When two nodes A and B simultaneously transmit, the packets will collide. The signal resulting from a collision is the superposition of the different signals. Thus, node A, after receiving the summed signal, calculates the phase shift of node B by using its signal in the sum, thereby recovering the node B signal destinated to it; node B can recover the signal that it similarly expects from node A [29,30].
In the Binary Coding-XOR, a simplification assumed is to use just a basic bitwise XOR (exclusive OR) operation among messages. This basic NC technique uses a finite field F 2 1 , which represents a field in network coding theory with 2 1 symbol combinations, being able to encode up to 2 messages into a single message. In this way, when using binary coding, XOR operations are performed considering just two packets listened to by the intermediate node [18].
In the Linear Network Coding, there are three subcategories: Random Linear Network Coding (RLNC), Sparse Linear Network Coding (SNC), and Deterministic Linear Network Coding (DLNC), which are described as follows.
Random Linear Network Coding (RLNC) allows for the use of a finite field higher than binary coding. In this way, it can be used in conventional network coding schemes with multiple source nodes [31]. RLNC performs a random selection of the encoding coefficients from a q-element finite field denoted by F q [32]. The larger the finite field, the less likely it is to generate linearly dependent packages at the destinations. If all nodes systematically used the same coefficients, destinations would not decode the received packets, given the high probability of redundant packets, which would generate linearly dependent systems [33].
This technique consists of linearly combining several messages using a randomly selected coefficient within a finite field F 2 n , where n can be any positive integer [34]. As-suming a network with a set of n d nodes, when an intermediate node i wants to transmit k messages (m 1 , m 2 , m 3 , ..., m k ) listened from its neighbors, it first randomly selects k coefficients (c i 1 , c i 2 , c i 3 , ..., c i k ) of the finite field. Then, linearly combine the packets that it has to listen to using the Equation (1): Together with the linear combination, the node sends a list of the used coefficients. At destination node t, the received packets are represented by Equation (2): where M (t) is a matrix whose rows are the k coded messages received at destination node t, M is a matrix, in which the rows represent the original k messages and G is a matrix in which each row represents the vector of coefficients used by the intermediate node to encode the messages. Thus, the destination node will recover the original messages by building and solving a linear system using Equation (2) [6]. This kind of NC requires that the node performs the coding operation to send all the coefficients used to carry out the linear combination, together with the coded message. This behavior presents, as its main drawbacks, the complexity of the decoding operation and the overhead resulting from the encoding vector. Sparse Linear Network Coding (SLNC) is a NC technique presented as an improvement for RLNC. In the SLNC, the intermediate node does not encode all the listened to messages. It encodes only a small number of messages in each transmission. Thus, the decoding complexity is reduced on the receiver. Besides, the communication overhead generated by sending the coefficients is also reduced, considering that the number of coefficients is proportional to the number of coded messages [35][36][37]. Within this context, there are a number of approaches that exploit low-density-parity-check (LDPC) codes [38][39][40]. In these approaches, each relay packet includes the coefficients in a small bit-map field to reduce the overhead. In Deterministic Linear Network Coding (DLNC), the coefficients used by the intermediate nodes to perform the NC are deterministically selected. That is, the coefficients are not randomly selected in the finite field but selected from techniques that aim to optimize the network coding process [41], which means that the validity of the coding scheme is guaranteed. That is, it ensures that encoded messages are linearly independent. The disadvantage of this type of coding is that there is a control overhead to be constructed and maintain a linear coding scheme among nodes [42].

Network Coding Related Work
In Migabo et al. [34], the authors proposed a Cooperative and Adaptive Network Coding technique for Gradient-Based Routing (GBR). The technique considers that the network density is dynamic, according to the average number of neighbor nodes, to encode interest messages. The encoding is performed utilizing linear combinations of random coefficients of a finite Galois Field of variable size GF(2 s ). The decoding is performed using Gaussian elimination.
When a relay node wants to transmit n accumulated data packets (P 1 , P 2 , ..., P n ), it first randomly selects n random coefficients C 1 , C 2 , ..., C n from the Galois Field of order 2 s with s being a positive integer. It then linearly combines the accumulated data packets with the randomly generated coefficients. The decoding process is performed by Gaussian elimination process in which the accumulated header data (coefficients) are grouped to form a n × n matrix C n×n , which is then reduced to a row-echelon form. The n encoded data packets from the transmitter node can then be decoded by solving a set of linear equations provided that the obtained equations are linearly independent from each other.
In Heide et al. [43], a technique called Generation-based RLNC is used. This technique consists of dividing large amounts of data into smaller blocks, named generations. So, both the encoding and decoding operations are applied by generation and not on the entire data. The authors proposed a random linear network coding, in which the coefficient vector can be sent in two ways: First, the authors considered that the ratio of nonzero scalars in a coding vector is referred to as the density. If the density is low, the coding vector will be sparse and will mostly consist of 0s. Thus, the authors represent each nonzero scalar by an index-scalar pair. In this way, the coding vector is formed by index-scalar pair and, that is necessary to send together with the coding vector the number of index-scalars pairs, reducing the information to be sent. Secondly, the authors cite that the coding vector can also be represented by a bit array, that indicates which scalars are nonzero, and the values of these scalars.
Each scalar can be represented by log 2 (q) bits, and as the maximal number of nonzero scalars is g, where q represents the size of the finite field and g represents the size of the generation. Besides, each index takes log 2 (g) bits. In this way, the overhead generated will depend on the size of the generation and the number of nonzero scalars.
Akhtari et al. [44] used a random linear network coding. The coefficients are selected randomly and sent in the coded packet header. The authors considered a finite field of 2 8 . In addition, the authors consider that in each hop, between the source node and the destination node, the packets are recoded.
The authors check the newly arrived coded packets dependency in the destination node. For this, the destination node runs a specific algorithm. In this algorithm, M is a triangular matrix of k rows with some missing rows. For the newly received vector u packet's code, nonempty rows of M are multiplied to the corresponding coefficient and added to it. If the vector is independent of the elements, the result will not be zero. At this point, the independent vector will be added to the matrix M in the empty slot. Therefore, it is necessary to check the packet's code vector's independence to ensure that the packet is innovative.
Wu et al. [45] propose an algorithm to optimize the finite field size and to improve the efficiency of RLNC. They analyze the relationship between the finite field size and the completion time for the finite-buffer relay transmission scenarios. Based on the analysis, the field size is optimized via numerical search to maximize the effective data rate.
In Dong et al. [32], the authors used RLNC and defined a new method to minimize the overhead generated by the transport of coefficients. They generated the encoding matrix using a pseudorandom generator O(N, K), where the generator function uses the number of symbols (N) participating in the network coding process and state of the generator (K) as seeds. The coefficient matrix [α] is generated in both transmitting and receiving nodes.
The source node can send the seed to generate a random coefficient matrix at receiving nodes in two ways. The first is using the first encoded packet and the second is using a different secured channel. As soon as the encoded packet is received at destination nodes, seed encapsulated can be used to generate decoding coefficients matrix.
Li et al. [46] defined a sparse coding scheme where packets are encoded from sequentially formed random subsets of source packets called batches. The relay recodes only from the buffered packets belonging to the same batch to maintain the code sparsity. A sparse coding scheme is used to minimize the coding coefficient delivery cost. Sparse means that the number of source packets involved in generating each coded packet is much smaller than the total number of source packets. Therefore, the coding vector is sparse. Each packet only needs to carry a small number of nonzero coding coefficients (which are uniformly randomly selected from F q ) in the header.
The authors considered that the relay has a finite buffer of size m << M, where M is the number of source nodes. In addition, they consider that the number of nonzero elements in each encoding vector is limited to d << M; d distinct source packet indexes are uniformly randomly drawn from {0, 1, ..., M −1} with replacement to form a batch with a sequence number (SEQ). The corresponding d source packets are referred to be the content of the batch, and d is referred to be the batch degree. For each of the b transmissions on Source-Relay, a coded packet, which is the random linear combination of the d source packets is transmitted, where b is the batch transmission size (BTS). After b transmissions, a new batch with SEQ increased by 1 will be started. The process continues until the destination successfully decodes all the data. The SEQ and the encoding vector are delivered in the header of each coded packet.
Considering a TCP communication and based on MPTCP (Multipath TCP) standard, Xu et al. [47] proposed the pipeline network coding-technique (MPTCP-PNC). The authors aim to reduce encoding and decoding delays and save bandwidth by using new coding coefficient rules. The operation occurs as follows: The sender divides the original packets P 1 ∼ P m with continuous Data Sequence Numbers (DSN) into N groups. Original packets within the same group are combined to form coded packets, that is, original packets included in each coded packet of a group follows the one-to-all progressive approach. For instance, considering three groups, namely G1, G2 and G3. In G1, the first original packet P 1 is encoded to C 1 with coefficient 1, and the second coded packet C 2 is a linear combination of original packets P 2 and P 1 , using a random coefficient from the finite field for P 1 and coefficient 1 for P 2 . The third coded packet C 3 in G1 contains P 1 , P 2 and P 3 . The m-th coded packet C m of a group can be obtained using a linear combination of original packets P 1 ∼ P m .
When establishing a connection, the sender and receiver negotiate and agree to maintain three structures: a Coded Packet Coefficients Matrix (CPCM), a Redundant Packet Coefficients Matrix (RPCM), and a Mapping Rule. The mapping rule presents the following structure: MR: (S 1 , S N , DSN, f lag) → Coding Vector (CV), where S 1 is the smallest DSN of original packets within the group, S N is the largest DSN of original packets within the group, DSN is the data sequence numbers of the coded packet and f lag is an identifier that determines the use of either CPCM or RPCM.
CPCM and RPCM are coefficients matrices for coded packet and redundant coded packets, respectively. MR is a mapping rule from tuple information of a packet to its corresponding coding vector. Elements in the two matrices are generated from the finite field GF(2 8 ) and linear independence checks among vectors have been performed beforehand. The CPCM and RPCM are generated at the beginning of establishing a connection and will be used throughout the life of this connection. After negotiation in the connection establishment stage, the sender and receiver maintain the same CPCM, RPCM, and MR. At the sender side, the Pipeline Network Coder can use this information to select a coding vector and perform the encoding operation. At the receiver side, when determining coding coefficients for a coded packet, the Pipeline Network Decoder directly selects the coding vector from CPCM or RPCM, which is enabled by the Mapping Rule from (S 1 , S N , DSN, f lag) to the corresponding coding vector. The tuple (S 1 , S N , DSN, f lag) together with coded data is then able to reconstruct the coded packet.
Guo et al. [48] proposed a decode-and-forward network coding (DFNC) scheme. In this scheme, the authors do not send the coefficients used in the network coding; they just send a m-bit bit-map, signaling which packets were involved in the encoding. Thus, if the relay node receives the packet from source node s 1 , the corresponding position in bit-map is set to 1; otherwise, it will be assigned to 0. The coding coefficients are generated using a pseudorandom algorithm considering three criteria: (1) the coefficient is assigned to be 0 if the relay node does not receive the corresponding packet successfully. (2) the coefficient is assigned to be 1 for its packet. (3) other coefficients are selected according to the following mapping function: h:(s, r) → GF(2 q )\{0, 1}, where s is the origin of the coded packet (the transmitting user ID), r indicates the sequence number of 1 in the bit-map. The destination will be equipped with the same mapping function to solve the coding coefficients according to the received bit-maps in each packet.
The authors make two considerations for recovering the original packets. First, they consider that all sent packets (by direct transmission and by retransmission) arrive at their destination successfully. Thus, the coefficient matrix is full rank and the data packets may be recovered by solving the set of equations with Gaussian Elimination Algorithm. Then, they considered that not all packets are received correctly. In this case, it will not be possible to decode all packets by solving the set of linear equations. Thus, the authors applied the decoding on the physical layer (at symbol level) to attempt to recover 'failed' received packets that may include correctly received symbols.
Bao and Li [38,39] proposed a sparse linear network coding framework, named adaptive network coded cooperation (ANCC). Basically, ANCC defines two communication phases and works as follows: The relay nodes listen to and store the correctly received neighbor's messages in the broadcasting phase and, in the retransmission phase, each relay node randomly selects a predefined number of listened messages, performs an encode process (binary checksum) and retransmit to the destination. A bit-map field is included in each coded packet retransmitted by relays to inform the destination how the parity checks have been formed and can correspondingly replicate the code graph and perform the decoding.
In Han et al. [41], the authors proposed a NC technique named weighted Vandermonde echelon fast coding (WVEFC). To perform the network coding, the authors use a coding matrix F PC . The F PC is an n 1 × n 2 -order matrix, where n 1 is the number of source packets that need to be encoded and n 2 = n 1 + k, the value of k refers to the number of packets that requires redundant coding to improve the delivery rate of packets. In the F PC matrix, the first n 1 column vectors refer an upper triangular matrix, while the rest of the sub-matrix is a Vandermonde expanding matrix.
For the network coding to work correctly, before each coding operation, the authors need to specify the row and column numbers of the generated coding matrix and specify the sequence in which packets need to be coded. The authors do not specify how the destination node obtains the coefficients used in the NC. In addition, the authors cite that the probability of linearly dependent columns appearing in the WVEFC coding matrix is lower but it still exists.
Valle et al. [6] proposed a communication scheme to WSN, named NetCoDer. They proposed a simple relay selection technique and a random linear network coding. The network was delimited in maximum size of 256 nodes and the finite field size used in the network coding is F 2 8 . The random linear network coding is performed as follows: Each node has an identification in hexadecimal format, representing its position in the slot scale, with addresses ranging from 00 to FF. The selection of the coefficient used to encode the messages, in the relay node, is based on the cited address of relay node i and the address of the neighbor t, using the following forming rule: c i t = i + t mod 256, where i is the identification of the i-th node and t is the identification of the t-th neighbor.
To inform the coordinator which messages each relay node was able to capture and encode, each node needs to forward the addresses of its neighbor nodes. Each relay node i sends a sequence of bits, which represents the presence (1) or absence (0) of the message from a node t. The coordinator is aware of this forming rule and is able to reconstruct the coefficients used in the coded messages.

Wrap-Up
Among the works cited in the state of the art, it can be observed that the focus is on just assessing the relay selection behavior. These works do not address all steps of communication. That is, they do not mention the scheme or protocol used to carry out the retransmission of messages heard by the relay nodes. Thus, the question remains whether retransmission mechanisms based on network coding can maximize the reception of messages at the destination when relay nodes are used. Table 1 summarizes the network coding described works, comparing them among themselves concerning the following set of classifiers: the type of network coding used; if the coefficients used to code are sent together with the coded message; if a different strategy has been created for sending the coefficients; and whether there is a need to exchange additional messages to send the coefficients. It is possible to observe that a number of works use random linear network coding (RLNC), which can be due to its following advantages: First, the linear system-generated has a high probability of being solvable, if all the coefficients of all the encoding vectors were randomly selected, independently, and uniformly from the finite field F q , considering that the finite field size is sufficiently large relative to the size of the network [33]. Second, in this type of NC, there is no control overhead to construct and maintain a linear coding scheme among nodes [42], allowing the use of this type of network coding in commercial devices.
In addition, it is also possible to observe among the works that do not send the coefficients together with the coded message, that just three of those works do not generate additional messages on the network. Considering that the goal is to reduce the overhead of sending the coefficients, sending extra messages to configure the coefficients is just another form of network overhead.
In this way, the methodology proposed in this paper will consider both the communication mechanisms used in the transmission and the retransmission steps. In the transmission, we consider that the relay nodes are optimally selected and will be able to listen to the packets that the coordinator did not successfully receive, using the proposed relay selection technique, as described in Section 3.2. The retransmission step will consider advantageous characteristics of the following state-of-the-art works Bao and Li [38,39], Guo et al. [48], and Valle et al. [6], which proposed new strategies for sending the coefficients without generating additional messages. Besides, we will adapt the proposed mechanism to be applied upon a sparse version of the communication network.

WSN Communication
In this paper, a novel communication scheme is proposed intended to improve the communication reliability in wireless sensor networks. The proposed scheme uses Optimized Relay Selection Technique (ORST) [12] to select the best set of relay nodes and combines it with two novel network coding approaches based on random linear network coding and sparse linear network coding to perform message retransmissions.

System Model
Consider a cooperative WSN communication system with n source nodes (S), one destination node (D), and m relay nodes (R). It is assumed that each node is fitted with a single antenna and signals on S − R − D and S − D paths use orthogonal channels through time division multiple access (TDMA). Considering the advantages of star topologies such as synchronization, latency, and energy efficiency, it is usually considered the star topology as a suitable topology for industrial usage and is used in this paper [24,49].
The IEEE 802.15.4e amendment using LLDN (Low Latency Deterministic Network) MAC operation mode is adopted for the PHY (Physical) and MAC (Medium Access Control) layers of the network. The IEEE 802.15.4e amendment has been proposed to adequately address the critical requirements of industrial IoT applications such as low latency, high reliability, and robustness of the industrial environment [50]. In this paper, IEEE 802.15.4e is configured to send the acknowledgments (GACK-Group acknowledgments) for the data received on the same superframe that they were sent.
WSN communication occurs in two steps. In the first, called transmission, it is assumed that, in each beacon interval, each node in the network has one message to transmit and performs the transmission in its timeslot. At the end of the first step, the coordinator node sends a GACK message indicating which messages it has not received. The GACK message is a bit-map, in which if the coordinator received the message from node i, the position i of this vector will be 1 and zero otherwise. The second step is the retransmission, where the selected relay nodes will apply NC and transmit the heard messages in its retransmission slot, which is previously allocated by the coordinator node.
To perform the retransmission, the proposed relay selection technique selects the optimal set of relay nodes that can improve the diversity order and, thus, it can achieve higher throughput [24], as described in Section 3.2.

The Optimized Relay Selection Technique
The Optimized Relay Selection Technique (ORST) [12] was designed as an optimization problem using an objective function. The objective function (Equation (3)) takes into consideration the available energy in the nodes (e). This parameter was selected among the set of available parameters because it was later demonstrated [21] that this was the parameter with the higher impact upon the quality of the network operation.
The objective function aims to ensure that appropriate nodes are selected as cooperating nodes. Each node x i will calculate its objective function value W i and this information will be sent to coordinator. where: , being RE i the remaining energy and IE i is the initial energy of node x i , respectively. The e i value is the normalized remaining energy of node x i (a real number between 0 and 1); • the expression 1 e i is used so that the node with the largest amount of energy has the lowest cost in the objective function.
In order to select the minimum number of relay nodes, ensuring at the same time every node has a reachable relay, an optimization problem is formulated as follows: In the constraint presented in Equation (4b), A is the adjacency matrix of order n × n, where its element a i,j = 1 if node x i is a neighbor of node x j and a i,j = 0 otherwise. Matrix A is formed in the coordinator node based on the list of neighbors sent by each node of the network. Therefore, whenever the list of neighbors of a node x j has not been received by the coordinator, all elements of row j of matrix A will be equal to zero; y is a vector of order n × 1, where y i will be equal to 1 when node x i is selected as relay and 0 otherwise and; b is a vector whose b i value has been defined as 1, representing the minimum number of relay nodes of each node x i . Considering the WSN presented in Figure 2, the coordinator will build matrix A from the list of neighbors of nodes N 2 , N 4 , N 5 , N 6 and N 8 , which are the nodes from which the coordinator receives the message with the list of neighbors. All elements of N 1 , N 3 and N 7 rows of matrix A will be equal to zero.
The constraint presented in Equation (4c) is determined by the coordinator node, where matrix C represents the set of nodes that do not have an adequate communication link with the coordinator node. Each row of matrix C represents a node x i that is not able to directly communicate with the coordinator and each column represents a node that is able to hear this node. In this case, d will be equal to 1, in order to guarantee that at least one of these nodes will cooperate with node x i .
The proposed ORST scheme aims to find a set of relays among the WSN nodes, ensuring two conditions: (1) each node x i (1 ≤ i ≤ n) is covered by at least one relay node; (2) the sum of the weights of the relays is minimized. In this scheme, x i is used as node identifier and n is the total number of nodes in the network. There is one node called a coordinator in the WSN (C). The ORST scheme is a resource allocation algorithm that may be reduced to the classic set-covering problem applied to WSNs [51]. Considering the WSN presented in Figure 2, the rows of matrix C will be filled with nodes N 1 , N 3 and N 7 , which are nodes that do not communicate with the coordinator node. The columns will be filled with the nodes that listen to each of these nodes, that is, N 2 for node N 1 and N 3 ; N 8 for node N 1 ; and N 6 for node N 7 . In this way, the coordinator will find the relay nodes solving the optimization problem with the mentioned constraints.
The set-covering problem seeks to find a minimum number of sets that contain all elements of all data sets. According to [52], the set covering problem can be formally defined as follows. An instance (X, F ) of a set covering problem consists of a finite set X and a family F = s 1 , s 2 , ...s z of subsets of X (z is the total number of subsets in F ), such that every vertex of X belongs to at least one subset in F : A subset s ∈ F covers its elements. Thus, the problem is to find a minimum-size subset C ⊂ F whose members cover all of X: when a subset C satisfies the Equation (6), it covers X.
The ORST problem considers a WSN composed of a set of nodes X = {x 1 , x 2 , . . . , x n }, being that every node has an associate positive weight value (W i ) and a specific communication range. We construct a directed and weighted graph G = (X, E) in the following way. Each node x i corresponds to a vertex x i ∈ X and two vertices x i and x j have an edge e i,j ∈ E if x i is able to hear a message sent by x j with the value of RSSI ≥ −87 dBm, as defined by Srinivasan and Levis [53] as the minimum value for adequate communication in WSNs.
Every graph with X and E has subsets F = {s 1 , . . . , s k }, where each subset s k is known as a set cover of the graph G. Each subset of F is formed by vertices that accomplish conditions (1) and (2).
The WSN problem treated in this paper consists of finding the set-cover with minimum sum of weights. The corresponding decision problem generalizes the well-known NPcomplete vertex-cover problem and is therefore also NP-hard [52,54].
Based on the y i ∈ {0, 1} variables of the ORST problem, cited in the Section 3.2, the minimum set cover problem was formulated as a Binary Integer Problem (BIP). In [22] different solutions were investigated to solve the ORST problem, being the B&B algorithm defined as the best solution.
The B&B algorithm uses a tree search strategy to implicitly enumerate each of the possible solutions of a given problem [55]. The computational complexity of B&B algorithms is dependent on two factors: the branching factor b of the tree, which is the maximum number of elements (subproblems) generated at any node in the tree, and the search depth d of the tree, which is the length of the longest path from the root of T to a child element. Thus, the B&B algorithm has a worst-case running time of O(Mb d ), where M is the maximum time to solve a subproblem [56]. For further details, the reader is referred to [22].

Network Coding Technique
The retransmission scheme proposed in this paper combines the advantages of three previous methodologies proposed by Valle et al. [6], Bao and Li [38,39], and Guo et al. [48]. We use the equation proposed by [6] as a rule for forming the coefficients and the method of sending the coefficients used in [38,39,48]. The equation for the generation of the coefficients proposed in [48] requires that the destination node receives all messages sent in the network to be able to decode, making it impossible to use in a real network, in which message losses occur.
The coefficients are sent based on a bit-map representation. Thus, if the relay node listens to the packet from n neighbors, the corresponding position to each one of the n neighbors in bit-map is set to 1; otherwise, it will be set to 0. We consider that the relay nodes will never have a message from the coordinator to encode. In this way, we consider that the first position of the m-bit bit-map represents node 1, the second position represents node 2 and so on.
It is important to remark that sending coefficients via a bit-map technique induces a reduction in the overhead generated by sending the coefficients. A traditional RLNC technique sends a list with each of the used coefficients. If we consider that each coefficient has 8 bits and in the worst case, 255 coefficients are sent (star topology), there is an overhead of 2040 bits. Using m-bit bit-map, the overhead is reduced to m bits. Thus, in the same scenario, only 255 bits would be needed.
To use the coefficient formation rule, we modified the operation behavior of the technique proposed in [6]. The authors considered that each node in the network has an identification in hexadecimal format, representing its position in the slot scale, with addresses ranging from 00 to FF. Our scheme uses the id assigned to the node in the formation of the network, which starts at 1 and goes up to the total number of nodes in the network, which is limited to 255 nodes, considering a star topology and that the coordinator is the node 0. Thus, the coefficients used by the relay nodes when encoding the listened messages are generated by the following formation rule: where i is the id of the i-th relay node, j is the id from its j-th neighbor, and q is finite field size, which was defined to F 2 8 in [6]. The coefficient is assigned to be 0 if the relay node does not receive one corresponding packet successfully. The coordinator node is equipped with the same formation rule to solve the coding coefficients, according to the received bit-map in each packet. Figure 5 shows an example to illustrate the structure of the bit-map and the mapping between the coding coefficients and bit-map.  To illustrate how the structure of the bit-map works, we will consider node N 2 ; this relay node listened and encoded messages from neighbors N 1 , N 3 , N 4 , N 5 , and its own message. Generating the m-bit bit-map with the following content: 11111000. This is the bit-map that will be sent to the coordinator along with the encoded message.
When the coordinator node receives the coded message from node N 2 , it checks the bit-map and applies the coefficient formation rule, knowing that node i sent the coded message and messages that were encoded (from which neighbors j), the coordinator obtains the coefficient used for each message. Then, the coordinator node has to solve the system of linear equations presented in Equation (2) to recover the original messages. According to [6], the requirement of coefficient matrix (matrix G, in Equation (2)) being full rank was verified and any set of coefficients that follows the coefficient formation rule presented in Equation (7) could be used as elements of the coding vector on any relay.

Simulation Assessments
The network simulation tool OMNeT++ [57] and the WSN framework Castalia [58] were used to assess the operation of the relay selection technique and the retransmission scheme using network coding. The open-source Solve Library l p_solve [59] was used to solve the optimization problem.

Simulation Settings
In framework Castalia, several extensions were added to the available IEEE 802.15.4e LLDN model, including the collision free period (CFP), which is subdivided into guaranteed time slots (GTS) for uplink messages forwarded from the nodes to the coordinator; and the group acknowledgment (GACK) timeslot. This was necessary because Castalia still does not have a fully functional implementation of the LLDN communication mode.
The simulation assessment was performed considering networks with 21, 41, 61, 81, and 101 nodes, one of the nodes being the personal area network (PAN) coordinator. Nodes were randomly deployed in an area of 50 × 50 m 2 , with the PAN coordinator positioned in the center. The used channel model was the free space model without time-varying. Other simulation parameters are described in Table 2. The simulation execution time was set to 450 s, during which the coordinator is able to send up to 50 beacons. The radio model used was CC2420, which is compliant with the IEEE 802.15.4e PHY Standard. All the nodes use the same constant transmission power of 0 dBm. To reduce the statistical bias, each simulation was performed 60 times, reaching a confidence interval of 95%. For each simulation round, the position of the nodes around the coordinating node was randomly reorganized. That is, the distance between the coordinator and the nodes also varies in each simulation round.
Additionally, simulations were performed considering a dynamic topology, where only 50% of nodes were associated with the network at time zero and the remainder were subsequently associated in groups of 5 by 5 nodes. The first group at time instant 50 s and then all the other groups every 30 s. Considering the scenario with the highest number of nodes (100 nodes), after 320 s, all nodes were associated. Later, from the time instant 320 s of simulation, 20% of the nodes of the network randomly left the coverage of the coordinator node. This leaving operation was performed in groups of four nodes, every 10 s of simulation. Finally, all nodes again joined the network, in the same order they have left (groups of 4 in 4), from the time instant 350 s of simulation, respecting an interval of 10 s for each group, except for the case of the network with 100 nodes, where only 10% of the outgoing nodes returned.
The dynamic topology mode was designed to force the list of neighbors to undergo multiple changes during the simulation time, in order to assess the reliability of the dynamic relay selection procedure.

Network Coding Technique Application Scenarios
Based on the results obtained in previous works [12,22], we know that relay nodes are optimally selected. However, in those previous works, the listened messages were not really sent. Instead, it was just sent a list with the nodes listened to by each relay node. In this way, it was possible to identify if the selected relay node could hear all or almost all the nodes that the coordinator did not hear. In order to assess the delivery of messages heard by each relay node, we consider three different retransmission scenarios. Thus, it is possible to identify the impact of the retransmission step on communication when using the ORST and network coding technique together.
In all scenarios, the method of generating and sending the coefficients used in the network coding will be the method described in Section 3.3.
1st Scenario: The first scenario is a typical RLNC scenario. Relay nodes store all messages heard during the transmission step. In sequence, relay nodes encode all stored messages and retransmit the encoded message to the coordinator node.
2nd Scenario: The second scenario is a typical SLNC scenario. Relay nodes store all heard messages during the transmission step. However, they encode just a small number of messages among the set of listened messages. In this scenario, the network coding technique becomes sparse linear network coding. Each relay node randomly selects three messages among the listened messages; it applies the network coding technique generating a single message and retransmits it to the coordinator node.
3rd Scenario: In the third scenario, GACK was used as a resource. It contains an M-bit bitmap to indicate successful and failed transmissions in the same order as the transmissions. Thus, after all nodes carry out the transmission, the coordinator node sends a GACK message, which contains the bit map informing which messages were the ones that have failed. After receiving the GACK message, each relay node selects three messages from those that were not received by the coordinator, at random, encodes, and retransmits to the coordinator. In this scenario, we continue to apply the sparse linear network coding version. However, strategically, we only selected messages that the coordinator was unable to correctly receive in the transmission step.

Simulation Assessment
The simulation assessment was performed considering the following metrics to measure the network quality performance: success rate, energy consumption, and the correlation between the average number of retransmitted messages per node and the average number of recovered messages in the decoded process.
The success rate represents the ratio between the number of sent messages and the number of messages that successfully reached the coordinator. This metric considers messages transmitted in both the transmission and retransmission attempts. In the retransmission attempts, just the messages that have been successfully decoded are considered. Energy consumption represents the average amount of energy spent by each node, obtained through the resource management module available in the Castalia framework. The average number of retransmitted messages per node represents the average number of retransmissions each node performed, i.e., the average number of coded messages sent per each node. Finally, the average number of recovered messages in the decoding process represents the average number of messages that were recovered by the coordinator node solving the linear system generated by the network coding. Figure 6 illustrates the energy consumption of the network. It is possible to observe that communication scenarios that use network coding spend more energy. This was an expected result, considering that relay nodes remain awake longer, listening to the messages in the transmission step and retransmitting the messages in the retransmission step.  Figure 6. Energy consumption. Figure 7 illustrates the success rate considering the three communication scenarios presented in Section 4.2 compared to a network without relay nodes. This result was surprisingly negative due to the small number of selected relays, as it will be shown in the following. At first sight, it was expected that the network coding linked to the ORST technique would increase the success rate of the network and consequently increase the reliability of communications. However, it is possible to observe that the behavior of Scenario 1 is similar to the network without relay nodes (where the node itself retransmits the messages for which it did not receive the ACK). Scenarios 2 and 3 show a clear improvement compared to this behavior specially for networks with less than 60 nodes.  Figure 8 presents the correlation between the average number of retransmitted messages per node and the average number of recovered messages in the decoding process. It is possible to observe in Scenario 1 that the number of retransmitted messages is much greater than the number of recovered messages in the decoding process. In networks with 40, 60, 80, and 100 nodes, the average number of recovered messages was very close to zero. That is, there was almost no message recovery. In Scenarios 2 and 3, a greater number of messages was retrieved when compared to Scenario 1. However, we expected the coordinator to recover more messages, considering that each coded retransmission contains at least three messages. Investigating the obtained results, it is clear that the number of relay nodes has a direct impact upon the operation of the network coding scheme. Figure 9 presents the average number of selected relay nodes, considering all scenarios. As it can be seen, the average number of relay nodes is very small. Even considering networks with 100 nodes, the average number of relay nodes is smaller than 4. Analyzing this problem from an equation solving perspective, the coordinator node in order to be able to decode the received messages, solves a linear system that must result in a single solution. A linear system with more unknowns than equations may not have any solution or have an infinite number of solutions, but it will never have just one solution. In the context of network coding, the number of equations corresponds to the number of coded retransmissions received by the coordinator, and the number of unknowns corresponds to the number of different messages that were coded, which the coordinator did not receive successfully in the transmission step. This fact explains why Scenario 1 performed so poorly. In this communication scenario, each relay node codes all the heard messages. That is, there were more unknowns than equations. Thus, it was impossible to decode the received messages. This is a typical problem that often arises in network coding applications.
We have also analyzed why there was no significant improvement in the success rate of Scenarios 2 and 3. Two facts can be considered: first, the number of messages that did not successfully reach the coordinator in the transmission stage was high, around 40%, and second, the relay nodes coded only a small number of messages (three messages for each retransmission). Thus, even when the coordinator recovers some messages in the decoding process, a large number of messages that were lost were not retransmitted, which resulted in unreliable communication.
To maximize the success rate of the network, we propose a different communication approach, represented by two new scenarios, Scenarios 4 and 5. In Scenario 4, we increased the number of relay nodes and continued using SLNC. In Scenario 5, we do not use network coding. Each relay node retransmits the messages it listened to one by one, as described below.
4th Scenario: This scenario simply increases the number of relay nodes. When performing the relay selection, the coordinator node selects two auxiliary nodes for each relay node and signals this information in the beacon. Thus, the nodes that will cooperate and assist in the retransmission know that they have been selected. In this way, if there are three relay nodes, there will be six auxiliary nodes, two for each relay node. Auxiliary nodes are selected as follows: For each neighbor of the relay node that communicates with the coordinator, it is checked the list of heard nodes. The neighbors that it listens to and, at the same time, the coordinator does not listen to are counted. The two neighbors of the relay node, which communicate with the coordinator and have the largest number of neighbors that do not communicate with the coordinator, will be selected to assist the relay node.
In the retransmission step, the relay node C i intersects its list of listened messages (L C i ), with the list of messages lost by the coordinator (L LostCoord ) and with the list of messages heard from the auxiliary nodes (L Aux j and L Aux j+1 ), according to Equation (8).
The result of Equation (8) are the common messages that the coordinator needs and that the relay node and the two auxiliary nodes also have. The auxiliary nodes of each relay also perform the intersection operation presented in Equation (8). The list of messages resulting from Equation (8) (I) is ordered by the Id of the nodes that sent them. Thus, both in the relay nodes and in the auxiliary nodes, the list I presents the same messages in the same order. When coding and retransmitting, the relay node selects a message that only it listened to and the first two messages from the list I resulting from the intersection. Retransmitting a coded message (M C i ) containing three messages listened to (M C i = m C i + m I 1 + m I2 ). The auxiliary node Aux j selects the first two messages resulting from the intersection (M Aux j = m I 1 + m I 2 ). The second auxiliary node Aux j+1 selects its own message and the second message from the list resulting from the intersection (M Aux j+1 = m Aux j+1 + m I 2 ).
This organization in the selection of messages that will be encoded and retransmitted by the relay and auxiliaries nodes allows the coordinator to effectively solve the linear system, decoding the lost messages and recovering them.
5th Scenario: The relay nodes will retransmit each of the messages listened to in individual slots without using NC. The coordinator node is the one who will allocate slots for each relay node, according to the number of messages that each relay will retransmit.
For the correct operation of the network, there is a configuration period. This period precedes each of the relay selections in the ORST technique [12]. It is during this period that the coordinator receives from all nodes, which communicate directly with it, the neighbor's list of each node. The neighbor's list is a bit-map, where each index of the bit-map represents a network node, and the content of the bit-map in "1" represents that the nodes are neighbors and "0" otherwise. The coordinator node uses this information from the neighborhood of each relay node together with GACK information from the previous Beacon Interval, to determine the number of slots that each relay node will receive to carry out the retransmissions.
The process to allocate slots for each relay node occurs as follows: First, the coordinator checks its GACK bit-map to identify which messages were lost. Then, the bit-map that represents the neighborhood of each relay node is updated, keeping in "1" only the positions that represent the listened neighbors and at the same time the messages not received by the coordinator by direct transmission. Up to this stage, the coordinator can identify how many slots each relay node would need, if it was to retransmit all the messages it heard, among those that the coordinator lost. The coordinator keeps this information for each relay node. In order to optimize the allocation of the slots and to prevent the relay nodes retransmiting repeated messages, the binary AND operation is performed with the neighbor's list of each relay node. Thus, it is possible to identify which relay nodes have heard the same messages, for example, in a network with five nodes, in which nodes N 1 and N 3 are relay nodes. The coordinator node lost messages from nodes N 2 , N 4 and N 5 . Considering the neighbors' list of each relay nodes already updated with the GACK information, the neighbors' list of node N 1 is represented by N 1 = N 4 , N 5 and the neighbor's list of node N 3 is represented by N 3 = N 2 , N 4 and N 5 . The illustration of the AND operation performed by the coordinator, in this example, is shown below: The binary AND operation will result in the elements that both relay nodes heard. After identifying which messages were listened to by more than one relay, the coordinator verifies which relay node has the least number of messages to be retransmitted and selects it to be the retransmitter. The relay that will retransmit is selected considering the number of messages to be retransmitted to balance the energy consumption among the relay nodes. This is because the more messages each relay has to retransmit, the greater the energy consumption of this node will be. In the cited example, both relay nodes listened to messages from nodes N 4 and N 5 . Disregarding the messages they both listened to, the node N 3 has an element in its relay list (the message from the node N 2 ), and the node N 1 does not have any element. Thus, the first element that the two relay nodes hear will be assigned to node N 1 , and the second can be assigned to anyone since both have the same number of messages to be retransmitted.
After selecting the relay node that will retransmit each message, the coordinator decreases the number of slots that would be assigned to the other relay node that had heard the same message and that was not assigned to it. If at the end of this process, it is identified that the number of messages lost is greater than the number of available slots, inevitably some messages will not be retransmitted.
We limit the total number of slots between transmissions and retransmissions to 140, as the goal is to maximize the network's success rate with the least number of messages being retransmitted. In a network that uses only ARQ (Automatic repeat request) protocol, each node in the network performs retransmissions whenever it does not receive an ACK. This way, the number of retransmissions can be even greater than the number of nodes in the network, considering that there are approaches that allow a node to perform the same retransmission a number x of times in case it does not receive an ACK. The objective of this scenario is that even considering a network of 100 nodes, it is possible to retransmit without needing a retransmission slot for each node in the network. Thus, the success rate will be maximized without increasing the beacon interval period between transmissions.
The information of which relay node should send the message will be sent in the next GACK message. Thus, when the relay node receives the message of GACK, it checks if, in any of the messages that the coordinator has lost, there is its own id signaling that it must be retransmitted. The relay knows that the messages lost by the coordinator must be retransmitted if, in the GACK, the id of the retransmitter is either marked as "0" or with its own id. If it is zero, it means that only it heard the missed message, and if it is the id itself, it means that more nodes listened, but he was the one selected to retransmit. Figure 10 illustrates the success rate considering all the scenarios. It is possible to observe that Scenario 4 presents a significant improvement over Scenarios 1, 2, and 3. However, the scenario that presented the best results was Scenario 5, which in networks with 20 and 40 nodes the success rate was above 95%, and in networks with 60, 80, and 100 nodes the success rate was maintained above 90%. Finally, Figure 11 presents the energy consumption in all scenarios. It is possible to observe that Scenario 4, where auxiliary nodes were selected, presented a higher energy consumption. This is understandable since a greater number of nodes will have the radio on for the entire transmission stage listening to neighbors. Scenario 5 presented the energy consumption similar to Scenarios 1, 2, and 3. Thus, Scenario 5 was the one with the best results, and it can be considered the best retransmission scheme to be used linked to the ORST technique.  Figure 11. Energy consumption.

Conclusions
Nowadays, with the industry 4.0 paradigm, smart devices with sensing communicating and actuating capabilities are common in the industrial environment. Thus, sensors and actuators are integrated into the environment and communicate transparently, growing the use of WSNs. However, WSNs present challenges in maintaining reliable communication, being necessary to apply extra mechanisms to improve their performance.
Cooperative communication has been proposed to enhance the reliability of wireless communication. When applying this type of solution, it is required to determine which nodes will be the relay nodes and how they collaborate in the retransmission process. This paper focused on combining the adequate selection of relay nodes and retransmission techniques. It was proposed the use of the ORST scheme, whose target is to adequately select relay nodes without generating overheads or excessive energy consumption, combined with four different retransmission techniques, three of them applying network coding techniques.
Differently from what we assumed when starting this study, the ORST technique working together with network coding approaches did not present interesting results. The main reason was due to the fact that ORST technique selects a small number of relay nodes. As a consequence, each relay node generates a new encoded packet with a large number of received messages, resulting in a small number of equations in the linear system and the coordinator will not be able to decode incoming messages. A key aspect of the linear network coding technique is that a node needs to receive a number of coded messages greater than or equal to the number of original messages, to successfully decode the original set of messages. However, the efficiency of the ORST technique is demonstrated when the relay nodes retransmit a set of the messages listened to in individual slots without using network coding; the success rate is greater than 90% and energy consumption is only slightly above the case without relays.
As future work, we intend to assess the implementation feasibility of the proposed schemes using available COTS (commercial off-the-shelf) WSN nodes. This implementation has been done in a centralized topology, where the PAN coordinator can be implemented as follows: as a device that has extra resources to perform the calculation or as a device with limited computational resources but which is connected to a computer that performs the processing and returns the solution to the coordinator. All the other nodes, on the other hand, can be devices with limited computational resources.