Optimal Number of Message Transmissions for Probabilistic Guarantee of Latency in the IoT

The Internet of Things (IoT) is now experiencing its first phase of industrialization. Industrial companies are completing proofs of concept and many of them plan to invest in automation, flexibility and quality of production in their plants. Their use of a wireless network is conditioned upon its ability to meet three Key Performance Indicators (KPIs), namely a maximum acceptable end-to-end latency L, a targeted end-to-end reliability R and a minimum network lifetime T. The IoT network has to guarantee that at least R% of messages generated by sensor nodes are delivered to the sink with a latency ≤L, whereas the network lifetime is at least equal to T. In this paper, we show how to provide the targeted end-to-end reliability R by means of retransmissions to cope with the unreliability of wireless links. We present two methods to compute the maximum number of transmissions per message required to achieve R. MFair is very easy to compute, whereas MOpt minimizes the total number of transmissions necessary for a message to reach the sink. MFair and MOpt are then integrated into a TSCH network with a load-based scheduler to evaluate the three KPIs on a generic data-gathering application. We first consider a toy example with eight nodes where the maximum number of transmissions MaxTrans is tuned per link and per flow. Finally, a network of 50 nodes, representative of real network deployments, is evaluated assuming MaxTrans is fixed. For both TSCH networks, we show that MOpt provides a better reliability and a longer lifetime than MFair, which provides a shorter average end-to-end latency. MOpt provides more predictable end-to-end performances than Kausa, a KPI-aware, state-of-the-art scheduler.


Introduction
The Internet of Things (IoT) is transforming our daily life at home [1], at work, in our cities [2], in transportation [3], in sport training and healthcare, in process control and automation, in smart farming and beyond.

Context
At home or in the office, the IoT allows us to save time and energy by controlling lights and appliances and knowing our resource consumption habits. In business and industry, it increases productivity and efficiency by streamlining processes. In transportation, it helps people to enjoy services of better quality. IoT devices are electronic devices able to communicate with a network and perform a task. We can draw a distinction between consumer devices, smart home devices, enterprise and industrial IoT devices. In this paper, we do not focus on IoT devices themselves, but rather on applications using these devices. More precisely, we will consider IoT applications that are responsible

Related Work
Different ways to estimate the quality of a wireless link exist. They all rely on link measurements obtained by an active, passive or hybrid link monitoring. A usual classification distinguishes between hardware and software-based link quality estimators [9,10]. Hardware estimators have the advantage of being directly provided by hardware without the need for any additional processing overhead. However, their accuracy is not very good for two reasons. First, they are measured only on successfully received packets, and they do not take into account the number of packet losses. Second, they are not evaluated on the whole packet received but only on eight symbols of this packet. The main hardware-based quality estimators are RSSI (Received Signal Strength Indicator), LQI (Link Quality Indicator), and SNR (Signal-to-Noise Ratio). In addition, LQI depends on the manufacturer of the radio transceiver used. For the software-based quality estimators, we distinguish those based on PRR (Packet Reception Rate), from those based on RNP (Required Number of Packet retransmissions) and finally those using a score. For instance, ETX (Expected Transmission Count) [11] is classified as an RNP-based estimator, whereas F-LQE (Fuzzy Link Quality Estimator) [12] is a score-based one assessing link quality in terms of four properties: Smoothed Packet reception Ratio, Stability factor, ASymmetry Level and channel Average Signal-to-Noise Ratio. In [13], the authors introduce Bmax, the maximum count of consecutive transmissions needed for a frame to be transmitted on a link. This metric evaluates the link burstiness, where a burst is defined as a period of continuous packet loss. It is obtained empirically for each link based on previous transmissions and can be seen as an observed worst case. Then, they build a schedule that allocates Bmax cells for every frame that a link is supposed to carry. The problem with such a method is that it may require a long delay (more than 140 hours according to the authors) before a representative Bmax is known for all the links.
Whatever the link quality estimator used, authors usually agree on the following conclusions: An ideal link quality estimator should be [10] energy efficient (i.e., requiring low processing, communication and memory overhead), accurate (i.e., reflecting the real link behavior), reactive (i.e., able to promptly react to persistent link state changes) and stable (i.e., able to tolerate transient link state changes). • To better reflect the real behavior of a link, several link properties should be taken into account. That is why link quality estimators tend to combine several simple estimators [14] and use sophisticated techniques (e.g., simple average, filtering, machine learning [15,16], regression, and fuzzy logic [17,18]) to produce a metric from link measurements.
Link quality estimators have been used to improve the quality of service provided to end-users [19]. By using links of better quality, the network throughput is maximized, the delivery times are minimized, routes are more stable [18], etc. These improvements can be increased by link quality prediction using online machine learning techniques as in [20]. In such a case, packets are then able to avoid links before their quality degrades below an acceptable threshold, and routing reactivity is improved. Furthermore, if both the link quality estimator and the routing protocol take energy into account, network lifetime is maximized, as in [21].
Closer to our work, Gaillard et al. propose a greedy algorithm that optimizes the distribution of links used by flows network-wide [22]. These authors extend TASA [23], a well-known centralized scheduling algorithm for FTDMA networks, to take into account retransmissions and fragmented packets. They build a schedule that complies with reliability expectation by adding extra cells that are used in the case of consecutive retransmissions. They take into account link quality and packet fragmentation. The same authors further extend their research in [8] by proposing Kausa, a centralized scheduling algorithm that builds resource paths that guarantee QoS per flow, when multiple applications are using the same network, and each application has its own requirements and traffic flows.
In this paper, we propose to use a simple link quality estimator: PDR (Packet Delivery Rate) that is easily computed by the sender as the ratio of the number of acknowledged frames to the number of sent frames. Even if this estimator is not ideal, as previously defined, it provides an accurate estimation of link behavior because, unlike hardware-based link estimators, it takes into account packet losses. We use the PDR link quality estimator to compute the maximum number of transmissions per message and per flow over any link visited by the flow considered.
We share the same goal as Kausa [8]: namely, to build a centralized schedule ensuring that flows meet their required end-to-end latency and delivery rate, by means of message retransmissions. Like [8], we adopt a per flow approach, allowing us to differentiate QoS per flow. However, unlike [8], we consider the optimal solution for any flow to be the one that minimizes the total number of transmissions per message of this flow, while achieving the desired end-to-end reliability. Furthermore, traces obtained from real IoT networks like [24,25] allow us to better understand the challenges of wireless networking and make realistic assumptions. That is why, in this paper, the performance evaluation is based on PDR values computed from a realistic model, as explained in Section 6.1.

Optimal Retransmission Estimation
To cope with the unreliability of network links, unacknowledged messages are retransmitted up to a maximum number of retransmissions. We have to determine this maximum number that of course depends on the unreliability of the link considered as well as the targeted end-to-end reliability R.
Since each flow may request its own QoS, we adopt a per-flow approach. Without loss of generality, we assume that each message is labeled with its flow tag, which includes the origin node of the flow.

Assumptions and Basic Properties
Let us consider the flow f originating from any sensor node N h with h ∈ [1, n]. Let h denote the path length (i.e., the number of hops) from N h to the sink N 0 . To simplify the notation, we assume that the path is defined by For any link j ∈ [1, h] of f 's path, we denote by M j the maximum number of transmissions of any message of f transmitted by node N j to its parent in the routing tree, and P j the probability of successful acknowledgment receipt after a single transmission, whereas R j is the probability of successful acknowledgment receipt on link j after a maximum number M j ≥ 1 of transmissions. Table 1 gives the notations used in this paper. In a TSCH network, this assumption means that the scheduling algorithm has assigned enough transmission cells to each node. As a consequence, the unreliability of a transmission is only due to the unreliability of the wireless link considered. If this assumption were not true, then the computation of the transmission reliability should include not only the message loss due to the unreliability of the wireless link considered but also the message loss due to Transmission queue overflow.
We first recall some basic properties: The end-to-end reliability on a path is equal to the product of the reliability of each link composing that path.

Property 2.
The probability of successful acknowledgment receipt over any link j increases with the maximum number of message transmissions M j according to the following equation: Proof. After M j transmissions, the acknowledgment of a message is not received successfully with a probability equal to (1 − P j ) M j . Hence, the probability of successful acknowledgment receipt after M j transmissions is equal to 1 − (1 − P j ) M j .

A Fair Method
The question is: knowing the end-to-end reliability that must be met on a given path, how to distribute this targeted end-to-end reliability into the targeted reliability of each link on that path? The maximum number of transmissions on each link is then deduced from the targeted reliability on the link and the probability of successful transmission on that link. In this paper, we propose two methods that can be applied to any network meeting Assumptions 1-3.
Since we adopt a per-flow approach, we consider any flow f , originating from a node h hops away from the sink. In other words, the path of f consists of h links, h ≥ 1. Since the end-to-end reliability on the path is equal to the product of the reliability of each of its links, which should be ≥ R, a simple solution consists of fairly and uniformly sharing the end-to-end reliability over each link of the path. Hence, each link j has to meet a reliability equal to R j = R 1/h . The maximum number of transmissions on any link j is then equal to M j = Log(1−R 1/h ) Log(1−P j ) . This principle is applied by the MFair Algorithm given hereafter (see Algorithm 1).

Algorithm 1:
MFair: compute the number of transmissions on each link j to reach a reliability R 1/h over this link per message of flow f visiting h links.
Require: R the targeted end-to-end reliability, P j = the success probability of receiving the acknowledgment after a single message transmission over link j with 1 ≤ j ≤ h Ensure: M j is the minimum number of transmissions on link j to achieve R 1/h for each link j do if P j = 1 then end if end for Property 3. The processing complexity of the MFair Algorithm is in O(1).

Proof. The complexity of the Fair Algorithm is equal to the complexity of computing
for each link, times h the number of links, hence the property. Figure 1 depicts the maximum number of transmissions over a link whose probability of successful acknowledgment receipt ranges from 0.5 to 0.9 for a 4-hop flow when the targeted end-to-end reliability ranges from 0.9 to 0.9999.

An Optimal Method
For any sensor node N i , let us denote f the flow originating from N i . The problem consists of minimizing the total number of transmissions needed to deliver any message of f to the sink, under the constraint that the end-to-end latency is greater than or equal to R.
The optimization problem can be defined as follows: Goal: Find the values of M j for each link j visited by f , j = 1 · · · h that minimize ∑ h j=1 M j under the constraints: If several solutions exist for the same total number of transmissions, choose the solution Lemma 1. Each link j = 1 · · · h should provide a reliability R j at least equal to the requested end-to-end Proof. If there is a link j that provides a reliability R j < R, then the product of the reliability of all other links should be greater than 1 to obtain an end-to-end reliability of R, which is impossible. Therefore, each link has to provide a reliability at least equal to R by means of retransmissions. Since the reliability of a link j increases with the maximum number of transmissions according to Equation (1), to obtain an end-to-end reliability of R, the number of transmissions M j should be greater than or equal to that needed to obtain a link reliability of R, leading to M j ≥ Log(1−R) Log(1−P j ) .

Lemma 2.
There is no solution ensuring an end-to-end reliability ≥ R, with a total number of transmissions Proof. It is deduced from Lemma 1.

Lemma 3.
If there is no solution ensuring an end-to-end reliability greater than or equal to R, with a total number of transmissions Log(1−P j ) and, if increasing M j to M j + 1 for the link j maximizing the value of P j (1/R j − 1) does not achieve R, then there is no solution ensuring an end-to-end reliability R in M + 1 transmissions.

Proof.
Assuming that there is no solution ensuring an end-to-end reliability ≥ R, with a total number of transmissions Log(1−P j ) , we compute, for any link k, the benefit on the end-to-end reliability brought by increasing M k to M k + 1. The new end-to-end reliability can be Hence, maximizing newR means selecting the link k that maximizes P k (1/R k − 1). Hence, if with that link k, the end-to-end reliability newR is not greater than or equal to R, then no other link can meet the end-to-end reliability R with a total number of transmissions equal to M + 1.

Property 4. The MOpt algorithm given in Algorithm 2 finds the optimal solution.
Proof. The algorithm starts with the smallest possible number of transmissions M, according to Lemma 2. If the requested end-to-end reliability is met, then the solution is found. Otherwise, the algorithm increases the total number of transmissions by one and checks again whether the requested end-to-end reliability is met by increasing the number of transmissions of one on the link providing the highest reliability gain. If yes, the solution is found, otherwise there is no solution for M + 1 transmissions.

Algorithm 2:
MOpt: Compute the number of transmissions M j on link j to achieve an end-to end reliability ≥ R and minimize the total number of transmissions per message of flow f visiting h links Require: R the targeted end-to-end reliability 0 < R < 1, P j = the probability of successful acknowledgment receipt after a single transmission over link j with 1 ≤ j ≤ h Ensure: M j is the minimum number of transmissions to achieve R for each link j do if P j = 1 then If several links provide the same Gain, take the farthest link j from the sink We now evaluate the complexity of this algorithm and more particularly we upper bound the number of iterations needed to find the maximum number of transmissions over each link such that the total number of transmissions per message of the flow considered is minimized. Property 5. The MOpt Algorithm, given in Algorithm 2, finds the optimal solution in a number of iterations less than or equal to ∑ h j=1 Log(h) Proof. Let us consider any link j visited by the flow considered. Since .
We then obtain where M j, f air denotes the maximum number of transmissions over link j, whereas M j,init denotes the first value tried by the optimal algorithm. Since ∑ h j=1 M j, f air is the maximum number of transmissions of a message to reach the sink provided by the Fair algorithm and this number is never exceeded by the Optimal algorithm, the maximum number of iterations of the Optimal algorithm is upper bounded by Log(1−P j ) , hence the property.

Expected vs. Maximum Number of Transmissions
For any flow considered and whatever the method adopted to compute the maximum number of transmissions over each link visited by the flow considered, the real number of transmissions used is much smaller than the maximum one, which occurs with a very low probability. In this section, we want to evaluate the energy saving obtained with a variable number of transmissions instead of always considering the worst case that has a very low occurrence probability. With the Assumptions 1-3, we can evaluate E (M j ) the expected number of transmissions on any link j, knowing that M j , the maximum number of transmissions, is computed either by MFair or by MOpt. We then have:

Property 6. The expected number of transmissions on any link j is equal to
where M j denotes the maximum number of transmissions used for link j.
Proof. Let us consider S n (x) = ∑ n k=1 (1 + x) k and let S n (x) be its derivative. We have S n (x) = ∑ n k=1 k(1 + x) k−1 . Since S n (x) is the sum of a geometric progression, we have: Hence, the derivative is: By replacing x by −P j and n by M j − 1, we get , hence the property. Figure 2 depicts the maximum and the expected numbers of transmissions per message of a 4-hop flow, for a targeted end-to-end reliability of 0.9999, over a link whose success probability per transmission ranges from 0.5 to 0.9. We observed that, in all the cases studied, although the value of E (M j ) decreases when P j , the success probability per transmission increases, E (M j ) remains constant and equal to 2. By stopping its retransmissions as soon it receives an acknowledgment, the sender saves its energy. For instance, let us consider that for any link j for which P j = 0.5 is visited by a 4-hop flow, the sender saves M j − E (M j ) transmissions for this link that is 16 − 2 = 14 transmissions for MFair. Since the flow visits four hops and assuming that its four links have the same success probability, each of the four transmitting nodes saves 14 transmissions. Assuming that the schedule is periodic with a period of 101 time slots (the default value in the TSCH network), and this flow generates a message per schedule period, these four nodes save 140 ms each schedule period of 101 × 7.25 = 732.25 ms, for a slot duration of 7.25 ms. This corresponds to an increase of 19% in sleeping time per schedule period for each node. It is useful to compare this value with the value of ETX on link j, denoted by ETX j . Let D j be the delivery rate from node j to its parent, whereas D r(j) is the delivery rate in the reverse direction. Since P j is defined as the probability for j to receive the acknowledgment of its message, we have to take into account the delivery rates of both directions, which gives: ETX j is defined as: As a consequence, ETX j does not depend on the targeted end-to-end reliability R but only on the reliability of both directions of the link considered. This explains the main difference between ETX j and M j .
In addition, ETX j is not equal to E (j). Indeed, E (j) assumes a maximum number of transmissions equal to M j , whereas ETX j assumes a number of transmissions that may be infinite.

Framework for a TSCH Network
The MOpt and MFair methods are now applied to compute the maximum number of transmissions per link and per flow. Flows are generated by a low-power network based on the TSCH technology [7].
We focus on data gathering applications with end-to-end requirements in terms of reliability and latency, as well as requirements with regard to network lifetime. The network supporting these applications is a TSCH network [7].

TSCH Network
In a TSCH network, the medium access is time-slotted and several transmissions are done on different channels in the same time slot. More precisely, transmissions are scheduled in cells, where a cell is defined by its channel offset and its time slot offset. There are two types of cells: shared cells where any node having a message to transmit is allowed to do so, and dedicated cells, where only the transmitter defined in the schedule is allowed to. The choice of a wireless TSCH network helps to meet Assumptions 1-3 because the mapping between logical channels and physical ones changes at each time slot. Thus, even if a message is retransmitted in the next slot and on the same logical channel as previously, it will be transmitted on a different physical channel.
The schedule of transmissions is periodic and conflicts in dedicated cells are avoided. In addition, nodes know from the schedule in which slots they are allowed to transmit or to receive. They sleep in any other slot in order to save energy.

Scheduling Function
The scheduling algorithm, which is assumed to be centralized in this paper, works per flow: it allocates the cells needed to transmit a message from the flow origin to the sink. More precisely, it proceeds hop by hop, starting from the flow origin and allocating to each visited node the number of cells needed to receive the message from its child and then the number of cells needed to transmit this message to its parent. Since the scheduler does not know a priori which message transmission will be successful, it has to take into account the worst case where a message is received by the next hop after the maximum number of transmissions for this link and this flow. Hence, for each message, the scheduler allocates to each visited link a number of cells corresponding to the maximum number of transmissions on that link for the flow in question.
However, this does not mean that each message is transmitted a maximum number of times. In fact, as soon as the sender has received the acknowledgment of any message msg, it stops retransmitting msg and may use the slot foreseen for a retransmission of msg for the transmission of another message, if it has one in its Transmit queue.
Any sensor node first transmits the message in its Transmit queue that has the highest flow priority, as the primary criterion and the smallest timestamp within a same flow, as the secondary criterion. This assumes that messages are timestamped when they are generated by their origin node.
The Load-based scheduler is selected because of its simplicity combined with its very good performances [26]. This scheduler schedules first the flow originating from the most loaded node. The load of a node is computed as the number of cells needed to transmit its own flows, plus the number of cells needed to receive and transmit the flows originating from its descendants.

Computation of Key Performance Indicators
In this paper, we consider three Key Performance Indicators (KPIs) that matter for Industry 4.0 and the IoT. We now show how to compute them for a TSCH network and a scheduling function defined in Sections 4.1 and 4.2, respectively. These three KPIs are: • The maximum end-to-end latency L is the maximum time elapsed between data generation by a sensor node and its delivery to the sink. To compute this value within the framework defined in Section 4.2, we make an additional assumption: With Assumption 4, the maximum end-to-end latency [27] is obtained when the last slot assigned to the node considered has just elapsed and then only the last transmission of the message is successful. This gives: where UsedSlots is the number of slots used by the schedule for data gathering. Hence, the smallest maximum end-to-end latency that can be achieved is obtained by an optimal schedule, which uses MinSize the minimum number of slots for data gathering and for a slotframe duration equal to this number of slots. This smallest maximum latency is equal to • The end-to-end reliability R provided by the network. It is evaluated by the ratio of the total number of user-data messages sent by the sensor nodes over the total number of user-data messages delivered to the sink. • Network lifetime T is defined as the time the first node runs out of battery. Network lifetime can be expressed as: where Initial_Energy(N) denotes the initial energy of node N, Average_Energy_Consumption(N) is the average energy consumption of N per slotframe, and SFDuration is the slotframe duration.
To evaluate the network lifetime, defined as the time up to first battery depletion of the busiest node, we use the parameters whose values are given in Table 2. To summarize, the IoT network has to guarantee that at least R percent of the messages generated by sensor nodes are delivered to the sink with a latency ≤ L, whereas the network lifetime is at least equal to T.

Generalization of the Theoretical Bound on the Maximum Latency
We now compute a theoretical bound on the maximum latency when TXCell f (N i ) < MaxTrans, where TXCell f (N i ) denotes the number of TX cells assigned to flow f on node N i .
Let f be the flow whose message m has the maximum end-to-end latency. Let N k be the source node of f which is k hops away from the sink. Flow f visits successively N k , N k−1 , · · · N 1 and then the sink. We adopt an additional assumption: on any visited node, m is never delayed by another flow. The worst case occurs when on N k message m is generated just after the last slot assigned to N k . Hence, N k has to wait for the next slotframe to transmit m. In addition, on any node, only the last transmission (i.e., the MaxTrans th transmission) is received in the worst case; the previous ones are lost. According to the framework defined in Section 4.2, when any node N i receives a message of flow f in a slotframe, it has TXCell f (N i ) cells to transmit it to its parent in the current slotframe. In each slotframe, any node N i has ∑ g TXCell g (N i ) opportunities to transmit a message to its parent, where g is a flow visiting N i . Hence, we get the following formula: MaxTrans−TXCell f (N h ) ∑ g TXCellg(N h ) * Slot f rameSize + SlotUsed * SlotDuration. (9) If only Depth, the routing tree depth, and MinTXCell, the minimum number of TX cells per pair (sensor node, flow), are known, the bound becomes, taking into account that each sensor node generates its own flow: MaxTrans−MinTXCell MinTXCell * (Depth−h+1) * Slot f rameSize + SlotUsed * SlotDuration. (10) Notice that Equations (9) and (10) generalize Equation (6), which is valid only when MinTXCell ≥ MaxTrans.

Performance Results for a Toy Example
We first consider a toy example of a wireless TSCH network comprising a sink and seven sensor nodes. Each sensor node generates an application message of 27 bytes every 10 s. The slot duration is assumed to be 7.25 ms. The routing tree is depicted in Figure 3, where node A denotes the sink. The value associated with each link j gives P j the probability of successful receipt of a single transmission over that link. We notice that links have heterogeneous qualities, ranging from 0.5 to 0.9. For each of the seven flows generated by a sensor node, we compute the maximum number of transmissions per link for any message of this flow. All the flows, except that generated by B, are multi-hop, which is six flows. In this example, we assume that Assumption 4 is met: MaxTrans is dynamically tuned according to the value computed by MFair or Mopt.   The total number of transmissions per message of any given multi-hop flow obtained by MOpt is always less than or equal to that obtained by MFair. For instance, for R = 0.9 (see Table 3), we observe a gain on the total number of transmissions per message and per flow, which is equal to 1 for the 2-hop flows (i.e., flows originating from C and E), and for the 3-hop flow originating from D. This gain becomes 2 for the 4-hop flow originating from G and 3 for the 4-hop flow originating from H. To summarize the results obtained for the six multi-hop flows considered, we observe five improvements for R = 0.9, four improvements for R = 0.99, two improvements for R = 0.999, four improvements for R = 0.9999 and five improvements for R = 0.99999, leading to a total of 20 improvements over the 30 cases tested. • Even if the total number of transmissions is the same for both methods, the distribution over the links may differ as exemplified in Table 7 Table 4, where the maximum transmission number for link HD is 9, whereas it is 8 for link CB. Since the nodes close to the sink usually have a larger load, decreasing their load improves the network performances. Notice, however, that the maximum number of transmissions on a given link depends on the flow. For instance, the maximum number of transmissions on Link B → A is equal to 11 for all flows, except the flow originating at B, where it is 10, for a targeted R = 0.99999. •

Number of Transmissions and End-To-End Reliability
The number of iterations of MOpt never exceeds h + 1 in all the cases evaluated.

Load-Based Scheduling
Let us see how these transmissions are scheduled, assuming a per-flow approach and more precisely the selection of the Load-based scheduler, which schedules first the flow originating from the most loaded node. We recall that the load of a node is computed as the number of cells needed to transmit its own flows, plus the number of cells needed to receive and transmit the flows originating from its descendants. Figures 5 and 6 depict the Load-based schedule of the seven flows generated by sensor nodes with MFair and MOpt, respectively, assuming a targeted end-to-end reliability of 0.9. In both cases, the Load-based scheduler schedules the flows in the same order, starting with the most loaded node B, the scheduling order is B, C, D, E, H, F, G, although Load(B) = 52 cells with MFair and only 46 with MOpt. The two resulting schedules are optimal in terms of slots needed because node B, the most loaded node, is kept busy in all slots of both schedules. However, MFair requires exactly 52 slots to schedule the 72 transmissions, (see Figure 5), whereas MOpt requires only 46 slots to schedule the 64 transmissions, (see Figure 6). In this simple configuration with seven flows, MOpt allows for saving eight transmissions, which represents an improvement of 11% in the number of transmissions to schedule. In addition, MOpt allows for saving six slots, which reduces by 11.5% the number of slots used.  In the Load-based schedule obtained with MFair and depicted in Figure 5, we have MinSize = 52 slots. Hence, for a slot duration of 7.25 ms, the smallest maximum latency that can be achieved with MFair is equal to (52 + 51) * 7.25 = 0.7465 s. The average energy consumption per slotframe of node B, the greatest loaded node, is equal to: (22 × TXCharge + 30 × RXCharge)/SFDuration, where SFDuration denotes the slotframe duration. With an initial energy of 2821.5 mAh, the default slotframe size of 101 slots and a slot duration of 7.25 ms, this node will have a lifetime of 39 days and a maximum latency of (101 − 1 + 52) * 7.25 * 10 −3 = 1.102 s. To meet a lifetime of one year, the slotframe size should be greater than or equal to 933 slots, with a maximum latency of (933 − 1 + 52) * 7.25 * 10 −3 = 7.0905 s.
With the Load-based schedule obtained with MOpt and depicted in Figure 6, we have MinSize = 46 slots. Hence, for a slot duration of 7.25 ms, the smallest maximum latency that can be achieved with MOpt is equal to (46 + 45) * 7.25 = 0.65975 s, which represents an improvement of 13.67%. It becomes 1.0585 s for the default slotframe size of 101 slots. The average energy consumption of B per slotframe becomes (20 × TXCharge + 26 × RXCharge)/SFDuration, leading to a network lifetime of 44 days for the default slotframe size of 101 slots and a maximum end-to-end latency of (101 − 1 + 46) * 7.25 = 1.0585 s. To meet a lifetime of one year, the slotframe size should be greater than or equal to 830 slots, with a maximum latency of (830 − 1 + 46) * 7.25 * 10 −3 = 6.34375 s, which represents an improvement of 12.38% with regard to MFair. For a slotframe size of 933 slots, MOpt would provide a maximum latency of 7.0905 s, a decrease of 11.77% with regard to MFair and a network lifetime of 410 days instead of 365 for MFair, an increase of 12.40%. Table 8 points out the trade-off between the maximum end-to-end latency and network lifetime by listing the results obtained by MFair and MOpt for different slotframe sizes: 52, 101 and 933 slots. To increase the network lifetime by increasing the slotframe size, provided that the application still generates the same number of messages per slotframe, leads to an increase in maximum latency. MOpt provides a shorter maximum end-to-end latency because of a smaller schedule size (i.e., smaller number of slots used).

Performance Results of a TSCH Network with 50 Nodes
For the performance evaluation of a TSCH network with 50 nodes, we use the 6TiSCH simulator [28], which has been designed for a fast prototyping. In [8], network performances are evaluated on two specific applications running on a same network. In this particular configuration, randomly deployed sensors are only in charge of generating messages that they forward to a close relay, whereas relays are deployed according to a triangular grid. Since a more generic configuration is representative of much more applications, we focus on a generic data gathering application running on random network topologies.

Simulation Parameters
The network topology is a random topology such that any mote has at least three neighbors (i.e., three motes with which PDR ≥ 0.5). For each wireless link, the PDR value is computed according to the Pister-Hack model [29]. The 6TiSCH protocol stack is used with RPL as the routing protocol with the ETX metric and the Load-based scheduler as the scheduling function. RPL, MFair and MOpt use the PDR values computed for the wireless links considered. For each wireless link i, RPL deduces ETX(i) from PDR(i). The simulation parameters used to evaluate the KPIs are those given in Table 9. The deepest routing tree observed in the simulations is 7-hop deep, the shallowest is 4-hop deep, with a median of 5-hop. Notice that, in the simulations done with the 6TiSCH stack, MaxTrans, the maximum number of transmissions of any message on any link is fixed, as in the standardized MAC TSCH protocol. In the 6TiSCH simulations, its value is set to 6. If after six transmissions the acknowledgment is not received, the message is discarded. This behavior has a strong impact on the latency. Since Assumption 4 is not met, the theoretical bound for the maximum latency given in Equation (6) is no longer valid, we use the new theoretical bound given in Section 4.4.
A legitimate question is why a schedule provides a number of Transmission cells (TX) for any given flow greater than MaxTrans, since a message that has not been acknowledged after MaxTrans transmissions is discarded. The justification is provided by the decrease in the average end-to-end latency and the maximum end-to-end latency as we will see in Section 6.3. This decrease is due to a greater number of opportunities to transmit.

End-To-End Delivery Rate
Simulation results about the average end-to-end delivery rate are depicted in Figure 7. As expected, the end-to-end reliability is better with MOpt than with MFair because MOpt maximizes the end-to-end reliability provided for a minimum total number of transmissions per message.

End-To-End Latency
Simulation results about the average end-to-end latency are depicted in Figure 8. As MFair tends to compute a greater number of transmissions than MOpt, the schedule for MFair includes a greater number of cells in a slotframe whose size is kept identical for MOpt and MFair. The more cells, the more chance to send or forward. Since an application packet is generated at a random time point in the slotframe, the more transmission (TX) cells the node has, and the more chance to send the packet immediately. This is why the average end-to-end latency is shorter with MFair. Figure 9 shows the percentage of messages delivered in a single slotframe. It is smaller with MOpt than with MFair due to the greater number of TX cells granted by MFair.  With MFair, 98% of messages reach the sink in one slotframe that is with an end-to-end latency ≤7 s. The gap between MFair and MOpt is smaller than 1%. The maximum end-to-end latency obtained by simulation with MFair and MOpt is illustrated in Figure 11. Since MFair allocates more TX cells and the slotframe size is kept identical for MFair and MOpt, MFair provides a shorter maximum end-to-end latency. For targeted reliabilities ≥0.999, MFair and MOpt provide very close maximum latencies.  Figure 12 depicts the schedule size expressed as the number of slots used in the schedule of MFair and MOpt. Unsurprisingly, the schedule size increases with the targeted end-to-end reliability, due to a great number of transmissions on each link to reach the targeted end-to-end reliability. Whatever the targeted reliability, the schedule size is always shorter with MOpt than with MFair, as expected.  Table 10 compares the theoretical bound for the maximum end-to-end latency obtained by Equation (10) with the simulation results, for different values of the targeted end-to-end reliability and a given random topology. The maximum end-to-end latency obtained by simulation decreases when the targeted end-to-end reliability increases: a greater number of TX cells assigned to nodes give them more opportunities to transmit in a slotframe. With the theoretical bound, the decrease is obtained when the number of TX Cells assigned to a node increases to reach the targeted end-to-end reliability. Whatever the targeted reliability, the bound given by Equation (10) and the simulation results are not close. The theoretical bound could be refined to take more information into account.  Table 10 also provides the end-to-end reliability. As expected, MOpt provides an end-to-end reliability better than MFair. However, for a targeted end-to-end reliability ≥0.999, MFair fails to achieve the requested reliability, when MaxTrans is left fixed to 6 instead of being dynamically tuned according to the values computed by MFair or MOpt.

Duty Cycle
With regard to network lifetime and energy consumption, we consider the busiest node excluding the root (sink) node, since it is supposed to be mains-powered. This busiest node determines the network lifetime. The duty cycle on this busiest node is computed as: Note that Equation (11) is meant for the following comparison, which does not represent actual radio duty cycle. In TSCH, a device does not turn on its radio all the time even during an active slot. Figure 13 depicts the number of cells assigned to the busiest node in one simulation run for each targeted end-to-end reliability.  Table 11 gives the duty cycle on the busiest node with MFair and MOpt. This gives an insight on network lifetime which is determined by the lifetime of the busiest node. As a consequence, for high targeted reliabilities (i.e., ≥0.999), MFair and Mopt give close end-to-end latencies at a greater energy cost for MFair.

Impact of MaxTrans, a TSCH Parameter
We now study the impact of MaxTrans the maximum number of transmissions of a message in TSCH, whose default value is 6. We set MaxTrans to the value of 20, which is greater than the maximum number of transmissions on each link computed by MFair or MOpt. We run the same simulations as previously and evaluate the impact on the end-to-end reliability and the end-to-end latency.
As expected, the end-to-end reliability depicted in Figure 14 is increased both with MOpt and MFair, when the maximum number of transmissions per message in TSCH is set to 20. It is very close to 100%, for all the values of the targeted reliability tested. Notice, however, that increasing MaxTrans may lead to a network overload resulting in violations of the maximum acceptable latency L. To avoid that, messages whose lifetime is greater than or equal to L should not be transmitted and should be discarded.  The consequence of increasing MaxTrans to 20 is an increase in the average latency for both MFair and MOpt, as shown in Figure 15. The shortest average latency is still provided by MFair. However, the gap decreases for high targeted end-to-end reliabilities (i.e., ≥0.999). Using a value of MaxTrans greater than the value required by MFair or MOpt on a given link may result in an increase in end-to-end latency and energy consumption for a very strongly limited gain in end-to-end reliability.

Comparison with Kausa
Since Kausa [8] is a well-known centralized scheduler that takes into account the unreliability of wireless links and adopts a per-flow approach to meet KPIs, we now compare MOpt to Kausa when run in a 6TiSCH network.

Our Kausa Implementation in the 6TiSCH Simulator
In the 6TiSCH simulator, as in any 6TiSCH implementation compliant with the standard, routing is done by the RPL protocol with the ETX metric. We simulated Kausa on the 6TiSCH simulator and ran it with the generic configuration defined in Section 6.1. We recall that the value of MaxTrans is fixed to 6. In these conditions, the main differences between Kausa and our approach are: • Kausa selects first the flow requesting the highest end-to-end reliability, then the shortest end-to-end latency and finally the flow originating from the farthest node of the sink. Our approach selects the flow originating from the most loaded sensor node (i.e., the sensor node needing the largest number of Tx+Rx cells). • For any flow f , Kausa starts by assigning cells to the most loaded node visited by f . Then, Kausa goes backward to the source of f . Finally, Kausa goes upward from the most loaded node to the sink. In our approach, cells are assigned to nodes visited by f in a cascading way from the source of f up to the sink. It follows that our approach is easier to implement. • For any flow f , Kausa minimizes the number of retransmissions on the most loaded node, whereas we minimize the total number of retransmissions on the path of f .
Notice that we implement an optimization of Kausa for 6TiSCH, enabling any node to use its next Tx cell to transmit any message to its parent, as done in our approach. This is not true in the published version of Kausa, where any node N is not allowed to use cells assigned to a flow f for another flow f visiting N. Figure 16 depicts the average end-to-end latency obtained by MOpt and Kausa. They both provide close values. However, MOpt provides a much smaller variance of the average end-to-end latency than Kausa, making it more predictable, even for high targeted reliabilities.

End-To-End Latency
The same conclusion applies to the maximum end-to-end latency depicted in Figure 17. The predictability of performance is a property sought by industrial applications.

End-To-End Reliability
With regard to the end-to-end PDR, we observe in Figure 18 that both MOpt and Kausa ensure a PDR greater than the targeted one when it belongs to the interval [0.9, 0.999]. For greater values, the PDR is not met because the value of MaxTrans = 6 is too small to reach the targeted end-to-end reliability. In addition, MOpt tends to provide a greater median value than Kausa.

Duty Cycle
Since the energy consumption can be deduced from the duty cycle of the busiest node, we compare the number of cells scheduled at the busiest node by Kausa and MOpt, as depicted in Figure 19. Both provide close values for a targeted end-to-end reliability less than or equal to 0.999. For a higher reliability, Kausa provides a smaller number of cells due to a number of transmissions scheduled per message less than or equal to MaxTrans, whereas MOpt may schedule a larger number, explaining this result.

Schedule Size
The schedule size, illustrated in Figure 20, is a little greater with MOpt than with Kausa. As a consequence, a node has fewer opportunities to transmit with Kausa than with MOpt, leading to a higher and less predictable end-to-end latency.

Conclusions
TSCH is a very promising technology for the IoT. It is now necessary to evaluate the performances it can provide to IoT applications. These performances are evaluated by means of three KPIs: maximum end-to-end latency L, end-to-end reliability R and network lifetime T. This IoT network has to guarantee that at least R% of messages generated by sensors are delivered to the sink with a latency less than or equal to L, while the network lifetime is at least T. In this paper, we have proposed two methods MFair and MOpt to achieve a targeted end-to-end reliability taking into account the unreliability of wireless links. The trade-offs between end-to-end latency, network lifetime and end-to-end reliability have been pointed out. In addition, we have shown that minimizing the total number of transmissions of a message to reach the sink with MOpt saves 12% of network lifetime in a small network of eight nodes, assuming a Load-based schedule of flows. As expected, MOpt provides a better end-to-end reliability and a longer network lifetime than MFair. However, the average end-to-end latency provided by MFair is smaller. These results scale up to a 50-node network representative of real deployments, as shown by simulation results obtained with the 6TiSCH simulator. However, using a fixed maximum number of transmissions MaxTrans equal to the default value of the TSCH protocol, instead of using the value computed by MFair or MOpt for the link and the flow considered, may lead to a violation of the targeted end-to-end reliability: messages that have not been acknowledged after MaxTrans transmissions are discarded. Compared to Kausa, a KPI-aware, state-of-the-art scheduler, our approach is simpler to implement and ensures more predictable end-to-end performances, which is an essential property for industrial applications.
Author Contributions: Y.T. implemented the MFair and MOpt algorithms, as well as the Load-based scheduler and Kausa in the 6TiSCH simulator. He designed the performance evaluation and performed it for many random network configurations and four values of targeted end-to-end reliability. He drew the graphs corresponding to simulation results and analyzed them. P.M. designed MOpt and compared MFair and MOpt on the toy example. She also computed the theoretical values of the end-to-end latency and wrote the paper.
Funding: This research received no external funding.