A Scheme to Smooth Aggregated Traffic from Sensors with Periodic Reports

The possibility of smoothing aggregated traffic from sensors with varying reporting periods and frame sizes to be carried on an access link is investigated. A straightforward optimization would take O(pn) time, whereas our heuristic scheme takes O(np) time where n, p denote the number of sensors and size of periods, respectively. Our heuristic scheme performs local optimization sensor by sensor, starting with the smallest to largest periods. This is based on an observation that sensors with large offsets have more choices in offsets to avoid traffic peaks than the sensors with smaller periods. A MATLAB simulation shows that our scheme excels the known scheme by M. Grenier et al. in a similar situation (aggregating periodic traffic in a controller area network) for almost all possible permutations. The performance of our scheme is very close to the straightforward optimization, which compares all possible permutations. We expect that our scheme would greatly contribute in smoothing the traffic from an ever-increasing number of IoT sensors to the gateway, reducing the burden on the access link to the Internet.


Introduction
The Internet of Things (IoT) is large scale by nature. This is not only manifested by the large number of connected devices but also by the huge volume of traffic that must be accommodated [1]. With the exponential growth of IoT devices [2,3], IoT networks will have to face the growth of traffic with the increasing amount of data exchanged between IoT sensors and servers [4]. For instance, in smart cities and smart buildings, IoT sensors with periodic transmission are used on a large scale [5,6].
When periodic transmissions from a large number of wireless sensors with different periods and frame sizes are aggregated in the gateway to be carried on the access link as in Figure 1, the instant aggregated traffic can be bursty and far exceeds the average of the aggregated traffic [7]. To avoid congestion during possible bursty intervals, access link bandwidth should be much higher than needed for non-bursty traffic with the same average. The primary motivation for our work is to make the instant aggregated traffic as close to the average as possible by adjusting the offsets for individual sensors, thus reducing the access-link bandwidth needed to avoid instant congestion. Figure 2b,c in Section 3, Problem Definition, show two different aggregation scenarios for two sensors with periods 4 and 6. The average of the aggregated traffic during 12 (least common multiple (LCM) of 4 and 6) timeslots is the same for both scenarios, whereas the maximum of instant aggregated traffic load is 2 for (b) and 1 for (c). In Figure 2c, the offset for sensor 2 is changed from 0 to 1 (O 2 = 1). Our objective is to arrange O i (i = 0, 1, . . . , n − 1) for n sensors in such a way as to minimize the maximum of the instant number of aggregated unit traffic loads during the L timeslots, where L is the LCM of P 0 , P 1 , . . . , P n−1 , as the unit loads from n sensors are aggregated.
The efficiency of resource allocation and quality of service (QoS) that IP networks provide depends critically on effective traffic management [8]. Although a few studies have been performed, they mainly focus on traffic shaping [9] or traffic policing [10] by controlling the outbound gateway [11,12].
Traffic shaping, also known as packet shaping, is a network-management technique that delays certain types of packets to optimize overall network performance [13]. For example, Bell Canada, (Montreal, Canada) revealed that it throttles traffic from Peer to Peer (P2P) file-sharing applications in its broadband access networks to 256 Kbps per flow [14].
Traffic shaping is also applied to the aggregate traffic produced by multiple network flows. For instance, Comcast handles congestion in its access network by throttling users who consume a large portion of their provisioned access bandwidth over a 5-min time window [14,15].
Because these approaches focus on controlling the throughput, the gateway must monitor the traffic continuously, and this can be burdensome for the gateway. Additionally, this approach can introduce delay due to queuing, particularly deep queues [13].
Traffic policing allows us to control the maximum rate of traffic transmitted or received on an interface. Traffic policing is often configured on interfaces at the edge of a network to limit traffic in or out of the network. In most traffic-policing configurations, traffic that falls within the rate parameters is transmitted, whereas traffic that exceeds the parameters is dropped or transmitted with a different priority [16]. This approach drops excess packets (when configured), throttling Transmission Control Protocol (TCP) window sizes and reducing the overall output rate of affected traffic streams. Overly aggressive burst sizes may lead to excess packet drops and throttle the overall output rate [10].
Another method is to change the quality of the transmitted data in real time [17]. However, though suitable for voice-or video-data transmission, this method is not appropriate for sensor-data transmission.
The solution we propose is to distribute the periodic traffic from different sensors as evenly as possible to the access link in time and thus minimize the maximum of the instant traffic load on the access link. This can be achieved by scheduling sensors with offsets [18]. Precisely, the first instance of a stream of periodic frames is released with a delay, called the offset, with regard to a reference point, which is the first time at which the sensor is ready to transmit. Subsequent frames of the streams are then sent periodically, with the first transmission as the time origin. M. Grenier et al. proposed a scheme that schedules messages with offsets in a controller area network (CAN) to enhance CAN network performance [19]. The offset of each stream is chosen such that the release of its first frame is as far as possible from the other frames already scheduled. Similarly, we assume that if IoT sensors periodically transmit frames with optimal permutation of offsets, this can minimize the maximum of the instant traffic.
Goossens [20] has shown the problem of choosing the optimal permutation of offsets to have a complexity that grows exponentially with the periods of the tasks, and there is no known optimal solution that can be used in practical cases. Thus, in [20], only a few distinct values for the periods are allowed. A straightforward optimization would take O(p n ) time, where n, p denote the number of sensors and size of periods, respectively.
In this paper, we propose a heuristic scheme that takes O(np) time. Our heuristic scheme performs local optimization sensor by sensor, starting with smallest to the largest periods. This is based on an observation that sensors with large offsets have more choices in offsets to avoid traffic peaks than sensors with smaller periods. A MATLAB (R2015b, MathWorks, Natick, MA, USA) simulation shows that our scheme excels the known scheme by M. Grenier et al. in a similar situation (aggregating periodic traffic in a CAN) for almost all possible permutations. The performance of our scheme is very close to the straightforward optimization, which compares all possible permutations. We expect our scheme will greatly contribute in smoothing the traffic from the ever-increasing number of IoT sensors to the gateway, reducing the burden on the access link to the Internet.
The rest of our paper is organized as follows: a wireless IoT sensor network model is provided in Section 2. The problem definition and proposed scheme are described in Sections 3 and 4, respectively. The performance evaluation and time complexity of the proposed scheme are provided in Sections 5 and 6, respectively, and Section 7 concludes the paper.

Wireless IoT Sensor Network Model
We model a wireless sensor network, as shown in Figure 1. A number of wireless sensors are connected to the gateway, and all traffic from the sensors is aggregated by the gateway to be carried on the access link to the Internet. The gateway aggregates sensor data, which is carried on the access link to the Internet. Table 1 summarizes the notations and variables used in this paper. The rest of our paper is organized as follows: a wireless IoT sensor network model is provided in Section 2. The problem definition and proposed scheme are described in Sections 3 and 4, respectively. The performance evaluation and time complexity of the proposed scheme are provided in Sections 5 and 6, respectively, and Section 7 concludes the paper.

Wireless IoT Sensor Network Model
We model a wireless sensor network, as shown in Figure 1. A number of wireless sensors are connected to the gateway, and all traffic from the sensors is aggregated by the gateway to be carried on the access link to the Internet. The gateway aggregates sensor data, which is carried on the access link to the Internet. Table 1 summarizes the notations and variables used in this paper.  (j = 0, 1,…, L − 1) One-dimensional array, the length of which is L. B[j] represents the number of unit traffic loads in timeslot j.

TEMP[j]
(j = 0, 1,…, L − 1) One-dimensional array, the length of which is L. (j = 0, 1,…, L − 1) One-dimensional array, the incremental traffic loads from sensor i are loaded into this array We assume a network in which sensors are characterized by the tuple sensor i = (Oi, Pi, Si). We further assume that we can control all sensors' bandwidth evenly. The sensors' offset exists in intervals [0, Pi-Si]. Transmission is periodic, and thus, all sensors i (i = 0, 1,…, n − 1) transmit frames repeatedly at times Oi + k*Pi (k is a non-negative integer).   One-dimensional array, the incremental traffic loads from sensor i are loaded into this array We assume a network in which sensors are characterized by the tuple sensor i = (O i , P i , S i ). We further assume that we can control all sensors' bandwidth evenly. The sensors' offset exists in intervals [0, P i − S i ]. Transmission is periodic, and thus, all sensors i (i = 0, 1, . . . , n − 1) transmit frames repeatedly at times O i + k*P i (k is a non-negative integer).

Problem Definition
The n sensors, which transmit frames periodically, are connected to a gateway. Each sensor i is characterized by 3-tuple (P i , S i , O i ), (i = 0, 1, . . . , n − 1), where P i , S i , and O i denote the period, frame size, and offset from the start of the period, respectively. Figure 2a illustrates how a 3-tuple (P i , S i , O i ) is used. Sensor i generates a packet of size 2 (S i = 2) at the offset of 3 (O i = 3) from the start of each period of length 6 (P i = 6). We quantize the traffic from sensor i to be carried on the access link in such a way that transmission of a frame of S i unit loads occupies S i successive timeslots, contributing one unit load to each timeslot. We further assume that we may change O i (a non-negative integer) as long All traffic from the n sensors is aggregated by the gateway to be carried on an access link to the Internet. Figure 2b,c illustrates how unit loads from sensor 1 and sensor 2 are aggregated. Because the same pattern of aggregation is repeated every LCM of P 1 and P 2 , we only show 12 timeslots, which correspond to the LCM of 4 and 6. Figure 2b shows the aggregation of unit loads from sensors 1 (P 1 = 4, S 1 = 1, O 1 = 0) and 2 (P 2 = 6, S 2 = 1, O 2 = 0). Note that two unit loads are to be carried in timeslot 0, which implies in timeslot 0 that the access link needs two times the bandwidth needed in timeslots 4, 6, or 8. The maximum of instant aggregated traffic among 12 (LCM of P 1 and P 2 ) timeslots is 2. For this aggregation scenario, we need to assign enough bandwidth for the access link to accommodate the two unit loads to avoid congestion. Now, consider another aggregation scenario described in Figure 2c in which the offset of sensor 2 is changed to 1 (O 2 = 1). Note that the maximum of instant aggregated traffic among 12 timeslots is now reduced to 1, needing half the bandwidth needed in Figure 2b. This illustrates how we reduce the bandwidth needed for an access link to carry aggregated traffic from sensors simply by coordinating their individual offsets.
Our objective is to arrange O i (i = 0, 1, . . . , n − 1) for n sensors in such a way to minimize the maximum of the instant number of aggregated unit traffic loads during the L timeslots where L is the LCM of P 0 , P 1 , . . . , P n−1 , as the unit traffic loads from n sensors are aggregated. Because O i may have P i − S i + 1 possible choices, a straightforward optimization should compare all ∏ n−1 i=0 (P i − S i + 1) cases. We will show our efficient heuristic algorithm, which compares only ∑ n−1 i−0 (P i − S i + 1) cases, in Section 4.

Problem Definition
The n sensors, which transmit frames periodically, are connected to a gateway. Each sensor i is characterized by 3-tuple (Pi, Si, Oi), (i = 0, 1,…, n − 1), where Pi, Si, and Oi denote the period, frame size, and offset from the start of the period, respectively. Figure 2a illustrates how a 3-tuple (Pi, Si, Oi) is used. Sensor i generates a packet of size 2 (Si = 2) at the offset of 3 (Oi = 3) from the start of each period of length 6 (Pi = 6). We quantize the traffic from sensor i to be carried on the access link in such a way that transmission of a frame of Si unit loads occupies Si successive timeslots, contributing one unit load to each timeslot. We further assume that we may change Oi (a non-negative integer) as long as 0 ≤ Oi ≤ Pi − Si.
All traffic from the n sensors is aggregated by the gateway to be carried on an access link to the Internet. Figure 2b,c illustrates how unit loads from sensor 1 and sensor 2 are aggregated. Because the same pattern of aggregation is repeated every LCM of P1 and P2, we only show 12 timeslots, which correspond to the LCM of 4 and 6. Figure 2b shows the aggregation of unit loads from sensors 1 (P1 = 4, S1 = 1 O1 = 0) and 2 (P2 = 6, S2 = 1 O2 = 0). Note that two unit loads are to be carried in timeslot 0, which implies in timeslot 0 that the access link needs two times the bandwidth needed in timeslots 4, 6, or 8. The maximum of instant aggregated traffic among 12 (LCM of P1 and P2) timeslots is 2. For this aggregation scenario, we need to assign enough bandwidth for the access link to accommodate the two unit loads to avoid congestion. Now, consider another aggregation scenario described in Figure 2c in which the offset of sensor 2 is changed to 1 (O2 = 1). Note that the maximum of instant aggregated traffic among 12 timeslots is now reduced to 1, needing half the bandwidth needed in Figure 2b. This illustrates how we reduce the bandwidth needed for an access link to carry aggregated traffic from sensors simply by coordinating their individual offsets.
Our objective is to arrange Oi (i = 0, 1,…, n − 1) for n sensors in such a way to minimize the maximum of the instant number of aggregated unit traffic loads during the L timeslots where L is the LCM of P0, P1,…, Pn−1, as the unit traffic loads from n sensors are aggregated. Because Oi may have Pi − Si + 1 possible choices, a straightforward optimization should compare all ∏ ( Pi − Si + 1) cases. We will show our efficient heuristic algorithm, which compares only ∑ ( Pi − Si + 1) cases, in Section 4.  The following is a more formal definition of our problem. The following is a more formal definition of our problem. L is the LCM of P 0 , P 1 , . . . , P n−1 . If frames of sensor i with (P i , S i , O i ) are carried on the access line, then B[j] (j = 0, 1, . . . , L − 1) is updated as follows. Note that L P i frames of size S i are carried Our objective is to find O i (i = 0, 1, . . . , n − 1), which minimizes the max(B) (the largest elements of the array B) after the following is performed.

Proposed Scheme
We use the notations in Table 1 to describe our scheme. Sensor i is characterized by a 3-tuple (P i , S i , O i ), (i = 0, 1, . . . , n − 1) where P i , S i , and O i denote the period, frame size, and offset from the start of the period, respectively. Without a loss of generality, we assume that P i ≤ P i+1 for i = 0, 1, . . . , n − 2.

Description of the Proposed Algorithm
The following is a brief description of our scheme.
(1) Sort n sensors in such a way that P i ≤ P i+1 for i = 0, 1, . . . , n − 2 (4) for i = 0 to n − 1 // for all n sensors with an increasing order of period find t ∈ [0, P i − S i ], which minimizes the max(TEMP) and std(TEMP) after the following operation In Algorithm 1 we now provide a pseudocode of our scheme using the notations in Table 1.
for i = 0 to n -1 // find best offset for sensor i, starting with the smallest to the largest period 6.
for k = 0 to L/P i -1 // L P i repetition of P i 43.
for q = 0 to S i -1 // frame of size S i 44.
The offset is determined by find_best_offset subroutine. First, the array ADD_ONE[j] is declared and all elements are initialized to 0 (j = 0, 1, . . . , L − 1). TEMP[j] and ADD_ONE[j] are used to try each offset. ADD_ONE[j] represents the incremental traffic with each trial offset, and TEMP[j] represents the incremented traffic with the trial offset (refer to TEMP[j] = B[j] + ADD_ONE[j], (j = 0, 1, . . . , L − 1)). In other words, ADD_ONE[j] will be circularly shifted right with increasing t (t ∈ Z and 0 ≤ t ≤ P i − S i ). The t that minimizes the maximum value and the standard deviation of TEMP[j] (j = 0, 1, . . . , L − 1) is returned as O i . The primary goal of the proposed algorithm is to minimize the maximum value of B[j]. However, if our search for the offset stops at the first minimum of max(TEMP), there may be a chance that some gaps will not be filled, causing the minimum of the final max(B) to increase in the next or a later round. Consider Figure 3, in which P i = 10 and S i = 4. The minimum of the max(TEMP) is two for O i = 0, 1, 2, 3, 4, 5, and 6. If we choose a number other than 6 for O i , we will end up with the gap in time slot 9 and/or 8. Based on this reasoning, we choose an offset that not only minimizes max(TEMP) but also std(TEMP). Our algorithm uses max(TEMP) and std(TEMP) together (as shown in the find_best_offset subroutine in our pseudocode) to fill the possible gaps in P i − (P i mod S i ), P i − (P i mod S i ) + 1, . . . , (P i − 1)th timeslot as best as possible.  The standard deviation std(TEMP) is calculated by the following formula: Figure 4 illustrates our point. The solid line in Figure 4 shows the variation of unit traffic loads per timeslot when max(TEMP) is only used, whereas the dotted line shows the variation when max(TEMP) and std(TEMP) are used together. Note that the variation of the dotted line is far smoother than that of the solid line. The final max(B) for the dotted line is 5, whereas the final max(B) for the solid line is 6. The standard deviation std(TEMP) is calculated by the following formula: (1) Figure 4 illustrates our point. The solid line in Figure 4 shows the variation of unit traffic loads per timeslot when max(TEMP) is only used, whereas the dotted line shows the variation when max(TEMP) and std(TEMP) are used together. Note that the variation of the dotted line is far smoother than that of the solid line. The final max(B) for the dotted line is 5, whereas the final max(B) for the solid line is 6.  However, std(TEMP) alone is not enough to choose the best offset to minimize the final max(B).  Figure 5). However, max(TEMP) for However, std(TEMP) alone is not enough to choose the best offset to minimize the final max(B). Figure 5 illustrates this point. For an offset of 0, max(TEMP) is 4 and std(TEMP) is 1.38, whereas for an offset of 2, max(TEMP) is 5 and std(TEMP) is 1.26. Lower std(TEMP) does not mean lower max(TEMP). If we choose the offset with the smallest std(TEMP), we would choose an offset of 2, which results in a max(TEMP) of 5 (bottom of the right side in Figure 5). However, max(TEMP) for an offset of 0 is 4, which is lower than 5 (top of the right side in Figure 5). However, std(TEMP) alone is not enough to choose the best offset to minimize the final max(B). Figure 5 illustrates this point. For an offset of 0, max(TEMP) is 4 and std(TEMP) is 1.38, whereas for an offset of 2, max(TEMP) is 5 and std(TEMP) is 1.26. Lower std(TEMP) does not mean lower max(TEMP). If we choose the offset with the smallest std(TEMP), we would choose an offset of 2, which results in a max(TEMP) of 5 (bottom of the right side in Figure 5). However, max(TEMP) for an offset of 0 is 4, which is lower than 5 (top of the right side in Figure 5).

Comparison against a Previous Work by M. Grenier et al.
Assigning offsets for "traffic shaping" is a problem that has been addressed in [20,21] concerning the preemptive scheduling of tasks. M. Grenier et al.'s [19] work is the closest to our Based on this observation, we compare max(TEMP) and std(TEMP) when choosing the best offset. After the offset is determined, we update array B with the incremental traffic loads from sensor i. The subroutine ADD_BLOCK creates a one-dimensional array BLOCK[j], j = 0 to L − 1, which is initialized to all 0s. The incremental traffic loads from sensor i are loaded into BLOCK[j], j = 0 to L − 1 as follows: for k = 0 to L P i − 1 and for q = 0 to S i − 1, BLOCK[O i + k*P i + q] is set to 1. Then, the summation B[j] = B[j] + BLOCK[j] is performed for the update by update B with the traffic from sensor i.

Comparison against a Previous Work by M. Grenier et al.
Assigning offsets for "traffic shaping" is a problem that has been addressed in [20,21] concerning the preemptive scheduling of tasks. M. Grenier et al.'s [19] work is the closest to our scheme to the best of our knowledge. It adjusts offsets for messages in such a way that spreads the messages over time as much as possible on the CAN (a shared bus for the transmission of messages in a car) to minimize worst case response time (WCRT).
Automotive message sets have certain specific characteristics (a small number of different periods, etc.) shared by the periodic frames from IoT sensors. However, the aim of our scheme is to minimize the demand for instant bandwidth, reducing the burden on the access link that connects the gateway (collecting traffic from many periodic frames from IoT sensors) to the Internet.
Here, we implement M. Grenier et al.'s scheme for comparison against our scheme. A brief description of their scheme is provided below for convenience: We assume that the streams are sorted by increasing value of their period, i.e., k < h implies T k ≤ T h . The algorithm sets iteratively the offsets of streams from f 1 to f n . Let us consider that the stream under analysis is f k .
Set the offset for f k to maximize the distance between its first release f k,1 , and the release right before and right after f k,1 . Concretely, ∀i ∈ N and r k + i· T k g ≤ T max g , (g: granularity of offsets).

An Illustrative Example of the Proposed Algorithm
We provide an illustrative example of our scheme in Figure 6. In this example, we assume that the current traffic load is represented as an array B[j] (j = 0, 1, . . . , 7) in Figure 6. A block represents a unit traffic load. For example, B[0], B [1], and B [2] have three, two, and three units of traffic loads, respectively. We show the process of determining the offset for a sensor with P i = 8 and S i = 3, as in Figure 7.
the gateway (collecting traffic from many periodic frames from IoT sensors) to the Internet.
Here, we implement M. Grenier et al.'s scheme for comparison against our scheme. A brief description of their scheme is provided below for convenience: We assume that the streams are sorted by increasing value of their period, i.e., k < h implies Tk ≤ Th. The algorithm sets iteratively the offsets of streams from f1 to fn. Let us consider that the stream under analysis is fk.
1. Set the offset for fk to maximize the distance between its first release fk,1, and the release right before and right after fk,1. Concretely, (a) Look for the smallest load in the interval [0, Tk]; (b) Look for one of the longest least-loaded intervals in [0, Tk] for which ties are broken arbitrarily.
The first (resp. last) possible release time of the interval is noted by Bk (resp. Ek); (c) Set the offset Ok in the middle of the selected interval; the corresponding possible release time is rk; (d) Update the release array R to store the frames of fk released in the interval [0, Tmax]: ∀i ∈ N and rk + · ≤ , (g: granularity of offsets).

An Illustrative Example of the Proposed Algorithm
We provide an illustrative example of our scheme in Figure 6. In this example, we assume that the current traffic load is represented as an array B[j] (j = 0, 1,…, 7) in Figure 6. A block represents a unit traffic load. For example, B[0], B [1], and B [2] have three, two, and three units of traffic loads, respectively. We show the process of determining the offset for a sensor with Pi = 8 and Si = 3, as in Figure 7.

Performance Evaluation
We present the simulation results of the proposed scheme. Furthermore, we compare the simulation results against M. Grenier et al. and a base implementation of random offset assignment. Thus, we implement three different simulations: (1) random offset (base implementation); (2) M. Grenier et al.; and (3) the proposed scheme.
When sensors periodically transmit frames with random offsets (a base implementation), the max(B) may differ with regard to individual instances of the simulation. We performed 100 iterations to obtain the confidence interval and the average for max(B). The confidence interval is obtained by using the max(B)'s mean denoted by µ and the standard deviation denoted by σ. Let B r denote the array B in iteration r, r = 0, 1, . . . , R − 1, where R is the number of iterations (R = 100 in our simulation). The µ and σ are obtained as in Equations (2) and (3): The confidence interval with a 95% confidence level can be obtained using normal distribution, as in Equation (4) [22]: The simulation is performed with varying values of n (number of sensors), P i (transmission period of sensors), and S i (size of frames for sensors).
[n: variable, P i and S i : fixed] We compare the performances of three schemes: random offset (base), M. Grenier et al. and the proposed scheme. In the case of random offsets, which exhibit different results on each iteration, the mean and confidence intervals for max(B) are shown. Figure 8 shows that the performance of the three schemes with the max(B) for the random offsets (base implementation) is set to 100%. Figure 8 shows that our scheme results in the minimum max(B) and is followed by M. Grenier et al. and then the random offset (from 33%, 66%, and 100% for n = 10 to 56%, 65%, and 100% for n = 300). As the number of sensors increases, the relative differences tend to diminish due to statistical multiplexing. Our scheme excels over other schemes in that the minimum max(B) implies the lowest burden on the access link.
[n, P i : fixed, S i : variable] The simulation is performed with an increasing frame size 1 to 10. Figure 9 compares the three schemes, with the proposed scheme exhibiting the minimum. When the frame size is as small as 1 or 2, our scheme shows similar or smaller max(B) compared with M. Grenier et al. because, for small frames, it does not help very much in reducing max(B) to try all the possible offsets within its period to find the best timeslot to fit the frame. However, as the frame size increases, the efficiency of the proposed scheme excels that of M. Grenier et al. This is more evident for greater periods (the difference is greater for Figure 9b  different results on each iteration, the mean and confidence intervals for max(B) are shown. Figure 8 shows that the performance of the three schemes with the max(B) for the random offsets (base implementation) is set to 100%. Figure 8 shows that our scheme results in the minimum max(B) and is followed by M. Grenier et al. and then the random offset (from 33%, 66%, and 100% for n = 10 to 56%, 65%, and 100% for n = 300). As the number of sensors increases, the relative differences tend to diminish due to statistical multiplexing. Our scheme excels over other schemes in that the minimum max(B) implies the lowest burden on the access link. [n, Pi: fixed, Si : variable] The simulation is performed with an increasing frame size 1 to 10. Figure 9 compares the three schemes, with the proposed scheme exhibiting the minimum. When the frame size is as small as 1 or 2, our scheme shows similar or smaller max(B) compared with  [n,: fixed, Pi, Si: variable] Figure 10 compares the three schemes with each third of the sensors having different Pi, with the frame size increasing from 1 to 10.
It is interesting to note that M. Grenier et al. performs better than the random offset (base scheme) with smaller frames; however, it performs worse than the random offset scheme as the frame size increases. This becomes more evident with n increasing from 30 to 300. This implies that M. Grenier et al. is only applicable for small-sized frames. The proposed schemes exhibit stable gain  Figure 10 compares the three schemes with each third of the sensors having different P i , with the frame size increasing from 1 to 10.
It is interesting to note that M. Grenier et al. performs better than the random offset (base scheme) with smaller frames; however, it performs worse than the random offset scheme as the frame size increases. This becomes more evident with n increasing from 30 to 300. This implies that M. Grenier Figure 11 compares the three schemes with each fifth of the sensors having different Pi with frame sizes, increasing from 1 to 10. We find a similar tendency as in Figure 10.  Figure 11 compares the three schemes with each fifth of the sensors having different P i with frame sizes, increasing from 1 to 10. We find a similar tendency as in Figure 10.  Figure 11 compares the three schemes with each fifth of the sensors having different Pi with frame sizes, increasing from 1 to 10. We find a similar tendency as in Figure 10. [n: fixed, Pi, Si: variable (Pi, Si changes together as a pair)] Figure 12 shows that the proposed scheme shows robust gain against both schemes, whereas the performance gain for a random offset or M. Grenier et al. against each other depends on the mixture of periods and frame sizes. [n: fixed, P i , S i : variable (P i , S i changes together as a pair)] Figure 12 shows that the proposed scheme shows robust gain against both schemes, whereas the performance gain for a random offset or M. Grenier et al. against each other depends on the mixture of periods and frame sizes. [n: fixed, Pi, Si: variable (Pi, Si changes together as a pair)] Figure 12 shows that the proposed scheme shows robust gain against both schemes, whereas the performance gain for a random offset or M. Grenier et al. against each other depends on the mixture of periods and frame sizes.   Based on the above simulation, we conclude that the efficiency of our scheme is very robust in smoothing traffic on the access link, which carries aggregated traffic from sensors with periodic transmission (possibly different periods and/or frame sizes or a various mixture).

Time Complexity of the Proposed Scheme
We consider the time complexity of determining the optimal permutation of offsets for all of the sensors. Sensor i has Pi − Si + 1 choices of offsets because the offset can be chosen from 0, 1, 2, …, Pi − Si. If we choose any offset greater than Pi − Si, say Pi − Si + x, x > 0, then the frame of size Si needs to be transmitted beyond the current period of size Pi (Pi − Si + x + Si > Pi for x > 0). Here, we assume that a frame of size Si consists of Si unit traffic loads and takes Si timeslots. Because the offset can be chosen independently for each sensor, we have ∏ ( Pi − Si + 1) permutations. Straightforward optimization would evaluate all of these permutations to obtain the best performance in smoothing    Based on the above simulation, we conclude that the efficiency of our scheme is very robust in smoothing traffic on the access link, which carries aggregated traffic from sensors with periodic transmission (possibly different periods and/or frame sizes or a various mixture).

Time Complexity of the Proposed Scheme
We consider the time complexity of determining the optimal permutation of offsets for all of the sensors. Sensor i has Pi − Si + 1 choices of offsets because the offset can be chosen from 0, 1, 2, …, Pi − Si. If we choose any offset greater than Pi − Si, say Pi − Si + x, x > 0, then the frame of size Si needs to be transmitted beyond the current period of size Pi (Pi − Si + x + Si > Pi for x > 0). Here, we assume that a frame of size Si consists of Si unit traffic loads and takes Si timeslots. Because the offset can be chosen independently for each sensor, we have ∏ ( −1 =0 Pi − Si + 1) permutations. Straightforward optimization would evaluate all of these permutations to obtain the best performance in smoothing Based on the above simulation, we conclude that the efficiency of our scheme is very robust in smoothing traffic on the access link, which carries aggregated traffic from sensors with periodic transmission (possibly different periods and/or frame sizes or a various mixture).

Time Complexity of the Proposed Scheme
We consider the time complexity of determining the optimal permutation of offsets for all of the sensors. Sensor i has P i − S i + 1 choices of offsets because the offset can be chosen from 0, 1, 2, . . . , P i − S i . If we choose any offset greater than P i − S i , say P i − S i + x, x > 0, then the frame of size S i needs to be transmitted beyond the current period of size P i (P i − S i + x + S i > P i for x > 0). Here, we assume that a frame of size S i consists of S i unit traffic loads and takes S i timeslots. Because the offset can be chosen independently for each sensor, we have ∏ n−1 i=0 (P i − S i + 1) permutations. Straightforward optimization would evaluate all of these permutations to obtain the best performance in smoothing the aggregated traffic. Its time complexity can be represented by O(p n ) time, for which n, p denote the number of sensors and size of periods, respectively.
In contrast, our heuristic scheme evaluates only ∑ n−1 i−0 (P i − S i + 1) permutations. The reason for this reduced complexity is that we do not evaluate all possible permutations of ∏ n−1 i=0 (P i − S i + 1). We rather optimize sensor by sensor, starting with the sensor with the smallest period. We evaluate P 0 − S 0 + 1 choices of offsets for sensor 0. For each possible offset, we can evaluate the incremental unit traffic loads contributed by sensor 0 and choose the best offset O 0 that best smoothes the resulting traffic (i.e., minimizes the maximum of the instant number of aggregated unit traffic loads during the L timeslots in which L is the LCM of P 0 , P 1 , . . . , P n−1 as the unit loads from n sensors are aggregated). This is repeated for sensors 1, 2, . . . , in sequence. Because each sensor i needs P i − S i + 1 evaluations, our scheme evaluates ∑ n−1 i−0 (P i − S i + 1) permutations in total, which can be represented by the time complexity of O(np).
Our heuristic scheme greatly saves computation time. For example, if there are five sensors with a P 1 = 15, P 2 = 25, P 3 = 25, P 4 = 40, P 5 = 70 and S 1 = 2, S 2 = 3, The difference in max(B) of our scheme against the straightforward optimization was zero in the above particular simulation. We expect the difference would remain zero or be very small for most cases. Our scheme determines offsets starting with sensors with the smallest to the largest periods. On each iteration with a sensor, we thoroughly investigate all possible offsets to find the offset that minimizes the current max(B) and std(B). Note that the sensors with smaller periods have fewer choices in offsets and that the sensors with larger periods have more choices in offsets. Thus, sensors with large offsets have more choices in offsets to avoid traffic peaks than those with smaller periods. We did not consider many cases of straightforward optimization because a few instances of straightforward optimization would take a formidable amount of computation time.

Conclusions
We have investigated the possibility of smoothing aggregated traffic from sensors with varying reporting periods and frame sizes via a gateway to be carried on an access link by adjusting the offsets of the periodic transmission from individual sensors. A straightforward optimization would consider all possible permutations of offset values, i.e., ∏ n−1 i=0 (P i − S i + 1) permutations, for which P i and S i denote the period and frame size, respectively. Its time complexity can be represented by O(p n ) time, for which n, p denote the number of sensors and size of periods, respectively.
Our heuristic scheme takes only ∑ n−1 i−0 (P i − S i + 1) permutations, which can be represented by the time complexity of O(np). We perform local optimization sensor by sensor in ascending order of periods, starting with the sensor 0 with the smallest period to the sensor n − 1 with the largest period. We evaluate P i − S i + 1 choices of offsets for sensor i. For all choices, we evaluate the incremental unit traffic loads contributed by sensor i and choose the offset O i that best smoothes the resulting traffic (i.e., minimizes the maximum of the instant number of aggregated unit traffic loads during L timeslots, in which L is the LCM of P 0 , P 1 , . . . , P n−1 as the unit loads from n sensors are aggregated). This is performed for i = 0 to n − 1 in sequence. Because sensor i needs P i − S i + 1 evaluations, our scheme evaluates ∑ n−1 i−0 (P i − S i + 1) permutations, which can be represented by the time complexity of O(np). M. Grenier et al. is closest to our scheme. It adjusts offsets for messages in such a way that spreads the messages of different periods over time as much as possible on the CAN to minimize WCRT. It is similar to our scheme in that it performs local optimization with increasing value of periods.
The difference from our scheme is that it looks for the longest least-loaded interval and sets the offset to the middle of the selected interval. The time complexity of the scheme can be represented by the same O(np) as ours because the task of finding the longest least-loaded interval is O(p).
The advantage of our scheme over M. Grenier et al.'s scheme lies in the performance of the traffic smoothing (i.e., maximum of the instant number of aggregated unit traffic loads). The maximum in our scheme is as low as half of their scheme, depending on the number of sensors, periods, and frame sizes, but never higher than their scheme because our scheme also includes their scheme in our evaluation. An extensive MATLAB simulation shows that our scheme excels over the scheme by M. Grenier et al. for almost all possible permutations in the number of sensors, periods, and frame sizes. Especially, as the frame sizes increase, the performance gap tends to grow.
The performance of our scheme is very close to the straightforward optimization that compares all possible permutations. In our scheme, the computational overhead is greatly reduced from exponential O(p n ) time to linear O(np) time. We expect our scheme would greatly contribute in smoothing the traffic from the ever-increasing number of IoT sensors to the gateway, reducing the burden on the access link to the Internet.
The proposed scheme is naturally heuristic because it does not consider all possible permutations on offsets for all of the sensors. However, it has been shown to be very efficient in smoothing traffic on the access link. The local optimization of each sensor's traffic is performed starting with sensors with the smallest periods to those with the largest periods. The time complexity of our scheme is greatly reduced compared to brute-force optimization with all possible permutations.