In order to show the approximation quality of CPLR, we compared it to two other piecewise linear approximation techniques from the literature. We decided to use the emSWAB implementation from [
5], as it was specifically designed for resource constrained architectures and we implemented the SW algorithm as well. Since the emSWAB implementation is made for 8 bit data, we implemented our CPLR algorithm and the SW approximation for 8 bit data as well to stay comparable, while internally using single precision floating point data types for the variables like slope, error, and means. This is a fair comparison as all implementations were tested on an x86-64 system, from which the emSWAB implementation benefits in the same way as our implementations, as emSWAB does not save intermediate results for slope and error calculation in 8-bit variables. The algorithms for our experiments are implemented in C. Our implementations can be obtained by mailing the paper’s first or last author. For the following experiments, we used emSWAB with a buffer size of 50 and SW with a buffer size of 100. This is a fair comparison, as the emSWAB algorithm actually scales the buffer size between half and double of the specified value. This means, in the following experiments, the emSWAB’s maximum buffer size is 100 as well.
  5.1. Data Sets
As the approximation quality of the algorithms depends on the actual data, we used different data sets for the evaluation. In our first test, we recorded accelerometer and gyroscope data with a wireless sensor node in two scenarios. In the 
Kitchen scenario, the sensor was attached to the wrist of a user, while the user performed kitchen tasks like cutting carrots with a knife, stirring a bowl of ingredients with a wooden spoon, blending ingredients in a bowl with a hand-held blender and using a hand mixer. For both data (accelerometer and gyroscope), we used the vector lengths of the three-dimensional signals, in order to derive one-dimensional signals. An example of the accelerometer signal is partially shown in 
Figure 3.
In a second scenario, we recorded accelerometer and gyroscope signals of a 
Walking person. The sensor was attached to the shoe of the user and sampled the data at 100 Hz. The walking dataset includes straight paths, turns, walking upstairs and downstairs and also different walking speeds. An example of the accelerometer signal is partially shown in 
Figure 3 as well.
As in the literature PLA algorithms are not only used for activity recognition scenarios, we also included experiments on available data sets from literature, which were used for the evaluation of PLA algorithms in other application domains. We used the timeseries data set (
Timeseries) that came with the implementation of mSWAB from [
4] as well as the data sets from [
14] including ECG signals (
ECG), valve time series of a Marotta Space Shuttle (
Shuttle), and the time series of a patient’s respiration measured by the thorax extension (
Respiration), which are freely available at [
15]. These datasets are shown in 
Figure 3 as well.
  5.2. Approximation Quallity
We found two criteria that are the most important to describe the approximation quality of a PLA algorithm: the compression ratio and the resulting approximation error per sample, which is referred to as average residual error in the following. Note that the average residual error needs to be distinguished from the segment error. The latter is part of the PLA algorithms for calculating the absolute residual error of a segment ( for CPLR), in order to guarantee a maximum error per segment, while the average residual error is used for evaluation purposes: when a data set is approximated with one of the PLA algorithms, the average sum of squares error per sample of the approximated signal to its original signal is calculated, in order to evaluate the overall approximation error of that approximation. The average residual error of an approximated signal is its  to the original signal, divided by the number of samples of that data set.
At different compression ratios, the average residual error differs accordingly. Although we cannot assume monotonicity, as a general trend, the more the average residual error increases, the more the data is compressed. In general, we want a small average residual error, while simultaneously achieving a high compression ratio or a low inverse compression ratio (ICR), respectively. Thus, these two criteria are conflicting.
At a certain threshold, a PLA algorithm produces an approximation with an average residual error and an ICR. We refer to this pair of error and ICR as an operating point. While CPLR uses the same segment error metric like SW (i.e., ), the authors of emSWAB decided to implement their segment error metric as the sum of absolute distances, in order to avoid computationally costly square and square-root functions. However, this causes a different behavior on the threshold. While CPLR and SW reach an operating point at a certain threshold, emSWAB leads to a higher approximation error, but also higher compression ratio at the same threshold, thus another operating point. The interesting question is whether this operating point is also met by SW and CPLR at a higher threshold, or whether their approximation error is smaller or higher at the same compression ratio. Thus, it is not sufficient to compare two algorithms with the same threshold in order to compare the approximation quality. Both algorithms might lead to the same operating point at different thresholds. Furthermore, as the approximation of a time series for different thresholds is a discrete problem, it is not always possible to find a threshold, which leads to the exact same ICR or average residual error, respectively, of two PLA algorithms. In order to compare different PLA algorithms, we approximated the same time series multiple times with different thresholds and plotted the the resulting average residual error over the corresponding ICR (amount of compressed data to the amount of uncompressed data) for each operating point controlled by the threshold. This leads to curves sketching the dependency between both quality indicators of the PLA algorithms. Note that we used the ICR instead of the compression ratio, as it gives an intuition about the resulting relative data size after approximating a sensor signal. The nearer the curve is to the origin of the plot, the better is the approximation quality of the algorithm.
In our experiments, we evaluated 29 data sets with in total 536,175 samples, all of them approximated 10,000 times with emSWAB from threshold 1 to 10,000 and 100,000 times with both SW and CPLR from threshold 1 to 100,000. The range of thresholds at which emSWAB was performed in our experiments is smaller than for CPLR and SW, as the error metric of emSWAB is a sum of absolute distances. This is explained in the following. At a certain threshold, the emSWAB algorithm might lead to an operating point that can be achieved with CPLR and SW at a much higher threshold, since the sum of squares error in CPLR and SW reaches the threshold faster than the sum of absolute distances used in emSWAB. Thus, the range of thresholds at which CPLR and SW were evaluated in our experiments needed to be larger as for emSWAB in order to reach the same range of operating points for comparisons. Note, that the segment error is an absolute error for a whole segment while the average residual error is a per-sample error. Thus, the segment errors and correspondingly the thresholds are higher then the resulting average residual error in the plots.
As can be seen in 
Figure 4, the general trend is a higher average residual error for lower ICRs. Comparing the quality plots 
Kitchen, 
Walking, and 
Timeseries with 
ECG, the ICR of the latter is smaller at similar average residual errors. This means for the 
ECG data set a higher compression rate for comparable approximation errors can be achieved. This mainly results from the signal structure of the 
ECG signal, which contains a higher amount of non-fluctuating signal parts compared to the other data sets.
For the respiration data set, the compression ability appears to be even higher. Note that, due to the high compression of this data set, we plotted the quality curve of the Respiration data at a smaller ICR range, in order to distinguish the curves. However, by having a closer look at the signal, the high compression results out of an over quantization of the signal when converting it to 8 bit because of the signal peak, which can be seen in 
Figure 3, representing the respiration of the person while waking up. The over quantization leads to a signal mostly consisting of perfect straight parts, which can be seen in 
Figure 5. Thus, the possible compression rate for this data set is much higher. Since for the Respiration data set a significant compression is possible, the bound on the possible achievable compression of SW due to its maximum buffer size can be observed in 
Figure 4. For SW, the ICR is bounded to 0.01 as a minimum, as the maximum buffer size is set to 100, while CPLR allows for unbounded segment sizes and thus a higher compression or a lower ICR, respectively.
Note that the curves in 
Figure 4 represent operating points of different integer threshold values starting from 1. Due to the nature of the error metrics implemented in the algorithms, the curves start at different operating points for a threshold of 1. By using floating point thresholds, the operating points below 1 could be reached as well, which are expected to follow the trend towards a lower error and a higher ICR as well. However, this is not further investigated in our experiments, as the operating points of interest with lower ICRs can already be seen in the conducted experiments. Furthermore, we did not include 
 in 
Figure 4, as it might cause misleading assumptions. In fact, CPLR as well as SW will recreate the original signal at 
, while compressing perfect straight parts of the signal that can be approximated by a segment without any approximation error. However, the emSWAB implementation from [
5] does not behave in the same way and does not compress the signal at 
. This is basically caused by the implementation of emSWAB, which merges segments when the residual error is below 
 instead below or equal 
, which is too conservative. However, an emSWAB implementation with the same behavior as our CPLR and SW implementations is possible, but was not available for our studies. Furthermore, the ICR at 
 is approx. 32% with CPLR and SW for the 
Respiration data set shown in 
Figure 3. This is caused by the coarse quantization of the signal. In activity recognition scenarios with MEMS (Micro-Electro-Mechanical Systems) accelerometers and gyroscopes, the signals typically use 16 bit fixed-point data or floating point data and include significant noise on the least significant bits. In such cases, we do not expect a significant compression at 
, and thus excluded 
 from our evaluation in 
Figure 4.
In this experiment, we compared the approximation quality of our CPLR algorithm with SW and emSWAB. In general, our evaluation shows no significant differences in the approximation quality of the chosen algorithms, except the ability of CPLR to create much longer segments and the approximation of the shuttle dataset, for which emSWAB slightly sacrifices in its approximation quality. Thus, we can show, that although CPLR is calculated in a time and memory efficient manner, its approximation quality is not compromised by this.
  5.3. Execution Time
As a first evaluation of the execution time behavior on the introduced data sets in 
Section 5.1, we used the x86-64 platform, as it allows a feasible evaluation of this amount of data. We decided to use an emulative approach based on the Valgrind framework [
16]. We did not include the simulation of caching effects, although our algorithm would draw profit from it, as all calculation data will fit in first level caches. We evaluated the binaries compiled with GNU C (GCC) compiler version of 7.3.0 [
17] with default configuration on an x86-64 system. As an evaluation metric, we use the measured instruction count spent in the function, which is called for every new sample of an input data series. By using the instruction count, architectural effects on the timing behavior like pipelining, branch prediction, caching, out of order execution, and the processor frequency are avoided.
For these experiments, we created Callgraphs with the tool Callgrind [
18] to evaluate CPLR, SW, and emSWAB over all 29 data sets using the same buffer sizes as in 
Section 5.2, i.e., 100 for SW and 50 for emSWAB and a threshold of 100. The minimum, maximum, and average instruction count per algorithm invocation as well as the standard deviation and the number of invocations can be seen in 
Table 1. The latter column only differs in the row emSWAB, as emSWAB handles the initialization of the PLA within the same function used for approximation while CPLR and SW are implemented with a separate initialization for the very first sample, for which they are not called in each data set.
By this experiment, we can show that our CPLR algorithm has a maximum instruction count of 150. The best case is 94 instructions, which need to be executed when no segment is outputted. Note that the average instruction counts for our algorithm depends on the actual signals, as the number or length of segments respectively is decisive for the fraction and is bounded on 150 instructions in case no compression is achieved and a segment is created for each sensor sample. The execution time behavior of CPLR can be seen in 
Figure 6. This shows the worst-case execution time characteristic of the CPLR algorithm.
For SW, we observe that the instruction count of each invocation grows by approximately 51 instructions for each new sample that is added to a segment. Thus, the execution time of SW linearly increases with the current segment’s length. In the case, that a segment is created due to an exceeding threshold, 26 instructions are needed additionally to the error calculation. If a segment is created due to a maximum segment size, only 46 instructions have to be performed, because the error calculation can be skipped. Therefore, the maximum and average instruction count depends on the maximum segment’s length and can be bounded by bounding the buffer size. In 
Figure 7 the execution time dependency of SW on the segment’s length is shown.
For emSWAB, the error calculation does not happen in each invocation and mostly the new sample is simply added to the buffer. This is the best case for emSWAB with 53 instructions. The segmentation within the buffer starts when the slope sign of the signal changes, and is not necessarily correlated to segment points. In this case, the execution time drastically increases, as multiple segments within the buffer are created and merged until none of them can be merged any further due to an exceeding error threshold. Within this process, the segment error calculation similar as in SW (iterating over all samples covered by that segment) is performed but repeatedly for multiple created and merged segments. Thus, the execution time is highly data and buffer size dependent and is two orders of magnitude higher than CPLR in this experiment. The execution time behavior of emSWAB is shown in 
Figure 8.
From this experiment, it can be seen that our CPLR algorithm not only has the minimum average computation time, but also a minimal guaranteed maximum execution time of 150 instructions on the x86-64 architecture without compromising its compression abilities.
In order to be able to perform this experiment on a large amount and variety of signals, we used the same threshold for all algorithms on all data sets. However, this is not a fair comparison, as the algorithms might lead to different segment lengths for the same thresholds, which influences the timing behavior of SW and emSWAB. Furthermore, among different data sets, the resulting compression ratios can differ as well as the same threshold. Therefore, we performed a second experiment on one of the data sets (i.e., the 
 signal from 
Figure 3) and chose the thresholds for CPLR, SW, and emSWAB leading to approximately the same compression ratio. This was done by examining the results of the in total 210,000 invocations at different thresholds on the corresponding 
 data set from the quality evaluation in 
Section 5.2 and picking those thresholds for CPLR, SW, and emSWAB, which just achieved an inverse compression ratio of at least 0.2. The chosen thresholds, the corresponding compression ratios and the instruction counts observed in this experiment are shown in 
Table 2.
It can be seen that the maximum instruction count for emSWAB and SW decreased w.r.t the results in 
Table 1, as the lower thresholds lead to shorter segments and thus lower execution times. The average instruction count for CPLR has slightly increased, as at a lower threshold smaller and thus more segments are created. As the segment creation needs additional instructions, the average instruction count slightly increased. The maximum instruction count of 150 for CPLR does not appear in this data set. This only happens due to the truncation to 8 bit, if the approximated segment point is above 255 or below 0, which needs two instructions. Thus, the maximum instruction count of 148 in this second experiment is less than in 
Table 1. Furthermore, it can be seen that the average instruction count for emSWAB increased, although the maximum instruction count decreased. This is explained by the fact of a smaller threshold leading to smaller segments, and thus the computationally costly segmentation process within the buffer is faster but performed more frequently. For SW, the minimum instruction count is 57 in this experiment. This is caused by the fact that the lower threshold leads to shorter segments and no segment is created due to an exceeding maximum segment size, which needs 46 instructions. Despite the latter, an instruction count of 57 is the shortest execution time of SW, which happens when only the first sample is added to a newly created segment. This small execution time is only achieved at the very first invocation of the SW algorithm after initialization. In all other invocations, the first sample is already added to the buffer upon segment creation, since a new segment’s end point is always created for the previous sample.
  ARM Cortex-M4 Microcontroller Architecture
On the x86-64, we were able to show the execution times of CPLR, SW, and emSWAB on a large amount and variety of data. However, as the target architectures of our algorithm are embedded microcontrollers, we additionally analyzed the instruction count on an ARM Cortex-M4 microcontroller. Due to the lack of an appropriate hardware setup on which high amounts of data can be approximated while acquiring the execution time of each invocation in an automated manner, we performed a static code analysis on the assembler code. We compiled emSWAB, SW and our CPLR algorithm for the ARM Cortex-M4 microcontroller architecture using the Arm Embedded GCC version 7.3.0 of the GNU Arm Embedded Toolchain. As compiler settings, we used Cortex-M4 as the target platform with its included hardware floating point unit. From the produced assembler code, we extracted the control flow graphs with corresponding instruction counts of each basis block. For SW and our CPLR algorithm, the simplistic control flow allows for easily analyzing the worst-case instruction count and its data dependency.
The best- and worst-case instruction counts per invocation for CPLR are 131 and 203 instructions, respectively. For SW, the best case is 59 instructions when a segment is created due to an maximum segment length. The data dependent worst case is 
 instructions with 
n being the segment size. For a maximum buffer size of 100 samples as used in the experiments on the x86-64 architecture (see 
Table 1), the maximum 
n is 100 and the worst-case instruction count of SW would be 5706 instructions on the ARM Cortex-M4 microcontroller. For emSWAB, we were able to analyze the shortest path consisting of 64 instructions, in which only a new sample is added to the buffer. However, when the segmentation process within the buffer is performed, the control flow consists of several nested function calls and nested loops, which makes a manual analysis infeasible. This data dependent behavior aligns with the timing behavior observed on the x86-64 architecture in our experiments (see 
Figure 8) and is not further investigated in this paper. The minimal and maximal instruction counts of CPLR, SW, and emSWAB on the ARM Cortex-M4 architecture are summarized in 
Table 3.