Evaluating the Privacy and Utility of Time-Series Data Perturbation Algorithms

: Data collected from sensor-rich systems may reveal user-related patterns that represent private information. Sensitive patterns from time-series data can be protected using diverse perturbation methods; however, choosing the method that provides the desired privacy and utility level is challenging. This paper proposes a new procedure for evaluating the utility and privacy of perturbation techniques and an algorithm for comparing perturbation methods. The contribution is signiﬁcant for those involved in protecting time-series data collected from various sensors as the approach is sensor-type-independent, algorithm-independent, and data-independent. The analysis of the impact of data integrity attacks on the perturbed data follows the methodology. Experimental results obtained using actual data collected from a VW Passat vehicle via the OBD-II port demonstrate the applicability of the approach to measuring the utility and privacy of perturbation algorithms. Moreover, important beneﬁts have been identiﬁed: the proposed approach measures both privacy and utility, various distortion and perturbation methods can be compared (no matter how different), and an evaluation of the impact of data integrity attacks on perturbed data is possible.


Introduction
Time-series data collected from various sensor-rich systems (e.g., auto vehicles, wearable devices, industrial equipment) may not reveal tangible personal identifying information, such as name, physical address, or email addresses.However, such data may still reveal essential user-related information (e.g., geolocation, biometrics).For example, time-series data collected from automotive systems, wearable devices, or smart grids contain information that may lead to identifying the end-user [1][2][3].Thus, sensitive information should be hidden before leaving the sensor-based device and reaching external data processing and analysis systems.
The state-of-the-art research proposes several time-series data perturbation algorithms capable of protecting sensitive data, while exposing useful data for aggregation and analysis purposes [3][4][5][6][7][8][9][10]. Furthermore, these algorithms aim to eliminate sensitive patterns that may lead to user identification, while introducing a minor utility loss for third-party processing systems.However, additional research is necessary to establish a proper (or desired) balance between data privacy and data utility.
This paper proposes a novel methodology for assessing the privacy and utility of time-series perturbation algorithms.It documents a systematic methodology to assess and compare existing data perturbation techniques from the perspective of data privacy and data utility.
The proposed technique is inspired by the cyber attack impact assessment (CAIA) methodology [11], an approach based on system dynamics research [12].CAIA studies the behavior of complex physical processes in the presence and absence of deliberate or accidental interventions to evaluate cyber-assets' significance in large-scale, hierarchical, and heterogeneous installations.The impact of the interventions on the system is measured, and decisions involving response adjustments are taken.
Our proposed approach considers a data protection system (from sensors to the perturbation method) similar to a physical process.It measures the impact of various user behavior (normal and sensitive) and data interventions on the time-series perturbation system, and evaluates the utility and privacy of the resulting perturbed data.Furthermore, the computed impact compares various perturbation algorithms and additionally identifies perturbation methods that preserve information on data interventions (e.g., from data integrity attacks).The approach is validated using real data captured from a VW Passat vehicle via the on board diagnostics 2 (OBD-II) port.
We consider the present work to be complementary to previous studies on measuring the utility and privacy of perturbation methods.Previous studies on time-series data perturbation techniques [3,8,9,13,14] focused on demonstrating their performance using specific datasets and conveniently selected parameters.However, more general approaches are necessary for enabling data perturbation on a large scale and for miscellaneous sensorbased data-collection scenarios.
The objectives of the current research were to: (i) define a testing procedure for measuring the data privacy and data utility provided by time series perturbation methods in the case of normal and sensitive behavior, as well as in the case of data interventions (attacks); (ii) describe an algorithm to compare two perturbation mechanisms; and (iii) propose a procedure to identify the types of attacks that can be detected after applying a specific perturbation mechanism.
Our main contributions are the following: • A systematic procedure for evaluating the utility and privacy of perturbation algorithms.The approach is sensor-type-independent, algorithm-independent, and data-independent.• A methodology for comparing data perturbation methods.

•
We demonstrate applicability by assessing the impact of data integrity attacks on perturbed data.

•
We analyze the approach on actual driving data and build the dataset following the stated requirements.
The remainder of the paper is organized as follows: Section 2 provides an overview of perturbation techniques of interest, explains our choices for the privacy and utility metrics, and briefly presents the CAIA methodology that inspired our research.The proposed approach is presented in Section 3. The experimental results are documented in Section 4, discussed in Section 5, and the conclusions are formulated in Section 6.

Time-Series Data Privacy Techniques
Time-series data collected continuously from selected data sources (e.g., sensors) are often regarded as time-domain signals.The signal characteristics (amplitude, average, peak and trough, trend, and periodicity) may reveal user-related patterns, and, as they represent private information, they should be subject to protection.Moreover, time-series data protection must consider both time and frequency domains.
Various techniques have been proposed to protect the privacy of potentially sensitive sensor data streams when transmitted over the Internet to third-party processing systems.The protection of private information from time-series sensor data is mainly achieved using encryption, de-identification, and perturbation [15].Encryption is a standard privacy approach for protecting data from unauthorized access [16].However, due to processing power constraints, encryption may be challenging to implement in sensor-based systems.Additionally, encryption methods assume that the entities that handle the personal information are trustworthy [17].Anonymization (also called sanitation [18] or de-identification) consists of deleting, replacing, or hashing all the personally identifiable information (PII) within a dataset.This process does not reduce the data quality, and anonymized data can be safely transported over the Internet.Popular approaches for data anonymization include k-anonymity [19], l-diversity [20], and t-closeness [21].However, anonymization is difficult to enforce on time-series data that do not contain explicit personal information.
Perturbation or random perturbation, the method of interest for the paper at hand, is a well-known approach for protecting time-series private data.It partially hides information, while maintaining the possibility of information extraction through data-mining techniques.The main advantage of perturbation is that it does not require additional knowledge of the data to be protected.Moreover, the computational complexity of the perturbation is low [18].Data perturbation techniques include randomization-based methods (additive perturbation [4], multiplicative perturbation [5,6], geometric perturbation [6], nonlinear perturbation [22], differential privacy (DP) [23]) and transformation-based methods [3,[7][8][9][10] (collected values are first translated from the original feature space into a lower-dimensional feature space where noise is added).Whether the perturbation preserves the essential characteristics of the data depends on (i) how much perturbation is added (i.e., in differential privacy) and (ii) how the perturbation is added.Generally, the perturbation amount is predefined by the user.
Differential privacy (DP) or -differential privacy [24] is a widely accepted technique for implementing perturbation-based privacy protection, including for deep learning approaches [25].Classical techniques (such as anonymization or aggregation) have been the subject of various privacy attacks, and even more so , modern techniques (such as k-anonymity [19]) have failed to protect data from certain types of attacks.Differential privacy protects data by adding a selected amount of noise to the original data using various mathematical algorithms (e.g., Laplace, Gaussian).Differential privacy, initially proposed to protect database queries, guarantees that changing the value of a single record has minimal effect on the statistical output of results [13].However, obtaining the desired trade-off between privacy and accuracy may be challenging for time series, and it may reduce data utility [15,26].

Privacy and Utility Metrics for Data Perturbation
Data perturbation involves adding noise to data.The more noise is added, the more data privacy is achieved, and consequently, the more data is hidden, thus also reducing data utility.Many perturbation mechanisms, and, more precisely, those that use data transformation, do not only add noise, but also use other parameters for implementing privacy.Consequently, finding the proper perturbation is not only a matter of increasing or decreasing noise.Several methods for computing these parameters have been proposed [3,9,13], with limited success.In a recent survey, Dwork et al. [27], while referring to finding the optimum privacy budget for differential privacy-based algorithms, stated that there is no clear consensus on choosing privacy parameters for a practical scenario.Thus, to the best of our knowledge, this issue remains an open question.
After applying a perturbation mechanism, the usefulness of the resulting data needs to be measured to ensure that sufficient information is preserved for data analysis and other types of data processing (e.g., anomaly or tampering detection [14, [28][29][30]).Utility metrics are essential to data analysts and data-processing applications.There is no universally accepted definition for data utility related to privacy mechanisms (i.e., differential privacy approach).However, generally speaking, data utility or usefulness is measured by the extent to which the chosen data privacy approach preserves aggregate statistical information [18].
Two main aspects are considered when measuring privacy and utility [6]: the privacy loss (PL) and the information loss (IL).Privacy loss measures the capacity of an attacker to extract relevant information about the original data from the perturbed data.In contrast, information loss measures the reduction in the statistical utility of the perturbed data compared to the original data.Therefore, the ideal approaches minimize privacy loss (maximize perturbation) and the loss of information (maximize utility).
The state-of-the-art literature lists several common metrics for measuring information loss (utility), such as variance [31,32], mean relative error (MRE) [33,34], mean squared error (MSE) [35] and mean absolute error (MAE) [13,36].For all, a lower value implies better utility.To validate the proposed approach, we chose to compute the mean absolute error (MAE) to measure the utility (Equation (1)), a metric often utilized for comparing time-series perturbation methods [13,37]: where N is the length of the time series, X is the original data vector, and X is the perturbed data.
For quantifying privacy loss, three main approaches have been proposed in the literature [7]: measuring how closely the original values of a perturbed attribute can be estimated [4,13], using information theory [38], or, using the notion of privacy breach [39].For our research, we calculate the probability density function (PDF) of queries executed on perturbed data and compute the probability of obtaining the original value (computed from the unperturbed data) from perturbed data.The lower the probability, the better the data privacy.
As previously shown, the scientific literature provides a rich palette of perturbation algorithms for time-series data and many metrics for measuring their performance.Despite the extensive research, currently, there needs to be a standardized approach for comparing these perturbation methods.The approach outlined in this paper stands out from previous works in the following ways: First, to the best of our knowledge, the presented procedure is the first approach to simultaneously measure both the privacy provided by the perturbation and the utility of the resulting data.Second, the comparison framework (including data generation) can be applied to diverse perturbation techniques without prior knowledge of the implemented algorithms.We note, however, that the methodology presented in this paper may be perceived as supplementary to the prevailing data privacy and utility metrics.

Cyber Attack Impact Assessment (CAIA) Methodology
The cyber attack impact assessment (CAIA) [11] builds on the behavioral analysis of physical processes proposed by Ford [12].Additionally, the sensitivity assessment approach of Huang et al. [40] computes the relative variance between model behavior in the activated and deactivated control loop cases.Control loops rely on observed variables and cause changes to the physical process state via control variables.The objective of the sensitivity assessment is to quantify the contribution of the control loop to the behavior of a certain variable of interest.
The CAIA methodology computes the covariance of the observed variables before and after the execution of a specific intervention involving the control variables.Metrics quantify the impact of deliberate interventions on the control variables.The cross-covariance values, comprising the impact matrix, are computed between the observed variable with no intervention and the observed variable with intervention on the control variable.The impact matrix provides useful information on (i) the impact of cyber attacks on the studied system, (ii) the propagation of disturbances to remote assets, and (iii) equally significant observed variables.

Proposed Approach
The proposed methodology is inspired by research in system dynamics, sensitivity analysis, and the CAIA framework.The perturbation method is modeled as a dynamic system, and the perturbation is first analyzed in the absence and the presence of sensitive user behavior.The main symbols used throughout this research are described in Table 1.
Relative impact of a behavior data on the observed variable (attribute) C Mean relative impact of a behavior data on the observed variable (attribute) α p Behavior-privacy parameter α u Behavior-utility parameter

Perturbation System Architecture and Design Consideration
Sensor data can either be protected by performing the perturbation locally, on the system that gathers the data, or to a remote system by securely transferring the data from the data source to the third-party processing systems.However, due to recent regulations, which explicitly stipulate that "wherever possible, use processes that do not involve personal data or transferring personal data outside of the vehicle (i.e., the data is processed internally)" [41], the local perturbation is preferred (see Figure 1).Implementing a protection system for time-series data involves choosing the perturbation method, taking into account data and equipment restrictions, such as: • the type of data leaving the system and the potentially sensitive information they carry; • the amount of information to be hidden considering possible sensitive information or other external factors; • utility restrictions (how much information about the data should still be available after perturbation); • the processing power of the equipment.
Because many data protection mechanisms have been proposed and implemented during the last decade, choosing the most suitable one is challenging.Consequently, the main purpose of this research is to make such decisions simpler.

Formal Description
Consider time-series data X, collected from a sensor at time instants T = {1, 2, . . ., t, . . .} for one observed variable (time series attribute), X r a vector containing measurements corresponding to regular (typical) user behavior (r ∈ R), X s corresponding to sensitive behavior (s ∈ S), X b corresponding to any type of regular or sensitive behavior (b ∈ B, B = R ∪ U and R ∩ S = ∅), X a corresponding to data intervention (a ∈ A).Moreover, consider σ the standard deviation of X, [σ min , σ max ] the feasible interval of minimum and maximum standard deviations (available from the sensor specifications), σ r the standard deviation of a regular (typical) user behavior, [σ r min , σ r max ] the standard deviation interval corresponding to regular behavior, σ s the standard deviation of a sensitive behavior, and σ a the standard deviation of a data intervention.Definition 1. Time-series data X, collected by a sensor, contains information associated with regular behavior if its standard deviation σ r is in the interval of regular operation, σ r ∈ [σ r min , σ r max ].
The interval of regular operation [σ r min , σ r max ] is obtained by computing the standard deviation for several time-series data collected during what is subjectively considered the typical operation of the device or equipment.Definition 2. Time-series data X, collected by a sensor, contains information associated with sensitive behavior (user-specific information) of the system user if the standard deviation of X, σ s , is outside the regular operation interval, σ s ∈ [σ min , σ r min ) or σ s ∈ (σ r max , σ max ]. From the privacy point of view, sensitive behavior corresponds to patterns that may lead to user identification or the identification of specific user behavior (e.g., aggressive driver behavior, nervous tics).Thus, such patterns should be recognized and protected by the perturbation system.
Data interventions are conducted either by altering the sensor or injecting erroneous data before the perturbation process occurs (see Figure 1) and, in this research, we associate them with integrity data attacks (e.g., pulse attacks, scaling attacks, random attacks).From the utility point of view, the information that may lead to an attack or anomaly detection should be maintained after data perturbation.The working hypotheses are that the impact of intervention data is more significant than the impact of the sensitive behavior data and that the impact of sensitive behavior data is higher, but reasonably close to, the regular behavior.Definition 3.An intervention is an intentional modification of the collected time-series data X, such that the standard deviation of X during the attack (σ a ) is greater than the standard deviation of all collected sensitive behavior data, σ a > σ s or it is smaller than the standard variation of all collected sensitive behavior data, σ a < σ s .
Consider M a perturbation mechanism that protects information from X b , b ∈ B, while maintaining the possibility of partial information extraction through data-mining techniques such that: Let X 0 denote the reference data of regular operation, called landmark regular behavior, X bt the t th measurement of the observed variable for the b behavior data, and Y bt the perturbation of X bt .
The mean of the observed values for the behavior b is defined by: Further, let C(Y b ) be the impact that behavior b has on the observed variable of the perturbation system, computed as the cross-covariance between the perturbed landmark regular behavior Y 0 = M(X 0 ) and the collected behavior data b: The impact (Equation ( 4)) is a measure of how much the output of the system deviates from regular behavior.
The relative impact C of a behavior b on the observed variable is defined as: As any perturbation method introduces a certain degree of uncertainty due to the added noise, the mean relative impact is used to quantify the impact of interventions under uncertainty: where P is the number of times perturbation is performed, i.e., Y 0 and Y b are computed.The larger the P, the more accurate the relative impact.

Definition 4.
Let M be a perturbation mechanism that takes as input time-series data X b , b ∈ B, corresponding to a regular or sensitive behavior, such that Y b = M(X b ), and let α p be a real positive number.M satisfies α-behavior-privacy for the observed variable if it holds that: where α p is the behavior-privacy parameter.Definition 4 imposes that the relative impact on the perturbed data of any behavior should be less than a pre-defined value, α p , for the observed variable.The behavior-privacy parameter α p in Equation ( 7) defines the desired level of privacy and it should be as small as possible for high data protection.In the case when the mean relative impact C(Y b , X b ) of a behavior data value b is higher than α p , we conclude that the perturbation method M does not provide sufficient protection, meaning that it does not hide enough information (sensitive patterns can be detected).

Definition 5.
Let M be a perturbation mechanism that takes as input time-series data X b , b ∈ B, corresponding to a regular or sensitive behavior, such that Y b = M(X b ), and let α u be a real positive number.M satisfies α-behavior-utility for the observed variable if it holds that: where α u is the behavior-utility parameter.Definition 5 states the condition to be met by the perturbation mechanism on any behavior data such that the perturbed result is useful.The behaviorutility parameter α u in Equation ( 8) defines the desired level of utility, and it should be as large as possible for high data utility.When the mean relative impact C(Y b , X b ) of a behavior data value b is lower than α u , we conclude that the perturbation method M does not provide sufficient utility, meaning that it hides too much information.
An ideal perturbation mechanism for the observed variable satisfies both α-behaviorprivacy and α-behavior utility conditions, such that the mean relative impact C(Y b , X b ) of any behavior b is in the interval [α u , α p ], α u ≤ α p .

Comparing Perturbation Methods
Consider M 1 and M 2 , two perturbation mechanisms that satisfy α-behavior-privacy and α-behavior-utility with the targeted α u and α p , the behavior-utility parameter and behavior-privacy parameter, respectively.Definition 6.For the perturbation methods M 1 and M 2 we define the following privacy-utility operators: if M 1 provides higher utility than M 2 and M 1 provides higher privacy than M 2 .
Next, consider X 0 the landmark regular operation data, X s , a sensitive behavior data Then, the mean relative impact is defined in Equations ( 9) and (10) as: and C2 (Y s , X s ) = Let us denote C1 s = C1 (Y s , X s ) and C2 s = C2 (Y s , X s ).
Consequently, the impact that any sensitive behavior s has on data perturbed with M 1 is higher than the impact of at least one sensitive behavior s on data perturbed with M 2 ; thus, more information about the sensitive behavior is maintained in the data perturbed with M 1 for all s ∈ S, providing higher overall data utility.Then, according to Definition 6, M The impact that any sensitive behavior s has on perturbed data with M 1 is smaller than the impact of at least one sensitive behavior s on data perturbed with M 2 .Consequently, less information about the sensitive behavior is held in the data perturbed with M 1 for all s ∈ S, providing higher overall data privacy.Then, according to Definition 6, M 1 M 2 .
Taking into account Propositions 1-3, we propose the methodology described in Algorithm 1 for comparing two perturbation methods M 1 and M 2 .
Before applying Algorithm 1, data preparation is required.Firstly, we describe the regular user behavior for the tested system, collect regular behavior data, compute standard deviations for all data, and find the interval of regular operation [σ r min , σ r max ].Then, we choose the landmark regular behavior, X 0 , selected from the collected regular behavior data such that the standard deviation σ is the closest to (σ r max − σ r min )/2.Thirdly, we collect sensitive behavior data, X s .The constituent steps are also illustrated in Figure 2.

Evaluation of the Utility of a Perturbation Method in Case of Data Interventions
External entities can alter sensor data, for instance, by modifying the sensor or changing the data after it is collected.Therefore, monitoring data interventions is essential for maintaining data integrity and detecting anomalies or attacks.We evaluate the impact data interventions have on the perturbed data and estimate the resulting data's utility.Maintaining enough information after perturbation to detect anomalies/attacks is expected from the utility point of view.Our research focuses on a type of data intervention called an integrity attack, which consists of modifying data after it is collected using predefined patterns.
Consider A the set of possible data interventions.Let an intervention (attack) data X a , a ∈ A, be the input of a perturbation method M. Let Y a be the perturbed values, Y a = M(X a ), and compute the mean relative impact of X a on the perturbed data Y a for an observed variable as: Denote Ca = C(Y a , X a ) and Cs = C(Y s , X s ), ∀s ∈ S.
Proposition 4. If M satisfies the condition: then M preserves the intervention information such that the perturbed data Y a is useful for detecting data intervention a, a ∈ A.
Proof.For ∀s ∈ S and ∀b ∈ B, max( Cs Then, the mean relative impact Ca of a data intervention a, a ∈ A, is higher than the mean relative impact of any behavior b (regular or sensitive).Thus, the perturbed data resulted from applying M on a preserves information about the intervention/attack, maintaining data utility.
The consequence of Proposition 4 is that, if the impact of an intervention (attack) is higher than the impact of all defined sensitive behavior data, then the perturbed data is useful from the point of view of the attack or anomaly detection.
The proposed approach for evaluating the utility of a perturbation method in case of interventions is described in Algorithm 2. The same data preparation is necessary as in the case of Algorithm 1.Additionally, data intervention X a , a ∈ A, is collected.If the result of the evaluation is positive, then the perturbation method provides utility for the considered intervention data.Otherwise, the usefulness of the perturbation method is low or uncertain.
Algorithm 2: Intervention Impact on Perturbed Data Algorithm Input: X 0 (landmark regular behavior), X s (vector of sensitive behavior data, s ∈ S), X a (intervention data, a ∈ A), M (perturbation method); Output: The evaluation result Function EvaluateInterventionImpactOnPerturbedData(X 0 , X s , X a , M):

Experimental Results
The proposed framework is evaluated from several perspectives.Beforehand, the approach to collecting and generating the necessary data is described.Next, several standard perturbation methods are compared using the impact coefficients and the proposed algorithm for univariate time series.Finally, the method's applicability is showcased for identifying the possibility of perturbation methods to detect specific types of integrity attacks.We consider the ability of the perturbation method to maintain information about the intervention as a measure of its utility.
The proposed framework is evaluated in the context of three time series distortion algorithms that leverage the discrete Fourier transform (DFT) as the transformation method: (i) a primary distortion method that consists of transforming the data in the frequency domain and filtering the first k coefficients (method denoted within this article as filtered FFT) and that does not introduce any noise perturbation (we use it for emphasizing the validity of the proposed method), (ii) the compressible perturbation algorithm (CPA) [8], based on the Fourier representation of the time series, it adds noise to a fraction of the frequencies, and (iii) the Fourier perturbation algorithm (FPA) [9], the first differentially private (DP) approach that offers practical utility for time-series data.The CPA and FPA algorithms are widely accepted as classical perturbation methods with many applications and variants.Thus, by demonstrating the validity of the proposed approach to these algorithms, we expect to have the framework's utility applied to other similar perturbation techniques.

Data Collection and Intervention Generation
The first step in using the proposed framework is collecting data for both regular and sensitive behavior.Further, interventions are generated by simulating various integrity attacks based on regular behavior data.
This research uses data extracted from in-vehicle CAN data.Data was collected via the on board diagnostics 2 (OBD-II) port on a VW Passat vehicle using the OBD Fusion mobile app.Data were recorded every 1 second during driving, and 136 features were extracted through the OBD-II port.All driving data (regular behavior and sensitive behavior) were collected on the same route in similar traffic conditions.
The dataset preparation consists of the following steps: • Step 1: Collect several normal behavior time-series data, compute the standard deviation σ for each one, and identify [σ r min , σ r max ], the interval of minimum and maximum standard deviation possible for normal behavior.

•
Step 2: Choose the landmark normal behavior (X 0 ), the data further used for computing impact coefficients and for attack generation.For instance, choose the normal behavior that has the standard deviation closest to the middle of the [σ r min , σ r max ] interval.

•
Step 3: Identify possible sensitive behaviors and collect the corresponding data.The collected data qualifies as sensitive behavior if its standard deviation is outside the interval [σ r min , σ r max ], according to Definition 2. Intervention data is generated from the landmark regular behavior by simulating four integrity attacks commonly utilized in the literature for security studies [42,43], plus the step attack that can be associated with a defective sensor.The list of interventions is not aimed to be exhaustive, but is provided to showcase the methodology in the context of possible attack scenarios.Given the attack interval [T start , T stop ], the following types of attacks on time-series data are considered:

•
Pulse attack: In this case the altered value X * j (t) is obtained by dividing the value of the attribute j at time t, X j (t), by an attack parameter a p : X * j (t) = X j (t)/a p with t in the attack interval [T start , T stop ]; Step attack: This attack involves setting values to the attack parameter a p is added: 3 displays a sample of the collected data, and the generated intervention data are illustrated in Figure 4. Interventions fulfill Definition 3 that states that the standard deviation of intervention data a is higher than the standard deviation of all defined sensitive behavior data s.Table 2 explains how sensitive behavior and attacks are generated and lists their corresponding standard deviations (σ).

Experiments 4.2.1. Compare Perturbation Methods
Auto vehicles, as well as many devices or industrial equipment, are enhanced with a large number of sensors.Only a fraction of the observed variable data collected from those sensors is sent to third-party processors.One or more collected or computed values are transmitted and must be protected.
To demonstrate the usage of the proposed approach, firstly consider a univariate time series consisting of the vehicle speed observed variable.According to the procedure described in Section 3.3, the regular and sensitive behavior time series are perturbed by applying selected distortion methods described in Table 3. Figure 5 presents a sample of the perturbed data.Step attack 0.0906 (attack window size = 10, a p = 70) The perturbation is applied for all tested algorithms for each behavior data value, and the relative impact (Equation ( 5)) is computed.Finally, the process is repeated a significant number of times (e.g., larger than 100), and the mean relative impact is obtained (Equation ( 6)).Table 4 summarizes the computed impact selected sensitive behavior has on various perturbation systems, and Figure 6a illustrates the minimum and maximum impact coefficients for all perturbation methods (highlighted in Table 4).
Furthermore, we investigate the proposed approach's utility in the case of multiple observed variables data.We selected more data attributes (instant fuel consumption, CO 2 flow, and magnetometer X) from the collected dataset, besides the already presented vehicle speed observed variable, and computed their mean relative impact coefficients (illustrated in Figure 7).  2 and apply the proposed approach for computing the impact of attacks on normal behavior data.Table 5 lists the mean relative impact coefficients (Equation ( 11)) for all integrity attacks and all tested perturbation methods.For all methods, min( Cs ) and max( Cs ) are extracted from Table 4.

Discussion
This paper proposes a comparison approach that identifies the minimum and maximum relative mean impact for each perturbation method.A suitable perturbation method is identified if it simultaneously holds the highest minimum and lowest maximum impact coefficients.
For one observed variable (vehicle speed), the computed minimum and maximum impact coefficients C, for all tested perturbation methods, are listed in Table 4 and illustrated in Figure 6a.As observed, the perturbation method M 5 holds both the highest minimum impact ( C = 0.228) and the smallest maximum impact ( C = 0.245).on Propositions 1-3, we conclude that M 5 provides the best privacy and utility from the tested algorithms, for the considered dataset, and the proposed sensitive behaviors.Perturbation M 6 also provides good privacy and utility with impact coefficients close to those computed for M 5 and it may be considered an alternative for M 5 .
The result is validated from the utility point of view by computing the mean absolute error (MAE) utility metric between the not perturbed sensitive behavior data and the perturbed version for each perturbation method (Table 6).Figure 6b highlights perturbation methods M 5 and M 6 as the ones that provide the smallest information loss.For measuring the privacy provided by the tested perturbation methods, we calculated the probability distribution function (PDF) of queries executed on perturbed data and computed the probability to obtain the actual query result from the original data (Table 7).Figure 6c shows the mean probability for all sensitive behaviors computed from PDFs generated from 1000 queries.The smaller the probability, the higher the privacy, as more information is hidden.Again, the computed probabilities emphasize the perturbation methods M 5 and M 6 as the ones providing the best privacy protection.In addition, we tested the proposed framework for more observed variables with the objective of finding the best perturbation method that can be applied to all variables.Figure 7 illustrates the computed mean relative impact coefficients for attributes instant fuel consumption, CO 2 flow, and magnetometer X.According to the stated requirements, the perturbation method M 6 holds the highest minimum and lowest maximum impact coefficients for all variables, thus providing the best utility and privacy for the described scenario.Moreover, the result is confirmed from the utility and privacy points of view.Figure 8 shows that M 6 has the minimum information loss (MAE) from all tested perturbation methods, and, moreover, the lowest maximum probability of the real query result (lower probability is better).Further, we investigated the possibility of using the mean relative impact coefficients (Equation ( 11)) for detecting data integrity attacks.Based on Proposition 4, data from Table 5 indicate that all data protection methods hide important information about the attacks.Attack detection may be possible but challenging for most perturbation methods, as the impact is similar to those of sensitive behaviors.As the impact coefficients for method M 1 are all smaller than the corresponding impact coefficients calculated for sensitive behavior, we conclude that M 1 hides essential information about the attacks.Thus, M 1 cannot be regarded as a suitable privacy protection method when attack or anomaly detection is an objective.In the case of M 4 , several attacks (a 4 and a 5 ) may be detected according to the proposed criteria, and the impact of the other attacks (a 1 , a 2 , and a 3 ) is also significant.
We demonstrated that the proposed methodology could be used to measure the utility and privacy of various perturbation algorithms, and the following advantages have been identified:

•
Compared to the other mechanisms, the proposed approach measures both privacy and utility; • Various distortion and perturbation methods can be compared, no matter how different they are; • An evaluation of the impact of various data integrity attacks on perturbed data is possible.
However, a few observations on its limitations are necessary.Firstly, the accuracy of the evaluation depends on the set of collected normal behavior time series and the set of defined sensitive behaviors.The more accurately they cover the possible sensitive behaviors, the more accurate the comparison is.
Secondly, the experiments have shown that the proposed approach may only be able to identify a suitable perturbation method for some observed variables.For example, certain algorithms provide high privacy for some variables but lack utility or vice versa.
Additionally, when more observed variables are evaluated, it is possible to identify desired perturbation methods depending on the variable.This can be anticipated as the impact of sensitive behavior or intervention may not be the same on all variables.In this case, we propose an improvement to the perturbation system illustrated in Figure 1: Instead of using a perturbation method for all observed variables, add several perturbation methods and assign variables to the ones implementing the best privacy and utility (see Figure 9).The approach proposed for comparing various perturbation methods on time-series data can be expanded to evaluate the utility of the perturbed data in the case of data interventions (e.g., integrity attacks).The computed impact coefficients show how much information about the intervention is hidden or preserved after perturbation.However, more research is necessary to test various attacks and anomaly detection techniques and assess their performance on perturbed data.

Conclusions
This paper addressed a new approach for measuring the privacy and utility provided by time-series perturbation algorithms.The main novelty and contribution to the state of the art is the exposed procedure for comparing various perturbation methods.The framework involved collecting data corresponding to sensitive behavior and measuring the impact this behavior had on the perturbation system.As shown, the presented metrics were helpful for simultaneously measuring the privacy and utility of the perturbed data.The research contribution is meaningful for those protecting time-series data collected from various sensors, as the approach is sensor-type-independent, algorithm-independent, and dataindependent.
The experiments demonstrated that the approach had significant benefits.It could be applied to diverse perturbation algorithms and on various data, under the condition that sensitive behavior could be defined and corresponding data could be collected.Moreover, the research suggested evaluating the impact of integrity data attacks on perturbed data.Data was collected via the OBD-II port on a VW Passat auto vehicle for both regular/typical and sensitive behavior.The experiments showed that the approach was also promising in measuring the impact of sensitive behavior on the perturbed data regarding privacy and utility.Furthermore, by exemplifying the approach on two classical perturbation algorithms, we expect our method to be applied to other perturbation techniques.
In future work, we intend to test the proposed method on publicly available datasets and on more diverse perturbation algorithms.A key challenge will be the detection of sensitive user behavior on such data.As a result, further adjustments to the presented approach may be required.Lastly, additional evaluation of the impact of integrity attacks on perturbed data and, consequently, the impact on the accuracy of the anomaly and attack detection algorithms will be included in future research work.

Figure 2 .
Figure 2. Methodology for comparing two perturbation methods.

Figure 3 .Figure 4 .
Figure 3. Regular and sensitive user behavior normalized data (Vehicle speed): (a) regular behavior; (b) sensitive behavior s 1 (break pressed every 30 s); (c) sensitive behavior s 2 (stop and go every 60 s); (d) sensitive behavior s 3 (sharp acceleration every 60 s); (e) sensitive behavior s 4 (break and acceleration alternatively every 60 s); (f) sensitive behavior s 5 (stop and go and acceleration alternatively every 60 s).

Figure 5 .
Figure 5. Perturbation of time series (vehicle speed) normalized data using various perturbation methods: (a) regular/normal behavior; (b) sensitive behavior (s3); (c) sensitive behavior (s4); (d) regular behavior perturbed with M1; (e) sensitive behavior (s3) perturbed with M1; (f) sensitive behavior (s4) perturbed with M1; (g) regular behavior perturbed with M3; (h) sensitive behavior (s3) perturbed with M3; (i) sensitive behavior (s4) perturbed with M3; (j) regular behavior (s4) perturbed with M5; (k) sensitive behavior (s3) perturbed with M5; (l) sensitive behavior (s4) perturbed with M5.4.2.2.Evaluate the Utility of a Perturbation Module for Detecting Data Interventions/AttacksSection 3.4 describes the procedure to evaluate the utility provided by data resulting from the considered perturbation methods in case of data intervention.Consider the integrity attacks presented in Table2and apply the proposed approach for computing the impact of attacks on normal behavior data.Table5lists the mean relative impact coefficients (Equation (11)) for all integrity attacks and all tested perturbation methods.For all methods, min( Cs ) and max( Cs ) are extracted from Table4.

Figure 6 .
Figure 6.Sensitive behavior data (vehicle speed): (a) Minimum and maximum impact coefficients for all tested perturbation methods; (b) Maximum and mean MAE (information loss) for all tested perturbation methods; (c) The maximum probability of the real query result for all tested perturbation methods.

Figure 8 .
Figure 8.(a) Maximum MAE (information loss) for all tested perturbation methods; (b) Maximum probability of the real query result for all tested perturbation methods.

Figure 9 .
Figure 9. Multi-sensor equipment with several perturbation modules.

Table 1 .
Symbols and their description.Standard deviation σ r min Minimum standard deviation of regular user behavior data, ∀r ∈ R σ r max Maximum standard deviation of regular user behavior data, ∀r ∈ R σ r Standard deviation of a regular user behavior data, r ∈ R σ s Standard deviation of a sensitive user behavior data, s ∈ S σ aStandard deviation of an intervention data (integrity attacks), a ∈ A B = R ∪ S and R ∩ S = ∅ A Set of intervention data (integrity attacks), A ∩ B = ∅ σ s Sensitive user behavior data, s ∈ S X bRegular or sensitive user behavior data, b ∈ B X a Intervention (attack) data, a ∈ A 1> M 2 .For s ∈ S, S ⊂ B ⇒ s ∈ B. M 1 and M 2 satisfy α-behavior-privacy and α-behavior-

Table 2 .
Standard deviation for Vehicle speed collected data values and generated interventions/attacks.

Table 3 .
Distortion /perturbation methods and parameters.

Table 4 .
Mean relative impact coefficients Cb for tested perturbation methods applied on sensitive behavior data.

Table 5 .
Mean relative impact coefficients Ca for tested perturbation methods applied on intervention data (vehicle speed observed variable).