PAVS: A New Privacy-Preserving Data Aggregation Scheme for Vehicle Sensing Systems

Air pollution has become one of the most pressing environmental issues in recent years. According to a World Health Organization (WHO) report, air pollution has led to the deaths of millions of people worldwide. Accordingly, expensive and complex air-monitoring instruments have been exploited to measure air pollution. Comparatively, a vehicle sensing system (VSS), as it can be effectively used for many purposes and can bring huge financial benefits in reducing high maintenance and repair costs, has received considerable attention. However, the privacy issues of VSS including vehicles’ location privacy have not been well addressed. Therefore, in this paper, we propose a new privacy-preserving data aggregation scheme, called PAVS, for VSS. Specifically, PAVS combines privacy-preserving classification and privacy-preserving statistics on both the mean E(·) and variance Var(·), which makes VSS more promising, as, with minimal privacy leakage, more vehicles are willing to participate in sensing. Detailed analysis shows that the proposed PAVS can achieve the properties of privacy preservation, data accuracy and scalability. In addition, the performance evaluations via extensive simulations also demonstrate its efficiency.


Introduction
Air pollution has become a major environmental risk factor for ill health and death. Epidemiological studies have showed that long-term exposure to PM 2.5 can cause heart disease, stroke, and lung cancer, etc. [1]. In order to attain air pollution monitoring, a series of solutions have been proposed [2][3][4]. However, traditional monitoring equipment is usually stationary, complex, and expensive due to the high cost of construction and maintenance. In contrast, vehicle sensing systems (VSS) have attracted more attention, since vehicles can be equipped with various kinds of sensors that can achieve collection and concentration measurements of a range of pollutants [5]. Specifically, the sensing data are firstly collected by vehicle sensors [6], transferred to roadside units (RSUs) by vehicle wireless transmitters via vehicular ad hoc networks (VANET) [7,8], and then relayed to remote servers by RSUs.
In recent years, VSS has been regarded as a new tool to monitor gas concentration and has attracted more and more attention. Lee et al. [9] pointed out that VSS can be used to collect data when criminals spread poisonous chemicals in flight. Hu et al. [10] proposed exploiting VSS to achieve • We propose new privacy-preserving data classification and privacy-preserving aggregation algorithms, so that service providers can efficiently compute the mean E(·) and variance Var(·) from aggregation results. In addition, the proposed PAVS captures data accuracy, i.e., the E(·), and Var(·) computed from each aggregation data map to a specific area and time period. • The proposed PAVS holds privacy-preserving property. Specifically, it can resist sensing data link attack. After executing PAVS, RSUs cannot get any valuable information of vehicles including vehicles' previous location information and real identities. • The PAVS scheme achieves scalability. If a service provider holds the aggregation results of areas Area 1 , ..., Area t , respectively, it can further compute the statistical data of a larger area that consists of Area 1 , ..., Area t by performing aggregation operations, without re-executing the whole PAVS scheme. • To demonstrate the utility and validate the efficiency of the proposed PAVS, we theoretically analyze the performance of PAVS in terms of computational cost, communication cost and storage cost. Additionally, we develop a Java simulator to simulate the computational cost on the vehicle side, RSU side and service provider side. The experiment results show that the proposed PAVS is efficient at the three sides.
The rest of the paper is organized as follows. In Section 2, we formalize the system model, security model and identify the design goal. In Section 3, we introduce bilinear pairing, related complexity assumptions, and properties of group Z * p 2 as preliminaries. The proposed PAVS scheme is described in Section 4, followed by the security analysis in Section 5 and the performance evaluation in Section 6. The related work is given in Section 7, and we conclude this work in Section 8.

Models and Design Goal
In this section, we formulate the system model, the security model and identify the design goal.

System Model
In VSS, the sensing data are collected by vehicles, transmitted to RSUs, and then transferred to the service providers [6]. In our system model, the service provider further deals with the data and publishes the results of statistical analysis in public. Our model consists of four kinds of entities: trusted authority, service provider, RSUs, and vehicles (as shown in Figure 1).
• Trusted Authority (TA): TA's duty is to manage and distribute key materials to service providers, RSUs, and vehicles in the system. • Service Provider (SP): SP deals with each aggregation result received from an RSU and gets E(·) and Var(·) for each area. • RSUs: Each RSU serves as a message aggregator role in the system. An RSU aggregates the messages sent from vehicles and forwards the aggregation results to SP. Before executing aggregation operations, RSUs will first classify the messages according to where and when the sensing data are collected.

Security Model
In our security model, TA and SP are fully trusted. For RSUs, on one hand, RSUs will follow the designated protocol specification. On the other hand, RSUs are curious and may try to disclose vehicles' privacy information. Specifically, RSUs can get all the messages transferred in the protocol. After RSUs get all the messages, RSUs may try decrypt the ciphertext to get sensing data and launch sensing data link attacks by linking the messages sent by vehicles and the statistical results.
We will show that PAVS can resist sensing data link attack by introducing two levels of privacy: basic privacy and full privacy. Specifically, we will prove that PAVS holds full privacy to demonstrate that RSUs cannot link the messages sent by vehicles and the statistical results published by SP. Note that the collision of RSUs and SP is beyond the scope of this paper. Definition 1 (Basic Privacy). When a run of the protocol is completed, RSUs cannot obtain vehicles' real identity information by communicating with vehicles. Definition 2 (Full Privacy). When a run of the protocol is completed, RSUs cannot obtain vehicles' real identity information and any other information with vehicles.

Design Goal
Under the aforementioned system model and security model, our design goal is to propose an efficient privacy-preserving data aggregation scheme for VSS, so that SP can obtain more abundant information from each aggregation result without vehicles' privacy leakage. Particularly, the following four objectives should be captured: • Privacy preservation. The privacy information of vehicles including previous location information and the real identities of vehicles should be protected. • Accuracy. The mean E(·) and variance Var(·) computed by each aggregation result should map to a specific area and time period. Additionally, aggregation results should be generated by real RSUs, and all the sensing data should be collected by registered vehicles. • Scalability. If SP has held the aggregation results for some small areas, E(·) and Var(·) for a larger area which consists of theses small areas should be efficiently computed without re-executing the whole scheme. • Efficiency. The computation on the vehicle side, the RSU side and the SP side should be efficient.

Preliminaries
In this section, we will introduce bilinear pairing, related complexity assumptions, and properties of group Z * p 2 that will serve as the basis of our scheme.

Bilinear Pairing and Complexity Assumptions
Let G and G T be two multiplicative groups of order q for some large prime q, and g be a generator of G. A bilinear mapê : G × G → G T , which satisfies the following properties: • Bilinearity:ê(g a , g b ) =ê(g, g) ab for all a, b ∈ Z * q . • Non-degeneracy:ê(g, g) = 1.

Definition 3 (Bilinear Generator).
A bilinear parameter generator Gen is a probability algorithm that takes a security parameter κ as input and outputs a 5-tuple (q, g, G, G T ,ê), where q is a κ-bit prime number, (G, ×) and ((G T , ×) are two groups with the same order q, g ∈ G is a generator, andê : G × G → G T is an admissible bilinear map. Definition 4 (Decisional Bilinear Diffie-Hellman (DBDH) Assumption). Let (q, g, G, G T ,ê) be the output of the bilinear parameter generator. Given g, g a , g b , g c ∈ G and R ∈ G T , where a, b, c are random elements in Z * q , R is a random element in G T . We say an algorithm B that outputs l ∈ {0, 1} has advantage ε in solving the DBDH problem in G if Given the security parameter λ, we choose a safe prime p = 2p + 1, where |p| = λ and p is also a prime. Then, we can calculate the Euler's totient function φ(p 2 ) as φ(p 2 ) = p 2 (1 − 1/p) = p(p − 1) = 2pp . That is, the order of Z * p 2 is 2pp . Let x ∈ Z * p . According to Fermat's Little Theorem, we have x p−1 ≡ 1 mod p. Thus, for some integer k, the equality x p−1 = 1 + k · p holds. Furthermore, we obtain Let y = p + 1. When k = 1, we obtain Thus, we get the following properties of group Z * p 2 : 1 For any x ∈ Z * p , we have x p(p−1) = 1 mod p 2 ; and 2 for any y = p + 1, the equality y p = 1 mod p 2 holds.

Proposed PAVS Scheme
In this section, we present our PAVS scheme, which mainly consists of the following parts: System Initialization, Data Collection at the vehicle, Data Aggregation at RSU, and Statistical Analysis at SP.

Overview
In the System Initialization phase, TA will mainly execute the Parameter Generation algorithm to generate public parameters and the Key Generation algorithm to generate key materials to vehicles, RSUs and SP.
In the Data Collection phase, the vehicles will encrypt the sensing data by performing a Data Encryption algorithm and sign the ciphertexts by running a Message Signing algorithm. After that, the vehicles will send the messages to RSUs.
In the Data Aggregation phase, RSUs will classify the messages according to where and when the sensing data are collected, and aggregate the data that are collected in the same area and the same time period. Then, RSUs send the aggregation results to SP.
In the Statistical Analysis phase, SP will decrypt the aggregation results and get the mean E(·) and variance Var(·) for each area.
In the vehicle sensing system, the sensing data are firstly collected by vehicle sensors, transferred to RSUs by vehicle wireless transmitters via VANET, and then relayed to remote servers by RSUs. As the reviewer mentioned, the data may not be able to arrive at the data aggregation at the same time; therefore, time stamps are included in PAVS. Thus, RSUs classify the messages according to the time stamps and the area where the data are sensed. That is, only the data with the same time stamp and collected in the same area will be aggregated together. Finally, SP computes the statistic data, i.e., the E(), and Var() from each aggregation data map to a specific area and time stamp.

System Initialization
This phase is mainly comprised of the Parameter Generation algorithm, the Key Generation algorithm, and the List Generation algorithm.
Parameter Generation (PG): On input security parameter λ, TA publishes system parameters p is a generator of Z * p 2 ; and (q, g, G, G T ,ê) is the output of the bilinear parameter generator.
Key Generation (KG): On input system parameters, TA generates its secret key s 0 , its master private key s, area key k 0 , and public parameter P pub , where s 0 , s, k 0 ∈ Z * q , and P pub = g s . Then, the following steps are executed: 1: TA computes private key S L j for each RSU R j , j ∈ {1, ..., α}, where S L j = H 1 (L j ||R j ) s , R j is the label of an RSU, and L j is the location of R j .

2: TA generates pseudo-identity
AES is the symmetric encryption algorithm, and s 0 is used to generate the symmetric encryption key. 3: After TA authenticates V i 's real identity, TA generates V i 's private key s i ∈ Z * q , computes V i 's public key g s i and authority key g s i r i . 4: TA transfers s i , k 0 and PID i to V i , S L j to R j , and sends k 0 , {PID 1 , ..., PID β } and {g s 1 r 1 , ..., g s β r β } to SP.
List Generation (LG): TA generates the vehicles' public key list (as shown in Table 1, the area list (as shown in Table 2), the RSU private key list (as shown in Table 3), the random value lists (as shown in Table 4 R-value list-1 and Table 5 R-value list-2), and the vehicle authority key list (as shown in Table 6 A-key list). The vehicles' Public key list, Area list, and R-value list-2 are public, the A-key list is maintained by TA and SP secretly, and the RSU private key list and R-value list-1 are kept by TA secretly. Table 1. Public key list.  Table 3. Private key list. Table 4. R-value list-1. Table 5. R-value list-2. Table 6. A-key list.

PID
A-key The communications between TA and each vehicle, between TA and SP, between TA and each RSU are all via private and authenticated channels. TA's secret key s 0 is used to generate vehicles' PIDs. TA uses its master private key s to generate RSUs' private keys. The area key k 0 is also known by vehicles and SP, and SP utilizes k 0 to recover the area. By using R-value list-1, TA can recover the real identity v i according to PID i . R-value list-2 is used by vehicles to encrypt sensing data. A-key list is utilized by SP to compute E(·) and Var(·) .

Data Collection at Vehicle
After the vehicle V i collects sensing data, V i executes the following Data Encryption algorithm, Message Generation algorithm and Message Signing algorithms.
Data Encryption (DE): Assume that V i collects m i1 , ..., m iθ i in Area 1 , ..., Area θ i , respectively, during the same time period. Let where t is the number of areas and β is the number of the registered vehicles in the system. In order that SP can compute E(·) and Var(·) of the sensing data of Area r , V i encrypts m ir as follows: where T i is the time stamp.

Remark 2.
Note that the data may not be able to arrive at the data aggregation at the same time; therefore, time stamps are included in PAVS. Thus, RSUs classify the messages according to the time stamps and the area where the data are sensed. That is, only the data with the same time stamp and collected in the same area will be aggregated together. The unit of time stamp is set by TA. In real life, the unit of time stamp can be an hour or half an hour. For simplicity, the time stamp is denoted as T i . That is, the subscript of T is denoted as i. In fact, the subscript can be set as any variable, since the time stamp is not related to the identities of the vehicles.
Message Generation (MG): The messages sent from vehicle V i should include the PID of V i , so that RSUs can recover the public key from the public key list to verify the signature generated by In addition, a ir is randomly chosen in Z * q , r ∈ [1, θ i ], and Area r is the area where m ir is collected. Here, M ir includes two messages H 1 (Area r ||T i ||k 0 ) a ir and g a ir , which are used by RSUs to classify the ciphertexts. [32]. After that,

Data Aggregation at RSUs
After RSU R j receives the messages (M 1 , , g a lθ l , C lθ l >, R j will firstly recover T i and T l to check if T i = T l holds. If T i = T l is satisfied, R j will verify if the following equality holdsê If so, we have Area θ i = Area θ l . That is, m iθ i and m lθ l are collected in the same area. Data Accuracy. We can see that the scheme achieves data accuracy, since each aggregation result maps to a specific area and time period. Specifically, (1) in Data Classification phase, RSUs classify the data according to the areas where and when the data was collected, which means only data collected in the same area and during the same time period will be aggregated together; (2) the aggregation results are signed by RSUs then sent to SP. That is, the aggregation results are generated by real RSUs but not impersonated RSUs; and (3) all the sensing data are generated by registered vehicles. R j will verify if M i is valid by checking whether PID i is included in the Public key list (as shown in Table 1) and verifying σ i by using the public key corresponding to PID i . If PID i is not on the list or the signature is not valid, M i will not be used any more.
1: RSU R j aggregates (C 1r , C 2r , ..., C kr ) by computing C r = k ∏ i=1 C ir , and then we obtain 2: Let B r = (L j , PID 1 , PID 2 , ..., PID k , T l , M 1r , C r ). R j generates an ID-based signature σ r = (u r , v r ) [33] on B r by using its private key S L j = H 1 (L j ||R j ) s , where u r = H 1 (L j ||R j ) a , a is randomly chosen from Z * q , and v r = S L j a+H(B r ,u r ) . Afterwards, R j sends (B r , u r , v r ) to SP.
Remark 3. M 1r is included in B r where the format of M 1r is (H 1 (Area r ||T l ||k 0 ) a 1r , g a 1r , C 1r , T l , PID 1 ). Thus, SP can recover Area r from M 1r in the following Statistical Analysis phase by using H 1 (Area r ||T l ||k 0 ) a 1r and g a 1r . Since only the sensing data collected in the same area will be aggregated, SP can conclude that all sensing data are collected inArea r .

Statistical Analysis at SP
After SP receives the messages (B r , u r , v r ), SP will verify if B r is valid by performing the Data Verification algorithm. If B r is valid, SP will execute the Area Recovery algorithm to recover Area r , and run the Data Decryption algorithm to decrypt C r and compute E(·) and Var(·) in Area r .
Data Verification (DV): After SP receives (B r , u r , v r ), SP verifies if the following equality holdŝ e(g, v r ) ? =ê(P pub , u r × H 1 (L j ||R j ) H(B r ,u r ) ).
If so, (u r , v r ) is a valid signature of B r . SP concludes that B r is generated by R j .
If, for some Area i , the equality holds, then SP concludes that all the data aggregated in B r are collected in Area i .

Security Analysis
Following aforementioned security requirements, our analysis will focus on how the proposed PAVS scheme can achieve the vehicles' privacy-preserving property.
Assume vehicle V i collects sensing data in different areas, submits ciphertexts to RSU R j , and R j classifies and aggregates the messages and sends them to SP. We will show that the proposed PAVS scheme can resist sensing data link attack by showing that it achieves full privacy, which means that R j will not get any valuable information from vehicles. In order to prove the proposed scheme achieves full privacy property, we explore the game sequence [34,35] to show that R j cannot distinguish the messages M i generated by vehicle V i from random strings, where The game sequence is explored to prove that the scheme is secure. This is because game sequence is a useful tool in taming the complexity of security proofs that might otherwise become complicated as to be nearly impossible to verify [35]. In our security proof, the attack games are played between an RSU R j and a challenger. Both R j and the challenger are probabilistic processes. In the proof, Game 0 and Game 1 are constructed, where Game 0 is the original attack game. If R j cannot distinguish Game 0 and Game 1, we can conclude that it cannot distinguish the messages generated by a vehicle from random strings. The challenger generates private keys for n vehicles so that it can act as real vehicles. Game 1. If R j submits (T l , PID l ) to the challenger, the challenger will choose (m l1 , ..., m lθ l ) randomly as sensing data, answer R j 's query by normally executing the scheme, and return the messages generated by V l to R j .
At some point, R j submits (T i , PID i ) (where T i is not queried before. If T i has been queried, R j may verify if M * 0 and M * 1 are generated by real vehicles though executing Data Classification algorithm; however, R j still cannot get any valuable information). The challenger generates two messages M * 0 and M * 1 to R j , where Here, ω 1 , ω 1 , ω 2 , ω 2 , ..., ω θ i , ω θ i are all random strings, and C i1 , ..., C iθ i are generated by performing the scheme normally. The challenger selects a random b ∈ {0, 1} uniformly, and then sends M * b to R j . R j will return 0 if it thinks that the whole message is generated by a real vehicle V i . Otherwise, R j returns 1. We say R j can win Game0 with advantage Adv Game 2. When R j submits (T l , PID l ) to the challenger, the challenger will choose (m l1 , ..., m lθ l ) randomly as sensing data, answer R j 's query by normally executing the scheme, and return the messages generated by V l to R j .
At some time point, assume R j queries on (T i , PID i ) (where T i or PID i is not queried before). The challenger chooses two messages M * 0 and M * 1 to R j , where Here, α 1 , ..., α θ i , α 1 , ..., α θ i , α 1 , ..., α θ i , β 1 , ..., β θ i , β 1 , ..., β θ i are all random strings. The challenger selects a random b ∈ {0, 1} uniformly and then sends M * b to R j . R j will return 0 if it thinks that the messages include some information generated by real vehicle V i . Otherwise, R j will return 1. We say R j can win Game1 with advantage Adv If the advantage with which R j wins Game0 and Game1 is both negligible, we can conclude that R j cannot get any valuable information.
We conclude that R j cannot distinguish the message generated by registered vehicles with random strings. Firstly, the advantage Adv G0 (R j ) with which R j wins Game 0 is negligible. That is, R j cannot distinguish (H 1 (Area k ||T i ||k 0 ) a ik , g a ik ) from (ω k , ω k ). Let h = g a ik . Then, H 1 (Area k ||T i ||k 0 ) can be denoted as h b k for some unknown b k . Similarly, ω k can be denoted as (ω k c k ) for some unknown c k .
Since h, ω k , b k , c k are all random elements, Secondly, the advantage Adv G1 (R j ) with which R j wins Game 1 is negligible. Assume that the challenge is to break a DBDH problem instance, i.e., to distinguish c 0 and c 1 given g x , g y , g z , where c 0 =ê(g, g) xyz , x, y, z ∈ Z * q and c 1 is a random element in G T . The challenger sets V i 's public key g s i as g y , and g r i as g z . In Game1, H 1 is treated as a random oracle [36]. The output of H 1 (T i , PID i ) is set as g x . Specifically, the challenger will generate < C i1 , ..., C iθ i > as follows: R j returns a bit b and guesses that M b is generated by vehicle V i . If R j can distinguish a valid ciphertext from a random string with a non-negligible advantage ε, then the challenger can break the DBDH assumption with non-negligible advantage.
Therefore, we can conclude that R j cannot get any valuable information from the messages generated by registered vehicles. Thus, the proposed PAVS scheme captures full privacy, and can resist a sensing data link attack.

Performance Evaluation
In this section, we evaluate the performance of the proposed PAVS scheme in terms of computational cost, communication cost, and storage cost. In order to ease the presentation, we give the corresponding notations in Table 7. Bit size of an element in Z * q S p 2 Bit size of an element in Z * p 2

Theoretical Analysis
According to the proposed PAVS scheme, the computational cost, communication cost and storage cost at vehicle side, RSU side, and SP side will be analyzed in this section.

Computational Cost
In the proposed PAVS scheme, a vehicle needs to encrypt each piece of sensing data. Additionally, the vehicle will generate a signature. Since the vehicle can choose a generic signature to sign the messages, the performance of the signature is not analyzed here. For vehicle V i to encrypt each piece of sensing data, it needs to perform a pair operation, five exponentiations, and three multiplication operations.
If RSU R j receives n dr encrypted sensing data collected from n a different areas, it will classify the data. Assume m c and m d are collected in the same area. For any message m e , in order to verify if m e is collected in the same area with m c and m d , if Area c = Area e has been verified, it is not necessary to verify if Area d = Area e holds. Therefore, R j will execute at most O(n dr n a ) pair operations to achieve data classification. For k ciphertexts, R j aggregates them by executing (k − 1) multiplication operations.
Assume SP receives n ds aggregation results and all the sensing data are collected in n a areas. SP will execute at most O(n ds n a ) pair operations to recover the areas. SP executes k multiplication operations and two exponentiation operations to computeD. In addition, SP needs to use Pollard's method to recover ∑ k i=1 m ir and performs one multiplication operation to calculateD.

Communication Cost
In the system initialization phase, TA will send long-term secrets to vehicles, RSUs and SP. After vehicles encrypt the sensing data, they will transfer the ciphertexts and a signature to RSUs. After that, RSUs will send messages to SP. The corresponding communication cost is listed in Table 8.

Entity
Communication Cost TA (n r + n v )S g + 2n v S id + (2n v + 1)S q Vehicle S r + n dr S id + S t + 2S η + 2S q RSU n dr (2S g + S η ) + S t + S id + S g

Storage Cost
The storage cost is related to the phase of system initialization. The storage overhead in TA is n r S g + (3 + 2n v )S q + 2n v S g + tS a +n v S id , where n r S g + (3 + 2n v )S q is the cost to store long-term secrets for vehicles, RSUs, SP and TA itself, and 2n v S g + tS a +n v S id is the cost to store the public lists. The storage overhead at vehicle V i is 2S q +S id , at RSU is S g , and at SP is n v (S id +S g ) +S q as shown in Table 9. Table 9. Storage cost.

Entity
Storage Cost TA n r S g + (3 + 2n v )S q + 2n v S g + n a S a + n v S id Vehicle 2S q + S id RSU S rsu + S g SP n v (S id + S g ) + S q 6.2. Experimental Simulation

Implementation and Experimental Settings
The performance of PAVS is independent from the security parameters and the number of hash functions. Accordingly, Table 10 shows the parameter settings. The experiment is run on a test machine with Intel(R) Core(TM) I5-4200u 1.6 GHz four-core processor, 8 GB RAM, and a Windows 8 platform based on a Java Pairing-Based library [37]. For a vehicle V i , it needs to encrypt the sensing data, generate M i and sign M i . Thus, in the experiments, the computational costs of V i are simulated by the total runtime including encryption, signature generation, and message generation algorithms on the vehicle side. On the vehicle side, the amount of sensing data n dv varies from 10 to 100. The change tendency of the computational cost on the vehicle side is shown in Figure 2. We can see that the computational cost is 1.235 s if n dv is 10, and 11.141 s when n dv equals 100. Therefore, the algorithms for vehicles are efficient enough.

RSU's Computational Cost
On the RSU side, the RSUs need to verify if the received messages are valid, classify the messages and aggregate the messages. Therefore, the computational cost of an RSU is measured by the total runtime including Message Verification, Data Classification, and Data Aggregation algorithms. According to the proposed PAVS, the performance of data classification is not only related to the number of messages received by RSUs, but is also related to the number of areas which the vehicles pass by. That is, with different numbers of vehicles and areas, the computational cost of RSU will be different. Thus, we set the number of vehicles n v as {5, 10,..., 50} and the number of areas the vehicles pass by n a as {1, 2,..., 10}. As shown in Figure 3, although the increase of n v and n a leads to the increase in computational costs of RSU, the maximum running time is less than 48 s. Therefore, PAVS is efficient when computing on the RSU side, since the computation is not necessary to be in real time.

SP's Computational Cost
On the SP side, SP will verify if B r is valid, recover Area r , and decrypt C r . Therefore, the computational cost on the SP side is measured by the total runtime including the Data Verification algorithm, Area Recovery algorithm, and Data Decryption algorithm.
On the SP side, the number of vehicles n v and the number of areas n a which the vehicles pass by are still two core parameters. Accordingly, n v is chosen from 5 to 50 and n a is chosen from 1 to 10 to measure the computational overhead of different situations. The results are shown in Figure 4. Despite the fact that n v and n a increase, the running time of SP to get E(·) and Var(·) is less than 36 s, which is also acceptable.

Scalability
Assume SP has received the aggregation results C 1 , C 2 , ..., C n of Area 1 , Area 2 , ..., Area n , respectively. If SP wants to get the statistical data E(·) and Var(·) of a larger area which includes some areas of Area 1 , Area 2 , ..., Area n , SP can still compute the new E(·) and Var(·) without re-executing the whole scheme.
For instance, SP can get E(·) and Var(·) of a larger area which consists of Area 1 , Area 2 , Area 3 , and Area 4 (as shown in Figure 5), as long as SP further aggregates C 1 , C 2 , C 3 , C 4 and then executes Step 2, Step 3 and Step 4 of the Data Decryption algorithm.

Related Works
In this section, we will mainly explore some of the existing work about VSS, since we propose a privacy-preserving data aggregation scheme for VSS.
In Ref. [10], Hu et al. constructed a VSS to monitor the concentration of carbon dioxide (CO 2 ) gas. The VSS can collect CO 2 concentration in a large field. Then, the collected data are reported to a remote server. The authors monitored the CO 2 concentration in Hsin-Chu city, Taiwan, and the data are displayed on a Google map. However, the authors did not consider security issues in their scheme.
In Ref. [38], the authors proposed deploying mobile agents to collect sensor data from some specific road segments. The mobile agent moves among vehicles and communicates with the neighbour vehicles via wireless broadcast which may not reach all the vehicles in the given segment. In order to solve the problem, they proposed an agent-based data collection scheme that can help achieve close to 100% data collection rate. Similarly, in order to enhance sensing coverage, Masutani [39] proposed a route control method. The simulation experiment shows that the sensing coverage can be enhanced significantly without increasing the number of sensing vehicles.
Different to Ref. [38,39], Zhang et al. [40] proposed the maximum coverage quality with a budget constraint problem. They proposed a new algorithm by selecting some of mobile users to maximize the coverage quality. The results of the simulation experiments showed that their algorithm achieved better performance compared with the random selection scheme.
Freschi et al. proposed a data aggregation method [41] to monitor the roughness of road surfaces. In addition, a series of data aggregation schemes [17][18][19][20] have been proposed. However, security issues are not considered in these studies. In Ref. [42], the proposed scheme achieves authentication and integrity of aggregation data by aggregating the data and the message authentication codes. In order to tolerate duplicate messages, they also presented a probabilistic data aggregation scheme. However, privacy-preservation is not considered in [42].
Wu et al. proposed a hybrid routing scheme in urban hybrid networks [43]. They firstly presented a location-based crowd sensing framework. Then, they constructed a routing switch mechanism by utilizing ad hoc solutions and RSU resources to guarantee quality of data dissemination. In Ref. [44], the authors proposed a broadcast protocol that can support dense and sparse traffic regimes.
Lee et al. [9] proposed MobEyes to support urban monitoring. For MobEyes, vehicle-local processing capabilities are utilized to extract features, and mobile agents move and collect summaries from mobile nodes. If the agents identify interest data, they will contact the involved vehicles. In Ref. [45], Lee et al. further described MobEyes. They introduced the analytic model for MobEyes performance, the effects of concurrent execution of multiple harvesting agents, the valuation network overhead, and so on. Similarly, the privacy issues are not referred to in their work.

Conclusions
In this paper, we have proposed PAVS-an efficient privacy-preserving data aggregation scheme for VSS. Compared with existing schemes, the proposed PAVS scheme has been identified to compute the statistical data from aggregated encryption data. To realize PAVS, we have designed concrete privacy-preserving data classification and privacy-preserving aggregation algorithms. Detailed analysis shows it can resist a sensing data link attack and hold data accuracy and scalability. PAVS's efficiency has been evaluated with theoretical analysis and experiments. Through extensive performance evaluations, we have demonstrated that the proposed PAVS's scheme is efficient on the SP/RSU/vehicle sides.