Achieve Location Privacy-Preserving Range Query in Vehicular Sensing

Modern vehicles are equipped with a plethora of on-board sensors and large on-board storage, which enables them to gather and store various local-relevant data. However, the wide application of vehicular sensing has its own challenges, among which location-privacy preservation and data query accuracy are two critical problems. In this paper, we propose a novel range query scheme, which helps the data requester to accurately retrieve the sensed data from the distributive on-board storage in vehicular ad hoc networks (VANETs) with location privacy preservation. The proposed scheme exploits structured scalars to denote the locations of data requesters and vehicles, and achieves the privacy-preserving location matching with the homomorphic Paillier cryptosystem technique. Detailed security analysis shows that the proposed range query scheme can successfully preserve the location privacy of the involved data requesters and vehicles, and protect the confidentiality of the sensed data. In addition, performance evaluations are conducted to show the efficiency of the proposed scheme, in terms of computation delay and communication overhead. Specifically, the computation delay and communication overhead are not dependent on the length of the scalar, and they are only proportional to the number of vehicles.


Introduction
Nowadays, the fast development of automotive industry and the wide deployment of on-board sensors have created a huge opportunity of vehicular sensing [1]. Other than protecting the normal operations of vehicles, the data generated by on-board sensors (e.g., chemical spill detectors, vibration sensors, and acoustic detectors [2]) can also provide unprecedented spatial-temporal coverage and witness unpredictable incidents with virtually zero investment in the deployment and maintenance of fixed surveillance infrastructures [3]. Meanwhile, since modern vehicles are normally not constrained by energy supply and equipped with adequate on-board storage (up to terabytes), they can keep the continuously harvested data in their on-board storage, which avoids the network congestion caused by the sensed data uploading. Motivated by the practical profits of vehicular sensing, various vehicular sensing based applications have appeared. For example, MobEyes [2] proposes a proactive urban monitoring application with the data collected from cameras and chemical detection sensors, a road surface condition detection application [4] is devised by opportunistically gathering data from vibration and GPS sensors. Meanwhile, to attract sufficient participants to join in the sensing process, an incentive mechanism for mobile crowdsensing has been devised in [5].
However, the wide application of vehicular sensing has met several challenges [6][7][8]. The first challenge is related to the location privacy of data requesters and data uploading vehicles. To retrieve the wanted data from the distributive on-board storage, each data query should specify the corresponding time and location. However, if the location information of a data query is disclosed, it may bring social reputation or economic damage to the querying location. Meanwhile, since vehicles are dynamically moving, their data reports should also be spatial-temporal tagged. The location information of a vehicle can be correlated with certain personal affairs (such as churches and hospitals), or it may reveal the identities of the vehicle owners (such as residences and offices). Without location privacy preservation, data requesters and vehicles are reluctant to issue data queries and upload sensed data, which leads to the under utilization of vehicular sensing. Thus, the location privacy of data requesters and vehicles should be preserved.
The second challenge is related to the accuracy of data query results. Due to the sheer volume of on-board data generation and limited transmission bandwidth, it is impossible for vehicles to upload all the sensed data immediately, and some of the sensed data are maintained in their on-board storage (except those real-time data for applications with stringent delay requirements) [9]. Since vehicles are dynamically and opportunistically moving, the sensory data maintained in their on-board storage captured during a past time period may or may not be generated within the target query area. Moreover, it is highly possible that the queried sensory data are partially generated within the target query area. Thus, it is difficult for the data requester to accurately identify and acquire the wanted data from the on-board storage of the massive and dynamically moving vehicles. However, most of the current secure range query schemes are devised for the outsourced central cloud storage [10,11], which cannot be directly applied to the distributive vehicular storage. Thus, a secure and accurate range query scheme is needed for the distributive on-board storage scenario to fully exploit the potential of vehicular sensing.
In this paper, to overcome the above challenges, we propose a novel privacy-preserving range query scheme from the distributive on-board storage in vehicular ad hoc networks (VANETs) in a practical scenario: acquiring the data harvested by the on-board air pollution sensors within the defined query area, i.e., an industrial area, to monitor the air quality. The proposed scheme exploits the Paillier cryptosystem [12] to preserve the location privacy of data requesters and vehicles, and protect the confidentiality of the sensed data. Specifically, the contributions of this paper are threefold.
First, the proposed scheme structures each multi-dimension scalar in one dimension, where the multi-dimension scalars denote the positions of the data requesters and vehicles in the format of grid cells; meanwhile, the proposed scheme supports secure scalar product computation for location matching. Thus, the location-based data query and data reports can be transmitted to the data server with high efficiency; meanwhile, the data harvested within the target query area can still be identified with location privacy preservation.
Second, since the vehicles are dynamically moving, the average of the sensory data captured during a short-time period may contain both the sensory data generated within and outside the query area. The proposed scheme enables the identification of the number of data reports generated within the target query area, and guarantees the accuracy of the sensory data captured within the target area.
Third, we give detailed security analysis to show that the proposed scheme is secure under the defined security model and achieves the security requirements in terms of location privacy preservation and confidentiality. Meanwhile, we conduct comparative performance evaluation to show that the proposed scheme is more efficient than the existing homomorphic secure scalar product schemes in terms of computational complexity and communication overhead.
The remainder of this paper is organized as follows. We describe the system model, the security requirements, and the design goals in Section 2. We recall the preliminaries and propose our privacy-preserving range query scheme in Section 3, followed by our security analysis and performance evaluations in Sections 4 and 5, respectively. We show related work in Section 6, and finally conclude our work in Section 7.

System Model
In the system model, we present the proposed range query scheme in a practical vehicular sensing application: help the data requester to acquire the data generated by vehicles (data harvested by the on-board air pollution sensors [13]) located within the target query area (an industrial district) during a past short-time period, which can help to investigate the air quality of the given industrial district. The system model consists of five entities, as shown in Figure 1.

•
The data requester aims to gather the air quality data in a given industrial district from the on-board air pollution sensors. The data requester submits a location-based data query ((1) Data Query) towards the data server and waits to hear the reply, as shown in Figure 1.

•
When the data server receives a data query request from a data requester, it forwards the received data query request ((2) Data Query) to all the vehicles through road side units (RSUs). Meanwhile, the data server receives all the on-board sensed data reports from RSUs, performs data filtering according to the data query, and delivers the data query response towards the data requester ((6) Data Query Result), as shown in Figure 1.

•
Each RSU serves as a gateway between the data server and data uploading vehicles, it helps to broadcast received data queries towards all the vehicles under its coverage ((3) Data Query) and forward the received on-board sensed data reports towards the data server ((5) Data Report), as shown in Figure 1.

•
Each data uploading vehicle is equipped with the required on-board sensor and enough on-board storage. When a piece of data is harvested by the on-board sensor of a vehicle, it is maintained in the on-board storage with time and location tagged. When a vehicle receives a data query, it first checks whether the queried data is still maintained. If the vehicle owns the queried data, it uploads the location-based sensed data to the nearest RSU((4) Data Report), as shown in Figure 1.

•
The trusted authority is a trusted and powerful entity, which is mainly responsible for the system bootstrap, key materials management, and the registration of new data requesters and vehicles, as shown in Figure 1.
The wireless connection between a vehicle and an RSU is realized through the IEEE 802.11p standard, a short-to medium-range communication technology operating at 5.85-5.925 GHz band with 3-27 Mbps data rates [14], which is mainly designed for the intelligent transportation systems radio service. The connections between RSUs and the data server, and those between data requesters and the data server, are realized through either wired links or any other links with high bandwidth and low transmission delay.

Security Requirements
In the security model, we consider the trusted authority is fully trusted, while the data server and RSUs are assumed to be honest-but-curious, that is, they follow the protocols, but they may try to infer the generation location and the content of data queries and data reports, which may violate the location privacy and confidentiality. Therefore, to preserve the location privacy of data requesters and vehicles, and protect the confidentiality of each individual sensed data report, the following security requirements should be satisfied: Location Privacy. Protect the location privacy indicates that the location information contained in each data query and data report should be protected [6], and the location information in this context indicates the grid cell scalars, which the data query and data generation locations are mapped to. Even if the data server obtains all the possible data queries and data reports, it cannot identify the grid cell scalar contained in any location-based data query or data report. Location privacy protection also includes that the data uploading vehicles cannot learn the grid cell scalar of the data query, or vice versa. In this way, the location privacy of data requesters and data uploading vehicles can be preserved.
Confidentiality. Protect the confidentiality of individual sensed data report means that, even if the data server obtains all the possible on-board sensed data reports, it cannot recover the content of an individual sensed data report [15]. Thus, the individual on-board sensed data can achieve the security requirement of confidentiality .
Note that there may exist other types of attacks such as impersonation, eavesdropping, and violation of data integrity [16]. Since we mainly focus on the privacy-preserving range query from the distributive on-board storage in VANETs, these security threats are beyond our study scope. We also assume that there is no collusion attack in the system, which is in accordance with previous research on secure data query [17].

Design Goals
Under the aforementioned system model and security requirements, our design goal is to develop a privacy-preserving range query scheme from the distributive on-board storage in VANETs. Specifically, the following three objectives should be achieved.
According to the above statement, if the proposed range query scheme does not take security into consideration, the location privacy of data requesters and data uploading vehicles could be threatened. Then, data requesters and vehicles may not be willing to participate in the range query process, and the system cannot properly operate. Therefore, the proposed scheme should achieve the security goal of location privacy.
To accurately reflect the air quality, it is important for data requesters to accurately obtain the data harvested by vehicles in the given query area. Since vehicles are dynamically moving and their positions are opportunistically changing, each received sensed data report may or may not be able to reflect air quality. Thus, to achieve the goal of the high accuracy in air quality monitoring, the data server should accurately filter all the received data reports.
Since vehicles are featured with the fast-moving characteristic, the connections between RSUs and vehicles are relatively short and intermittent, and the communication overhead introduced by the data query and sensed data reports should also be minimized. To extract the desired data from the massive amount of sensory data reports efficiently, the computational complexities brought to data server should also be deliberately evaluated.

Proposed Scheme
In this section, we propose the privacy-preserving range query scheme from the distributive on-board storage in VANETs, which mainly consists of five parts: preliminaries, system initialization, data query generation, data report generation, and data filtering.

Preliminaries
In our proposed scheme, the Paillier cryptosystem is exploited due to its additive homomorphic property and the homomorphic multiplication property of one plaintext and one ciphertext [12], which has been widely employed in many privacy-preserving data processing applications [15]. Specifically, the Paillier cryptosystem consists of three components: key generation, encryption, and decryption.
• Encryption: Given a message m ∈ Z n , choose a random number r ∈ Z * n , and the ciphertext can be calculated as c = E(m) = g m · r n mod n 2 .
• Decryption: Given the ciphertext c ∈ Z * n 2 , the corresponding message can be recovered as m = D(c) = L(c λ mod n 2 ) · µ mod n.

System Initialization
For the range query system under consideration, we assume a trusted authority, located at the management authority of vehicles and traffics, will bootstrap the whole system.
Given a security parameter κ, the trusted authority selects two large prime numbers p 1 and q 1 , where |p 1 | = |q 1 | = κ. The trusted authority also calculates the Paillier cryptosystem's public key (n = p 1 q 1 , g), and the corresponding private key (λ, µ). Moreover, the trusted authority chooses a secure cryptographic hash functions H(), where H : {0, 1} * → Z * n . In addition, the trusted authority also chooses a random number α. The trusted authority publishes the system parameter as params = {n, g, H()}.
The trusted authority assigns the private key (λ, µ) towards the data server, but the trusted authority does not share α with the data server. During the registration of each data uploading vehicle (w vehicles in total) or each data requester, the trusted authority checks its eligibility (whether the vehicle has the on-board air pollution sensors installed) and securely returns α towards it, as shown in Figure 2.

On-Board Data Query Generation
In this subsection, we first describe the query area construction. Then, we identify the data query generation process.

Query Area Construction
A data requester, the environmental protection agency in this context, aims to monitor the air quality of an industrial district through collecting the data generated by the air pollution sensor (Data_Type), during a past time period (from the starting time t s to the ending time t e ). We exploit the grid cell system defined in [18] to construct a grid cell architecture, which reflects the locations of the data requester and vehicles: the data requester also specifies a large region contains the target industrial district, which is large enough for the data requester and vehicles to be comfortable with revealing the fact that they are somewhere within this query area, and the bottom-left vertex of the large region is with the bottom-left vertex (x 0 , y 0 ). The large region is constructed into a grid cell structure, which is divided into k = a × b equal-sized grid cells with the side length of L, i.e., L = 0.8 km, when we take an industrial area with the coverage of 2.4 km 2 as an example [19]. Given location L i with coordinates (x i , y i ), the corresponding grid cell identifier can be computed as As shown in Figure 3, the data query area and the moving trajectory of vehicles are denoted with grid cells.

Data Query Generation
The data requester generates a query area that includes a group of grid cells I d , which is overlapped with the target industrial district. Meanwhile, the data requester generates a grid cell scalar u = {u 1 , ..., u k } to denote the defined grid cell architecture shown in Figure 3, which is initially set to zero. If i ∈ I d , i.e., grid cell i belongs to the target industrial district, we set the corresponding position in the scalar u to u i = 1. Moreover, the data requester selects a number β, which satisfies the condition that 2 · k < 2 β and 2 (2k+1)β < n. In addition, we structure the multi-dimension grid cell scalar to one composite value by calculating m u = ∑ k i=1 u i · 2 i·β , and generate the ciphertext of m u , which is where TS is the current timestamp, and r u,1 , r u,2 ∈ Z * n are two random numbers. Finally, the data requester formulates a data request and delivers it to the data server: where t is the on-board sensed data sampling interval. After receiving Request, the data server further delivers the data request towards the data uploading vehicles through RSUs.

Data Report Generation
For each registered data uploading vehicle, the previously sensed data (on the scale of a few past days) and the related information (such as number, time, data type, sensed data, location coordinates, etc.) are maintained in its on-board storage in the format of n, t, Data_Type, data, (x, y) . Upon receiving Request 1 , it checks its on-board storage to identify the data in consistent with the data query. If the queried data exists, the vehicle performs the following steps.
The vehicle identifies the on-board air pollution sensor data (data 1 , data 2 , ..., data d ) harvested at the past time points (t s , t s + t, ..., t e ), where d = t e −t s t + 1, d < k, t is the sampling interval. Then the vehicle calculates the sum of the sensed data reports, which is denoted as sum v = ∑ d i=1 data i and it satisfies the condition that ∑ w v=1 sum v < n, where w denotes the total number of data uploading vehicles. In addition, the vehicle generates the ciphertext of the on-board sensed data sum v : where r v,0 ∈ Z * n is a random number. Then, the vehicle maps the location coordinates of the on-board sensed data into grid cell architecture shown in Figure 3, and generates a vector v = {v 1 , ..., v k } with Algorithm 1. Calculates v j = v j + 1; 5: end if 6: end for Output: v Then, the vehicle structures the multi-dimensional scalar v into one dimension : m v = ∑ k j=1 v j · 2 (k+1−j)·β , and generates the ciphertext

Algorithm 1 Vehicle Scalar Generation
where r v,1 , r v,2 ∈ Z * n are two random numbers, and Finally, the vehicle formulates the Data_Report and delivers it to its nearest RSU:

Privacy Preserving Data Filtering
Upon receiving all the data reports, each RSU forwards all the collected on-board sensed data reports to the data server, and the data server filters within all the received data reports to identify the on-board sensed data reports generated within the industrial district.
The data server first computes the scalar product of u and v to identify the data reports generated within the industrial district, based on the idea that the data uploading vehicle must share at least one grid cell with the industrial district. Then, the data server decrypts C u,1 with its private key sk and recovers the value of m u + H(α||1||TS). Meanwhile, the data server also computes Based on C u,v , the data server recovers the value of s u,v = m u m v + η v + 1 with the private key sk. With s u,v , the data server can calculate the scalar product of u · v, where k v indicates the pieces of sensed data generated within the industrial district. If k v = u · v ≥ 1, at least one piece of sensed data is generated in industrial district, and the vehicle becomes a member of the group N in ; otherwise, all the data reports are generated out of the industrial district, and the vehicle becomes a member of the group N out , such that |N in | + |N out | ≤ w. Finally, the data server Correctness. The correctness of Equation (8) can be clearly illustrated with the following equation: As shown in Equation (9), s u,v consists of three parts: . Thus, with Equation (8), u · v can be computed.

Data Report Aggregation
The data server first aggregates the group of sensed data generated outside the industrial district N out , which is Then, the data server decrypts C out with the private key sk, and calculates the average of the sensed data generated outside the industrial district, which is S out = ∑ v∈N out (sum v + H(α||0||TS)). Meanwhile, the data server aggregates the sensed data within the group N in , which is In addition, the data server decrypts S in = ∑ v∈N in (sum v + H(α||0||TS)) with the private key sk; meanwhile, the data server delivers S in , S out , K in , |N in | and |N out | towards the data requester.
After receiving S in , S out , K in , |N in | and |N out |, the data requester first calculates the average of the sensed data outside the industrial district, which is Then, the data requester calculates the average of the sensed data inside the industrial district, which is

Security Analysis
Following the earlier discussed security requirements, we analyze the security properties of the proposed privacy-preserving range query scheme in this section. Specifically, our analysis will focus on how the proposed scheme can achieve the location privacy preservation of data requesters and data uploading vehicles, and protect the confidentiality of the individual on-board sensed data.
The proposed range query scheme can preserve the location privacy of data requester and data uploading vehicles. In the proposed scheme, the data requester first identifies a large region to construct the grid cell structure, which contains the industrial district. Based on the constructed grid cell structure, nothing could be learned besides the fact that the data query occupies one or a few grid cells in the defined grid cell structure. Since the location of the data requester and the vehicles are mapped to the constructed grid cell architecture, protecting the grid cell scalar of the data requester and the vehicles means preserving the location privacy. The following two paragraphs illustrate how the proposed scheme can protect the grid cell scalar during the secure scalar product computation process.
In the proposed scheme, the data requester's multi-dimensional grid cell scalar u is structured into one dimension m u , and then encrypted by the public key of the data server pk to generate (C u,1 , C u,2 ). After receiving (C u,1 , C u,2 ), the data server decrypts C u,1 with its private key sk, and obtains m u + H(α||1||TS). To prevent the data server from recovering the value of m u , H(α||1||TS) and H(α||2||TS) are introduced in (C u,1 , C u,2 ). Since the secret α is only shared between the data requester and data uploading vehicles and the defined security model does not take collusion attack into consideration, the data server cannot recover the value of m u . As (C u,1 , C u,2 ) are valid ciphertexts of the Paillier cryptosystem, which is known to be semantically secure against the chosen plaintext attack, and the composite grid cell scalar contained in (C u,1 , C u,2 ) is also semantically secure. Meanwhile, the locations of data uploading vehicles are also encrypted by the public key of the data server and protected by the secret α. Since only the data server with the private key sk can decrypt (C u,1 , C u,2 ) and (C v,1 , C v,2 ), and the data server cannot recover the value of m u and m v with the decrypted results, the composite grid cell scalars of the data requester and vehicles can be protected during transmission.
Due to the homomorphic property of the Paillier cryptosystem, the data server first computes (C v,1 ) m u +H(α||1||TS) and then recovers the value of s u,v = m u · m v + η v + 1 through decrypting . To avoid the exhaustive attack against m u · m v , a random number η v is also included in s u,v , but the data server can still obtain the scalar product of u · v by computing . In this way, the content of each composite grid cell scalar can still be protected. Meanwhile, the location privacy of the data requester and data uploading vehicles can be preserved during the secure scalar product computation.
The individual on-board sensed data report is confidential in the proposed scheme. In the proposed scheme, each vehicle's on-board sensed data report is formed as C v,0 = g sum v +H(α||0||TS) · (r v,0 ) n mod n 2 . Since C v,0 is a valid ciphertext of the Paillier cryptosystem, the sensed data sum v in C v,0 is also semantically secure. After decrypting C v,0 with sk, the data server still cannot recover the value of sum v + H(α||0||TS), since α is a secret shared between the data requester and vehicles. In addition, the data server aggregates all the sensed data within the group N in and the group N out , respectively, and delivers the aggregated results S in and S out to the data requester. Without α, the data server also cannot learn the aggregation of the sensed data reports sum v , and the confidentiality of the aggregation results can be achieved. Furthermore, the data requester can only obtain the aggregated results contained in S in and S out , but it cannot recover the value of each individual sensed data. Therefore, the confidentiality of individual sensed data can also be protected.

Performance Evaluation
In this section, we describe the parameter setup, and evaluate the performance of the proposed scheme in terms of computation complexity and communication overhead.

Parameter Setup
We conduct the experiments with the Java Paillier Library [20] on a desktop with 3.40 GHz processor and 8.00 GB memory to study the operation costs. The experimental result shows that the cost of a single exponentiation operation in Z * n 2 (|n 2 | = 2048) is C e = 8.82 ms. The proposed scheme integrates all the elements in a scalar to one composite value, which requires us to choose the proper parameter β and the length of the scalar k. Given |n| = 1024, the maximum value of β max is 7 and the maximum length of the scalar is k max = 63, which satisfies the condition that 2 · k max < 2 β max and (2 · k max + 1) · β max < |n|.
We compare the proposed scheme with the traditional homomorphic secure scalar product when Paillier encryption is exploited, i.e., each element in a scalar is encrypted separately at the data requester side, and then transmitted to the data server, i.e., the ciphertext of the data requester are generated as C u,1,i , C u,2,i , i = 1, 2, . . . , k. At the vehicle side, each element in a scalar is also encrypted and delivered individually, i.e., the ciphertext of each vehicle are generated as C v,1,i , C v,2,i , i = 1, 2, . . . , k. At the data server side, the product of two corresponding elements in the two scalars are also , and then aggregated to C u,v = ∏ k i=1 C u,v,i . Finally, the aggregated results are decrypted to derive the final results.

Computation Complexity
In the proposed scheme, when a data requester generates a data query, it requires 4 exponentiation operations in Z * n 2 to generate C u,1 ||C u,2 . Note that the computation of 2 i·β , i ∈ {1, 2, ..., k} can be conducted in the setup phase, and multiplication operation in Z * n 2 is negligible in comparison with the exponentiation operation in Z * n 2 . In the traditional scheme, the data requester needs to generate the ciphertext C u,i,1 ||C u,i,2 for each element in the scalar separately. For a scalar with the length of k, the data requester needs to consume 4 × k exponentiation operations in Z * n 2 to generate the ciphertext. In the proposed scheme, when a vehicle receives a data query, it generates an encrypted location-based data report, which requires 6 exponentiation operations in Z * n 2 . However, for the traditional scheme, each vehicle needs to consume 2 + 4 × k exponentiation operations to generate one data report.
After receiving all the data reports from w vehicles, the data server should compute the scalar product of the data requester and each vehicle, to identify whether the on-board data report is harvested within the industrial district. The data server takes one exponentiation operation in Z * n 2 to recover the value of m u + H(α||1||TS) by Paillier decryption, and it also consumes two exponentiation operations in Z * n 2 to obtain the value of m u · m v + 1. Since the multiplication operation in Z * n 2 is considered negligible in comparison to the exponentiation operation in Z * n 2 , the computation cost of aggregation is negligible, and it takes two exponentiation operations in Z * n 2 for the Paillier decryption to recover the aggregated results. For the traditional scheme, it takes 3 × k exponentiation operations in Z * n 2 to recover the scalar product and costs two exponentiation operations to obtain the aggregated results.
Thus, in the proposed scheme, totally for the data requester, vehicle, and the data server, the computation cost will be 4 × C e , 6 × C e , and (3 × w + 2) × C e . For the traditional scheme, the computation cost for the data requester, vehicle, and data server are 4 × k × C e , (2 + 4 × k) × C e , and (3 × k × w + 2) × C e , respectively, since each element in a scalar are encrypted and processed independently and the computation complexity is proportional to the length of the scalar. Figure 4a,b show the computation complexity of the data server with both schemes in terms of the scalar length and the number of vehicles. Simulation results show that the proposed scheme greatly reduces the computation complexity of the data server in comparison with the traditional scheme. As shown in Figure 4a, the computation complexity increases with both scalar length and the number of vehicles in the traditional scheme, while in the proposed scheme, the computation complexity only increases with the number of vehicles as shown in Figure 4b. Meanwhile, Figure 5a,b present the computation complexity of the data requester and vehicle in terms of the scalar length, and compare the proposed scheme with the traditional scheme, respectively. Figures 4 and 5 indicate that the proposed scheme can achieve lower computation complexity compared to the traditional scheme.

Communication Overhead
We evaluate the communication overhead that introduced the data query, which is sent from the data query towards the data server, and the data report sent from the vehicles towards the data server, since the data query sent from the data server towards the RSU and the data query response sent from the data server do not involve the transmission of ciphertexts in the proposed scheme.
We first consider the data requester to data server communication, where the data requester generates a location-based data query and delivers the data query to the data server. The location-based query is in the format of C u,1 C u,2 , and its size is 2048 × 2 bits, if we choose 1024-bit n. If the traditional homomorphic scalar product protocol is adopted, the corresponding communication overhead is 2048 × 2 × k bits, when there are k partitioned grid cells. In Figure 6, we plot the data requester to data server communication overhead in terms of the scalar length, and compare the proposed scheme to the traditional scheme, which shows that the proposed scheme reduces the communication overhead. During the vehicle to data server communication, each vehicle generates a location-based data report and delivers the data report to the data server via the RSU. The data report is in the format of C u,0 C u,1 C u,2 , and its size is 2048 × 3 bits. The data server collects w data reports from the total w vehicles, the communication overhead between the vehicles and the data server is 2048 × 3 × w bits. For the traditional homomorphic scalar product protocol, the communication between the vehicles and the data server is 2048 × 3 × w × k bits, when the scalar length is k. Figure 7a,b plot the communication overhead introduced by vehicles of both schemes in terms of the scalar length and the number of vehicles. Simulation results show that the proposed scheme greatly reduces the vehicle to data server communication overhead in comparison with the traditional scheme. As shown in Figure 7a, the vehicle to data server communication overhead increases with both scalar length and the number of vehicles in the traditional scheme, while in the proposed scheme, vehicle to data server communication overhead only increases with the number of vehicles as shown in Figure 7b.

Related Works
In this section, we briefly review some of the existing related schemes in secure scalar product computation and secure location-based query schemes.

Secure Scalar Product Computation
Research on secure scalar product computation has been focused on performing privacy-preserving profile matching [21][22][23], and secure multi-party computation [24,25]. A fine-grained privacy-preserving profile matching scheme is proposed with the Paillier homomorphic cryptosystem in [22], which enables two users to measure the level of similarity in their fine-grained personal files. An efficient privacy-preserving binary scalar product computation protocol is proposed in [23], which computes the similarity in symptom characters. A secure multi-party computation algorithm is proposed in [25] with the BGN homomorphic encryption technique, which allows the cloud server to execute secure scalar product and addition operations without the data content disclosure.
However, in the above schemes, each element in a scalar needs to be encrypted separately, which leads to the result that the size of the ciphertexts is proportional to the length to the scalar. In our proposed scheme, the elements in a given scalar is first structured to one-dimension and then encrypted, which is highly efficient in terms of communication and computation overhead.

Secure Location-Based Query
There mainly exist three types of secure location-based data query schemes. The first type of solutions are through pseudonyms [15,26]. However, the pseudonym-based solutions are based on the idea of disrupting the connection between the identities of vehicles and their locations, and the locations of the data queries can still be disclosed [11]. The second type of solutions are through location cloaking [27,28], and the identification of the exact location of each participant can be prevented by introducing uncertainty or error into location information. K-Anonymity is a commonly used technique in location cloaking, which preserves the location privacy of one user through hiding one user among a group of K users [29]. Even though location cloaking can achieve a satisfactory level of location privacy preservation, it may not be able to guarantee that there are enough users in the neighborhood and not be able to achieve high accuracy in their query results [30]. The third type of solutions rely on the cryptographic techniques, which can effectively preserve the accuracy of the data query results [31,32]. Various privacy-preserving data query schemes are proposed, such as range query [33]. An efficient spatial range query solution is proposed in [11], which permits data queries over encrypted location based data. The authors in [30] propose a hybrid approach based on location cloaking and the additive homomorphic encryption, which achieves both efficiency and privacy preservation. A spatial range query scheme over ciphertext is proposed in [34], which achieves the location privacy of the user's query and the location-based service confidentiality.
However, these schemes are based on the scenario of the outsourced central cloud storage data query and introduce heavy cryptographic operations, which also can not be applicable to the dynamically moving and distributive on-board storage. In this paper, we propose an efficient privacy-preserving location-based data query scheme, which can be applied to the scenario of the distributive vehicular on-board storage.
Based on the above analysis, in this paper, we propose an efficient location-based vehicular on-board sensory data querying scheme, which greatly reduces the network complexity and preserve the location privacy of both the data requester and vehicles.

Conclusions
In this paper, we have proposed a privacy-preserving range query scheme from the distributive on-board storage in VANETs, which achieves the secure scalar product computation with the homomorphic Paillier cryptosystem. The proposed scheme structures the multi-dimensional scalar into one dimension, identifies the data harvested within the industrial district, and computes the aggregation results of the identified sensed data. Security analysis has been conducted to demonstrate its security properties, in terms of location privacy preservation and confidentiality. Performance evaluations have also been done, which indicates that the proposed scheme can significantly reduce the computation complexity and communication overhead. For future work, we will take the possible behaviors of the collusion attack into consideration and design new solutions to resist such attacks.