Efficient Privacy-Preserving Data Sharing for Fog-Assisted Vehicular Sensor Networks

Vehicular sensor networks (VSNs) have emerged as a paradigm for improving traffic safety in urban cities. However, there are still several issues with VSNs. Vehicles equipped with sensing devices usually upload large amounts of data reports to a remote cloud center for processing and analyzing, causing heavy computation and communication costs. Additionally, to choose an optimal route, it is required for vehicles to query the remote cloud center to obtain road conditions of the potential moving route, leading to an increased communication delay and leakage of location privacy. To solve these problems, this paper proposes an efficient privacy-preserving data sharing (EP2DS) scheme for fog-assisted vehicular sensor networks. Specifically, the proposed scheme utilizes fog computing to provide local data sharing with low latency; furthermore, it exploits a super-increasing sequence to format the sensing data of different road segments into one report, thus saving on the resources of communication and computation. In addition, using the modified oblivious transfer technology, the proposed scheme can query the road conditions of the potential moving route without disclosing the query location. Finally, an analysis of security suggests that the proposed scheme can satisfy all the requirements for security and privacy, with the evaluation results indicating that the proposed scheme leads to low costs in computation and communication.


Introduction
Vehicular sensor networks (VSNs) [1][2][3], that is, a combination of wireless communication given by vehicular ad hoc networks [4] and the sensing devices installed in the vehicle, can improve traffic conditions in urban cities, and have recently received considerable attention. In VSNs, the vehicles equipped with sensing devices can record a myriad of data reports on the road conditions and environment situations, and these data reports need be uploaded to the remote cloud center [5,6] for processing and analyzing. In addition, vehicles often need to query the road conditions of potential moving routes at remote cloud centers. However, uploading a large amount of data reports to the cloud data center consumes heavy bandwidth, and leads to an increased communication delay.
Recently, fog computing [7] has been proposed to extend the capabilities of cloud computing [8] near vehicles [9], which can locally handle the data reports uploaded by vehicles. These new properties will bring about benefits such as location awareness and low latency. Fog computing has already been used to provide low latency services in vehicular sensor networks, such as navigation services [10] and surface condition monitoring [11].
A typical architecture of fog-assisted vehicular sensor networks (F-VSNs) [12][13][14] contains the trusted authority, cloud center, fog nodes, and vehicles. The trusted authority is responsible for generating system parameters, and the registration of all entities (cloud center, fog nodes and vehicles). The cloud center provides centralized control with strong computing power and large storage capacity from a remote location. Fog nodes have available computing, storage, and communication

Our Contributions
To solve the aforementioned problems, this paper proposes an efficient privacy-preserving data sharing (EP 2 DS) scheme for fog-assisted vehicular sensor networks. The main contributions of this paper are as follows: • First, the proposed EP 2 DS scheme exploits the super-increasing sequence [20] for achieving multi-dimensional data aggregation, while calculating the average sensory data in each road segment, greatly saving on the resources of communication and computation. • Secondly, by utilizing the modified oblivious transfer [28], the proposed EP 2 DS scheme is able to query about the road conditions of the potential moving routes without disclosing the query location. • Thirdly, an analysis of security indicates that the proposed EP 2 DS scheme is proven to be secure under elliptic curve discrete logarithm (ECDL) assumption in the random oracle model and satisfies all the requirements for security and privacy. • Finally, the performances of computation and communication in costs are evaluated through quantitative calculations, with the results that the proposed EP 2 DS scheme is of more efficiency than others.

Organization
This paper is organized as follows. The related work is surveyed in Section 2. We introduce the background in Section 3. The concrete scheme is proposed in Section 4. Section 5 provides an analysis of the security. In Section 6, the performance evaluation is performed. Section 7 concludes the paper.

Related Works
Some works closely related to this paper are briefly reviewed below. In F-VSNs, massive sensory data is produced in each data dimension, and needs to be uploaded for further processing and analysis; data aggregation schemes [16][17][18][19][20][21][22][23] have received considerable attention recently, and are roughly classified into two categories: single-dimensional data aggregation [16][17][18][19] and multi-dimensional data aggregation [20][21][22][23]. Zhuo et al. [16] introduced a data aggregation scheme, which protects each involved entity's identity privacy, and allows the requester to examine the correctness of the obtained results. Rabieh et al. [17] employed the data aggregation technique to find out the routes for the vehicle to be in each road segment; however, it only can calculate the data aggregation result, and cannot recover the content in each data dimension.
Xu et al. [18] constructed a privacy-preserving data aggregation scheme that can classify messages based on where and when the sensor data is collected, and aggregate the data collected in the same area and period. Sun et al. [19] designed a data aggregation mechanism considering data integrity and access control. However, the schemes [16][17][18][19] are unable to determine the number of the data reports produced in each data dimension, and further fail to calculate the average sensory data in each data dimension. Lin et al. [20] integrated the perturbation technique and super-increasing sequence to combine multiple aggregated data into one data report to improve the energy efficiency.
Lu et al. [21] employed the homomorphic Paillier encryption, one-way hash chain technique and Chinese remainder theorem to achieve lightweight multi-dimensional data aggregation. On the basis of the super-increasing sequence and modified homomorphic Paillier encryption, Wang et al. [22] introduced a multi-subtasks aggregation scheme, in which each aggregated datum is mapped to a specific area and period. Kong et al. [23] designed a privacy-preserving multi-dimensional data sharing scheme using the Chinese remainder theorem and modified Paillier encryption, with counting the number of the sensory data collected at each segments and calculating the average sensory data in each segment.
Although schemes [20][21][22][23] are able to calculate the average sensory data in each data dimension, they bring heavy computation costs and communication overhead. In addition, the query vehicle usually wants to know the road conditions of the potential moving route, which could lead to that the query location being disclosed in the data query process, the schemes in [23][24][25][26][27] have been proposed to solve this problem.
Ghinita et al. [24] and Paulet et al. [25] employed the oblivious transfer to hide query location in the data query process, but the communication cost of schemes [24,25] is directly proportional to the data dimension. Zhu et al. [25,26] utilized an improved homomorphic encryption technology to protect the query location in location-based services, but it do not support scenarios with a high vehicle density. Kong et al. [23] utilized the proxy re-encryption technique to hide the query location, but it does not support queries of whole network sensory data during the data query phase.
To sum up, from the review above, the available data aggregation schemes [16][17][18][19][20][21][22][23] either fail to determine the number of data reports produced in each data dimension or bring heavy computation and communication costs. In addition, the communication costs of the existing schemes [23][24][25][26][27] are either directly proportional to the data dimension or bring heavy communication costs in the data query process.
To address the issues above, we propose an EP 2 DS scheme for fog-assisted vehicular sensor networks, which can not only reduce the computation and communication costs, but also calculate the average sensory data in each road segment. Additionally, the proposed EP 2 DS scheme can query the road conditions of potential moving routes without disclosing the query location.

System Model
The system model is presented in Figure 1, which is composed of five entities: trusted authority (TA), cloud center (CC), the data collection vehicle V i (i = 1, 2, · · ·, δ), fog node FN j (j = 1, 2, · · ·, n), and the data query vehicle V q . The road area is divided into m segments, and each segment k (k = 1, 2, · · ·, m) is represented by a unique two-dimensional identifier (u k , v k ) , approximating of the location coordinates [23]. As to readability, the definitions of notations employed in this study are illustrated in Table 1.
Wired or wireless connection  The i-th data collection vehicle (ID i , PID i ) V i 's real identity and pseudo identity ( The j-th fog node The data query vehicle (ID q , PID q ) V q 's real identity and pseudo identity (x q , R q ) V q 's private key (u k , v k ) Identifier of the segment k d Maximum value of sensory data m The total number of segments n The total number of fog nodes δ The total number of vehicles |d| Maximum length of sensory data ϕ The vehicles' sharing key d The sensory data captured by The exclusive OR operation p, q Two large prime numbers F p The finite field over p G An additive group with the order q on the elliptic curve E over F p P A generator of G The wireless connections between the vehicles and the fog nodes are brought about by the Institute of Electrical and Electronics Engineers (IEEE) 802.11p standard [29]. The connections between the fog nodes and CC are achieved via either the wired links or other links with low transmission delay and high bandwidth.
TA: A fully trusted entity, which is responsible for the management of the security parameters for the system and the registration of the cloud center, fog nodes, and vehicles, and periodically updates the system information.
CC: An honest-but-curious entity, which is responsible for providing centralized control with powerful storage and computing capabilities from a remote location. In addition, it can perform computational analytics from data reports uploaded by the fog nodes, and distribute data to all fog nodes for further sharing with vehicles [30].
V i : It is equipped with smart sensors, periodically formatting a data report from the collected sensory data and uploading the data report towards the fog node.
FN j : This consists of a road side unit and an edge server [13], and aggregates the data reports uploaded by the data collection vehicles under its communication range and transmits the aggregated data report towards CC. Meanwhile, each fog node manages one or more segments, and can assist in sharing the sensory data to the query vehicle [31].
V q : To choose an optimal route, V q usually sends a query report to the fog node, then the fog node returns a response report to V q .
In our system model, we assume the fog node is honest-but-curious, i.e., it is able to correctly execute the operations defined in the protocol; however, it also can try to violate the privacy of the vehicle through analyzing the vehicle's data report and query report; meanwhile, we assume neither the fog nodes nor the query vehicles can collude with each other in the proposed EP 2 DS scheme. Additionally, we assume there exists an attacker, which can eavesdrop on the data transmission and launch attacks.

Security Requirement
The following security requirements should be achieved. Authentication and data integrity: The proposed EP 2 DS scheme should guarantee that any reports are not modified during the transmission process, and can detect any modification of the reports; moreover, any entity in F-VSNs should be able to be authenticated to ensure the reliability of the data source.
Confidentiality: To ensure the privacy of sensory data, the proposed EP 2 DS scheme should provide confidentiality, i.e., no attacker can obtain the sensory data from data report.
Location privacy preservation: To protect vehicle's query location, it is important not to disclose the query location to fog nodes that provide location-based services in the data query process.
Identity privacy preservation: Apart from the TA, any entities should not trace or recognize the identity of the data collection vehicle by analyzing the received data reports. Traceability: TA should be able to reveal the identity of the malicious vehicle uploading the bogus data report.
Unlinkability: Apart from the TA, neither fog nodes nor the malicious vehicles can determine whether the two data reports are from the same vehicle.
Resistance to attacks: The proposed EP 2 DS scheme should be able to withstand various popular attacks such as the modification attack, replay attack, impersonation attack, and man-in-the-middle attack.

Elliptic Curve
Let F p be a finite field with a prime number p. The elliptic curve E over F p defined as the set of all points (x, y) meeting y 2 = x 3 + ax + b mod p, where 4a 3 + 27b 2 = 0 and a, b ∈ F p [32,33].
An infinity point O, and other points on E, form an additive cyclic group G with the order q and generator P. Let P ∈ G and k ∈ Z * q , the scalar multiplication over G is described as kP = P + P + · · · + P (k times).

Security Assumption
ECDL problem [34,35]: Given two elements P, Q ∈ G, the ECDL problem is to find an integer x ∈ Z * q such that Q = xP. ECDL assumption [34,35]: It is hard for any probabilistic polynomial-time algorithm to solve ECDL problem with non-negligible probability.

The Proposed Scheme
The proposed EP 2 DS scheme includes system initialization, registration, data collection, and data query phases. Note that the data flows in the data collection and data query phases are shown in Figure 2.

System Initialization
TA produces all system parameters through executing the following steps.
(1) TA randomly chooses a large prime number p, and selects a non-singular elliptic curve E defined by TA picks a group G of E with the prime order q and a generator P.
(3) TA randomly chooses s ∈ Z * q as its master key and computes its public key P pub =sP. (4) TA chooses eight one-way hash functions where a 1 , a 2 , · · ·, a m are large prime numbers and d is the maximum value of the data. Then, TA assigns prime number a k towards segment k. (6) TA publishes the system parameters {p, q, G, P, P pub ,

Registration
All vehicles, fog nodes, and cloud centers register with TA.

V i Registers with TA
(1) V i sends the identity ID i to the TA in secure channel.
(2) After confirming the identity ID i , TA randomly chooses w i ∈ Z * q and computes and sets PID i = {PID i,1 , PID i,2 , t i }, where t i represents the valid period of PID i .
(3) TA randomly chooses r i ∈ Z * q and computes (4) TA randomly chooses a sharing key ϕ ∈ {0, 1} |d|−1 , and transmits the pseudo identity PID i , the private key (x i , R i ) and the sharing key ϕ to V i in a secure channel.

FN j Registers with TA
(1) FN j sends the identity ID FN j to the TA in a secure channel.
(2) TA randomly chooses r FN j ∈ Z * q and computes (3) TA sends the private key (x FN j , R FN j ) to FN j in a secure channel.

CC Registers with TA
(1) TA randomly chooses x ∈ Z * q and computes P cc = xP. (2) TA sends the private key x and public key P cc to CC in a secure channel.

Data Collection
The data collection phase includes three processes: data gathering, data aggregation, and data reading.

Data Gathering
V i gathers sensory data in a short period of time, e.g., every five minutes: (i) if there is a sensory data obtained at road segment k under FN j , i.e., d j i,k > 0, then e j i,k =1; (ii) if there is no sensory data obtained at road segment k under FN j , i.e., d j i,k = 0, then e j i,k = 0. V i produces a data report through executing the following steps: (3) V i randomly picks l j i ∈ Z * q and calculates Figure 2 ( 1 ).

Data Aggregation
where w ≤ δ. FN j can aggregate data reports through executing the following steps: (1) FN j checks whether t i is valid and T j i is fresh for each i = 1, 2, · · ·, w. If t i is not valid or T j i is not fresh, DR j i will be rejected. Otherwise, FN j performs the batch verification using small exponent test [36]. FN j randomly selects a set of small numbers θ j 1 , θ j 2 , · · ·, θ j w ∈ [1, 2 w ] and checks whether the following equation holds If it does hold, FN j computes where T j is current timestamp.
Otherwise, CC randomly chooses a set of small numbers θ 1 , θ 2 , · · ·, θ n ∈ [1, 2 n ] and performs the batch verification using small exponent test [36]. CC verifies whether the following equation holds If it does hold, CC calculates By solving the discrete log of Φ and ∆ with the base P, utilizing the Pollard's lambda algorithm [37], CC can obtain (2) CC distributes µ and ν to all fog nodes {FN 1 , FN 2 , · · ·, FN n } for further sharing with vehicles.

Data Query
The data query vehicle V q intends to query the data captured at segment c with the identifier (u c , v c ) at the FN j . The phase includes three processes: query generation, data response, and response reading.

Query Generation
(1) V q selects two random numbers r j q , s j q ∈ Z * q and calculates (2) V q randomly picks l j q ∈ Z * q and calculates where T j q is the current timestamp. Figure 2 ( 3 ).

Data Response
(1) After receiving QR j q , FN j checks whether t q is valid and T j q is fresh. If t q is not valid or T j q is not fresh, QR j q will be rejected. Otherwise, FN j verifies whether the following equation holds If it does hold, FN j selects two random numbers t j q , ϕ j q ∈ Z * q and calculates (2) FN j randomly picksl j q ∈ Z * q and calculateŝ Figure 2 ( 4 ).

Response Reading
(1) After receiving RR j q , V q checks whetherT j q is fresh. IfT j q is not fresh, RR j q will be rejected. Otherwise, V q verifies whether the following equation holdŝ If it does hold, V q calculates By solving the discrete log of Λ with the base P, utilizing the Pollard's lambda algorithm [37], V q can obtain β c = H 8 (t j q u c + ϕ j q v c ).
(2) By calling the Algorithm 1, V q can achieve the average sensing data d c captured at segment c. set

Security
This section depicts the security proof of the proposed EP 2 DS scheme in the random oracle model. Additionally, a security evaluation and comparison on the proposed EP 2 DS scheme and schemes of [17,19,23,25,26] is conducted.

Security Model
The security model of the proposed EP 2 DS scheme can be found in the Appendix A.

Security Proof
The security proof of the proposed EP 2 DS scheme can be found in the Appendix B.

Analysis and Comparison of Security Requirement
Authentication and data integrity: Based on Theorem 2, no polynomial-time attacker is able to fake a valid data report owing to the ECDL assumption. Therefore, authentication and data integrity can be ensured in the proposed EP 2 DS scheme.
Confidentiality: Based on Theorem 1, without the cloud center's private key x, any attacker is unable to compute the sensing data , and thus confidentiality can be ensured in the proposed EP 2 DS scheme.
Location privacy preservation: Based on Theorem 1, without the the data query vehicle's private key x q , no attacker can obtain the query location (u c , v c ) from {E j q = r j q P, F j q = u c P + x q E j q , G j q = s j q P, H j q = v c P + x q G j q }, and hence the location privacy can be guaranteed in the proposed EP 2 DS scheme. Identity privacy preservation: On the basis of the proposed EP 2 DS scheme, the identity ID i of V i is only contained in the pseudo identity PID i = {PID i,1 , PID i,2 , t i }, where PID i,1 = w i P, PID i,2 = ID i ⊕ H(w i P pub , t i ) and P pub = sP. To extract the identity ID i of V i , the attacker has to compute ID i = PID i,2 ⊕ H(s · PID i,2 , t i ). However, it is impossible to solve w i · s · P for any attacker to obtain ID i without knowing w i and s. Therefore, the identity privacy is guaranteed in the proposed EP 2 DS scheme.
Traceability: In accordance with the proposed EP 2 DS scheme, TA can adopt its own master key s to calculate ID i = PID i,2 ⊕ H(s · PID i,2 , t i ), and find out the identity ID i of V i from the pseudo identity PID i involved in the data report, with the proposed EP 2 DS scheme satisfying the traceability.
Unlinkability: On the basis of the proposed EP 2 DS scheme, the data reports generated by any vehicle are random, and any attacker cannot link the two data reports sent by the same vehicle, with the proposed EP 2 DS scheme realizing the traceability.
Resistance to attacks: The proposed EP 2 DS scheme is able to withstand the networks attacks in the following: • Modification attack: Based on Theorem 2, any polynomial attacker is unable to forge a valid data report with modification on data reports found. • Replay attack: On the basis of the proposed EP 2 DS scheme, the timestamp is contained in the data report. By examining freshness of the timestamp, the verifier is able to bear any replay attacks. • Impersonation attack: From Theorem 2, no attacker can fabricate a legal data report without vehicle's private key. • Man-in-the-middle attack: The analysis of the modification attack shows that any modification of the data reports on transmission is able to be found.
Security comparisons of schemes [17,19,23,25,26] and the proposed EP 2 DS scheme are displayed in Table 2. S1, S2, S3, S4, S5, S6, S7, S8, S9, and S10 are used to represent authentication and data integrity, confidentiality, location privacy preservation, identity privacy preservation, traceability, unlinkability, the modification attack, the replay attack, the impersonation attack, and the man-in-the-middle attack, respectively.  [26] √ In accordance with Table 2, Rabieh et al.'s scheme [17] is able to provide location privacy preservation, identity privacy preservation, and traceability. Sun et al.'s scheme [19] cannot achieve location privacy preservation. Kong et al.'s scheme [23] cannot achieve identity privacy preservation, traceability, the replay attack, and the man-in-the-middle attack. Paulet et al.'s scheme [25] cannot achieve authentication and data integrity, identity privacy preservation, traceability, the modification attack, the replay attack, the impersonation attack, and the man-in-the-middle attack. Zhu et al.'s scheme [26] cannot achieve identity privacy preservation and traceability, the replay attack, and the man-in-the-middle attack. In contrast, all security requirements are able to be satisfied in the proposed EP 2 DS scheme.

Performance Evaluation
We analyze the computation and communication costs of these schemes [17,19,23,25,26] and the proposed EP 2 DS scheme, and evaluate their performance.
To realize a fair comparison, we compare these schemes [17,19,23,25,26] with the proposed EP 2 DS scheme under the 80-bit security level [38]. Regarding the pairing-based schemes [17,19,23,25,26], we choose a bilinear pairing e : G 1 × G 1 → G 2 , where G 1 is an additive group defined by the generator P with order q on the super singular elliptic curve E : y 2 = x 3 + x mod p with the embedding degree 2, q is 160-bit Solinas prime number and p is 512-bit primer number meeting q · 12 · r = p + 1. With regard to the proposed EP 2 DS scheme, we pick a group G, where G is produced by the generator P with the order q on an elliptic curve E : y 2 = x 3 + ax + b mod p with a prime order q, where q, p are 160 bits prime number and a = −3, b is 160-bits random prime number.
The running time of the operations is able to be derived by making use of the MIRACL Crypto SDK [39]. We run the experiment on a 64-bit Windows 10 operating system with 2.53 GHz, an i7 CPU and 4 GB memory. Table 3 lists the average running time for these operations.

Computation Costs
The computation costs of the proposed EP 2 DS scheme and these schemes [17,19,23,25,26] are displayed in Table 4.  [17], V i requires running two multiplication operations in G 1 and two exponentiation operations in G 1 , thus the total time is 2T m + 2T e = 6.9164 ms. FN requires executing one multiplication operation in G 1 , one exponentiation operation in G 1 , and w + 1 bilinear pairing operations in G 1 , and thus the total time is T m + T e + (w + 1)T p = 10.3092w+13.7674 ms. CC requires executing one exponentiation operation in G 1 and n + 1 bilinear pairing operations in G 1 , and hence the total time is T e + (n + 1)T p = 10.3092n + 2.0289 ms.
For Sun et al.'s scheme [19], V i requires running two multiplication operations in G 1 and one exponentiation operation in G 1 and one map to point hash function operation, thus the total time is 2T m + T e + T h = 15.1967 ms. FN requires executing w + 3 multiplication operations in G 1 and four bilinear pairing operations in G 1 , so the total time is (w + 3)T m + 4T p = 1.4293w +45.5247 ms. CC requires executing one multiplication operation in G 1 , n exponentiation operations in G 1 and two multiplication operations in G 1 , and hence the total time is T m + nT e + 2T p = 2.0289n + 11.7385 ms.
For Kong et al.'s scheme [23], V i requires running four multiplication operations in Z n 2 and four exponentiation operations in Z n 2 , thus the total time is 4T m + 4T e = 13.8328 ms. FN requires executing 2w multiplication operations in G 1 , so the total time is 2wT m = 2.8586w ms. CC requires executing 6n multiplication operations in G 1 and 4n exponentiation operations in G 1 , and hence the total time is 6nT m + 4nT e = 16.6914n ms.
For the proposed EP 2 DS scheme, V i needs to run five scalar multiplication operations in G, and therefore the total time is 5T sm = 1.9255 ms. FN requires executing w + 3 scalar multiplication operations in G; accordingly, the total time is (w + 3)T sm = 0.3851w+1.1553 ms. CC requires executing n + 3 scalar multiplication operations in G and two solving the DL operations; therefore, the total time is (n + 3)T sm + 2T log = 0.3851n+2.4429 ms.
In the data query phase, for Kong et al.'s scheme [23], V q requires running ten multiplication operations in G 1 and seven exponentiation operations in G 1 , so the total time is 10T m + 7T e = 28.4953 ms. FN needs to run nine multiplication operations in G 1 and seven exponentiation operations in G 1 , the total time is thus 9T m + 7T e = 27.0660 ms. For Paulet et al.'s scheme [25], V q requires running five multiplication operations in G 1 and nine exponentiation operations in G 1 , the total time is thus 5T m + 9T e = 25.4066 ms. FN needs to run 6m multiplication operations in G 1 and 8m + 3 exponentiation operations in G 1 , the total time is thus 6mT m + (8m + 3)T e = 24.8070m +6.0867 ms.
For Zhu et al.'s scheme [26], V q requires running five exponentiation operations in G 1 and two bilinear pairing operation in G 1 , the total time is thus 5T e + 2T p = 30.7629 ms. FN needs to run four multiplication operations in G 1 and four bilinear pairing operation in G 1 , the total time is thus 4T m + 4T p = 46.9540 ms.
For the proposed EP 2 DS scheme, V q needs to run eleven scalar multiplication operations in G and two solving the DL operations, and hence the total time is 11T sm + 2T log = 5.5237 ms. FN needs to run eight scalar multiplication operations in G, thus the total time is 8T sm = 3.0808 ms. Figure 3 clearly demonstrates the comparison result of computation costs in the data collection phase. Figure 3a shows that the computation costs of V i is 1.9255 ms, which decreases by 72.2%, 87.3%, and 86.1% compared with that by Rabieh et al.'s scheme [17], Sun et al.'s scheme [19], and Kong et al.'s scheme [23], respectively. As shown in Figure 3b, the computation costs of FN increase linearly with the number of vehicles, with the proposed EP 2 DS scheme having a lower slope compared with Rabieh et al.'s scheme [17], Sun et al.'s scheme [19], and Kong et al.'s scheme [23]. From Figure 3c, we can see that the computation costs of CC grows linearly with the number of fog nodes, and the proposed EP 2 DS scheme has a lower slope compared with Rabieh et al.'s scheme [17], Sun et al.'s scheme [19], and Kong et al.'s scheme [23]. Figure 4 clearly indicates the comparison result of the computation costs in the data query phase. From Figure 4a, we can know that the computation costs of V q in the proposed EP 2 DS scheme are 5.5237 ms, which decreases by 80.6%, 78.3%, and 82.0% compared with that by Kong [25], and Zhu et al.'s scheme [26], respectively. Figure 4b shows the correlation between the computation cost of FN and the number of segments m, we can see that the computation cost of FN in the EP 2 DS scheme is the smallest compared with Kong

Communication Costs
The communication costs of the proposed EP 2 DS scheme and these schemes [17,19,23,25,26], are evaluated in this subsection. We mainly consider the data report size, query report size, and response report size. As mentioned above, the lengths of the elements in G, Z * q , Z n , and Z n 2 are 160 bits (20 bytes), 160 bits (20 bytes), 1024 bits (128 bytes), and 2048 bits (256 bytes), respectively, assuming that the length of timestamp and identity are 32 bits (4 bytes). The comparison results of communication costs are illustrated in Table 5. In the data collection phase, for Rabieh et al.'s scheme [17], the data report size is 260 bytes, as For Sun et al.'s scheme [19], the data report size is 516 bytes, as For Kong et al.'s scheme [23], the data report size is 1152 bytes, as For the proposed EP 2 DS scheme, the data report size is 172 bytes, as i | = 28 + 20 + 20 + 20 + 20 + 20 + 20 + 20 + 4 + 4= 172 bytes.
For the proposed EP 2 DS scheme, the query report size is 172 bytes, as The response report size is 148 bytes, as The results from the comparison of communication costs in the data collection phase are illustrated in Figure 5. In terms of the data report size, the proposed EP 2 DS scheme requires 172 bytes, which is decreased by 33.8%, 66.7%, and 85.1% compared with that for Rabieh et al.'s scheme [17], Sun et al.'s scheme [19], and Kong et al.'s scheme [23], respectively.
The result from the comparison of communication costs in the data query phase is shown in Figure 6. Regarding the query report size, from Figure 6a, we can see that the proposed EP 2 DS scheme requires 172 bytes, a decrease of 85.1%, 32.8%, and 46.9% compared with that by Kong et al.'s scheme [23], Paulet et al.'s scheme [25], and Zhu et al.'s scheme [26], respectively. Figure 6b shows the correlation between the response report size and the number of segments m, and we can see that the response report size in the EP 2 DS scheme is the smallest compared with Kong [26], respectively. Furthermore, unlike Paulet et al.'s scheme [25], the response report size in the EP 2 DS scheme does not increase with the number of segments m.

Conclusion
This paper proposes an efficient privacy-preserving data sharing scheme for fog-assisted vehicular sensor networks. Based on the super-increasing sequence, the proposed EP 2 DS scheme is able to format the data reports captured at different road segments into one report, while calculating the average sensory data in each road segment, greatly saving on the resources of communication and computation. Furthermore, by exploiting the modified oblivious transfer technology, the proposed EP 2 DS scheme also can query the road conditions of the potential moving route in the data query phase without disclosing the query location. Finally, an analysis of security displays that the proposed EP 2 DS scheme can satisfy all the requirements for security and privacy, with the performance evaluation suggesting that the proposed EP 2 DS scheme is more efficient in computation and communication costs compared to the existing schemes of [17,19,23,25,26]. Accordingly, the proposed EP 2 DS scheme is more appropriate for achieving data sharing in fog-assisted vehicular sensor networks. In future work, we will consider using blockchain technology to achieve decentralization and privacy protection.

Security Model
The proposed EP 2 DS scheme should satisfy the confidentiality and unforgeability. The security is defined by the following two interaction games executed by a challenger C and an attacker A. A could make the following queries.
• Hash queries: Upon receiving the query, C returns a random value to A. • Extract queries: Upon receiving the query on the pseudo identity PID i , C returns a private key to A. The IND-CPA is defined by the following game. Setup: C generates the system parameters and returns to A. Phase 1: A adaptively makes the hash, extract, and signcryption queries with polynomial bounded times.
Challenge: A chooses a challenging identity PID * i , picks two messages m * 0 and m * 1 and sends to C. C randomly picks b ∈ {0, 1} and produces the ciphertext of message m * b under PID * i . Finally, C returns the ciphertext to A. Phase 2: A is able to adaptively perform the query in Phase 1 apart from that, it cannot make extract queries on PID * i . Guess: A produces a guess b ∈ {0, 1}. The advantage that A wins the game is Definition A2 (Unforgeability). The proposed scheme can achieve existential unforgeability against adaptive chosen message attacks (EUF-CMA), if any probabilistic polynomial-time attacker does not have the ability to win the below game with a non-negligible advantage.
The EUF-CMA is defined by the following game. Initialization: A selects a challenging pseudo identity PID * i and transmits to C. Setup: C generates the system parameters and returns to A. Queries: A adaptively makes hash, extract and signcryption queries. Supposing there is an attacker A is able to win the game defined in Definition 1 with a non-negligible probability ε, we can construct an algorithm B that could break the IND-CPA of ElGamal encryption with probability ε .
Initialization: The simulator S for ElGamal encryption generates the {p, q, P, G, P pub ) and transmits to B.
To keep the rapidly response and consistency, B maintains the following list: • L H 2 : It consists of tuples (PID i , R i , P pub , h i ).
• L H 4 : It consists of tuples (PID i , R i , C i,1 , C i,2 , L i , T i , τ i ).
• L V i : It consists of tuples (PID i , x i , R i ).

Phase 1:
A adaptively is able to adaptively perform the following polynomial bounded times queries.
H 2 queries: A performs a query on (PID i , R i , P pub ), B executes as follows: H 4 queries: A performs a query on (PID i , R i , C i,1 , C i,2 , L i , T i ), B executes as follows: Extract queries: A performs a query on PID i , B executes as follows: q and makes R i = x i P − h i P pub . If h i already appear in L H 2 , B chooses another x i ∈ Z * q and tries again. B inserts (PID i , x i , R i ) and (PID i , R i , P pub , h i ) into L V i and L H 2 , respectively. Finally, B returns the (x i , R i ) to A.
Signcryption queries: A makes a query on the message m i under PID i , B returns m i to S. S randomly chooses t i ∈ Z * q and computes C i,1 = t i P, C i,2 = t i P cc + m i P,and returns them to B. B produces a ciphertext {PID i , R i , C i,1 , C i,2 , L i , σ i , T i } in accordance with the proposed scheme. Finally, B returns the ciphertext to A.
Challenge: A selects a challenging identity PID * i , picks two same length message m * 0 and m * 1 and sends them to B. Then B transmits them to S. S randomly chooses b ∈ {0, 1}, t * i ∈ Z * q and computes C * i,1 = t * i P, C * i,2 = t * i P cc + m * b P, and returns them to B. B produce a ciphertext {PID * i , R * i , C * i,1 , C * i,2 , L * i , σ * i , T * i } in accordance with the proposed scheme. Finally, B returns the ciphertext to A. Phase 2: A is able to adaptively perform the query in Phase 1 apart from it cannot make a extract queries on PID * i . Guess: B can output b as its guess against the IND-CPA of ElGamal encryption. Probability analysis: Supposing that A is able to make at most q H 2 times H 2 queries, q H 4 times H 4 queries, q e times extract queries and q s times signcryption queries. We define two events as follows: • E 1 : B does not abort above game in extract queries. • E 2 : B is able to correctly output the value of b.
According to the above simulation, we could obtain that Pr[ ) q e and Pr[E 2 |E 1 ] ≥ ε, and hence the advantage that B is able to break the IND-CPA of ElGamal encryption is In accordance with the above analysis, we can conclude that B can break the IND-CPA of ElGamal encryption with a non-negligible probability, this is contradicts with the security of ElGamal encryption, so the proposed EP 2 DS scheme could provide confidentiality.
Theorem A2. The proposed EP 2 DS scheme can provide the unforgeability if the ECDL problem is hard.
Assuming that there is an attacker A can break the unforgeability of the proposed EP 2 DS scheme with a non-negligible advantage ε, we can construct an algorithm B for solving the ECDL problem with probability ε .
Initialization: A picks a challenging identity PID * i and returns to B. Setup: Given an instance (P, aP = Q) of the ECDL problem, then B sets P pub = Q and returns {p, q, P, G, P pub , P sp , H 1 , H 2 , H 3 , H 4 , H 5 , H 6 , H 7 , H 8 , a} to A. H 2 queries: It is the same as Theorem 1. H 4 queries: It is the same as Theorem 1. Extract queries: It is the same as Theorem 1. Signcryption queries: A makes a query on the message m i under PID i , B executes as follows: • If PID i = PID * i , B randomly selects t i , l i , σ i , h i , τ i ∈ Z * q and calculates C i,1 = t i P, C i,2 = t i P cc + m i P, L i = l i P, R i = σ i P − (h i P pub + τ i L i ). If the h i already appears in L H 2 or τ i already appears in L H 4 , B chooses another σ i ∈ Z * q and tries again. Then, B returns the ciphertext {PID i , R i , C i,1 , C i,2 , L i , σ i , T i } to A, and inserts (PID i , R i , P pub , h i ) and (PID i , R i , C i,1 , C i,2 , L i , T i , τ i ) into L H 2 and L H 4 , respectively. • If PID i = PID * i , B generates a ciphertext {PID i , R i , C i,1 , C i,2 , L i , σ i , T i } in accordance with the proposed scheme. Then, B returns the ciphertext to A. Forgery: A outputs a forged ciphertexts {PID * i , R * i , C * i,1 , C * i,2 , L * i , σ * i , T * i } on m * i under PID * i . On the basis of the forking lemma [40,41], B is able to output another valid ciphertext {PID * i , R * i , C * i,1 , C * i,2 , L * i , σ * i , T * i } on m * i under PID * i by choosing a different H 2 . Since both ciphertexts are valid, we are able to gain the following two equations We can gain the equations: as a solution of ECDL problem. Probability analysis: Supposing that A is able to make at most q H 2 times H 2 queries, q H 4 times H 4 queries, q e times extract queries, and q s times signcryption queries. We define three events as follows: • E 1 : B never abort above game in extract and signcryption queries. • E 2 : B is able to output a valid ciphertext. • E 3 : PID i = PID * i .
According to the above simulation, we could obtain that . Thus, the probability that B is able to solve the ECDL problem is shown as: