Montclair State University Digital Montclair State University Digital Commons Commons ESPADE: An Efficient and Semantically Secure Shortest Path ESPADE: An Efficient and Semantically Secure Shortest Path Discovery for Outsourced Location-Based Services Discovery for Outsourced Location-Based Services

: With the rapid growth of smart devices and technological advancements in tracking geospatial data, the demand for Location-Based Services (LBS) is facing a constant rise in several domains, including military, healthcare and transportation. It is a natural step to migrate LBS to a cloud environment to achieve on-demand scalability and increased resiliency. Nonetheless, outsourcing sensitive location data to a third-party cloud provider raises a host of privacy concerns as the data owners have reduced visibility and control over the outsourced data. In this paper, we consider outsourced LBS where users want to retrieve map directions without disclosing their location information. Speciﬁcally, our paper aims to address the following problem: Given a user’s location s , a target destination t , and a graph G stored in a cloud, can users retrieve the shortest path route from s to t in a privacy-preserving manner? Although there exist a few solutions to this problem, they are either inefﬁcient or insecure. For example, existing solutions either leak intermediate results to untrusted cloud providers or incur signiﬁcant costs on the end-user. To address this gap, we propose an efﬁcient and secure solution based on homomorphic encryption properties combined with a novel data aggregation technique. We formally show that our solution achieves semantic security guarantees under the semi-honest model. Additionally, we provide complexity analysis and experimental results to demonstrate that the proposed protocol is signiﬁcantly more efﬁcient than the current state-of-the-art techniques.


Introduction
Due to the advent of the Internet of Things (IoT), a wide range of smart devices (e.g., tablets and smartphones) are being used to boost businesses across several industries, including healthcare [1,2], manufacturing [3], and military [4]. The number of applications that provide services using geo-locations has also increased in recent years [5]. Specifically, the proliferation of smart devices is the driving force for the growth of Location-Based Services (LBS). LBS applications and platforms allow users to access relevant and updated information about their surroundings based on their real-time geo-location information. For example, navigation, gaming, advertising, and tracking are some domains that effectively leverage LBS [6].
Although the LBS feature renders dynamic user experience and enhances the way businesses can operate and interact with their customers, many such applications sacrifice user security for increased availability and performance [7]. For example, addressing the inherent privacy issues in LBS remains a critical challenge for many service providers [8,9]. In general, a user should disclose his/her location data for the LBS provider to make accurate recommendations. This enables LBS providers to continuously track the user's personal information, such as their home address, travel plans, lifestyle, etc. Due to growing privacy concerns, users may not want to disclose their location information to LBS providers, but they may still want to get benefits from such applications. Additionally, the evolution of real-time LBS queries emphasizes the need for LBS providers to provide on-demand services. That is, LBS providers often require huge computational and storage resources to manage graph data and process location-aware queries. As a result, it is costly and challenging for LBS providers to manage on-premise infrastructure for delivering services. A natural solution to this problem is to adopt cloud computing technology [10] for efficient processing of LBS queries with on-demand scalability and increased resiliency [11]. The LBS provider can delegate their computational operations in addition to their data to the cloud. Nonetheless, the privacy issues mentioned above become even more challenging under an outsourced environment as the cloud service providers are remote and not trusted servers. A common approach to address the confidentiality of the location data is to encrypt it before being outsourced to a cloud. However, processing over encrypted data is not straightforward as the cloud cannot see the underlying data, thus affecting the quality of LBS. Therefore, we emphasize that there is a strong need to develop privacy-preserving technologies (e.g., [12,13]) that can equilibrate privacy and quality of LBS applications in a cloud environment.
In this paper, we focus on the Single-Source Single-Destination (SSSD) shortest path query, which is one of the commonly used LBS features, under an outsourced cloud environment. For example, consider finding the shortest path to the military base in a mission-critical search-and-rescue operation. To achieve data confidentiality, we assume that the geospatial data are represented as a graph and are encrypted before being outsourced to a cloud. Specifically, given an encrypted graph G stored in a cloud, the goal is to find the shortest path from the source to a destination in a privacy-preserving manner. In the literature, this problem is commonly referred to as Privacy-Preserving Shortest Path over Encrypted Graph (PSPEG) [14]. We emphasize that any solution to the PSPEG problem needs to address the following three privacy objectives: 1. Privacy Objective 1 (PO1): User's input information (i.e., source and destination locations) should not be revealed to the cloud service providers and any other users.

Privacy Objective 2 (PO2):
The contents of graph G should never be revealed to the cloud service providers and unauthorized users.

Privacy Objective 3 (PO3):
The shortest path information should be revealed only to the query issuer.
Existing PSPEG solutions (e.g., [14][15][16]) either do not meet all the above privacy objectives or are not very efficient. Therefore, the primary goal of this paper is to develop a solution that is both secure and efficient. Specifically, we address two research questions: (i) Investigate ways for the data owner to securely and efficiently outsource his/her graph data as well as the shortest path query processing task to a cloud. (ii) Study methods for the query issuer to efficiently retrieve the shortest path results from the cloud without compromising his/her location privacy. To address the above two questions, we adopt the Paillier cryptosystem [17] due to its efficiency and inherent additive homomorphic properties over encrypted data. The cloud server can directly operate on encrypted data to execute the steps involved in the shortest path-discovery process. Along this direction, we propose an Efficient and Semantically Secure Shortest Path Discovery over Encrypted Graph Data (ESPADE) protocol under a cloud environment. The proposed ESPADE protocol meets all of the three privacy objectives mentioned above and provides semantic security [18] under the semi-honest model of Secure Multi-party Computation (SMC) [19]. The main idea behind our protocol is to split the graph data into grids, apply homomorphic encryption operations using a novel data aggregation technique, and execute the shortest path discovery process in an iterative process. The underlying steps involved in ESPADE are based on the well-known Dijkstra's algorithm [20]; thus, ensuring the correctness. Our performance analysis shows that ESPADE is more secure and efficient compared to the existing PSPEG protocols. It is worth noting that our solution can be incorporated into critical outsourced LBS applications (e.g., military rescue operations) where the secure computation of single-source single-destination shortest path queries is a fundamental task in the graph mining process.

Main Contributions
The existing solutions to the PSPEG problem are either insecure (e.g., leak intermediate results to the cloud providers) or not very efficient. Our proposed ESPADE protocol is both secure and efficient over existing solutions. The main contributions of this paper are summarized below: 1. Semantic Security: ESPADE meets all three privacy objectives mentioned earlier. That is, the contents of the outsourced graph G and the user's input query are never revealed to the cloud service providers and any unauthorized users. This is because our solution is designed to achieve semantic security under the semi-honest model of SMC. We refer the reader to Section 6.1 for more details. 2. Efficiency: Our protocol is significantly efficient (in terms of both computation and communication-wise) compared to existing solutions. It is worth noting that the majority of expensive computations in ESPADE are performed by cloud providers, thus minimizing the costs on the end-user. 3. Correctness: The steps involved in ESPADE are similar to the ones in the standard Dijkstra's algorithm. The only difference is that the underlying operations are performed either over encrypted or randomized data. For any given G and shortest path query Q, the shortest path returned by our protocol is the same as the one that would be returned by executing Dijkstra's algorithm on {G, Q}. 4. Flexibility: Upon outsourcing the graph data to the cloud, the data owner does not have to participate in any other operations. Specifically, the end-users can issue SSSD queries directly to the cloud and the majority of the query processing task is done by the cloud providers, which suffices in the main purpose of outsourcing in the first place.
The remainder of this paper is organized as follows: Section 2 discusses our system model. In Section 3, we touch upon more closely related work to ours. Section 4 presents the background information. Section 5 discusses the proposed ESPADE protocol in detail along with a running example. We present the security proofs and the comparative performance analysis of ESPADE with experimental results in Section 6, and conclude the paper with future work in Section 7.

Problem Model
In our problem setting, we consider three types of entities: (i) Alice (ii) Federated Cloud and (iii) Bob. The proposed system model along with the information flow among different entities is shown in Figure 1. Next, we discuss the role of each of these entities.

•
Alice: We assume that Alice (data owner) holds sensitive geospatial data represented as a graph G = {V, E, W}, where V = {v 1 , . . . , v n } denotes the set of vertices, E denotes the edges connecting those vertices associated with weights W. In our model, we consider that G is an undirected weighted graph represented as an α × α grid matrix (refer to Section 5 for more details). For example, consider a graph used by Google Maps where G represents a road network.
Here each vertex in V will correspond to a given point on the road network, and each edge E will correspond to the road segment that connects any two points on the road network. In this case, weight can be the distance between junctions or traffic flow. • Federated Cloud: Without loss of generality, let Alice outsource G (in encrypted format) to a Federated Cloud (FC) environment consisting of two cloud services providers C 1 and C 2 .
We assume that C 1 and C 2 are semi-honest and do not collude. This is a realistic model as it is being used in several existing works (e.g., in [21][22][23][24]). This is because most of the cloud service providers are well-known IT organizations and it is highly unlikely that any two cloud service providers will collude as it may damage their reputation and can adversely affect their revenues.
• Bob: The end-user Bob issues SSSD shortest path queries to FC. C 1 and C 2 will collaboratively compute the shortest path from source s to t, and return the path to Bob.
Under the above problem setting, we formally define the proposed ESPADE protocol as follows: where G is known only to Alice and is outsourced (after proper randomization) to FC. s, t denote the source and destination locations, respectively, known only to Bob. At the end of the ESPADE protocol, the shortest path route, denoted by SP(s, t), should be revealed only to Bob. Briefly, our model involves the following main steps: (1) C 2 initially generates a public-private key pair (pk, sk) based on the Paillier cryptosystem [17] and distributes the public key pk to Alice, C 1 and Bob, (2) Alice randomly splits her graph data G in a systematic way and outsources them to C 1 and C 2 , (3) Bob randomly splits his shortest path query information and forwards them to FC, (4) C 1 and C 2 involve secure computations to collaboratively extract the shortest path information, and (5) FC sends the randomized sub-graph data to Bob, who then aggregates and extracts the shortest path information based on Dijkstra's algorithm.

Related Work
In this section, we discuss the existing methods for privacy-preserving shortest path (PPSP) computation. We also demonstrate the limitations of the existing solutions.

Privacy-Preserving Shortest Path over Plaintext Data
In this sub-section, we briefly touch upon existing privacy-preserving shortest path solutions in which the graph data stored on the server-side are in a non-encrypted format.

Obfuscation Methods
In obfuscation methods [25], Bob does not send his query directly to the LBS provider. For example, in [26], Bob computes a set of dummy locations and obfuscates his source location s with the fake locations. Similarly, he obfuscates the destination location t with many fake locations. Suppose S and T denote the obfuscated source and destination locations, respectively. Bob forwards (S, T) to the LBS provider. Upon receiving, the LBS provider computes the shortest path from every location in S to every location in T, based on G resulting in |S| * |T| paths which are forwarded to Bob. Finally, Bob retains the shortest path that corresponds to the original source-destination location pair. It is worth noting that any PPSP solution based on the obfuscation method is not secure as it will reveal substantial information about the query to the LBS provider. For example, LBS knows that the original source (or destination) location is one among the many locations in the set S (or T). Additionally, obfuscation-based methods assume that outsourced data (i.e., G in our case) are known to the LBS provider, whereas in our problem setting the contents of G are hidden from both cloud service providers.

Private Information Retrieval (PIR) Methods
In PIR [27], the server is assumed to hold a database of items and the end-user wants to retrieve the ith item from the database in an oblivious manner. Computational PIR [28] is one kind of PIR which can work with a single server and utilizes cryptographic techniques to ensure the privacy of the query from a computationally bounded server. Existing PIR-based solutions to PPSP assume that the outsourced graph database is not privacy-sensitive and is known to the LBS provider. For example, Mouratidis and Yiu [29] proposed a hardware-aided PIR-based solution to PPSP that relies on a tamper-resistant secure co-processor installed at the server-side. However, such schemes require decrypting the data in a secure area at the cloud and perform the computation on decrypted data; thus, they do not protect data access patterns from the server [30].
In this paper, we assume that the graph data are sensitive and thus the above schemes are not applicable to our problem domain, especially for outsourced environments where the outsourced data are in an encrypted format and has to be protected even from the cloud service providers.

PPSP over Encrypted Graph Data
In the past decade, various techniques for location-based query processing over outsourced encrypted data have been proposed [14][15][16]31,32]. However, there exist only a few solutions to the PPSP problem over encrypted graph data. Blanton et al. [15] proposed a data-oblivious algorithm for the SSSD shortest path problem on protected information. First, their algorithm is well-suited only for dense graphs. Second, their framework is based on the (k, n) threshold linear secret sharing scheme that requires at least three parties whereas our framework utilizes a more practical two-cloud model.
Zhang et al. [16] proposed a shortest path computing framework based on homomorphic encryption and secure multi-party computation. However, their protocols assume that only the edge-weight information was encrypted; and thus, their solution reveals the vertex information to the cloud server. Additionally, they utilized 1-out-of-n oblivious transfer cryptographic primitive, which can be prohibitively expensive and their solution requires the participation of the data owner during the shortest path computation, whereas in our protocol the data owner does not participate in any operations after the data outsourcing step.
Samanthula et al. [14] proposed two PPSP solutions over encrypted graph data, referred as PSPEG 1 and PSPEG 2 , under different cloud settings. First, PSPEG 1 utilizes a single cloud service provider whereas PSPEG 2 uses a two-cloud model similar to ours. However, both of these protocols incur significant computation and communication costs on Alice and Bob. Furthermore, there is no provable way to quantify the security guarantees of these two protocols as they did not provide any security proofs. In Section 6, we formally show that our proposed ESPADE protocol is more efficient and secure compared to both PSPEG 1 and PSPEG 2 . Specifically, the proposed ESPADE protocol utilizes a data aggregation technique under encryption to boost the performance of each iteration in the shortest path discovery process whereas PSPEG 1 and PSPEG 2 incur significant costs (computation, communication, and round) on the cloud providers and the end-user. We refer the reader to Section 6 for a detailed analysis on the secure guarantees of ESPADE and its comparison with existing work.

Preliminaries
In this section, we discuss basic concepts and core algorithms that are utilized in ESPADE as background knowledge. First, we present the main steps involved in the Dijkstra's algorithm. Then, we discuss the importance of homomorphic encryption in outsourced environments along with the properties of Paillier Cryptosystem, which are crucial to perform certain operations over encrypted data in our proposed protocol. Finally, we discuss the two-party secure multiplication (SM) protocol over encrypted data, which is used as a building block in ESPADE. Some common notations utilized in this paper are shown in Table 1. Table 1. Common Notations.

SSSD-SP
Single-source single-destination shortest path ESPADE Efficient and semantically secure shortest path discovery over encrypted graph data FC A federated cloud environment consisting of two cloud service providers C 1 and C 2 Weight between the two vertices: v i,j and v k i,j (s, t) Source and destination locations

SP(s, t)
Shortest path from s to t based on G (pk, sk) A pair of public-private key pair generated based on Paillier cryptosystem E pk Paillier's Encryption function with public key pk D sk Paillier's Decryption function with private key sk r ∈ R Z N A random number chosen uniformly in the group Z N

The Dijkstra's Algorithm
There exist several algorithms to find the single-source shortest path for a weighted graph G = {V, E, W}, such as Dijkstra's and Bellman-Ford algorithms. In this paper, we focus on Dijkstra's algorithm as it is one of the most commonly used algorithms for non-negative weighted graphs. In general, most of the location-based services, such as map directions, deal with non-negative weighted graphs. For brevity, in this paper, we restrict our discussion to non-negative weighted graphs. Given a weighted graph G, Dijkstra's algorithm can be used to compute the shortest path from a given source (s) to the destination (t), denoted by SP(s, t). Let us consider a graph G with the following elements: • Vertices denoted by u or v; • Each edge that connects two vertices (u, v) has weights associated with it, denoted by w u,v .
For a given (s, t) pair, the algorithm will traverse neighboring vertices in G, finding the shortest path by adding edge weights and replacing the current shortest path if a lower total distance is found. The main steps involved in Dijkstra's algorithm are discussed below.

•
The current vertex cv is marked as source s; • Each vertex in G is initially marked as unvisited; • All vertices are assigned with ∞ as the distance from s, while for s itself the distance is assigned B. Iterative Process

•
Step 1: For the current vertex cv, consider all unvisited vertices that are directly connected to cv. Let us denote this set by L.

•
Step 2: For each vertex u ∈ L, calculate its new distance from the source as Step 3: Mark cv as visited and update SP(s, t) as SP(s, t) cv. If cv = t, then terminate and return SP(s, t) as the output. Otherwise, select the vertex m ∈ L, which has the smallest distance, set cv to m, and proceed to Step 1.

Homomorphic Encryption and Paillier Cryptosystem
Homomorphic Encryption (HE) provides a way to process the information in an encrypted format and produce the same results as if the operations were performed over plaintext data [33]. In outsourced environments, HE enables untrusted servers to directly perform operations over encrypted data while maintaining the confidentiality of the data. Existing HE schemes are broadly categorized into Fully Homomorphic Encryption (FHE) and Partially Homomorphic Encryption (PHE) schemes. On one hand, FHE supports multiple types of operations (i.e., addition and multiplication) over encrypted data, but the existing FHE schemes are very expensive and not practical. On the other hand, PHE schemes support only one type of operation, either addition or multiplication, over encrypted data.
The Paillier cryptosystem [17] is an asymmetric PHE scheme that can support an arbitrary number of addition operations over encrypted data and whose security relies on the hardness of integer factorization. Let E pk denote the Paillier's encryption function, where the public key pk consists of (N, g). Here N denotes the RSA modulurats and g denotes the geneor from group Z * N . The encryption of a message m under Paillier's scheme is given by, E pk (m) = (g m * r N ) mod N 2 , where r ∈ R Z N . For any two given messages, m 1 , m 2 ∈ Z N , the following properties of Paillier's encryption function always hold: • Additive Homomorphism: The output of multiplying the ciphertexts of m 1 and m 2 is equivalent to the encryption of m 1 + m 2 mod N. That is, .

•
Partial Multiplication: Given a constant b ∈ Z N , rising the ciphertext of m 1 to the power of b is equivalent to the encryption of b * m 1 . That is, Semantic Security: Paillier's encryption function is a probabilistic scheme meaning that encryptions of the same message will result in different ciphertexts. Therefore, given a set of ciphertexts, an adversary cannot deduce any information about the underlying plaintexts. That is, ciphertexts are indistinguishable from one another; thus, the scheme exhibits the semantic security property [18].
In this paper, we adopt Paillier's scheme as the underlying encryption scheme due to its security guarantees, homomorphic properties, and possible optimizations (refer to Section 6.2.2 for more details).

Secure Multiplication (SM)
As mentioned above, Paillier's encryption scheme allows parties to locally perform addition operations over encrypted data. However, multiplication over encrypted data cannot be done locally and it requires the help of the party holding the corresponding private-key. Without loss of generality, suppose C 1 holds private data E pk (a), E pk (b) and C 2 holds the corresponding Paillier's private-key sk. We assume that a and b are not known to C 1 and C 2 . Under this setting, the goal of Secure Multiplication (SM) [34] is to enable C 1 and C 2 to jointly compute E pk (a * b). The output of the SM protocol-i.e., E pk (a * b), should be known only to C 1 . During the execution of SM, the contents of a, b, and (a * b) should not be revealed to C 1 and C 2 .
The main steps involved in the SM protocol [34] are highlighted in Algorithm 1. Initially, C 1 generates two random numbers r a , r b ∈ R Z N , and uses them to randomize the encrypted inputs. That is, C 1 computes A = E pk (a + r a ) and B = E pk (b + r b ) using the additive homomorphic properties of Paillier's scheme and forwards these values to C 2 . Upon receiving, C 2 decrypts (A, B) to get (a , b ), respectively. Then, it multiplies a and b whose equation can be expanded as follows: At this point, C 2 knows that p = a * b , which it encrypts under pk and sends the resulting encrypted value P to C 1 . Finally, C 1 performs homomorphic operations on P locally and removes the randomized factors (i.e., a * r b , b * r a and r a * r b ) under encryption to get E pk (a * b) as the final output.
Require: C 1 holds E pk (a), E pk (b) and C 2 holds sk. a and b are not known to C 1 and C 2 1: C 1 : (a). Pick two random numbers r a , Example 1. Suppose a = 59 and b = 58. For simplicity, let r a = 1 and r b = 3. Assume that C 1 holds E pk (59), E pk (58) . Various intermediate results computed during the execution of SM protocol are as follows. Initially, C 1 computes A = E pk (a) * E pk (r a ) = E pk (60), B = E pk (b) * E pk (r b ) = E pk (61) and sends them to C 2 . Then, C 2 decrypts and multiplies them to get p = 3660. After this, C 2 encrypts p to get P = E pk (3660) and sends it to C 1 . Upon receiving P, C 1 computes y = E pk (3660 − a * r b ) = E pk (3483), and y = y * E pk (−b * r a ) = E pk (3425). Finally, P 1 computes E pk (a * b) = y * E pk (−r a * r b ) = E pk (3422).

The Proposed ESPADE Protocol
In this section, we present our ESPADE protocol in detail. As mentioned in Section 2, our proposed protocol consists of three parties: Alice, Federated Cloud, and Bob. ESPADE is constructed based on the Paillier cryptosystem and by utilizing the SM primitive as a building block. For the rest of this paper, we explicitly make the following assumptions: 1. Alice's geospatial data are represented as a weighted graph G with V vertices and W weights.
The contents of G are sensitive and thus need to be kept confidential from cloud service providers and unauthorized parties. We assume that each vertex in G is associated with a unique identification number-for example, denoting the combination of latitude and longitude information. 2. C 2 generates the public-private key pair (pk, sk) based on the Paillier's scheme and securely distributes pk to Alice, C 1 , and Bob. We assume that there exist secure communication channels (e.g., SSL) between each pair of parties participating in our protocol. 3. Similar to existing work [14][15][16], we assume that all the participating parties in our protocol are semi-honest [19]. The semi-honest model is a practical security model, due to the following reasons. First, building protocols under the semi-honest model is an important first step for constructing protocols under stronger security models (e.g., against covert and malicious adversaries). Second, protocols under the semi-honest model are typically considered to be quite efficient, which may not be the case for other adversarial models. Third, protocols that are proven to be secure under the semi-honest model can prevent inadvertent leakage of information among participating parties. Finally, it is highly unlikely that the well-established cloud service providers (e.g., Amazon and Microsoft) would deviate from the prescribed protocol and collude, as this would damage their reputation and consumer trust. Therefore, we believe that the semi-honest model is a practical security model for our problem domain.
The goal of ESPADE is to securely outsource G to FC and execute Bob's shortest path query (s, t) in a privacy-preserving manner. The proposed ESPADE protocol consists of the following two stages: • Stage 1-Secure Outsourcing of Graph G (SOG): In this stage, Alice transforms her graph data G into a proper α × α grid matrix. During this process, Alice relies on our data aggregation technique to intelligently capture the information in each grid. After this transformation, Alice outsources the aggregated graph information to the federated cloud environment using the randomization approach. At the end of this stage, only C 1 knows the encrypted graph data. • Stage 2-Secure Retrieval of Shortest Path (SRSP): In this stage, Bob securely sends his shortest path query Q = s, t to FC. Then, C 1 and C 2 jointly involve secure computations to retrieve the shortest path in an iterative process, based on Dijkstra's algorithm. At the end of this stage, only Bob knows the shortest path from s to t.
The main steps involved in the proposed ESPADE protocol are given in Algorithm 2. Next, we explain each of the two stages of ESPADE in detail.

Secure Outsourcing of Graph G (SOG)
During Stage 1, Alice first divides her graph G into α × α square grids (for example, 1 mile by 1 mile). Let n denote the total number of grids in G and g v denote the grid ID in which a vertex v resides. Each grid is represented by the set of vertices that reside inside the grid, their neighbors, and the associated weights. Alice represents each piece of grid information as a matrix where each row corresponds to a vertex in that grid. Suppose denotes the maximum number of unique vertices in each grid and m denotes the maximum of 1-hop neighbors, a vertex in G can have. Upon dividing the graph G into n grids, Alice constructs the matrix M i for each grid, for 1 ≤ i ≤ n. Each row in M i corresponds to a particular vertex in grid i and its m neighboring vertex information. Specifically, for grid i, the number of unique vertices are denoted by v i,1 , . . . , v i, . For each vertex v i,j in grid i, Alice stores 3m entries, such that each entry consists of the neighboring vertex, its associated edge weight, and the grid ID of the neighboring vertex, where 1 ≤ j ≤ m. As an example, for vertex . In this case, v 1 i,j , . . . , v m i,j and w 1 i,j , . . . , w m i,j denote the neighboring vertices of v i,j and the corresponding edge weights, respectively, for 1 ≤ j ≤ m. Additionally, g v 1 i,j denotes the Grid ID of vertex v 1 i,j . A sample snapshot of the grid information captured in M i is shown below.
For security reasons, we assume that all the matrices constructed from G are of the same size; that is, × m. However, if a grid has less than vertices, Alice can add dummy entries. Similarly, if a vertex has less than m neighbors, she can insert dummy values.
Upon creating all the matrices, we propose that Alice adopts the following data aggregation technique to reduce its outsourcing costs, as well as later query processing costs for the end-users. Alice transforms matrix H i into a vector T i of size 3m + 1 by concatenating the column-wise entries in M i . That is, the first entry of T i is computed as v i,1 v i,2 , . . . , v i, , the second entry as v 1 i,1 v 1 i,2 , . . . , v 1 i, , and so on. After transforming all the matrices into vectors, Alice needs to outsource T i 's to FC in an encrypted format. However, for large values of n, m and , it would be expensive for Alice to encrypt T i , for 1 ≤ i ≤ n. Therefore, we propose the following approach for Alice to outsource T i,j . Alice selects a random number r i,j and splits T i,j into two random shares: It is worth noting that T i,j = T 1 i,j + T 2 i,j mod N always holds, for 1 ≤ i ≤ n and 1 ≤ j ≤ 3m + 1. Then, Alice outsources T 1 i,j and T 2 i,j to C 1 and C 2 , respectively. Upon receiving T 2 i,j , C 2 computes E pk (T 2 i,j ) and forwards it to C 1 . Finally, C 1 computes E pk (T 1 i,j ) * E pk (T 2 i,j ) mod N 2 which is equivalent to E pk (T 1 i,j + T 2 i,j ) = E pk (T i,j ). We denote the final encrypted dataset by G , which is known only to C 1 .

Secure Retrieval of Shortest Path (SRSP)
Following from Stage 1, C 1 has encrypted graph dataset G . During Stage 2, Bob with private input s, t , C 1 with private input G and C 2 with private key sk wants to jointly find the shortest path from s to t in a privacy-preserving manner. At the end of Stage 2, SP(s, t) should be known only to Bob.
The main steps involved in Stage 2 of ESPADE are shown in Algorithm 3. Next, we discuss the steps of Stage 2 in detail below:

•
To start with, Bob creates a graph G s that initially contains no values except his starting point s.
The goal of Bob is to expand G s by retrieving graph data from FC in an iterative manner until he has sufficient graph data to construct SP(s, t). First, he computes the grid location (e.g., using his GPS) in which his source location s resides, denoted by cg. He also sets the current vertex cv to s. Now, Bob wants to request cg's grid data from FC so that he can expand G s without revealing any information about cg to C 1 and C 2 . A trivial approach here is for Bob to encrypt cg and forward it to C 1 , but this would require exponentiation module N 2 operations, which are expensive.
To avoid this, Bob splits cg into two random shares cg 1 and cg 2 , such that cg 1 = cg + r mod N and cg 2 = N − r, where r ∈ R Z N . Bob sends cg 1 and cg 2 to C 1 and C 2 , respectively. • Upon receiving cg 2 , C 2 encrypts it under pk and forwards E pk (cg 2 ) to C 1 .

•
After receiving cg 1 from Alice and E pk (cg 2 ) from C 2 , C 1 computes E pk (cg) by performing homomorphic operations, as E pk (cg 1 ) * E pk (cg 2 ) mod N 2 . Then, it obliviously checks which grid's information Bob is requesting. To achieve this, C 1 computes ∆ i = E pk (cg) * E pk (i) N−1 mod N 2 . The idea behind this operation is to subtract Bob's requesting grid ID cg from all grid IDs under encryption. The observation here is that exactly one of the values of ∆ is an encryption of 0. C 1 randomizes ∆ by computing X i = ∆ r i i mod N 2 , where r i ∈ R Z N . Then, C 1 randomly permutes Y ← π(X) and sends Y to C 2 . Here π is a random permutation function known only to C 1 .
• Upon receiving Y, C 2 decrypts it component-wise using the private key sk resulting in a new vector Z. It is worth noting that only one of the entries in Z is 0. Now, C 2 generates a new encrypted vector P based on whether the value of Z i is 0 or not. Specifically, it generates P as follows and sends it to C 1 :

Algorithm 3 SRSP(G , s, t )
Require: Bob holds SSSD query s, t ; C 1 holds G and C 2 holds the private key sk 1: Bob: (a). G s ← s and cv ← s (b). Compute the current grid ID cg of cv (c). Compute two random shares of cg as cg 1 ← cg + r mod N and cg 2 ← N − r, where r ∈ R Z N (d). Send cg 1 to C 1 and cg 2 to C 2 2: C 2 : (a). Receive cg 2 from Bob (b). Compute and send E pk (cg 2 ) to C 1 3: C 1 : (a). Receive cg 1 from Bob and E pk (cg 2 ) from C 2 (b). E pk (cg) ← E pk (cg 1 ) * E pk (cg 2 ) mod N 2 (c). for 1 ≤ i ≤ n do: Send P to C 1 5: C 1 : (a). Q ← π −1 (P) (b). for 1 ≤ i ≤ n do: Send λ 1 to Bob and Λ to C 2 6: C 2 : (a). λ 2,j ← D sk (Λ j ) for 1 ≤ j ≤ 3m + 1 (b). Send λ 2 to Bob 7: Bob: (a). Receive λ 1 from C 1 and λ 2 from C 2 (b). λ j ← λ 1,j + λ 2,j mod N, for 1 ≤ j ≤ 3m + 1 (c). Update G s based on λ j and execute Dijkstra's algorithm (d). if t is marked as visited then return SP(s, t) else Identify the new neighboring vertex cv and proceed to step 1(b) • C 1 performs inverse permutation on P to get Q = π −1 (P). It is worth noting that Q i equals E pk (1) if i = cg, and Q i = E pk (0) otherwise. After this, C 1 with private input Q, G and C 2 with private key sk are involved in a set of secure multiplication operations. Specifically, C 1 with input Q i , G i,j and C 2 jointly execute SM(Q i , G i,j ), for 1 ≤ i ≤ n and 1 ≤ j ≤ 3m + 1. Suppose Φ denotes the output of SM. Since Q consists of E pk (1) only for Bob's current grid cg, secure multiplication will result in Φ to store the aggregated grid information of cg. For i = cg, every other entry in G i is multiplied by E pk (0); thus, the result will be encryptions of 0's for all other grids. Note that the output of SM-that is Φ i,j -is known only to C 1 , for 1 ≤ i ≤ n and 1 ≤ j ≤ 3m + 1. Next, C 1 aggregates all the SM results column-wise locally. That is, The important observation here is that Λ contains the entire grid data in which cg resides. At this point, C 1 needs to somehow send the current grid data to Bob. In order to alleviate the overload on Bob, C 1 utilizes the randomization approach. That is, C 1 selects random numbers r j ∈ R Z N and adds it to Λ using additive homomorphic properties by computing Λ j ← Λ j * E pk (r j ) mod N 2 . Additionally, C 1 computes λ 1,j = N − r j . Now, C 1 sends λ 1,j to Bob and Λ j to C 2 , for 1 ≤ j ≤ 3m + 1.

•
After receiving the encrypted randomized vector Λ , C 2 decrypts it component-wise using sk to get λ 2,j , for 1 ≤ j ≤ 3m + 1. Then, C 2 sends λ 2 to Bob. Due to the randomization by C 1 , it is worth noting that the decrypted values in this step do not reveal any information to C 2 . • Finally, upon receiving λ 1 and λ 2 from C 1 and C 2 , Bob adds them component-wise to get λ j ← λ 1,j + λ 2,j mod N, for 1 ≤ j ≤ 3m + 1 which consists of cg's grid data. Bob will then update G s based on this new grid information and executes the Dijkstra's algorithm to determine the shortest path marking each vertex with minimum distance as visited. If Bob's destination t is in this subgraph and marked as visited, Bob can calculate the shortest distance from s to t locally, and thus terminates the protocol by returning SP(s, t). Otherwise, Bob identifies the new vertex nv for which the grid information is missing and sets it as the new current vertex cv. Then, the algorithm is repeated (i.e., go to step 1(b) of Algorithm 3) with an updated cv as input to the next iteration.

Example 2.
In this example, we consider a sample weighted graph G (refer to Figure 2) and illustrate various intermediate steps during the execution of ESPADE. Here G consists of 16 vertices, denoted from A to P, and it is split into four evenly distributed square grids. The four grid cells are denoted by M 1 , . . . , M 4 . Without the loss of generality, suppose Bob wants to retrieve the shortest path from A to P using ESPADE. Various intermediate results along with the shortest path discovery process in each iteration are shown in Figure 3. For brevity, we only show the steps involved during Stage 2 of ESPADE.
• Iteration 1: Initially, Bob sets his current vertex cv to A. In this case, the current grid ID of cv is 1 since cv resides in M 1 . That is, Bob sets cg = 1. Bob randomly splits his cg information and sends them to C 1 and C 2 , separately. At the end of the first iteration, Bob receives all the vertex and associated edge weight information of M 1 . He updates his sub-graph G s (refer to Figure 3a) and executes Dijkstra's algorithm. After marking C as visited, Bob makes I the current vertex.

Performance Analysis of ESPADE
In this section, we present the security and performance analysis of the ESPADE protocol. First, we formally show that ESPADE is secure under the semi-honest model of SMC. Then, we discuss the complexity costs of our protocol and provide a comparative analysis with two existing solutions including experimental results.

Security Analysis under the Semi-Honest Model
As mentioned earlier, we assume that all the parties participating in our protocol are semi-honest as it is the commonly used adversarial model. Under the semi-honest model, parties follow the prescribed steps of the protocol, but they are free to deduce any meaningful information about the inputs based on the messages they see during the execution of the protocol. In this paper, we adopt the following security definition that is commonly used for the semi-honest model [19,35]: Definition 1. Suppose P be a party participating in a protocol π with input a and output b. Assume that Π P,real (π) denotes the P's execution image of π. In general, an execution image of P consists of all the inputs, outputs, and the intermediate messages it sees during the execution of the protocol. Let Π P,sim (π) denote the P's simulated image based on π and a, b . Then, π is secure if the (real) execution image of P is computationally indistinguishable from its simulated image.

Proof of Security for Stage 1
During Stage 1, the execution image of C 2 is given by i,j is the encrypted value received from Alice at step 2(a) of Algorithm 2. Additionally, F i,j is a random number derived upon decrypting T 2 i,j . Without loss of generality, let the simulated image of C 2 be given as where a i,j and a i,j are randomly chosen from Z N 2 and Z N , respectively. Since the encryption function of Paillier's scheme is semantically secure, it is guaranteed that T 2 i,j will be a random number in Z N 2 . Thus, T 2 i,j is indistinguishable from a i,j . Similarly, F i,j is distinguishable from a i,j . As a result, Π C 2 ,sim (ESPADE) is computationally indistinguishable from Π C 2 ,real (ESPADE), which implies that C 2 does not learn anything during the execution of Stage 1.
On the other hand, the execution image of C 1 in Stage 1 is given by Π C 1 ,real (ESPADE) = {T 2 }, where T 2 is an encrypted matrix sent by C 2 (at step 2(c) of Algorithm 2). Let the simulated image of C 1 be Π C 1 ,sim (ESPADE) = {B}, where the values of B i,j are randomly selected from Z N 2 . Since E pk is a semantically secure encryption function, it guarantees that ciphertexts T 2 i,j are randomly distributed in Z N 2 . This shows that T i,j is computationally indistinguishable from B i,j ; thus, Π C 1 ,sim (ESPADE) is computationally indistinguishable from Π C 1 ,real (ESPADE). Therefore, C 1 cannot deduce any information about G in Stage 1.

Proof of Security for Stage 2
In this sub-section, we formally prove that Stage 2 of ESPADE is secure as per Definition 1. Without the loss of generality, we consider the messages exchanged between C 1 and C 2 in a single iteration. We emphasize that similar analyses can be carried out for other iterations.
During Stage 2, the execution image of C 2 is given by, where cg 2 is a randomized value sent by Bob (at step 1(d) of Algorithm 3). Additionally, Y is an encrypted vector sent by C 1 at step 3(d) and Z is the resulting decrypted vector, such that exactly one of the entries is 0 and all others values are random numbers in Z N . Similarly, Λ is the encrypted randomized vector sent by C 1 at step 5(d) and γ 2 is the resulting decrypted vector. Without loss of generality, let the simulated image of C 2 be given by, Here h denotes a random number generated from Z N . I denotes a vector of size 3m + 1 whose elements are randomly selected from Z N 2 whereas vector J is randomly generated such that only one of the entries is 0 and the remaining entries are random numbers in Z N . Similarly, K and L denote vectors of size 3m + 1 whose elements are randomly selected from Z N 2 and Z N , respectively. Since cg 2 and h are both random numbers from Z N 2 , it is evident that they are computationally indistinguishable. Since E pk generates ciphertexts that are uniformly random in Z N 2 , we conclude that I and K are computationally indistinguishable from Y and Λ , respectively. Plus, J is computationally indistinguishable from Z as they both have exactly one of the entries as 0 and remain as random numbers in Z N . Furthermore, L and γ 2 are vectors consisting of random numbers chosen from Z N ; thus, they are computationally indistinguishable. Based on the above results, it is implied that C 2 cannot deduce any information about G and Bob's query during Stage 2.
Similarly, according to Algorithm 3, we can show that C 1 's execution image can be simulated from random numbers. The important observation here is that all the messages received by C 1 (from Bob and C 2 ) are either in an encrypted format or are randomized numbers distributed in Z N . Therefore, no information is revealed to C 1 .
Following from Algorithm 2, we emphasize that ESPADE is constructed by sequentially combining the two stages. As shown above, Stages 1 and 2 are secure under the semi-honest model. Additionally, it is worth noting that the output of Stage 1 (which is in encrypted format) is passed as an input to Stage 2. Therefore, by Composition Theorem [35], we can conclude that ESPADE is secure under the semi-honest adversary model. That is, ESPADE ensures that neither the contents of G nor the shortest path query is revealed to C 1 and C 2 . Therefore, our proposed protocol meets all the three privacy objectives (i.e., PO1, PO2, and PO3), described in Section 1.

Complexity Analysis
In this sub-section, we analyze the computation, communication and round complexities of the proposed ESPADE protocol. Table 2 shows the complexity costs incurred for various participating parties during the execution of our protocol.

Computation Costs
In our problem setting, we explicitly assume that the value of generator g is set to N + 1 which helps us to optimize the Paillier encryption function without affecting the underlying security guarantees. Specifically, when g = N + 1, Paillier's encryption function can be reduced to E pk (m) = (1 + m * N) * r N mod N 2 , for any message m ∈ Z N . Additionally, it is worth noting that the computation of r N mod N 2 can be done offline, since it is independent of the message to be encrypted. As a result, the actual online computation cost of Paillier encryption is two multiplications (under modulo N 2 ). We refer the reader to [36] for detailed security analysis on this setting. Following from the above optimizations, we only consider the online computation costs during our complexity analyses of ESPADE.

Online Computation Communication (in bits) Round
Alice (one-time) (6m + 2) * n additions 2n * (3m + 1) * log N - First, during step1(a) of Stage 1, the computation cost of Alice is bounded by (6m + 2) * n addition operations. Note that Alice does not participate in any other operations after outsourcing the data to FC. It is evident that Alice's computation costs are negligible as it is a one-time cost. Similarly, the computation cost of Bob mainly depends on steps 1 and 7 in Algorithm 3. In each iteration, Bob needs to perform O(m) additions. Since the number of iterations is bounded by O(n), the total computation cost of Bob in ESPADE is bounded by O(nm) additions (which is low compared to the overall computation cost of the protocol as shown below).
Next, we discuss the computation costs incurred on the federated cloud. During Stage 1, at step 2(b) of Algorithm 2, C 2 is involved in n * (3m + 1) encryption operations, which is equivalent to 2n * (3m + 1) multiplication operations. Here n denotes the total number grids and m denotes the maximum number of 1-hop neighbors, a vertex can have in G. Additionally, at step 3(b), C 1 performs n * (3m + 1) multiplications. Combining these results, the total computation cost of FC in Stage 1 is bounded by O(mn 2 ) multiplications.
During Stage 2, the computation cost of FC mainly depends on steps 3, 4, and 5 of Algorithm 3. For steps 3 and 5, C 1 needs to perform (n log N) and (mn log N) multiplications, respectively. For step 4, C 2 is involved in log N multiplications. Therefore, the total computation cost of FC in stage 2 is bounded by O(mn log N) multiplications. Putting everything together, the total computation cost of FC (i.e., the combined computation costs of C 1 and C 2 ) in ESPADE is bounded by O(mn 2 log N) multiplications.

Communication and Round Complexity
One the one hand, for Alice, the communication cost depends on step 1(c) of Algorithm 2, where it sends out two matrices T 1 and T 2 to FC. The size of these matrices is n × (3m + 1). Since each entry in these matrices is a random number chosen from Z N , the total size of each matrix is n * (3m + 1) * log N bits. Therefore, the total communication cost of Alice is (2n * (3m + 1) * log N) bits. On the other hand, in each iteration of Stage 2, Bob splits the cg value into two random shares and forwards them to FC. This results in 2 log N bits of communication. Since the number of total iterations is bounded by O(n), the total communication cost of Bob is bounded by O(n log N) bits. On the other hand, the total communication cost of FC is bounded by O(mn 2 log N) bits.
The number of communication rounds between Bob and FC is bounded by O(n). Furthermore, the number of communication rounds between C 1 and C 2 is bounded by O(n). Therefore, the round complexity of ESPADE is bounded by O(n).

Performance Comparison with Existing Work
In this sub-section, we compare the performance of ESPADE with two closely related works, namely PSPEG 1 and PSPEG 2 [14]. The performance comparison results are shown in Table 3. On the one hand, PSPEG 1 is based on a single-cloud architecture whereas PSPEG 2 and ESPADE adopt a two-cloud federated model. Additionally, we observe that PSPEG 1 , PSPEG 2 and ESPADE always produce correct results as the underlying operations in all the three protocols are constructed based on Dijkstra's algorithm. On the other hand, the security guarantees of PSPEG 1 and PSPEG 2 are unclear as no formal proofs were provided to demonstrate the confidentiality of the outsourced data, privacy of the user's shortest path query and the protection of access patterns. In our case, as we formally showed in Section 6.1, ESPADE is semantically secure under the semi-honest model; thus, it meets all the three privacy criteria. Additionally, it is worth noting that ESPADE hides data access patterns due to the underlying random permutation operations.
We observe that PSPEG 1 is very inefficient in terms of computation and communication complexities. Specifically, since PSEPG 1 utilizes a single-cloud model, it incurs significant costs on the data owner Alice and the end-user Bob. For the data outsourcing step, the computation costs of Alice in PSPEG 1 and PSPEG 2 are bounded by O( mn log N) and O(m|V| log N) multiplications, respectively, where n denotes the number of grids and |V| denotes the number of vertices in G. Unlike PSEPG 1 and PSEPG 2 , our proposed ESPADE protocol utilizes a random splitting approach during the data outsourcing step, which incurs low costs on Alice. In particular, the computation cost of Alice is bounded by O(mn) additions. It is clear that the computation cost of Alice is significantly less in ESPADE in comparison with PSEPG 1 and PSEPG 2 .
For PSEPG 1 , the computation cost of Bob is bounded by O(n 2 log N ) multiplications, where N denotes the RSA moduli, such that N 2 < N . In PSPEG 2 , this computation burden is alleviated to a certain extent by pushing some expensive computations from Bob's side to the second cloud. Nonetheless, the computation cost incurred on bob is bounded by O(|V| log |V|) which is still high even for moderate-sized graphs (|V| ≈ 10, 000). In ESPADE, the computation cost of Bob is bounded by O(mn) additions. It is worth noting that, with the effective combination of homomorphic encryption properties and our secure data aggregation technique, ESPADE significantly improves the computation cost of Bob. As shown in Table 3, ESPADE incurs overall less computation and communication costs compared to PSEPG 1 and PSEPG 2 , especially for n < |V|. Furthermore, the round complexities of PSEPG 1 and ESPADE are bounded by O(n), whereas for PSEPG 2 it is bounded by O(|V|).

Experimental Results
In this sub-section, we demonstrate the superiority of ESPADE over PSPEG 1 and PSPEG 2 through empirical analysis. All three protocols were implemented in Java using the BigInteger Class to handle arbitrary-precision arithmetic operations, and experiments were conducted on a Intel R Core TM i7 3.1 GHz PC running macOS 10.13.6 High Sierra with 16GB memory. The Paillier encryption key size is set to 1024 bits (i.e., the size of N in bits is 1024) and all the results presented are average values over five executions.
In our experiments, randomly generated datasets were used. For = 20, m = 20, n = 100, |V| = 5000, the running time for Alice in PSPEG 1 and PSPEG 2 are 4.11 and 6.81 min, respectively, whereas the running time for Alice in ESPADE is 21 milliseconds. Note that Alice needs to perform expensive exponentiation operations to encrypt the matrix data in PSPEG 1 and PSPEG 2 , whereas in ESPADE Alice simply involves in modulo addition operations. Additionally, the running time for Bob in PSPEG 1 and PSPEG 2 are 1.02 min and 121 milliseconds, respectively. The running time for Bob in ESPADE is 7 milliseconds. The above results clearly justify our performance analysis in Section 6.3 and show that ESPADE is significantly more efficient, by several orders of magnitude, than PSPEG 1 and PSPEG 2 .
In summary, by using the proposed data aggregation approach in ESPADE, Alice can securely outsource her graph data and effectively delegate the shortest path query processing task to Federated Cloud (FC). Furthermore, the costs incurred on Bob during Stage 2 of ESPADE are minimal. Based on the above discussions, we conclude that ESPADE significantly offers improved performance in computation and communication load over PSPEG 1 and PSPEG 2 , and at the same time, offering a higher level of security protection.

Conclusions
Existing research shows that location-based services violate user's privacy and the issue becomes even more challenging when such applications are pushed to remote and non-trusted cloud servers. In this paper, we addressed the single-source single-destination shortest path query processing problem in outsourced LBS. Specifically, we proposed an efficient and semantically secure shortest path discovery protocol for encrypted graph data outsourced to a federated cloud environment. At the core of our proposed ESPADE protocol, we utilized homomorphic encryption combined with a novel data aggregation technique to enable the cloud service providers to operate over encrypted aggregated data in a privacy-preserving manner. We formally showed that our protocol is secure under the semi-honest model and also hides access patterns. Additionally, we discussed the complexity analysis of ESPADE and demonstrated that it is more efficient and secure compared to PSPEG 1 and PSPEG 2 . Our experimental results show that ESPADE is significantly faster than the existing solutions.
Improving the performance of ESPADE further largely depends on minimizing the amount of data sent from the federated cloud to Bob. Additionally, improving the performance of the secure multiplication protocol is another important step to improve the overall efficiency of ESPADE. For future work, we will investigate better data aggregation and pruning techniques to enhance the secure retrieval of the shortest path (Stage 2) process in ESPADE. We will also extend our research to other graph mining tasks, such as minimum spanning tree and breadth-first search, over encrypted graph data. Another direction for future work is to extend the ESPADE protocol into a secure protocol under other adversarial models (e.g., covert and malicious models).