An Agent-Based System for Location Privacy Protection in Location-Based Services

Aloufi, Omar F.; Alfakeeh, Ahmed S.; Alotaibi, Fahad M.

doi:10.3390/ijgi14110433

Open AccessArticle

An Agent-Based System for Location Privacy Protection in Location-Based Services

by

Omar F. Aloufi

^1,2,*,

Ahmed S. Alfakeeh

¹

and

Fahad M. Alotaibi

¹

Information Systems Department, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia

²

Department of Information Systems, College of Computer Science and Engineering, Taibah University, Madinah 42353, Saudi Arabia

^*

Author to whom correspondence should be addressed.

ISPRS Int. J. Geo-Inf. 2025, 14(11), 433; https://doi.org/10.3390/ijgi14110433

Submission received: 24 August 2025 / Revised: 27 October 2025 / Accepted: 1 November 2025 / Published: 3 November 2025

Download

Browse Figures

Versions Notes

Abstract

Location-based services (LBSs) are a crucial element of the Internet of Things (IoT) and have garnered significant attention from both researchers and users, driven by the rise of wireless devices and a growing user base. However, the use of LBS-enabled applications carries several risks, as users must provide their real locations with each query. This can expose them to potential attacks from the LBS server, leading to serious issues like the theft of personal information. Consequently, protecting location privacy is a vital concern. To address this, location dummy-based methods are employed to safeguard the location privacy of LBS users. However, location dummy-based approaches also suffer from problems such as low resistance against inference attacks and the generation of strong dummy locations, an issue that is considered an open problem. Moreover, generating many location dummies to achieve a high privacy protection level leads to high network overhead and requires high computational capabilities on the mobile devices of the LBS users, and such devices are limited. In this paper, we introduce the Caching-Aware Double-Dummy Selection (CaDDSL) algorithm to protect the location privacy of LBS users against homogeneity location and semantic location inference attacks, which may be applied by the LBS server as a malicious party. Then, we enhance the CaDDSL algorithm via encapsulation with agents to solve the tradeoff between generating many dummies and large network overhead by proposing the Cache-Aware Overhead-Aware Dummy Selection (CaOaDSL) algorithm. Compared to three well-known approaches, namely GridDummy, CirDummy, and Dest-Ex, our approach showed better performance in terms of communication cost, cache hit ratio, resistance against inference attacks, and network overhead.

Keywords:

agent; cache; inference attacks; dummies; privacy protection

1. Introduction

With the rapid development of mobile computing and network technology, smartphones have become a necessity in people’s lives. In addition to satisfying daily communication needs, smartphones provide many services that make our lives easier and more enjoyable. Among these services, location-based services (LBSs) are popular and attractive and constitute a basic component of the Internet of Things (IoT) [1,2,3,4,5,6]. One important feature provided by LBSs is that they enable LBS users to search for points of interest (POIs), such as nearby restaurants, metro stations, and medical centers. Despite the various benefits provided by LBSs, the intrinsic privacy leakage problem cannot be ignored due to the openness of wireless networks [7].

The main reason for the privacy leakage problem in LBSs is that LBS users are forced to reveal their real locations when asking for POIs. The real location of an LBS user is included in the query sent to the LBS provider in the form (<X, Y>, POI, Range, and ID). Table 1 gives a description of the LBS user’s query.

Figure 1 presents the reference scenario of LBS application use from an abstraction perspective, where the queries are built/issued based on the form described in Table 1.

Tracking the real location of an LBS user enables an attacker to infer some sensitive information, which, in turn, harms the privacy of the LBS user. This is because it gives a deep look at aspects of his/her personal life, such as customs, habits, and religious or political beliefs. Moreover, the attacker can use different methods to track the real location of the LBS users; for example, Ref. [2] provides a wide spectrum of advanced tracking methods. As a result, protecting the location privacy of LBS users has become a pressing need.

One of the most important proposed approaches to protect the privacy of LBS users is to use dummies. In the context of LBS privacy protection, “dummy” is a term that refers to a query built based on a false location. If the LBS user masks the real location by some fabricated locations, location privacy is protected [8,9,10]. Therefore, the current query (i.e., real query) is mixed with multiple other queries built based on dummy locations in a way that minimizes the attacker’s capability to retrieve the real location among the dummies. The main purpose of the mixing process is achieving the concept of k-anonymity, which guarantees that the real query issuer cannot be determined among the (k − 1) dummies [11,12]. Figure 2 illustrates the general conceptual framework of dummy-based location privacy protection approaches.

1.1. Motivation

Because all information related to the activities of the LBS users is stored in an LBS server, the LBS server or its maintainer can act as an attacker (i.e., a malicious party). This, in turn, exacerbates the problem of privacy protection because all previous information will be accessible. The best way to protect the location privacy of an LBS user is to increase the k-anonymity level by increasing the number of dummies that are sent. However, increasing the k-anonymity level requires more computation, and the mobile devices of the LBS users suffer from a low computational power, limited storage, and short battery lifetime. In addition, weak dummy locations can be filtered by the attacker to identify the real location of the LBS user. Weak dummy generation is considered an open problem according to the survey provided in [13]. Beyond generating weak dummies, the attacker can apply inference attacks such as a location homogeneity attack [14] or semantic location attack [15] to identify the real location of the LBS user. Thus, the need for a robust location privacy protection approach is a top requirement. In addition to the above, LBS users use different mobile devices that run different operating systems, such as Android and IOS. Therefore, from a technical point of view, the installed privacy protection method should be platform-independent; however, to the best of our knowledge, this aspect has not been addressed previously.

The best way for safeguarding the privacy of the LBS users against potential LBS provider-side breaches (as a malicious party) is to decrease the number of connections of each individual LBS user. This can be achieved by caching the responses to previous queries to answer future queries. To avoid performing a large computation on the LBS user’s mobile device while simultaneously achieving a high k-anonymity level, we can allow a mobile code (a software agent) to migrate to the LBS server to generate numerous dummies there by taking advantage of the LBS server’s high computational power. Thus, instead of generating dummies on the LBS user side and sending them via the network, the mobile agent generates the dummies on the LBS server side. This sufficiently decreases the network overhead.

Offering a comparison of our approach to well-known dummy-based privacy protection approaches (such as CirDummy, GridDummy [9], and the Destination Exchange (Dest-Ex) method [10]), Table 2 above highlights the properties of our proposed approach.

1.2. Contributions

In this paper, we employ agent software technology to build an architecture that ensures location privacy protection for LBS users. Our proposed privacy protection architecture relies on generating strong dummies to ensure location privacy protection. We exploit the mobility feature of the agent to find the optimal tradeoff between ensuring a high K-anonymity level and limiting the network overhead. The main contributions of our work are as follows:

We introduce a novel agent-based LBS privacy protection architecture that is compatible with heterogeneous platforms and has good performance quality attributes.
We propose the Caching-Aware Double-Dummy Selection (CaDDSL) algorithm to generate/select strong location dummies for location privacy protection purposes. Our proposed CaDDSL algorithm employs normalized distance to achieve two objectives: generating/selecting the dummies that have the highest contributions to the cache for answering the future queries, and decreasing the number of connections with the LBS server (a malicious party).
Considering resistance against inference attacks, we enhance the CaDDSL algorithm to obtain the Caching-Aware Overhead-Aware Dummy Selection (CaOaDSL) algorithm. Our CaOaDSL algorithm ensures resistance against homogeneity location attack and semantic location attack based on generating/selecting the dummies using non-normalized distance. In addition, it guarantees a high privacy protection level with minimal network overhead.
We introduce a novel privacy metric to estimate the compromised privacy of the LBS user. Our privacy metric mainly depends on the entropy metric, and it is used to alert LBS users to situations in which they are vulnerable to privacy, a feature that has not been offered previously, to the best of our knowledge.

1.3. Structure of Paper

The remainder of this paper is organized as follows: Section 2 discusses related work. Our proposed agent-based architecture is provided in Section 3. Section 4 discusses the security analysis based on a defined threat model. Section 5 describes the metrics used. Section 6 presents the results of the evaluations, and the conclusion is provided in Section 7.

2. Related Works

Location-based services have become a vital part of our daily lives, and privacy issues have become a major concern of LBS users. Researchers have responded by building defenses against attackers. Refs. [16,17] provide taxonomy of privacy protection approaches. The basic classes are described below.

2.1. Server-Based Approaches

The most important features of the approaches that belong to the server-based category are as follows: (i) the mission of the LBS user is to formulate and send his/her query to ask for the POIs; and (ii) the protection method is executed at the LBS server side, taking into account that the LBS server is a trusted third party (TTP). Different techniques are used to ensure location privacy protection, such as policy-based, noise-based, and clocks.

In Ref. [18], the researchers address the problem of latency in providing and benefiting from location-based services. The focus is on the quality of services, an essential aspect of which is privacy protection. To ensure this protection, the location (LOC) lookup method is proposed. The LOC lookup method involves grouping the LBS-enabled applications according to the kind of the LBS application offered by the servers. The LOC lookup method is an indexing technique used to speed up the response of the LBS-enabled applications, where the privacy of LBS users is ensured based on the policy provided by each LBS server individually. The proposed method adopts the 6G standard. The results show that the average response time is about 3.35 ms, and it provides a significant level of privacy protection.

The authors of [19] address the tradeoff that occurs when examining the quality of services provided by LBS-enabled applications and the leverage of precise location coordinates of LBS users to others (maybe attackers). To solve this problem, the authors propose a differential privacy-based location privacy protection algorithm (DPLPA). A clustering component groups the locations of LBS users (based on the density of connections between locations) into clusters. The location information of the LBS users within a certain cluster is represented by the centroid of the cluster to protect the privacy. The Laplace noise helps to shape the centroid, consequently blurring the personal information of the LBS users. To maintain the quality-of-service aspect, a privacy budget is employed with the purpose of generating the responses of the LBS queries with a high utility. The results show that the level of privacy protection achieved is about 80% when implementing the proposed system on two different datasets (Geolife and Gowalla). However, the limitation of this work is that the LBS server is a trusted third party.

Ref. [20] addresses the problem of generating clocked regions of an appropriate size and within a reasonable time to ensure quality of service and privacy protection simultaneously. As a solution to this issue, the authors propose the ICRR (Inverted Cached Result Registry) approach. The objective of ICRR is to limit the search space of the LBS queries by searching for the answers within the most frequent queries posed by the LBS user. The search space is limited by using the enhanced cloaking algorithm (ECA). The ECA works by utilizing the map reduction technique to partition the overall graph (which models the geographical area the LBS user locates when issuing LBS queries) into sub-graphs to improve the uncertainty of providing the accurate locations of the retrieved PoIs, as well as minimizing the response time. The results show that by increasing the size of the clocked regions, better privacy protection is achieved, with an improvement of about 11%. In addition, the clocking time decreases by 2.5%, which is a significant improvement in response time. The limitation of this work is that it ignores the time required to process the LBS queries on the LBS server side.

2.2. User-Based Approaches

The main features of the approaches classified under the user-based category are as follows: (i) the LBS user has full control of his/her privacy protection level, and the protection method is implemented on the LBS user’s mobile device; and (ii) the LBS server is considered a malicious party (i.e., an attacker). The most common techniques used in this category are dummies and pseudonyms.

In Ref. [9], dummy-based techniques are employed to safeguard the location privacy of LBS users, utilizing two distinct dummy generation approaches. The first, CirDummy, creates dummies by forming a virtual circle around the user’s real location, while the second, GridDummy, generates dummies using a virtual grid encompassing the user’s position. Meanwhile, [10] introduces the Destination Exchange (Dest-Ex) method, which leverages historical motion trajectories of LBS users to produce dummies. To enhance robustness, Dest-Ex specifically selects past trajectories that intersect with the user’s current path. It is worth mentioning that the methods used in both Refs. [9,10] will be involved in a comparison with the proposed methods of this study.

Ref. [21] Propose an Ijk-anonymity-based scheme to protect the query intent (i.e., PoI) the LBS user wants to retrieve. The I parameter in the Ijk-anonymity refers to the number of PoIs, which indicate the query intent. PoIs are replaced by (I − 1) dummy PoIs to prevent the attacker from extracting the real intent. The j parameter in the Ijk-anonymity refers to the number of segments into which the real location of the LBS is divided for the purpose of ensuring location privacy. The k parameter in the Ijk-anonymity refers to the number of queries the attacker gathers. We must take into account that in the environment where the Ijk-anonymity-based scheme works (which is a crowd of mobile users), the number of gathered queries is greater than the number of dummy queries that a single LBS user constructs and sends. This, in turn, strengthens the defense against the attacker attempting to infer personal information. Thus, the Ijk-anonymity-based scheme provides both query and location privacy protection for the LBS-enabled applications. The workspace within which the scheme is implemented is the range of Knn-LBS queries, defining a region inside which the LBS user seeks a PoI (i.e., within 2 KM from the actual location). The results showed that by increasing the Ijk-anonymity level, this decreases the attacker’s ability to distinguish the real intent or real location among dummies. The limitation of this approach is that it ignores the context used to select the (i − 1) dummy intents, thus making it weak against homogeneity and semantic attacks.

By integrating pseudonym and dummy approaches, Ref. [3] proposes the Dummy ID pipeline to prevent the tracking of the LBS users in the military domain. The authors address the problem of attackers (enemy) being able to gather sensitive information about army personnel and the movements of military vehicles and other equipment, thereby posing a significant privacy and security issue. The researchers stated that privacy protection in the military domain is critical since it may lead to losses of military personnel and equipment if their exact locations are revealed. With the Dummy ID approach, the real trajectory of the movements of soldiers and vehicles is hidden by using a bool of IDs, where the real ID is distributed among the Dummy IDs to minimize the enemy’s ability to track the actual trajectory. The Dummy ID approach is supported by a pseudonym to increase the level of privacy protection. The pseudonym approach plays an important role as it confuses the attacker and prevents the discovery of the location of the military entities, referred to in this work as the “silence period”. To evaluate the system, the authors provided the anonymity level (new metric) obtained from the probability of linking between the Dummy ID and the silence period. It is used to measure the ability of enemies to identify the real IDs. The results showed that the average level of anonymity is 13.3 when the number of nodes (military entities) is 30. However, the Dummy ID approach does not consider the possibility that the enemy may have reliable information about the geographical area within which there is active warfare.

However, the drawbacks of the approaches classified under this category are tightly coupled with the user’s mobile device and can be summarized as follows: (i) capacity storage limitation, (ii) low computational capability, and (iii) short battery lifetime. In addition, the process of generating weak dummies is considered an open problem because it enables the attacker to filter the dummies to determine the real position of the LBS user.

3. Proposed Privacy Protection Architecture

This section explains our proposed agent-based privacy protection architecture, which achieves the quality attributes mentioned above and discusses the issues summarized in Table 2. Four main subsections are included. The first presents our system model. The second presents our agent-based architecture. The third explains the roles of agents that are used, including in our proposed algorithms. The last illustrates the architecture details through UML diagrams. Note that the POIs are assumed to be static; we do not address moving POIs or manipulating K-Nearest Neighbor (K-NN) queries.

3.1. System Model

For a network of

(n \times n)

cells that shapes an area, let

(g)

denote the number of users in each cell, with each cell containing

(p_{i n t r e s t})

POIs. The overall scenario, which our system follows to minimize the connections to the untrusted LBS provider, is depicted in Figure 3.

As shown in Figure 3, to retrieve nearest hotels, the weak approach is to send real queries with real positions to the LBS server, constituting a direct privacy threat. To protect location privacy, the LBS user deliberately issues many queries with dummy locations that ask for the same POIs; in Figure 3, these queries are denoted by the continuous lines. The LBS server’s responses are cached in the access point so that the user can benefit from previous queries that have been answered over time. Therefore, the LBS user first tries to retrieve POIs from the cache, and when they exist, the task is completed. Otherwise, the connection to the LBS server is mandatory.

3.2. Agent-Based Location Privacy Protection Architecture

The framework consists of the malicious LBS provider, the cache (access point), and a set of mobile devices. The system is controlled via two agents (M-A-user and S-A-cache), as shown in Figure 4.

Table 3 presents the agents, their type, and the location of installation within the proposed architecture.

3.3. Roles of the Agents

M-A-user: This mobile agent calculates the privacy level of the LBS user based on the privacy metric (provided in the Section 5) and alerts the LBS user when he or she has reached a dangerous situation. To the best of our knowledge, no such feature has been provided in previous work. In addition, according to the general scenario of our system illustrated in Figure 3, this mobile agent first migrates to the cache to search for the query answer with the assistance of the S-A-cache agent. If no answer is found in the cache, it migrates to the LBS server and generates strong dummies there to protect the location privacy of the LBS user. To complete this task, the M-A-user agent executes an algorithm called the Caching-Aware Overhead-Aware Dummy Selection algorithm (CaOaDSL), which is described below.

Caching-Aware Overhead-Aware Dummy Selection (CaOaDSL) Algorithm: The primary goal of the CaOaDSL algorithm is to provide robust dummy locations to safeguard the location privacy. During the generation stage, appropriate locations are chosen that are indistinguishable from the user’s actual location. The selection process is not arbitrary; the process selects the dummy locations that contribute most to the cache. The process of selecting cells (as dummy locations) depends on the probability that each cell was queried in the past (query probability). Moreover, to ensure indistinguishability, cells with matching query probabilities to the user’s real location are selected, making it computationally infeasible for an attacker to identify the real location among

(k - 1)

dummies. This approach generates strong dummy locations and achieves a higher privacy protection degree. Figure 5 illustrates the dummy location selection process.

For the previously given region

(R)

, which includes

(n \times n)

cells, let

(q p)

refer to the query pro-bability of a cell. Then, we have

\sum_{i = 1}^{n^{2}} {q p}_{i} = 1

. For the

k

locations (i.e., cells) contained in a query, which consists of one real location and

(k - 1)

dummies, each location has a conditional probability of being the real location. Let

{\overset{´}{p}}_{i} (i = 1, 2, \dots, k)

denote the probability that the

i^{t h}

location is the real location. Then,

{\overset{´}{p}}_{i} = \frac{{q p}_{i}}{\sum_{j = 1}^{k} {q p}_{i}}

.

The entropy

(E)

of identifying the real location out of the dummy set is defined as follows:

E = - \sum_{i = 1}^{k} {\overset{´}{p}}_{i} \times {l o g}_{2} ({\overset{´}{p}}_{i})

(1)

Our first objective is to achieve the maximum entropy value in the dummy selection process:

M a x (- \sum_{i = 1}^{k} {\overset{´}{p}}_{i} \times {l o g}_{2} ({\overset{´}{p}}_{i}))

(2)

To consider the impact of dummies on the cache performance, we select realistic dummy locations, prioritizing selections that optimize contributions to the cache hit ratio. We depend on the following property of the query probability: “If the query probability of a location is high, the data for this location is more likely to serve future queries and can achieve a higher cache hit ratio”. Let

φ

refer to the contribution of a dummy. The contribution is defined as follows:

φ = q p \times ω

(3)

where

ω = 0

if the position previously cached, and

ω = 1

if not.

Here, we have two objectives, which are represented by Formulas (2) and (3). Taking into consideration the previous two objectives, the dummy selection problem can be formulated as a Multi-Objective Optimization Problem (MOP), which can be represented as follows:

L_{d u m m y} = \arg \max \{- \sum_{i = 1}^{k} {\overset{´}{p}}_{i} \times {l o g}_{2} ({\overset{´}{p}}_{i}), \sum_{i = 1}^{k} φ_{i}\}

(4)

To solve this MOP, each objective can be manipulated individually. For the first objective, let

(L_{c a n d i d a t e})

denote the set of candidate dummy locations. The

(L_{c a n d i d a t e})

is selected that can achieve high entropy for the current query:

L_{c a n d i d a t e} = \arg \max (- \sum_{i = 1}^{k} {\overset{´}{p}}_{i} \times {l o g}_{2} ({\overset{´}{p}}_{i}))

(5)

Regarding the second objective, from the candidate dummy locations, the

(k - 1)

dummy locations that contribute the most to the cache are chosen. The maximum sum of contribution of (

k - 1)

dummies is given by the following:

L_{d u m m y} = \arg \max \{\sum_{i = 1}^{k - 1} φ_{i}\}

(6)

In practice, an LBS user usually queries for POIs that are in his or her vicinity. Therefore, it is not very useful to cache cells that are distant from the real location. Here, a normalized distance

(n d)

can be used to ensure that the selected dummy locations are close to the real location of the LBS user. This, in turn, enhances the cache hit ratio, because the answers to the queries that are built based on the generated dummies will be cached. The normalized distance is represented as follows:

{n d}_{i} = n d (l_{r}, l_{i}) \times \frac{1}{\sqrt{2 π}} e^{- \frac{{(n d - n d (l_{r}, l_{i}))}^{2}}{2}}

(7)

where

n d (l_{r}, l_{i})

refers the physical distance between

l_{r}

and

l_{i}

(the

i^{t h}

dummy location), and

n d = \frac{\sum_{i = 1}^{k} n d (l_{r}, l_{i})}{k}

.

To represent the impact of

(k - 1)

dummies on the caching performance, the total normalized distance

(T N D)

is used, where

(T N D \in [0, 1])

. It is defined as follows:

T N D = \prod_{i = 1}^{k - 1} \sqrt{2 π} \frac{{n d}_{i}}{n d (l_{r,} l_{i})}

(8)

Then, the total contributions of the dummy locations

(σ)

can be represented as

σ = (\sum_{i = 1}^{k} φ_{i}) \times (1 - T N D)

(9)

Based on Formulas (9) and (6), which are related to the second objective, the contributions are updated to

L_{d u m m y} = \arg \max \{(\sum_{i = 1}^{k} φ_{i}) \times (1 - T N D)\}

(10)

Figure 6 illustrates the process of selecting the dummy locations under the normalized distance term.

Danger of inference attacks. Using the described normalized distance to ensure that the selected dummy locations are not far away from the real location of the LBS user is risky because the attacker can employ location homogeneity attack [15] and semantic location attack [16], to infer personal information about the LBS user. When the attacker performs a location homogeneity attack, he/she exploits information about the locations from which the LBS user queries are sent. Therefore, if all the queries that were built based on both the real location of the LBS user and the selected

k - 1

dummy locations are sent from locations that are close to one another, the attacker can infer some sensitive information. For instance, consider an LBS user in an athletic area that includes many different sports clubs as POIs. If all

k - 1

dummy locations were selected from the immediate area, the attacker could directly infer that the LBS user is fond of sports. In semantic location attack, the attacker exploits the duration of the LBS user’s presence in a certain place (or POI) to attack privacy. For instance, consider three areas, namely athletic, recreational, and business areas, where the stay durations of the user are 1, 2, and 8 h, respectively. Under the normalized distance term, all

k - 1

dummy locations would correspond to cells that contain sport clubs, playing clubs, and commercial enterprises, respectively, as POIs. During the 8 h period, all the selected

k - 1

dummy locations would be close to one another due to the normalized distance. Thus, the attacker could infer that the query issuer (i.e., the LBS user) is a businessperson. Moreover, because 8 h is, in general, the length of time that a person spends at work, the attacker could define the time during which the LBS user would be out of the house. All such cases threaten the privacy of the LBS user without the need to determine his or her real location accurately.

The best way to solve this problem is to select cells (i.e., dummies) that meet the following conditions: (i) the cells have the same query probability, which results in a high entropy value; and (ii) the cells are far away from one another. Figure 7 shows the solution to the normalized distance problem.

The first condition is expressed by Equation (5). For the second condition, let

(C R)

denote the clocking region. The objective is to ensure that the dummy locations are spread over a large area (i.e., a larger

C R

).

An important question arises here: how should the

C R

be measured? If the measurement relies on the sum of distances between pairs of dummy locations, a problem will occur. We explain this problem by considering the data shown in Figure 8.

In Figure 8,

W

is the actual location.

X, Y, a n d Z

are the dummy locations, where the

(q p)

of the cells that contain

X, Y, a n d Z

is equal to the

(q p)

of the cell that contains

W

.

X

is the highest-ranked dummy that can be directly chosen, as it is the farthest from

W

. To meet

(k = 3 a n o n y m i t y l e v e l)

, we can choose

Y o r Z

. Based on the sum of the distances between pairs of dummy locations, we can choose either

(Y o r Z)

because

Y W + Y X = Z W + Z X

. However, from a privacy point of view,

Y

is preferred to

Z

because it spreads the dummy locations over a large area. Thus, instead of using the sum of the distances between pairs of dummy locations, we can use their product. Note that

Y W \times Y X > Z W \times Z X

. This, in turn, leads to the selection of

Y

as the dummy location.

Based on the previous discussion, a second MOP can be defined in which the two conditions presented above form the objectives of this MOP. Let

C = [c_{1}, c_{2}, c_{3}, \dots, c_{k}]

refer to the set of real and dummy locations. Then, the distance between cell

c_{i}

and

c_{j}

is given by

\sum_{i \neq j} d (c_{i}, c_{j})

. The second MOP is as follows, where

{F L}_{d u m m y}

refers to the most distant dummy locations:

{F L}_{d u m m y} = \arg \max \{- \sum_{i = 1}^{k} {\overset{´}{p}}_{i} \times {l o g}_{2} ({\overset{´}{p}}_{i}), \prod_{i \neq j} n o n - n d (c_{i}, c_{j})\}

(11)

where

c_{i}, c_{j} \in C

, and

n o n - n d (c_{i}, c_{j})

refer to the non-normalized distance between

c_{i}

and

c_{j}

.

Similar to the first MOP, we can enhance the second objective of the second MOP by searching and employing the optimal

(k - 1)

dummy locations that have long distances from the actual location and also far away from one another.

{F L}_{d u m m y} = \arg \max \{\prod_{i \neq j} n o n - n d (c_{i}, c_{j})\}

(12)

Computing

\prod_{i \neq j} n o n - n d (c_{i}, c_{j})

is based on the probability represented as follows:

\frac{\prod_{L_{r \in C}} d (c_{j}, L_{r})}{\sum_{c_{j \in C}} \prod_{L_{r \in C}} d (c_{j}, L_{r})}

, where

d (c_{j}, L_{r})

is the distance between the current location of the LBS user

L_{r}

and the candidate cell

c_{j}

.

The union of the first MOP

(L_{d u m m y})

and the second MOP

{(F L}_{d u m m y})

leads to a third and final MOP

{(E C}_{d u m m y})

with three main and optimization objectives.

{E C}_{d u m m y} = \arg \max \{- \sum_{i = 1}^{k} {\overset{´}{p}}_{i} \times {l o g}_{2} ({\overset{´}{p}}_{i}), \sum_{i = 1}^{k} φ_{i}, \prod_{i \neq j} n o n - n d (c_{i}, c_{j})\}

(13)

The first objective is to select the candidate cells

(L_{c a n d i d a t e})

that have the same query probability, which results in high entropy

(E)

. Among the candidate cells, the second objective is to select the

k - 1

dummy locations

{(L}_{d u m m y})

that make the highest contribution to the cache (based on the normalized distance). Among the same candidate cells, the third objective is to select other

k - 1

dummy locations

({F L}_{d u m m y})

that are far away from the real location of the LBS user and far away from one another (based on the non-normalized distance to defend against inference attacks). Finally, among the

2 \times (k - 1)

dummy locations that are included in both

L_{c a n d i d a t e}

and

{F L}_{d u m m y}

, the eventual

k - 1

final dummy locations (

{E C}_{d u m m y})

are selected to protect the location privacy of the LBS user.

The system models user locations within an

n \times n

grid, generating dummy locations along the four cardinal directions (up, down, left/before, and right/after). This directional space serves as the generation seed, which can be expanded to include diagonal orientations when needed. However, diagonal orientations are ignored to avoid computational complexity. In detail, we first sort the cells according to their query probabilities. Second, we select

4 k

cells that have similar query probabilities to the real location of the LBS user

l_{r}

(

2 k

cells/locations are before

l_{r}

and

2 k

cells/locations are after

l_{r}

) and randomly select

2 k

cells to be the candidate cells. Then, out of candidates, we select a subset of

k - 1

dummies that have the highest contributions to the cache. Third,

4 k

additional cells (

2 k

cells are above

l_{r}

and

2 k

cells are below

l_{r}

) are selected under the same query probability condition and randomly

2 k

additional cells are selected to form another set of new candidate cells. Then, out of these new

2 k

cells, we select a new subset of

k - 1

dummies that are the furthest distance from

l_{r}

). Finally, the resulting

k - 1

dummy locations are selected to be used in practice. When the

k

value is high, the number of subsets

(\binom{2 k}{k - 1})

, which are related to the second and third objectives, is too large. Therefore, we consider only

S_{p r}

random subsets and select one of them. Here,

S_{p r}

is a configuration-based system parameter (

S_{p r} = 1000

by default) that can be increased to ensure a higher privacy protection level. Algorithm 1 illustrates the previous steps in detail. This algorithm is called the Caching-Aware Double-Dummy Selection (CaDDSL) algorithm.

Algorithm 1: Caching-Aware Double-Dummy Selection (CaDDSL) Algorithm

Conflict and tradeoffs. Two critical issues arise in relation to the CaDDSL algorithm. The first is related to the goals of the eventual MOP

{(E C}_{d u m m y})

, and the second is related to the

k

value. For the eventual MOP

{(E C}_{d u m m y})

, there is a conflict between the second goal

(\sum_{i = 1}^{k} φ_{i})

and the third goal

(\prod_{i \neq j} n o n - n d (c_{i}, c_{j}))

. This conflict occurs because the second objective relies on the normalized distance

(n d)

to select cells that are close to one another to ensure a large contribution to the cache, while the third objective relies on the non-normalized distance

(n o n - n d)

to select cells that are far away from one another to guard against inference attacks. The key idea of this issue is that the cells selected under the

n o n - n d

term (to be used as a part of the final dummy locations in the eventual set

{E C}_{d u m m y}

) negatively affect the contribution to the cache. Therefore, there is a clear tradeoff between the second and third objectives. The best approach for resolving this tradeoff is to increase the number of actual dummy locations that are included in the final set

{(E C}_{d u m m y}

). This, in turn, means increasing the

k

value, thus leading to the second issue. Increasing the

k

value results in a higher privacy protection level; however, it also leads to a higher network overhead. In detail, when

k

is large, the number of subsets

(\binom{2 k}{k - 1})

is too large, which, in turn, requires an increase in the

S_{p r}

value (by a factor of 2000, for example). Therefore, there is a clear tradeoff between increasing the

k

value and limiting the network overhead. As a result, increasing the size of

{E C}_{d u m m y}

requires increasing the

k

value, which, in turn, leads to high network overhead.

For the first tradeoff (i.e., between the second and third objectives), because the answers to the queries (which were built based on the final selected dummy locations included in the

{E C}_{d u m m y}

set) are stored in the cache, this tradeoff is resolved by one of the tasks of the S-A-cache agent.

To resolve the second tradeoff, the M-A-user migrates to the LBS server, executes the CaDDSL algorithm there and returns with the results. Therefore, instead of generating 99 dummy locations (i.e.,

k - 1 = 99

, for example) and sending them via the network, only the M-A-user agent is sent via the network. In this way, we simultaneously ensure a high privacy protection level and low network overhead. Notice that executing the CaDDSL algorithm on the LBS server side allows the LBS user to take advantage of the high computational capabilities of the LBS server. This speeds up the response time of the whole system. Figure 9 illustrates the migration of the M-A-user agent.

For the M-A-user agent’s migration, an itinerary is defined. More specifically, the M-A-user agent migrates to a destination machine (LBS server) and executes a task there (CaDDSL algorithm). Then, it migrates back from the destination machine to the home machine, where a method called ReportResults is executed. The process of encapsulating the CaDDSL algorithm with the M-A-user agent constitutes our enhancement algorithm, which is called the Caching-Aware Overhead-Aware Dummy Selection (CaOaDSL) algorithm. Algorithm 2 presents the details of the CaOaDSL algorithm.

Algorithm 2: Caching-Aware Overhead-Aware Dummy Selection (CaOaDSL)

Input:

q p

(query probability of each cell),

l_{r}

(the real location of the LBS user),

S_{p r}

(a system parameter.

Output:

{E C}_{d u m m y}

.

1:

a g e n t = n e w M - A - u s e r

; (create M-A-user agent);

2:

i t i n e r a r y = n e w i t i n e r a r y ()

; (create an itinerary);

3:

i t i n e r a r y . a d d D e s t i n a t i o n (“ D B L B S - s e r v e r ”, “ e x e c u t e C a D D S L a l g o r i t h m ”)

,

where the lines 1–18 of Algorithm 1 are performed.

4:

i t i n e r a r y . a d d D e s t i n a t i o n (“ L B S m o b i l e d e v i c e ”, " e x e c u t e R e p o r t R e s u l t s ")

,

where

R e p o r t R e s u l t s

is a method that contains the retrieved POIs.

5: Output

{E C}_{d u m m y} = {a r g m a x}_{{\hat{L}}_{c a n d i d a t e} \subset {\overset{´}{L}}_{c a n d i d a t e}} \{\sum_{i = 1}^{k} φ_{i}, \prod_{i \neq j} n o n - n d (c_{i}, c_{j})\}

,
where

|{E C}_{d u m m y}| = \frac{|{L_s e t}_{d u m m i e s}|}{2}

.

To summarize the CaOaDSL algorithm in a logical problem–solution manner, the following matching flowchart, Figure 10, illustrates each problem that arises and its corresponding solution.

S-A-cache: This agent uses a Bloom filter technique to check if the query answer is within the cache. If the query answer is within the cache, the S-A-cache agent will process the query and give the answer to the M-A-user agent. Otherwise, there is no need to search for the query answer, and the S-A-cache instructs the M-A-user agent to begin its migration to the LBS server. Notice that the Bloom filter technique can provide an immediate response with respect to the existence (or not) of a query answer in the cache. Therefore, there is no need to waste time searching the cache if no answer exists. This will enhance the response time of the system, resulting in increased trust in LBS-enabled applications, as the LBS users are concerned about the speed of the answer. The Bloom filter acts as a hash table and can greatly accelerate the answering process. Specifically, we employ the key idea that was proposed in Ref. [22] and apply it in the process of searching within the cache.

To solve the first tradeoff (i.e., the tradeoff between the second and third objectives in the MOP represented by Formula (14)), we assign a task to the S-A-cache agent, called data freshness. The key idea of this task is to manage the life cycle of the data stored in the cache. Data life-cycle management is the deletion of expired or idle data. In this context, three terms are defined: idle data, expired data, and valid data. Let

({a n s w}_{F L d u m m y})

denote the set of stored answers (i.e., POIs) that are retrieved by the queries built based on the dummy locations in the

{F L}_{d u m m y}

set. These answers are referred to as idle data because they make no contribution to the cache, since the dummy locations that are included in the

{F L}_{d u m m y}

set are chosen based on the non-normalized distance. Let

({a n s w}_{L d u m m y})

denote the set of stored answers (i.e., POIs) retrieved by the queries built based on the dummy locations included in the

L_{d u m m y}

set. It is obvious that the content of the cache is

({a n s w}_{F L d u m m y} \cup {a n s w}_{L d u m m y})

. The answers included in the

{a n s w}_{L d u m m y}

set contribute to the cache because the dummy locations included in the

L_{d u m m y}

set are chosen based on the normalized distance. Some of the answers that are included in the

{a n s w}_{L d u m m y}

set have low probability of being queried, while others have a high probability of being queried. That is because it originally depended on the

q p

of the cells selected to be the actual and final dummy locations. The expired data

({\underline{answ}}_{F d u m m y})

are answers that have low probability of being queried. The valid data

({\bar{a n s w}}_{F d u m m y})

are answers that have high probability of being queried. It is obvious that

{(a n s w}_{L d u m m y} = {\underline{a n s w}}_{F d u m m y} \cup {\bar{a n s w}}_{F d u m m y})

. Both expired data and idle data are deleted because they make little and no contribution to the cache. As a result, only the stored valid data are kept. This ensures the highest contribution to the cache for answering future queries.

From a modeling perspective, let

T_{c a c h e}

refer to the lifetime of the cached data and

t_{a l r e a d y - c a c h e}

refer to the time that a cell’s data have already been cached. The freshness,

D_{f r e s h}

, is mathematically modeled as a function of the previously defined variables, expressed as follows:

D_{f r e s h} = \sqrt{1 - \frac{({t_{a l r e a d y - c a c h e})}^{2}}{({T_{c a c h e})}^{2}}}

(14)

Because both the expired data and valid data are defined according to the query probability (i.e., low and high, respectively), we need to filter the expired data based on a threshold. Let

{t h r}_{q p}

denote this threshold. The cached data are considered to be expired data if the

q p

of the original cell is less than

{t h r}_{q p}

; otherwise, they are considered to be valid data. Algorithm 3 describes the data freshness task.

Algorithm 3: Data freshness task

Input:

q p

(query probability of each cell),

c a c h e c o n t e n t

,

∆

(specific period of time that can be updated).

Output:

D_{f r e s h}

.

1: While

{(t}_{a l r e a d y - c a c h e} \leq T_{c a c h e})

, do

2: Begin

3: If

(a n s w \in {a n s w}_{F L d u m m y})

, then

4: Delete it from the cache;

5: Else

6: If

({a n s w}_{L d u m m y} \in {a n s w}_{L d u m m y})

and

(q p < {t h r}_{q p})

, then

7: Delete it from the cache;

8: Else

9:

D_{f r e s h} = D_{f r e s h} + ∆

;

10: End while

11: Output:

D_{f r e s h}

.

3.4. Proposed Architecture Details

From a software-engineering perspective, sequence diagrams are used to illustrate the details of the architecture. Figure 11 shows the sequential process of answering a query with the cache.

Notice that since we consider the LBS server to be an attacker (a malicious party), answering the queries with the cache will guarantee complete location privacy for the LBS user under any location privacy metric. That is because the LBS user does not connect to the LBS server.

Figure 12 shows the sequential process of answering a query with the LBS server, where the CaOaDSL algorithm preserves the location privacy of the LBS user.

4. Security Analysis

Two main parts are addressed as described below.

4.1. Security of Agents

An M-A-user agent can be attacked by the LBS server by blocking or modifying carried data. Because our major concern is privacy protection, the security of agents is out of the score. Thus, we assume that all agents used are secure, following the approach proposed in [23]. Security of agents will be considered in future work.

4.2. Security Against Inference Attacks

In this subsection, we present the threat model and proof that the proposed CaOaDSL approach is robust against both location homogeneity attack and semantic location attack.

The threat model

The goal of the malicious party is to gather sensitive information about a certain LBS user, including geographic position, POI, and range of query. Table 4 lists the capabilities of the LBS server as a malicious party.

Definition 1.

An approach is considered resistant to location homogeneity attacks if the likelihood of accurately guessing the real location of an LBS user is extremely low.

Theorem 1.

The proposed CaOaDSL approach is location homogeneity attack-resistant.

Proof. We assume that the attacker knows the mix of real and dummy locations.

In addition, the attacker knows the

(q p)

of each individual cell and all the issued

(k)

locations,

l_{1}, l_{2}, \dots, l_{k}

. Let

{P S}_{(e v e n t)}

denote that the attacker can successfully guess whether the

(e v e n t)

is true. The CaOaDSL approach is resistant against location homogeneity attack if the following conditions are met:

{(1) P S}_{(l_{i})} = {P S}_{(l_{j})} \forall (0 < i \neq j \leq k)

(15)

And

(2) d i s t a n c e (l_{i}, l_{j}) i s l o n g

Initially, because the dummy locations are chosen based on the

(q p)

of cells that resemble the

(q p)

of the LBS user’s cell (i.e., their actual location), the attacker gains no advantage from using these query probabilities to identify the real location of the LBS user.

Second, since the CaOaDSL approach guarantees that the

k

value is high, the probability of successful guessing is

(\frac{1}{k})

, which is very low. Among the

k

locations,

\frac{k}{2}

locations are selected based on non-normalized distance (

n o n - n d)

, thus ensuring that these locations are far away from one another. Since the

k

value is high, the

\frac{k}{2}

value is also high. Consequently, the probability of successfully guessing is

(\frac{1}{\frac{k}{2}} = \frac{2}{k})

, which is still low. The current probability value will be the same for all

k

locations in the

{E C}_{d u m m y}

set. As a result, the attacker can only randomly guess the real location of the LBS user.

Definition 2.

An approach is considered resistant to semantic location attacks if the likelihood of accurately guessing the real location of an LBS user is extremely low under a certain period of time.

Theorem 2.

The proposed CaOaDSL approach is semantic location attack-resistant.

Proof. We assume that the attacker knows the mix of real and dummy locations.

Since semantic location attack is tightly coupled with time, additional information is given to attacker. In addition to the information that the attacker holds in the Proof 1 paragraph, the attacker knows the frequency of the queries that are sent from a certain location during a specific time period. Let

f r e q (Q_{l_{i}}^{t p})

and

f r e q (Q_{l_{j}}^{t p})

denote the frequencies, or numbers of queries, that are sent from locations

l_{i}

and

l_{j}

, respectively, during time period

t p

. To represent the robustness against semantic location attack, a third condition must be satisfied, in addition to the two in the Proof 1 paragraph:

(3) f r e q (Q_{l_{i}}^{t p}) = f r e q (Q_{l_{j}}^{t p}) \forall (0 < i \neq j \leq k)

(16)

For the first two conditions, the same justification as was provided in Proof 1 is used here to prove that the attacker can only randomly guess the real location of the LBS user among the dummies. Under the third condition, all submitted locations (including the real location of the LBS user) have the same query frequency. That is because the protection originally depended on the dummy generation process. In other words, the real location of the LBS user (his/her real cell) is associated with the dummies (dummy cells). Consequently, each dummy cell sends a query frequency value, which will be the same for all dummy the cells and the real cell. That is, the attacker will revert back to randomly guessing the real location of the LBS user since all locations have the same query frequency.

Since the active attacker (LBS server) knows the proposed algorithm (CaOaDSL), he/she may try to reverse the algorithm, but this will fail. That is because the CaOaDSL algorithm randomly chooses a two candidate sets of cells: (i)

2 k

cells (right after and right before the real location of the LBS user) and (ii)

2 k

cells (above and below the real location of the LBS user). From the previous two candidate sets, a third set (final set) is also randomly formed, containing the actual dummies. Note that we use a randomization process to construct the three sets. This randomization guarantees the uncertainty of the selection, which, in turn, leads to uncertain dummy selection results. As a result, even if the attacker runs our proposed CaOaSDL algorithm several times, he/she cannot infer the real location of the LBS server due to the original randomization processes.

5. Used Metrics

Two types of evaluation metrics are used in this work, as described below.

5.1. Privacy Metrics

We use two privacy metrics, according to the way in which the queries are answered. For the queries that are answered by the cache, we use the cache hit ratio, which represents the ratio of the number of queries answered by the cache to the total number of queries in the system. It is defined as follows:

C H R = \frac{|Q_{a n s w e r e d - c a c h e}|}{|Q_{a n s w e r e d - c a c h e}| + |Q_{a n s w e r e d - s e r v e r}|}

(17)

For the queries that are answered by the LBS server, we propose a new privacy metric. This privacy metric mainly depends on the achieved entropy value

(E)

, which is defined by Formula (1). Suppose an LBS user sends a query containing

k - 1

dummies to safeguard location privacy. The maximum entropy value achievable is

{l o g}_{2} (k)

, which occurs when all

k

submitted locations have an equal probability of being considered the real/actual location of the query issuer. Consequently, if the LBS user attains an entropy value

(E)

that is less than

{l o g}_{2} (k)

, the level of privacy compromise by the attacker will be

({l o g}_{2} (k) - E)

. As time goes on, the LBS user submits multiple queries, allowing the attacker to achieve incremental (i.e., small) success with each query. The sum of these small successes is related to the moments at which the LBS user issues the queries and determines to what degree the LBS privacy has been broken over a certain period of time.

More formally, let

Ʈ = (ƫ_{1}, ƫ_{2}, ƫ_{3}, \dots, ƫ_{n})

refer to the moments at which the LBS user issues queries, where each query is protected by

k - 1

dummy locations. Then, our new privacy metric, which is called amount of compromised privacy, is defined as follows:

A C P = \sum_{i = 1}^{n} ({l o g}_{2} (k) - E (ƫ_{n})), w h e r e ƫ_{n} \in Ʈ

(18)

Because one of the tasks that is assigned to the M-A-user agent is to alert the LBS user if he or she reaches a dangerous state, the

A C P

can be used to represent the level of danger that threatens the user’s privacy. Based on the danger level that has been reached, we can recommend that the LBS user stop sending queries for a period of time, a feature that has not been offered previously, to the best of our knowledge.

5.2. Performance Metrics

Since there is a clear tradeoff between achieving a high privacy protection level (i.e., high

k

value) and limiting the network overhead, we present a performance metric that is used to measure the load of the network. Let

{(Z}_{q})

denote the size of a sent query in bits, let

(Z_{q - d l})

denote the size of a sent query that was built based on the selected dummy location, let

{(Z}_{r e s})

denote the size of the response, and let

(W = c o n s t a n t)

denote the bandwidth of the wireless network.

By an LBS user who is not concerned about privacy, a query is sent to the LBS server with size

Z_{q}

, and a response is sent back with size

Z_{r e s}

. Therefore, the total bit size is

Z_{q} + Z_{r e s}

. To protect privacy, the LBS user sends a query of size

Z_{q}

, along with

k - 1

dummy queries (each dummy query has a size of

Z_{q - d l}

). Thus, the total size of all the queries sent is as follows:

{T Z}_{q - s e n t} = Z_{q} + (k - 1) \times Z_{q - d l}

(19)

The corresponding total size of the responses to the LBS user is defined as

{T Z}_{r e s} = Z_{r e s} + (k - 1) \times Z_{r e s - d l}

(20)

where

Z_{r e s - d l}

is the size of the response, which is related to the size of each dummy query in bits.

Consequently, when there are

(U_{l s b})

LBS users, the network overhead (in terms of sending and responding) can be defined as

{O H}_{n e t w o r k} = \frac{({T Z}_{q - s e n t} + {T Z}_{r e s})}{W} \times U_{l s b}

(21)

However, when using a mobile agent, additional size must be included in the network overhead. Suppose that the LBS user sends an agent to the LBS server. The agent consists of code with a size

{(Z}_{A - c o d e})

and state information of size

(Z_{S - i n f o})

. The agent carries the request (i.e., the query), a size of

Z_{r e q}

. Here,

Z_{r e q} = Z_{q}

in size. On the LBS server side, the agent communicates locally, which does not produce any network overhead. The agent has a code that compresses the LBS server results; therefore, only

(1 - δ) \times Z_{r e s}

must be transmitted back to the agent’s home machine (i.e., the smart phone). Here,

δ

denotes the compression factor. To represent the total bandwidth of the mobile agent in terms of sending and responding, Formulas (21) and (22) can be combined as follows:

{T B}_{M A} = Z_{A - c o d e} + (2 \times Z_{S - i n f o}) + {T Z}_{q - s e n t} + [(1 - δ) \times {T Z}_{r e s}]

(22)

Consequently, Formula (23) is updated to represent the network overhead when using mobile agents as follows:

{O H}_{n e t w o r k} = \frac{{T B}_{M A}}{W} \times U_{l s b}

(23)

It is worth mentioning that when not using a mobile agent,

Z_{A - c o d e} = Z_{S - i n f o} = δ = 0

.

6. Experimental Results and Evaluations

6.1. Simulation Setup

The following simulation parameters are used: The area is divided into a

(160 \times 160)

grid, accommodating a total of (10,000) users. The cache is represented by only one table to store information on POIs. Timestamps are attached to information within the cache, as well as queries; these timestamps are used to obtain the data freshness term. The query probability is generated randomly with the help of the Google Maps API. Google Maps API allows us to query different POI types. In detail, the POI types that are considered are healthcare (hospitals, clinics, and pharmacies), transportation (metro stations, bus stops, and parking), recreation (sport clubs, gyms, parks), and commercial (restaurants, malls, and gas stations). To generate random query probabilities, the following steps are performed:

Assigning weights to POIs based on expected popularity (e.g., hospitals = 0.3, restaurants = 0.5, and metro stations = 0.2).
Using normal distribution probability to model query rates.
Randomizing frequencies per POI.

We selected three previous approaches for the comparison with the proposed approach: CirDummy [10], GridDummy [10], and Dest-Ex [11].

6.2. Evaluation of Cache Hit Ratio Results

Since the cache hit ratio is tightly coupled with the communication cost (i.e., the number of queries sent to the LBS server), we first evaluate the approaches that are involved in the comparison in term of the communication cost.

Figure 13 shows a 120 min snapshot of the communication cost, where

(k = 3)

. It is obvious that all the queries in the CirDummy, GridDummy, and Dest-Ex approaches are sent to the LBS server. That is because they do not use a cache. For the CaOaDSL approach, the number of queries that are sent to the LBS server decreases since the answers to many queries are found in the cache. As a result, we decrease the connection numbers to the LBS server, compared to the infinite connection number of each LBS user in the CirDummy, GridDummy, and Dest-Ex approaches.

Since the CirDummy, GridDummy, and Dest-Ex approaches do not use query response caching, we evaluated our proposed approach in terms of (i) the effect of the

k

on the queries sent to the LBS server and (ii) the effect of the

k

on the cache hit ratio.

Figure 14 shows the impact of

k

value on the number of queries that are sent to the LBS server. We obtained different snapshots in an incremental way. At

(t = 30)

, we can see that when

k = 3, 6, a n d 9

, the number of queries that are sent to the LBS server linearly increases as

k

increases. This result occurs because, at the beginning, there are no responses in the cache. Therefore, during the first snapshot, the cache is filled. When

k = 13 t o 30

, the number of queries sent decreases dramatically due to the caching of the queries’ responses. As time progresses, the rest of the snapshots show that the answers to many queries are found in the cache; thus, the number of queries sent decreases gradually.

To show the impact of

k

value on the cache hit ratio, we took a 120 min snapshot of freshness time in an incremental way (i.e., the

∆

parameter). Figure 15 illustrates the results.

Because the CirDummy, GridDummy, and Dest-Ex approaches do not use a cache, their cache hit ratios are always zero. For the proposed approach, CaOaDSL, we use three values of the freshness time,

∆

. When

∆ = 0

, the CaOaDSL approach relies only on the cached responses of the queries that were built based on dummies that were selected under normalized distance (without freshness time). Figure 15 shows that many LBS users find their answers to future queries in the cache. That is because the normalized distance ensures that the search for POIs will be in the vicinity of the user. We refresh the cached responses that have the highest probability of being queried in the future (i.e.,

∆ = 120, 150)

, and the cache hit ratio increase as

∆

increases. Therefore, over time, the cache is filled with valuable query responses that can provide answers to future queries. Thus, the high cache hit ratio may reach 1, as illustrated in Figure 15.

6.3. Evaluation of Results on Resistance Against Inferences Attacks

LBS users seek to achieve a higher

k - a n o n y m i t y

level since it leads to a growing privacy protection degree. However, this

k - a n o n y m i t y

level is offered by the number of generated dummies that surround the issued query. Therefore, we first evaluate the privacy level that is achieved vs. the

k

value. Then, we calculate the number of LBS users that reach dangerous states based on our proposed privacy metric,

A C P .

Based on the entropy, which is defined by Formula (1), Figure 16 shows the privacy levels achieved that are achieved by the different approaches as a function of entropy.

In general, the entropy increases with

k

. Among the approaches, the GridDummy approach performs the worst. That is because GridDummy selects dummy locations based on the vertices of a grid

(\sqrt{k} \times \sqrt{k})

, and these vertices are fixed when the map is defined. Therefore, the entropy of the dummy locations (i.e., the vertices) depends on the current query probabilities of the map. The CirDummy approach overcomes this shortcoming of the GridDummy approach and slightly outperforms it. The reason behind this is that the chosen dummy locations are confined to a virtual circle. Since the variation in query probabilities remains relatively stable within this circle, which encompasses only a few cells (a small area), the resulting entropy value is only marginally higher. The Dest-Ex approach performs better than both the GridDummy and CirDummy approaches. That is because the generated dummies are based on the actual trajectories of the LBS user’s motion. Because the motion trajectory passes through many cells, it runs across a region that is bigger than a virtual circle, resulting in higher query probabilities. This, in turn, leads to higher entropy values. Another factor in the Dest-Ex approach that contributes to the enhancement of the entropy values is the direction, which may be changed to pass through more cells with the same query probability. Compared to the GridDummy, CirDummy, and Dest-Ex approaches, the CaOaDSL approach achieves the best performance. The underlying reason is that the dummy locations are selected from cells with similar query probabilities. This ensures significantly higher entropy values and leads to higher privacy degrees.

Because the proposed privacy metric,

A C P

, relates individually to each LBS user who is involved in our system, we evaluated the degree to which each user’s privacy is compromised by the attacker (the LBS server or its maintainer) over time. In this context, we assumed that

k = 6

(i.e., the LBS user sends five dummy queries and the real query to the LBS server). Under the threat of a mixture of inference attacks (i.e., location homogeneity attack and location semantic attack), we took a snapshot at

(t = 120)

. Every 3 min, only one type of inference attack is implemented. Twenty LBS users’ situations are evaluated, as shown in Figure 17, using a threshold

(t h r = 0.8)

at which the LBS user is considered to be vulnerable to attack by the LBS server. For fair comparison, 20 LBS users are randomly elected for evaluation under identical threshold conditions. The comparison of the danger statues of the LBS users is summarized in Table 5.

Table 5 shows that all LBS users in the GridDummy approach exceeded the threshold. That is because the selected dummy locations are close to one another since they are formed by the vertices of the grid. Therefore, the dummy locations of the LBS user can be easily determined using inference attacks by the LBS server. Three-quarters and more than half of the LBS users exceeded the threshold in CirDummy and Dest-Ex, respectively. Compared to GridDummy, CirDummy is more robust against inference attacks since the radius of the circle may be enlarged to include dummies that are farther away from the real location of the LBS user. Compared to CirDummy, Dest-Ex achieved higher robustness against the inference attacks. The reason is that the generated location dummies can change their directions and move to cells that are farther away than those that are selected by the CirDummy approach. The CaOaDSL approach has the minimum number of LBS users that reached a dangerous state (18 LBS users out of 20 are secure against the LBS server). That is because non-normalized distance is used to select half of the actual dummy locations. This provides more robustness against inference attacks since the selected dummy locations are far away from one another. Moreover, due to the good cache design/management by the S-A-cache agent, many of the LBS users find answers to their queries in the cache, without the need to connect to the LBS server.

The results in Table 6 are in accordance with the results in Table 5, where the threshold was set to different values, the simulation was re-run at various snapshots with different LBS users elected each time, and the result was recorded.

6.4. Scalability and Network Overhead Results Evaluation

In this evaluation, the wireless bandwidth is

W = c o n s t a n t = 11 M b p s

, and the compression factor of the mobile agent is

δ = 0.7

. In addition, the GridDummy, CirDummy, and Dest-Ex approaches are implemented under a client–server architecture for comparison with our proposed agent-based architecture.

Here, we evaluate the scalability and the network overhead as functions of the value and the number of LBS users. Figure 18 shows the impact of increasing the

k

value in the network overhead when the number of LBS users is 1000.

In general, the network overhead increases as the

k

value increases. When

k = 6

, for example, each LBS user in the GridDummy, CirDummy, and Dest-Ex approaches will create five queries (which are based on five dummy locations) and then send them with the real query (which is based on the real location) to the LBS server for processing. In the CAOaDSL approach, each LBS user will create a mobile agent (M-A-user), which migrates to the LBS server, generates five dummy locations, and builds five dummy queries to achieve the

k - a n o n y m i t y

concept. Figure 18 shows that the network overhead percentage increases semi-linearly as the

k

value increases when using the GridDummy, CirDummy, and Dest-Ex approaches. The corresponding values of the network overhead are similar in the three previous approaches. This can be justified by the variation in the queries sent to the LBS server, where the corresponding responses differ slightly in size. That is because the sizes of the responses depend on both the type and number of retrieved POIs. For the CaOaDSL approach, when

k = 3

, the network overhead percentage is higher than that of the other approaches. That is because the mobile agent is larger than the three queries sent together to the LBS server. This larger size is due to the

Z_{A - c o d e}

and

2 \times Z_{S - i n f o}

of the mobile agent. After that, the network overhead percentage decreases when

k = 6

and continues to increase by a very small amount in a semi-linear manner when

k > 6

. That is because only the mobile agent is transferred via the wireless network compared to the many queries sent in the GridDummy, CirDummy, and Dest-Ex approaches.

Figure 19 shows the results of the scalability evaluation as the number of LBS users is increases.

Under

k = 12

and

φ = 0.7

, we gradually increased the number of LBS users (with a step-size of 1000) from 1000 to 10,000. Figure 19 supports the results shown in Figure 18. In the CaOaSDL approach, all the corresponding network overhead percentage values are lower than those of the GridDummy, CirDummy, and Dest-Ex approaches. That is because we fixed the

k

value at 12, whereas the original size of the mobile agent is lower than the total size of the queries (sent together via the wireless network to the LBS server) in the other approaches. Naturally, increasing the number of LBS users negatively affects the network overhead percentage value, whereas the network overhead percentage values increase by small amounts under the CaOaDSL approach. However, compared to the other approaches, the network overhead percentage values are significantly lower than those of the other approaches.

7. Conclusions

In this technological age, privacy is one of the major concerns of mobile-device users. When it comes to achieving location privacy protection for users of location-based services, we propose the Caching-Aware Overhead-Aware Dummy Selection (CaOaDSL) algorithm, which has five main objectives: (1) achieving a high entropy value by selecting dummy locations that have the same query probability as the real location of the LBS user; (2) decreasing the number of connections to the LBS server (a malicious party) by caching the responses of the queries built relying on the selected dummy locations to answer future queries; (3) enhancing the contribution of the query responses to the cache by using the normalized distance, which ensures that the first half of the selected dummy locations are located near where the real query was issued; (4) ensuring robustness against homogeneity location and semantic location inference attacks by selecting the second half of the dummy locations based on the non-normalized distance; and (5) ensuring both a high privacy protection (a high

k - a n o n y m i t y

level) and low network overhead at the same time via encapsulation with mobile agents. Compared to previous approaches, the CaOaDSL approach provides better performance in terms of communication cost and cache hit ratio. Moreover, according to the new privacy metric, the CaOaDSL approach has the highest robustness against inference attacks. Furthermore, the migration of the mobile agent to the LBS server to perform the CaOaDSL approach there significantly decreases the network overhead compared to previous approaches.

A key limitation of this work is its focus on static points of interest (POIs), without addressing dynamic POIs such as locating nearby Uber cars that may enter the user’s vicinity. Additionally, while the employed agents are assumed to be secure against tampering by the LBS server, this study only guarantees location privacy. To achieve complete privacy protection, query privacy must also be addressed.

In future work, dealing with moving POI, taking into consideration different geographic densities and varied POI distributions, and integrating privacy protection and agent security will be considered. Query privacy must be protected against query analysis attacks, which could be applied to collect personal information about LBS users. Future enhancements may also incorporate advanced methods, including deep learning and federated learning, to strengthen the study. An additional improvement area involves analyzing how dummy location generation and agent migration affect perceived latency and battery drain for end users. Another future enhancement will be related to involve comparison with un-dummy privacy protection approaches, such as differential privacy variants after adaptation with a dummy-based approach.

Author Contributions

Conceptualization, Omar F. Aloufi, Ahmed S. Alfakeeh and Fahad M. Alotaibi; Methodology, Omar F. Aloufi and Ahmed S. Alfakeeh; Software, Omar F. Aloufi; Formal Analysis, Omar F. Aloufi; Writing—Review and Editing, Omar F. Aloufi, Ahmed S. Alfakeeh and Fahad M. Alotaibi. All authors have read and agreed to the published version of the manuscript.

Funding

This Project was funded by KAU Endowment (WAQF) at King Abdulaziz University, Jeddah, under Grant Number (WAQF: 218-830-2024). The authors, therefore, acknowledge with thanks WAQF and the Deanship of Scientific Research (DSR) for technical and financial support.

Data Availability Statement

Data available on request from the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Albouq, S.S.; Sen, A.A.A.; Namoun, A.; Bahbouh, N.M.; Alkhodre, A.B.; Alshanqiti, A. A double obfuscation approach for protecting the privacy of IoT location based applications. IEEE Access 2020, 8, 129415–129431. [Google Scholar] [CrossRef]
Giovanni, P.; Pilloni, V.; Martalò, M. Trustworthy Localization in IoT Networks: A Survey of Localization Techniques, Threats, and Mitigation. Sensors 2024, 24, 2214. [Google Scholar] [CrossRef]
Basmh, A.; Mahgoub, I. Location privacy-preserving scheme in iobt networks using deception-based techniques. Sensors 2023, 23, 3142. [Google Scholar]
Liu, T.; Liu, J.; Wang, J.; Zhang, H.; Zhang, B.; Ma, Y.; Sun, M.; Lv, Z.; Xu, G. Pseudolites to support location services in smart cities: Review and prospects. Smart Cities 2023, 6, 2081–2105. [Google Scholar] [CrossRef]
Qi, L.; Liu, Y.; Yu, Y.; Chen, L.; Chen, R. Current Status and Future Trends of Meter-Level Indoor Positioning Technology: A Review. Remote Sens. 2024, 16, 398. [Google Scholar] [CrossRef]
Jo, H.G.; Son, T.Y.; Jeong, S.Y.; Kang, S.J. Proximity-based asynchronous messaging platform for location-based Internet of Things service. ISPRS Int. J. Geo Inf. 2016, 5, 116. [Google Scholar] [CrossRef]
Liu, Q.; Ma, Y.; Alhussein, M.; Zhang, Y.; Peng, L. Green data center with IoT sensing and cloud-assisted smart temperature control system. Comput. Netw. 2016, 101, 104–112. [Google Scholar]
Hidetoshi, K.; Yanagisawa, Y.; Satoh, T. An anonymous communication technique using dummies for location-based services. In Proceedings of the ICPS’05 International Conference on Pervasive Services, Santorini, Greece, 11–14 July 2005. [Google Scholar]
Lu, H.; Jensen, C.S.; Yiu, M.L. Pad: Privacy-area aware, dummy-based location privacy in mobile services. In Proceedings of the Seventh ACM International Workshop on Data Engineering for Wireless and Mobile Access, Vancouver, BC, Canada, 13 June 2008. [Google Scholar]
Hara, T.; Suzuki, A.; Iwata, M.; Arase, Y.; Xie, X. Dummy-Based User Location Anonymization Under Real-World Constraints. IEEE Access 2016, 4, 673–687. [Google Scholar] [CrossRef]
Yin, C.; Xi, J.; Sun, R. Location Privacy Protection Based on Improved-Value Method in Augmented Reality on Mobile Devices. Mob. Inf. Syst. 2017, 2017, 7251395. [Google Scholar]
Chen, M.; Li, W.; Chen, X.; Li, Z.; Lu, S.; Chen, D. LPPS: A Distributed Cache Pushing Based K-Anonymity Location Privacy Preserving Scheme. Mob. Inf. Syst. 2016, 2016, 7164126. [Google Scholar] [CrossRef]
Zhang, S.; Li, M.; Liang, W.; Sandor, V.K.A.; Li, X. A survey of dummy-based location privacy protection techniques for location-based services. Sensors 2022, 22, 6141. [Google Scholar] [CrossRef]
Alrahhal, H.; Alrahhal, M.S.; Jamous, R.; Jambi, K. A symbiotic relationship based leader approach for privacy protection in location based services. ISPRS Int. J. Geo Inf. 2020, 9, 408. [Google Scholar] [CrossRef]
Shady, M.; Khemakhem, M.; Jambi, K. Agent-based system for efficient kNN Query processing with comprehensive privacy protection. Int. J. Adv. Comput. Sci. Appl. 2018, 9, 52–66. [Google Scholar] [CrossRef]
Alrahhal, M.S.; Khemakhem, M.; Jambi, K. A survey on privacy of location-based services: Classification, inference attacks, and challenges. J. Theor. Appl. Inf. Technol. 2017, 95, 6719–6740. [Google Scholar]
Gupta, A.K.; Shanker, U. Location Privacy Preservation for Location Based Service Applications: Taxonomies, Issues and Future Research Directions. Wirel. Pers. Commun. 2024, 134, 1617–1639. [Google Scholar] [CrossRef]
Horvath, K.; Kimovski, D. Efficient Location-Based Service Discovery for IoT and Edge Computing in the 6G Era. arXiv 2025, arXiv:2504.00743. [Google Scholar]
Wang, B.; Li, H.; Ren, X.; Guo, Y. An efficient differential privacy-based method for location privacy protection in location-based services. Sensors 2023, 23, 5219. [Google Scholar] [CrossRef]
Saravanan, P.; Ramani, S.; Reddy, V.R.; Farhaoui, Y. A novel approach of privacy protection of mobile users while using location-based services applications. Ad Hoc Netw. 2023, 149, 103253. [Google Scholar] [CrossRef]
Siddiqie, S.; Reddy, P.K.; Annappalli, S.R. Location and Intent Privacy Preservation for Spatial Range Queries in a Mobile Network. IEEE Access 2025, 13, 45998–46013. [Google Scholar] [CrossRef]
Sun, S.; Qian, Y.; Zhang, R.; Wang, Y.; Li, X. An improved chinese string comparator for Bloom filter based privacy-preserving record linkage. Entropy 2021, 23, 1091. [Google Scholar] [CrossRef]
Alluhaybi, B.; Alrahhal, M.S.; Alzahrani, A.; Thayananthan, V. Dummy-based approach for protecting mobile agents against malicious destination machines. IEEE Access 2020, 8, 129320–129337. [Google Scholar] [CrossRef]

Figure 1. Reference scenario of LBS application use. LBS users send queries from their actual positions to the LBS server, which processes these requests to retrieve results (POIs) and then returns them to the users.

Figure 2. The conceptual framework of dummy-based approaches. An LBS user requests the nearest hospitals from their actual location. To protect privacy, dummy-based methods fetch results using both the real location and dummy locations.

Figure 3. The overall scenario of our system.

Figure 4. Agent-based architecture.

Figure 5. Dummy location selection in the CaOaDSL algorithm. The real user’s location falls within a cell that has a query probability of (0.0213). To prevent attackers from identifying the real location, the algorithm selects (13) candidate cells, marked by √, with the same query probability as the real one. These candidate cells serve as dummy locations, ensuring strong privacy protection by making it difficult to distinguish the real location among them. However, not all candidate dummies optimize cache utility. From this set, the algorithm ultimately selects the dummies that provide the greatest cache contribution.

Figure 6. Dummy location selection according to the normalized distance term.

Figure 7. Solution to the normalized distance problem.

Figure 8. Measurement of the cloaking region.

Figure 9. Migration of the M-A-user agent.

Figure 10. Problem–solution matching flowchart of the CaOaDSL algorithm.

Figure 11. Sequence diagram for answering a query with the cache.

Figure 12. Sequence diagram for answering a query with the LBS server.

Figure 13. Communication cost vs. time progress,

(k = 3)

.

Figure 13. Communication cost vs. time progress,

(k = 3)

.

Figure 14. Communication cost vs.

k

.

Figure 14. Communication cost vs.

k

.

Figure 15. Cache hit ratio vs.

k

,

t = 120

.

Figure 15. Cache hit ratio vs.

k

,

t = 120

.

Figure 16. Entropy vs.

k

,

t = 120

.

Figure 16. Entropy vs.

k

,

t = 120

.

Figure 17.

A C P

value for 20 LBS users, k = 6, t = 120.

Figure 17.

A C P

value for 20 LBS users, k = 6, t = 120.

Figure 18. Network overhead vs.

k, φ = 0.7

, NO. LBS users = 1000.

Figure 18. Network overhead vs.

k, φ = 0.7

, NO. LBS users = 1000.

Figure 19. Network overhead vs. number of LBS users,

k = 12, φ = 0.7

.

Figure 19. Network overhead vs. number of LBS users,

k = 12, φ = 0.7

.

Table 1. General form of an LBS query.

Symbol	<X, Y>	POI	R	ID
Description	The coordinates of the exact location of the LBS user.	The queried interest.	The queried range.	The identity of the LBS user.

Table 2. Properties of our approach (√ means satisfied property while × means opposite).

	Technical Problems		Privacy-Level Degree	Resistance Against Inference Attacks
Approach	Heterogeneous Platforms	Network Overhead	K-Anonymity Level	Homogeneity Attack	Semantic Location Attack
CirDummy [9]	×	×	Low	×	×
GridDummy [9]	×	×	Low	×	×
Dest-Ex [10]	×	×	Low	×	×
Ours	√	√	High	√	√

Table 3. Agents.

Agent Name	Type	Location
M-A-user	Mobile	Each mobile device
S-A-cache	Stationary	Access point

Table 4. Capabilities of the attacker (LBS server).

Cap. No.	Description
1	Can monitor the current queries that are sent by the LBS users and access/obtain all stored information.
2	Knows the CaOaDSL algorithm.

Table 5. Comparison of danger statuses of LBS users.

$Settings : k = 6$ $, t = 120$ $, k = 6, t h r = 0.8 .$
	Term	Number of LBS Users that Exceeded the Threshold	Percentage of Encroachment
Approach		Number of LBS Users that Exceeded the Threshold	Percentage of Encroachment
CaOaDSL		2	0.1
Dest-Ex		12	0.6
CirDummy		15	0.75
GridDummy		20	100

Table 6. Percentage of encroachment for the predefined thresholds.

Try No.	No. of LBS Users	$T$	$T h r$	Percentage of Encroachment
Try No.	No. of LBS Users	$T$	$T h r$	CaOaDSL	Dest	Cir	Grid
1	40	130	0.75	0.12	0.51	0.64	100
2	60	140	0.7	0.14	0.53	0.8	100
3	80	150	0.65	0.21	0.4	0.68	100
4	100	160	0.6	0.18	0.43	0.56	100
5	120	170	0.55	0.13	0.38	0.54	100

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Published by MDPI on behalf of the International Society for Photogrammetry and Remote Sensing. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Aloufi, O.F.; Alfakeeh, A.S.; Alotaibi, F.M. An Agent-Based System for Location Privacy Protection in Location-Based Services. ISPRS Int. J. Geo-Inf. 2025, 14, 433. https://doi.org/10.3390/ijgi14110433

AMA Style

Aloufi OF, Alfakeeh AS, Alotaibi FM. An Agent-Based System for Location Privacy Protection in Location-Based Services. ISPRS International Journal of Geo-Information. 2025; 14(11):433. https://doi.org/10.3390/ijgi14110433

Chicago/Turabian Style

Aloufi, Omar F., Ahmed S. Alfakeeh, and Fahad M. Alotaibi. 2025. "An Agent-Based System for Location Privacy Protection in Location-Based Services" ISPRS International Journal of Geo-Information 14, no. 11: 433. https://doi.org/10.3390/ijgi14110433

APA Style

Aloufi, O. F., Alfakeeh, A. S., & Alotaibi, F. M. (2025). An Agent-Based System for Location Privacy Protection in Location-Based Services. ISPRS International Journal of Geo-Information, 14(11), 433. https://doi.org/10.3390/ijgi14110433

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Agent-Based System for Location Privacy Protection in Location-Based Services

Abstract

1. Introduction

1.1. Motivation

1.2. Contributions

1.3. Structure of Paper

2. Related Works

2.1. Server-Based Approaches

2.2. User-Based Approaches

3. Proposed Privacy Protection Architecture

3.1. System Model

3.2. Agent-Based Location Privacy Protection Architecture

3.3. Roles of the Agents

3.4. Proposed Architecture Details

4. Security Analysis

4.1. Security of Agents

4.2. Security Against Inference Attacks

5. Used Metrics

5.1. Privacy Metrics

5.2. Performance Metrics

6. Experimental Results and Evaluations

6.1. Simulation Setup

6.2. Evaluation of Cache Hit Ratio Results

6.3. Evaluation of Results on Resistance Against Inferences Attacks

6.4. Scalability and Network Overhead Results Evaluation

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI