A Combined Approach of Heat Map Confusion and Local Differential Privacy for the Anonymization of Mobility Data

Dürr, Christian; Gühring, Gabriele S.

doi:10.3390/app15148065

Open AccessArticle

A Combined Approach of Heat Map Confusion and Local Differential Privacy for the Anonymization of Mobility Data

by

Christian Dürr

and

Gabriele S. Gühring

^*

Faculty Computer Sciences and Engineering, Esslingen University of Applied Sciences, Kanalstr. 33, 73728 Esslingen, Germany

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(14), 8065; https://doi.org/10.3390/app15148065

Submission received: 9 June 2025 / Revised: 13 July 2025 / Accepted: 14 July 2025 / Published: 20 July 2025

(This article belongs to the Special Issue Recent Advances in Data Privacy, Transparency and Cybersecurity: Theory and Applications)

Download

Browse Figures

Versions Notes

Abstract

Mobility data plays a crucial role in modern location-based services (LBSs), yet it poses significant privacy risks, as it can reveal highly sensitive information such as home locations and behavioral patterns. This paper focuses on the anonymization of mobility data by obfuscating mobility heat maps and combining this with a local differential privacy method, which generates synthetic mobility traces. Using the San Francisco Cabspotting dataset, we compare the effectiveness of the combined approach against reidentification attacks. Our results show that mobility traces treated with both a heat map obfuscation and local differential privacy are less likely to be reidentified than those anonymized solely with Heat Map Confusion. This two-tiered anonymization process balances the trade-off between privacy and data utility, providing a robust defense against reidentification while preserving data accuracy for practical applications. The findings suggest that the integration of synthetic trace generation with heat map-based obfuscation can significantly enhance the protection of mobility data, offering a stronger solution for privacy-preserving data sharing.

Keywords:

mobility data anonymization; Heat Map Confusion; Location Privacy Protection Mechanism; reidentification attacks; synthetic mobility traces

1. Introduction

With the rapid advancement of mobile technology, the collection of mobility data has become an integral part of urban planning, transportation systems, and various Location-Based Services (LBSs). This data, detailing users’ precise movements and locations, offers deep insights but simultaneously poses significant privacy risks [1,2]. Personal information, such as home addresses, workplaces, or even behavioral patterns, can be gathered from mobility data, raising substantial concerns about user privacy [3].

To address these concerns, Location Privacy Protection Mechanisms (LPPMs) have been developed, with the aim of anonymizing mobility data while preserving its utility for analysis. Standard LPPMs such as differential privacy [4] and k-anonymity focus on masking individual locations or clustering data points to obfuscate users’ identities [2]. However, these techniques often focus on only protecting specific mobility features like the removal of Points of Interest (POIs) to prevent POI-based attacks or the obfuscation of transition probabilities between locations in Markov chain-based mobility patterns, leaving users vulnerable to in-depth reidentification attacks that leverage comprehensive movement patterns [5].

Heat Map Confusion (HMC) [2,6], which was first suggested by Maouche et al. [2], is an LPPM that obfuscates mobility traces by transforming them into generalized heat maps, capturing both frequently visited locations and overall movement patterns. HMC protects against reidentification attacks by modifying user profiles to resemble, but not replicate, the profiles of other users. Using these altered heat maps, HMC ensures a balance between maintaining data utility and enhancing privacy protection. The integration of local differential privacy (LDP) [7,8,9] through LDPTrace, as defined in the trajectory synthesis framework [4,8], enhances the ability of HMC to protect against reidentification attacks.

Heat maps have been used for many years [10] and are currently used [11] to visualize geospatial data in the best possible way and improve the performance of algorithms. The use of heat maps to anonymize individual geospatial mobility data and at the same time maintain its usefulness has increased [1,2,12] with increasing regulatory requirements, e.g., the EU General Data Protection Regulation [13].

This paper compares the effectiveness of HMC alone and in combination with LDPTrace (HMC plus LDPTrace) against an Aggregate Privacy Attack (AP-Attack) scenario, as defined in Section 5. An AP-Attack tries to uncover personal information by analyzing patterns in combined mobility data, like heat maps.

While HMC focuses on obfuscating mobility patterns through heat map alterations, LDP adds a layer of differential privacy by introducing controlled randomness, ensuring that individual data points remain unidentifiable even in aggregated datasets. There are other locally differentiable privacy methods, i.e., NGRAM, proposed in [14]. However, LDPTrace proves to be more efficient and faster—see [4]—which is critical when performing LDP on a user’s edge device.

Using the San Francisco Cabspotting dataset, which includes detailed taxi movement records, we evaluate both HMC and HMC plus LDPTrace, assessing their ability to protect user privacy without sacrificing the quality and utility of the data.

This paper provides new insights into how combined anonymization methods can address the increasing challenges of privacy protection in mobility data:

It utilizes a combined approach using HMC and LDP to address vulnerabilities in mobility data anonymization, which has not been explored together before.
It integrates LDP through LDPTrace to enhance HMC’s resistance against Aggregate Privacy (AP-Attacks), a novel improvement over using HMC alone.
The approach emphasizes preserving both individual privacy and the utility of aggregate mobility data, achieving a balance not previously demonstrated in comparable methods.

2. Heat Map Confusion

Heat Map Confusion (HMC) [1,2,6] is used to anonymize mobility traces. HMC is an LPPM designed to protect users’ mobility data from reidentification attacks using detailed heat map representations of movement patterns. Unlike standard LPPMs, which mainly focus on small-scale mobility data, such as single location points, HMC works with broader mobility features. It converts user data into heat maps that show frequently visited places and overall movement trends (see Figure 1), as described in [2].

Main Principles of HMC

Heat Map Creation: The first step in the HMC process is to turn a user’s mobility trace $T$ . A mobility trace $T$ of length $n$ is a sequence of latitude, longitude, and timestamp: $T = (⟨{lat}_{1}, {long}_{1}, t_{1}⟩, ⟨{lat}_{2}, {long}_{2}, t_{2}⟩, ⟨{lat}_{3}, {long}_{3}, t_{3}⟩, \dots, ⟨{lat}_{n}, {long}_{n}, t_{n}⟩)$ into a heat map $H$ . The heat map creation step involves dividing a specific geographical area into a grid of squares, where each cell represents a location. The intensity of each cell shows how often a specific user has visited that spot (see Figure 1). This approach makes raw GPS data more general, creating a wider view of movement patterns and hiding precise location details.
Heat Map Alteration: To anonymize the data, HMC changes each user’s heat map by comparing and merging it with the heat map of a similar user from a reference set $P$ . This step ensures that the final heat map does not show unique patterns of the original user. To calculate the similarity of two different heat maps, the Topsøe Divergence is used.

The Topsøe Divergence is a metric used to measure the dissimilarity between two heat maps

P

and

Q

[2,15].

d_{T o p s o e} (P, Q) = \sum_{i, j} (P_{i, j} \ln (\frac{2 P_{i, j}}{P_{i, j} + Q_{i, j}}) + Q_{i, j} \ln (\frac{2 Q_{i, j}}{P_{i, j} + Q_{i, j}}))

(1)

Here,

P_{i, j}

and

Q_{i, j}

represent the probabilities at cell (

i

,

j

) in the specific heatmap. The metric combines two terms: the first measures how

P

diverges from

Q

, while the second evaluates how

Q

diverges from

P

. A value close to zero indicates high similarity between the distributions, whereas larger values mean greater dissimilarity. The Topsøe Divergence is an asymmetric version of the Kullback–Leibler Divergence and therefore has a range between

0

and

2

; see Table 1, Table 2, Table 3 and Table 4.

The merging process works step by step and starts with an original heat map

H

that needs to be obfuscated. For this process, two additional heat maps are selected from the entirety of known user profiles

R

:

U

, the heat map that is the most similar to

H

, determined by selecting the profile with the lowest Topsøe Divergence, and

V

, the heat map that has the highest utility, determined by the most overlapping visited cells, which is defined as Area Coverage [2].

Area Coverage (AC) is a metric used to compare two mobility traces,

T

and

T^{'}

, by checking how similar the areas they cover are. This metric is important because it helps determine how well an anonymized mobility trace keeps the useful patterns of the original one. AC includes three components: precision, recall, and F-score.

The Area Coverage Precision

A C_{Precision} (T, T^{'})

measures how much of the reconstructed trace T^′ overlaps with the original trace T.

A C_{Precision} (T, T^{'}) = \frac{|C (T) \cap C (T^{'})|}{|C (T^{'})|}

(2)

Here,

C (T)

represents the set of all cells visited in

T

, and

C (T^{'})

is the set of all cells visited in

T^{'}

.

The Area Coverage Recall

A C_{Recall} (T, T^{'})

measures the proportion of cells in the original trace T that are preserved in

T^{'}

.

A C_{Recall} (T, T^{'}) = \frac{|C (T) \cap C (T^{'})|}{|C (T)|}

(3)

Finally, the Area Coverage F-Score

A C (T, T^{'})

combines precision and recall into a single metric. It is computed as the mean of precision and recall as depicted in Equation (4):

A C (T, T^{'}) = \frac{2 \cdot A C_{Precision} (T, T^{'}) \cdot A C_{Recall} (T, T^{'})}{A C_{Precision} (T, T^{'}) + A C_{Recall} (T, T^{'})}

(4)

A C (T, T^{'})

ensures that the reconstructed mobility trace achieves both high precision and recall, reflecting a balance between retaining relevant details from the original trace and minimizing unnecessary noise.

The process of modifying the heat map

H

begins by identifying

U

, the most similar heat map according to the Topsøe Divergence, and

V

, the heat map with the best overlap in visited areas, i.e., the best Area Coverage F-Score [2]. An area is defined as a single cell

X_{i, j}

of a given heat map

X

, where

X \in {H, U, V}

.

X_{i, j}

is considered to be visited when there is at least one record

(x, y)

in the mobility trace

T

such that

(x, y) \in X_{i, j}

. Overlap between two heat maps consists of all cells

X_{i, j}

, which are visited in both heat maps simultaneously.

If

H

and

U

belong to different users,

H

is returned unchanged. In other cases,

H

is iteratively altered by blending it with

V

. In each iteration step, the values in overlapping cells between

H

and

V

are increased, while values in non-overlapping cells are reduced, creating an altered heat map

H^{'}

[2]. The strength of this alteration is determined by an obfuscation factor

α

. This process continues until

H^{'}

becomes more similar to

V

than to

U

or until the maximum number of iterations is reached. If no suitable solution is found,

V

is used as the final heat map

H^{'}

[2].

For example, in a scenario where

H

has high activity in cells

A

and

B

, heat map

U

has activity in cells

B

and

C

, while heat map

V

shows activity in cells

C

and

D

. The algorithm will iteratively reduce activity in cell

A

and increase it in cell

D

. At the same time, it keeps some overlap in cell

B

to maintain general movement trends. The result is a new heat map

H^{'}

that protects privacy but still keeps the overall movement patterns useful for analysis.

3.: Mobility Trace Reconstruction: After the heat map $H$ is modified into the obfuscated heat map $H^{'}$ , the next step reconstructs an anonymized mobility trace $T^{'}$ that corresponds to $H$ . In [2], this process is described in detail.

Mobility Trace Reconstruction results in the reconstructed mobility trace

T^{'}

following the general patterns of movement seen in the original trace

T

, such as frequent visits to a certain area and realistic travel routes [2]. However, the exact details of the original movements are altered to prevent reidentification.

The entire process of HMC is displayed in Figure 2.

HMC is especially effective in protecting against reidentification attacks, such as the AP-Attack, as defined in Section 5, which uses combined heat map data to identify users. By changing heat maps to hide specific user patterns, HMC reduces the uniqueness of mobility traces. Thus, it protects up to 87% of users against reidentification, with minimal loss of data utility [2].

3. Local Differential Privacy

To ensure robust privacy protection for mobility data, the concept of ε-differential privacy is frequently used. It guarantees that the inclusion or exclusion of any single data entry in a database does not significantly alter the results of queries, thereby protecting individual user information [9]. Formally (see Equation (5)), an algorithm

A

satisfies ε-differential privacy if for all databases

D_{1}

and

D_{2}

, differing by at most one element, and any possible outcome

S

of

A

, the following condition holds:

P [A (D_{1}) \in S] \leq e^{ε} P [A (D_{2}) \in S]

(5)

In this formula,

ε

represents the privacy budget, which controls the balance between privacy and data utility [9]. A smaller

ε

value means stronger privacy guarantees, because more noise is added to the data, making it harder to infer individual user details. However, this also reduces the usefulness of the data for analytical purposes. Conversely, a larger

ε

offers higher data accuracy but weaker privacy [9]. When applied to mobility data, this balance is particularly critical as even slight inaccuracies can disrupt transportation models or infrastructure planning. Thus, selecting an appropriate ϵ value depends on the specific application and the sensitivity of the data [7,8,9].

In [16], Dwork et al. point out that there is no clear consensus on how to choose

ε

in an

ε

-differential privacy framework. The authors in [16] therefore suggest the creation of a publicly available communal body of knowledge about differential privacy implementations. Here, we use a similar

ε

as in [4] since our work builds on this approach.

Unlike traditional differential privacy that requires a trusted data curator, LDP ensures privacy by allowing users to perturb their own data before sharing it. This protects individual trajectory data by guaranteeing that even if data is intercepted or analyzed, the original patterns remain private.

In [4], LDPTrace, a concept for

ε

-local differential privacy for mobility data, is implemented. As HMC, LDPTrace works on a grid of rectangular cells covering the examined area. In order to implement LDPTrace, the following steps are performed:

1.

Feature Extraction: LDPTrace extracts three main features from individual user trajectories and obfuscates them with a privacy budget

ε

:

Intra-trajectory Transitions: The movement between consecutive cells in a grid of a trajectory, capturing local movement behavior.
Start and End Points: Virtual markers indicate where trajectories begin and end, which helps in preserving trajectory structures.
Trajectory Length: A probability distribution is determined by a central data curator using the obfuscated trajectory lengths of each trajectory.

2.

Frequency Estimation: To estimate frequencies for trajectory synthesis, Optimized Unary Encoding (OUE) is used for each of the features extracted. OUE represents each feature as a binary vector

V

of length equaling the maximal value of the feature, where

V [i] = 1

at the index, which equals the feature value and zero otherwise. OUE adds noise to binary data to protect privacy before combining the data [3]. As depicted in Equation (6), the probability of a perturbed vector

\hat{V}

at index i being 1 is defined as

P (\hat{V} [i] = 1) = \{\begin{matrix} \frac{1}{2}, & if V [i] = 1 \\ \frac{1}{e^{ε} + 1}, & i f V [i] = 0 \end{matrix}

(6)

Here,

ε

represents the privacy budget, with smaller values ensuring greater privacy at the cost of higher noise. This mechanism ensures that reconstructed trajectories retain key statistical properties, such as region transitions and visiting frequencies, while safeguarding individual user data.

3.: Adaptive Synthesis Process: The framework builds a probabilistic model using these extracted features, allowing it to generate synthetic trajectories that mimic real movement patterns [17]. The synthesis process is adaptive, meaning it selects transitions and trajectory lengths based on learned distributions without needing exact user data, thus enhancing privacy.

Trajectory synthesis under local differential privacy ensures that user mobility patterns can be anonymized without sacrificing their statistical utility. In [3], a Markov chain-based model is used to probabilistically synthesize trajectories. In [9], a graph-based model offers the possibility of using more flexible grids. LDPTrace models transitions between regions, as well as the distribution of start and end points, to replicate realistic mobility behavior. By utilizing these probabilistic distributions, LDPTrace preserves the overall structure of the mobility data while effectively anonymizing individual details.

LDPTrace offers several benefits over traditional trajectory anonymization methods:

Enhanced Privacy Protection: By employing local differential privacy, LDPTrace reduces the risk associated with data aggregation and central storage, ensuring that users’ real mobility data is not exposed [4].
Improved Utility: The framework maintains high data utility, as synthetic trajectories generated by LDPTrace closely match real-world movement patterns [4]. This is particularly beneficial for analyses that depend on aggregated mobility trends rather than specific individual behaviors.
Low Computational Cost: Unlike older methods that rely on intensive computations, such as linear programming or external data integration, LDPTrace simplifies the synthesis process, making it feasible for use on devices with limited resources [4].
Resistance to Attacks: The method is designed to withstand common location-based attacks, such as reidentification and outlier analysis. By generating trajectories that do not closely mirror any specific user data, LDPTrace effectively minimizes vulnerabilities [4].

In LDPTrace, most of the computations on individual trajectories are performed by the individual user’s edge device. The computational load on the aggregation center, where the trajectory synthesis is performed, is therefore reduced. The total computational complexity on each edge device for a trajectory

T

is

O (|T| |C|)

, which consists of

O (|T|)

for perturbing the length vector and start/end points and

O (|T| |C|)

for perturbing the grid region transitions where

| C |

is the number of grid regions. Performing LDPTrace on an edge device with limited resources should therefore be possible. Further runtime experiments for LDPTrace are published in [4], showing that LDPTrace performs better than other approaches of LDP.

4. Combination of HMC and LDPTrace

HMC and LDPTrace focus on different aspects of privacy and data utility in mobility data. HMC is designed to protect large-scale mobility patterns by creating and modifying heat maps that represent user traces. It changes these heat maps to look similar to those of other users, which helps protect overall mobility trends and frequently visited areas; see Figure 2. This makes HMC effective against attacks that try to identify users based on their general movement behavior. However, it does not focus on protecting individual data points.

On the other hand, LDPTrace uses ε-differential privacy to anonymize data at a more detailed level. It adds noise to individual data features and uses probabilistic models to preserve patterns like region transitions and trajectory statistics. This makes LDPTrace better for applications that need detailed data, as it balances privacy and accuracy for each point. While HMC works well for protecting general mobility trends, LDPTrace provides stronger protection for specific data points.

While LDPTrace protects the individual trajectories of a person, the trajectories are only useful in an aggregated form, e.g., in the form of heat maps. Because the central data curator generates synthesized trajectories using aggregated probability distributions of individual features of a trajectory, there is no direct link between an original trajectory and a synthesized trajectory. However, an aggregated heat map can also change during the execution of LDPTrace, as Figure 3 shows. Here, LDPTrace was applied to all Cabspotting data from San Francisco [18]. We measure these central changes in the heat map with the Topsøe Divergence; see Section 2.

In this paper, we combine HMC with LDP and, in particular, with LDPTrace. We thus utilize the strength of each method. This is performed according to the following sequential approach:

Synthetic Trajectory Generation with LDPTrace: First, LDPTrace creates synthetic mobility traces based on real mobility data from the San Francisco taxi dataset. These synthetic trajectories simulate movement patterns without directly revealing actual user locations. By introducing randomness in location data, LDPTrace makes it difficult to link specific movements to real individuals while maintaining patterns that resemble real-world data.
Heat Map Confusion on Synthetic Data: Once the synthetic trajectories are generated, HMC further anonymizes the data. HMC creates heat maps from these synthetic traces and then alters them by merging similar patterns. This step reduces the likelihood of reidentification based on movement patterns, as individual traces are grouped into less specific patterns. By transforming synthetic data into altered heat maps, this combined approach provides additional protection for individual locations and routes.

However, while the computation of a heat map of a synthesized trajectory preserves ε-differential privacy because of the post-processing immunity—see [19]—the heat map alteration algorithm according to [2] does not. This is why in the following section we focus on reidentification attacks instead of a privacy budget on the one side and a utility metric, i.e., Topsøe Divergence, on the other side.

5. Aggregate Privacy Attack

An AP-Attack (Aggregate Privacy Attack) [2,6] is a reidentification method that aims to identify users by analyzing their movement patterns as shown in aggregated heat maps. This attack uses frequent locations and movement trends from aggregated heat maps to match individual known mobility profiles [6]. By examining how often a user visits different areas, the AP-Attack compares anonymized data to known patterns. This approach is especially effective on datasets where general movement trends are still visible, as it focuses on commonly visited areas rather than specific locations [2,6].

Figure 4 shows the process of reidentifying users based on their mobility data by carrying out an AP-Attack. The process is divided into two main parts: the training phase and the reidentification phase.

In the training phase [6], mobility data from a set of known users

U

is collected. This data contains their movement traces

K D

, which shows where and how these users moved over time. For each user

U_{i}

, a mobility profile

p (K D_{i})

is created. Each profile abstracts individual movement patterns of the mobility traces into a heat map.

In the reidentification phase [6], a separate set of anonymized data

U D

is used. This anonymized data contains movement traces

U D_{j}

from users whose identities are unknown. For each trace, a profile

p (U D_{j})

is created in the same way as in the training phase. These anonymized profiles are then compared to the known user profiles

p (K D_{i})

using a distance metric. This metric calculates how similar the two profiles are. In the context of this paper, the Topsøe Divergence is used as a distance metric. The anonymized trace

p (U D_{j})

is then matched to the known profile

p (K D_{i})

that is most similar.

6. Experimental Evaluation

The Heat Map Confusion (HMC) method and the evaluation of the reidentification rate using AP-Attack are implemented using the Python 3.12.3 programming language with Visual Studio Code version 1.84. The following Python libraries are used to conduct the experiments:

NumPy (version 1.26.4): This library is used for numerical operations and handling large datasets, especially for creating and modifying the heat maps.

Pandas (version 2.2.2): Pandas is used for data manipulation, especially for processing the taxi mobility traces and managing time-stamped data.

Matplotlib (version 3.8.4): This library is used to create and visualize heat maps and to plot the results of the experiments.

Scikit-learn (version 1.5.0): This is used for clustering algorithms like KMeans to preprocess the mobility data during the Heat Map Alteration step.

6.1. Dataset

The publicly available San Francisco Cabspotting dataset is used in this study. It includes the GPS traces of 536 taxis collected over a period of 23 days. Each trace includes the latitude, longitude, timestamp, and the occupancy status of a taxi. The San Francisco Cabspotting dataset is a commonly used, publicly available dataset. It therefore serves as a benchmark for several anonymization algorithms; see [1,2,4,6,8]. The dataset is processed to convert the raw GPS data into heat maps. All data used in this study is publicly available for replication [18].

To simulate the training and reidentification phases effectively, it is necessary to partition the mobility datasets. The splitting process involves dividing the trajectories of each taxi randomly to ensure variability and avoid bias into a 50% training and a 50% reidentification dataset.

As in [2], we divide the geographical area into an 8 × 8 grid of squares of the metro area of San Francisco, each bin roughly 800 m in height and length. Each cell in the heat map shows how often a single taxi visits a specific location; see Figure 1.

The coordinates used as area are latitude: 37.60–37.85, longitude: −122.50–−122.35.

6.2. Evaluation with Aggregate Privacy Attack

To test how well HMC and LDPTrace protect data, we apply an AP-Attack in two steps:

AP-Attack on unaltered Heat Maps: The AP-Attack is first applied to the original heat maps, which have not been anonymized with HMC. This provides a baseline reidentification rate, showing how easily a taxi can be reidentified without any further anonymization.
AP-Attack on obfuscated Heat Maps: The AP-Attack is applied to the heat maps after the HMC anonymization process. Comparing the reidentification rates from the original and altered maps shows whether the HMC process is effective. A lower reidentification rate indicates successful anonymization.

In both cases, tests are conducted with synthesized trajectories (created with LDPTrace, based on the San Francisco Cabspotting dataset) and non-synthesized trajectories (the original trajectories from the San Francisco Cabspotting dataset). This allows us to compare how well the different approaches perform under varying circumstances.

The results of the AP-Attacks performed on the San Francisco Cabspotting dataset are presented in Table 1 and Table 2 for the non-obfuscated baseline data and the data processed with HMC. In contrast, Table 3 and Table 4 display the outcomes when using synthesized trajectories generated by LDPTrace. While Table 1, Table 2, Table 3 and Table 4 summarize the average values and standard deviations across all trials, Table A1, Table A2, Table A3 and Table A4 in Appendix A provide a detailed breakdown of nine individual trials.

Table 1. Non-obfuscated normal dataset—overview.

Metrics	Mean	Standard Deviation
Reidentified Total	103	32.784
Non-Reidentified Total	433	32.784
Reidentified (%)	19.20	6.116
Non-Reidentified (%)	80.80	6.116
Average Distance (Topsøe Divergence)	0.034	0.014

Table 2. HMC-obfuscated normal dataset—overview.

Metrics	Mean	Standard Deviation
Reidentified Total	9	3.035
Non-Reidentified Total	527	3.035
Reidentified (%)	1.76	0.659
Non-Reidentified (%)	98.24	0.659
Average Distance (Topsøe Divergence)	0.070	0.019

Table 3. HMC-obfuscated synthetic dataset—overview.

Metrics			Mean	Standard Deviation
Reidentified Total	$ε =$	1	1	0.5
	1.5		1.78	0.44
	2		0.78	0.44
Non-Reidentified Total	$ε =$	1	535	0.5
	1.5		534.22	0.44
	2		535.22	0.44
Reidentified (%)	$ε =$	1	0.19	0.09
	1.5		0.33	0.08
	2		0.15	0.08
Non-Reidentified (%)	$ε =$	1	99.81	0.09
	1.5		99.67	0.08
	2		99.85	0.08
Average Distance (Topsøe Div.)	$ε =$	1	0.162	0.016
	1.5		0.144	0.015
	2		0.139	0.016

Table 4. Non-obfuscated synthetic dataset—overview.

Metrics			Mean	Standard Deviation
Reidentified Total	$ε =$	1	1.67	0.71
	1.5		2.56	0.73
	2		1.33	0.5
Non-Reidentified Total	$ε =$	1	534.33	0.71
	1.5		533.33	0.73
	2		534.67	0.5
Reidentified (%)	$ε =$	1	0.31	0.13
	1.5		0.48	0.14
	2		0.24	0.1
Non-Reidentified (%)	$ε =$	1	99.69	0.13
	1.5		99.52	0.14
	2		99.75	0.1
Average Distance (Topsøe Div.)	$ε =$	1	0.161	0.016
	1.5		0.142	0.016
	2		0.138	0.017

To maintain consistency, randomness is controlled by setting nine pre-defined NumPy np.random seeds, which ensure that the same random trajectories are used across trials. Each scenario is conducted over nine separate trials, ensuring consistent and reliable results by reducing the influence of random variations in the data partitioning process.

The following terms explain the metrics used in the results tables:

(Non-)Reidentified Total: This refers to the total number of (un-)successful reidentifications.
(Non-)Reidentified (%): This refers to the percentage of (un-)successful reidentifications.
Seed: NumPy seed set with np.random. This influences how the datasets are partitioned into train and test datasets.
Average Distance: The average minimum distance determined by calculating Topsøe Divergence—see Equation (1)—over all 536 datasets.

Table 1 provides an overview of the non-obfuscated dataset, summarizing key metrics across all trials. It includes the average reidentification rate (19.2%) and its standard deviation of 6.12, along with the corresponding non-reidentification rate (80.8%). The mean Topsøe Divergence is 0.034, and its standard deviation is 0.014. These values function as the baseline for comparing the effectiveness of the obfuscation process.

The detailed results as shown in Table A1 exhibit a Reidentified (%) rate ranging from 8.40% to 28.36%. This highlights the significant risk associated with sharing raw mobility data, as nearly one in five mobility traces can be successfully reidentified without any privacy protection measures. The Average Distance measured as a Topsøe Divergence shows some variance with values ranging from 0.02 to 0.07.

In contrast, the HMC-obfuscated dataset, as shown in Table 2 and Table A2, achieves a substantial reduction in reidentification rates. Compared to Table 1, the average reidentification rate decreases from 19.2% to 1.76%, demonstrating the effectiveness of HMC in enhancing privacy protection. The corresponding non-reidentification rate increases from 80.8% in the non-obfuscated dataset to 98.24% in the HMC-obfuscated dataset.

The Reidentified (%) rate in HMC (Table A2) ranges from 0.56% to 2.5%, with an average significantly lower than in the non-obfuscated dataset. The lowest reidentification rate appears in Trial 4 (0.56%), while the highest (2.5%) occurs in Trial 1 and Trial 6. The Average Distance (Topsøe Divergence) values for the HMC dataset range from 0.05 to 0.10, which is a bit over the value of the non-obfuscated dataset. This suggests that HMC alone effectively increases anonymity and reduces the likelihood of reidentification.

In contrast, the HMC-obfuscated synthetic dataset, shown in Table 3 and Table A3, achieves an even greater reduction in reidentification rates. Compared to Table 1, the average reidentification rate drops from 19.2% in the non-obfuscated dataset to just 0.15% to 0.33% in the HMC-obfuscated synthetic dataset, depending on the

ε

used. The non-reidentification rate increases correspondingly from 99.85% to 99.67%, indicating that the combination of HMC and LDPTrace provides strong privacy protection.

The Reidentified (%) rates in these trials range from 0% in multiple trials (e.g., Trials 2, 3, and 6) to a maximum of 0.37% in several trials, depending on the ε value used.

The parameter

ε

is called a privacy budget; it controls the trade-off between privacy and data utility. A smaller ε means stronger privacy protection. However, a lower ε might also result in a higher utility metric, i.e., Topsøe Divergence. This is because a lower ε introduces more random noise to the data, making individual traces harder to distinguish but also increasing the overall distortion of mobility patterns [4]. Conversely, larger ε values allow for better data utility but may increase the risk of reidentification.

In this case, the combined approach results in higher Topsøe Divergence values, particularly for ε = 1, where the average distance reaches 0.16 with a standard deviation of 0.016. This demonstrates a strong level of data protection, significantly enhancing privacy. In comparison, higher values of

ε

reduce the Topsøe Divergence.

In comparison, the reidentification rate and average distance of mobility traces obfuscated only with LDPTrace (Table 4) are 0.31% and 0.16 for

ε

= 1, respectively. This is significantly lower than the reidentification rate of 19.20% and average distance of 0.034 observed in the non-obfuscated dataset (Table 1). However, when both HMC and LDPTrace are applied (Table 3), for

ε

= 1, the reidentification rate drops further to 0.19%, and the average distance increases to 0.162, highlighting the enhanced privacy protection achieved by combining both anonymization techniques.

7. Conclusions

The results indicate that, in general, the higher the level of privacy, the greater the average distance measured with the Topsøe Divergence. However, the comparison between using HMC alone and combining it with LDPTrace gives several important insights:

HMC alone is effective at lowering Reidentified (%) rates, with average distances from 0.05 to 0.10, as shown in Table A2. This indicates that HMC can offer privacy protection while keeping data reasonably useful.
HMC combined with LDPTrace provides stronger privacy, as shown by the lower Reidentified (%) rates between 0% and 0.37% in Table A3 and larger average distances up to 0.20. This demonstrates that the additional application of HMC further strengthens anonymization.
While LDPTrace alone increases Topsøe Divergence nearly tenfold, reflecting its strong anonymization effect, adding HMC does not significantly increase this distance further. Instead, it provides a substantial additional drop in reidentification rates, implying that HMC complements LDPTrace effectively by improving anonymization without further distorting the data.

On the whole, we manage to show that the synthesized trajectories created with the LDPTrace algorithm can be further anonymized and made inaccessible to attacks by adding HMC anonymization.

To enhance the balance between privacy and data utility, future research should focus on

Application-specific studies: Analyzing how different data modification levels impact specific applications or use cases, i.e., environmental pollution or traffic simulations.
Including other utility metrics: Creating or utilizing other metrics beyond average distance, such as checking for time-based consistency and the accuracy of Points of Interest (POIs), can offer a clearer view of data usability.
Experiments with other attacks: To strengthen the robustness of privacy protection methods and ensure comprehensive evaluation, future work should focus on testing additional attack scenarios and analyzing their impact on reidentification rates and data utility. This could include exploring attacks such as outlier detection, POI (Point of Interest)-based attacks, and PIT (Point-in-Time)-based attacks; see [2].

Achieving strong privacy protection while maintaining data utility is crucial for practical applications. The fact that HMC enhances LDPTrace’s anonymization without introducing additional distortion makes this combined approach especially promising.

Author Contributions

Conceptualization and writing—review and editing, G.S.G.; software and writing—original draft preparation, C.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the German Federal Ministry of Research, Technology and Space (BMFTR), grant number 16KISA046K.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available in CRAWDAD Epfl/Mobility; IEEE Dataport, https://doi.org/10.15783/C7J010.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Non-obfuscated normal dataset—complete.

Metrics	Trial 1	Trial 2	Trial 3	Trial 4	Trial 5	Trial 6	Trial 7	Trial 8	Trial 9
# Taxi ID	536	536	536	536	536	536	536	536	536
Reidentified Total	152	82	141	45	140	83	90	96	97
Non-Reidentified Total	384	454	395	491	396	453	446	440	439
Reidentified (%)	28.36	15.30	26.31	8.40	26.12	15.49	16.79	17.91	18.10
Non-Reidentified (%)	71.64	84.70	73.69	91.60	73.88	84.51	83.21	82.09	81.90
Seed	845	286	742	301	87	123	581	445	4
⌀ Topsøe Divergence	0.02	0.03	0.03	0.07	0.03	0.03	0.04	0.03	0.03

Table A2. HMC-obfuscated normal dataset—complete.

Metrics	Trial 1	Trial 2	Trial 3	Trial 4	Trial 5	Trial 6	Trial 7	Trial 8	Trial 9
# Taxi ID	536	536	536	536	536	536	536	536	536
Reidentified Total	11	12	10	3	9	11	7	12	5
Non-Reidentified Total	525	524	526	533	527	525	529	524	531
Reidentified (%)	2.50	2.24	1.87	0.56	1.68	2.50	1.31	2.24	0.93
Non-Reidentified (%)	97.50	97.76	98.13	99.44	98.32	97.50	98.69	97.76	99.07
Seed	845	286	742	301	87	123	581	445	4
⌀ Topsøe Divergence	0.08	0.05	0.10	0.09	0.08	0.06	0.07	0.05	0.05

Table A3. HMC-obfuscated synthetic dataset—complete.

Metrics			Trial 1	Trial 2	Trial 3	Trial 4	Trial 5	Trial 6	Trial 7	Trial 8	Trial 9
# Taxi ID			536	536	536	536	536	536	536	536	536
Reidentified Total	$ε =$	1	1	1	1	2	1	0	1	1	1
	1.5		2	2	1	1	2	2	2	2	2
	2		1	0	0	1	1	1	1	1	1
Non-Reidentified Total	$ε =$	1	535	535	535	534	535	536	535	535	535
	1.5		534	534	535	535	534	534	534	534	534
	2		535	536	536	535	535	535	535	535	535
Reidentified (%)	$ε =$	1	0.19	0.19	0.19	0.37	0.19	0	0.19	0.19	0.19
	1.5		0.37	0.37	0.19	0.19	0.37	0.37	0.37	0.37	0.37
	2		0.19	0	0	0.19	0.19	0.19	0.19	0.19	0.19
Non-Reidentified (%)	$ε =$	1	99.81	99.81	99.81	99.63	99.81	100	99.81	99.81	99.81
	1.5		99.63	99.63	99.81	99.81	99.63	99.63	99.63	99.63	99.63
	2		99.81	100	100	99.81	99.81	99.81	99.81	99.81	99.81
Seed			845	286	742	301	87	123	581	445	4
⌀ Topsoe Divergence	$ε =$	1	0.15	0.15	0.16	0.2	0.16	0.15	0.17	0.16	0.16
	1.5		0.13	0.14	0.15	0.18	0.14	0.13	0.15	0.14	0.14
	2		0.13	0.13	0.14	0.18	0.13	0.13	0.14	0.13	0.14

Table A4. Non-obfuscated synthetic dataset—complete.

Metrics			Trial 1	Trial 2	Trial 3	Trial 4	Trial 5	Trial 6	Trial 7	Trial 8	Trial 9
# Taxi ID			536	536	536	536	536	536	536	536	536
Reidentified Total	$ε =$	1	2	2	1	2	1	2	1	3	1
	1.5		2	3	2	2	2	3	3	4	2
	2		1	1	2	1	2	2	1	1	1
Non-Reidentified Total	$ε =$	1	534	534	535	534	535	534	535	533	535
	1.5		534	533	534	534	534	533	533	532	534
	2		535	535	534	535	534	534	535	535	535
Reidentified (%)	$ε =$	1	0.37	0.37	0.19	0.37	0.19	0.37	0.19	0.56	0.19
	1.5		0.37	0.56	0.37	0.37	0.37	0.56	0.56	0.75	0.37
	2		0.19	0.19	0.37	0.19	0.37	0.37	0.19	0.19	0.1
Non-Reidentified (%)	$ε =$	1	99.63	99.63	99.81	99.63	99.81	99.63	99.81	99.44	99.81
	1.5		99.63	99.44	99.63	99.63	99.63	99.44	99.44	99.25	99.63
	2		99.81	99.81	99.63	99.81	99.63	99.63	99.81	99.81	99.81
Seed			845	286	742	301	87	123	581	445	4
⌀ Topsoe Divergence	$ε =$	1	0.15	0.15	0.16	0.2	0.16	0.15	0.17	0.15	0.16
	1.5		0.13	0.13	0.14	0.18	0.14	0.13	0.15	0.14	0.14
	2		0.12	0.13	0.14	0.18	0.13	0.13	0.14	0.13	0.14

References

Khalfoun, B.; Maouche, M.; Ben Mokhtar, S.; Bouchenak, S. MooD: MObility Data Privacy as Orphan Disease—Experimentation and Deployment Paper. In Proceedings of the 20th International Middleware Conference, Davis, CA, USA, 9–13 December 2019. [Google Scholar]
Maouche, M.; Ben Mokhtar, S.; Bouchenak, S. HMC: Robust Privacy Protection of Mobility Data against Multiple Re-Identification Attacks. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2018, 2, 1–25. [Google Scholar] [CrossRef]
Gatzert, N.; Knorre, S.; Müller-Peters, H.; Wagner, F.; Jost, T. Big Data in der Mobilität: Akteure, Geschäftsmodelle und Nutzenpotenziale für die Welt von Morgen; Springer Gabler: Wiesbaden, Germany, 2023. [Google Scholar]
Du, Y.; Hu, Y.; Zhang, Z.; Fang, Z.; Chen, L.; Zheng, B.; Gao, Y. LDPTrace: Locally Differentially Private Trajectory Synthesis. Proc. VLDB Endow. 2023, 16, 1897–1909. [Google Scholar] [CrossRef]
Xu, F.; Tu, Z.; Li, Y.; Zhang, P.; Fu, X.; Jin, D. Trajectory Recovery from Ash: User Privacy is NOT Preserved in Aggregated Mobility Data. In Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, 3–7 April 2017. [Google Scholar]
Maouche, M.; Ben Mokhtar, S.; Bouchenak, S. AP-Attack: A Novel User Re-identification Attack on Mobility Datasets. In Proceedings of the 14th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services (MobiQuitous 2017), Melbourne, Australia, 7–10 November 2017. [Google Scholar]
Buchholz, E.; Abuadbba, A.; Wang, S.; Nepal, S.; Kanhere, S.S. SoK: Can Trajectory Generation Combine Privacy and Utility? Proc. Priv. Enhancing Technol. Symp. 2024, 3, 75–93. [Google Scholar] [CrossRef]
Primault, V.; Ben Mokhtar, S.; Lauradoux, C.; Brunie, L. Differentially Private Location Privacy in Practice. arXiv 2014. [Google Scholar] [CrossRef]
Walter, P.; Efremidis, A.; Gühring, G. Anonymization of Mobility Data and its Meta Information using Local Differential Privacy in Combination with Bidirectional Graphs. 2025; preprint. [Google Scholar]
Netek, R.; Pour, T.; Slezakova, R. Implementation of Heat Maps in Geographical Information System—Exploratory Study on Traffic Accident Data. Open Geosci. 2018, 10, 367–384. [Google Scholar] [CrossRef]
Jiang, D.; Wang, H.; Li, T.; Gouda, M.A.; Zhou, B. Real-time tracker of chicken for poultry based on attention mechanism-enhanced YOLO-Chicken algorithm. Comput. Electron. Agric. 2025, 237, 110640. [Google Scholar] [CrossRef]
Kapp, A.; Nuñez von Voigt, S.; Mihaljević, H.; Tschorsch, F. Towards mobility reports with user-level privacy. J. Locat. Based Serv. 2023, 17, 95–121. [Google Scholar] [CrossRef]
European Union. General Data Protection Regulation (GDPR), 2018, [Online]. Available online: https://gdpr-info.eu/ (accessed on 9 June 2025).
Cunningham, T.; Cormode, G.; Ferhatosmanoglu, H.; Srivastava, D. Real-world trajectory sharing with local differential privacy. Proc. VLDB Endow. 2021, 14, 2283–2295. [Google Scholar] [CrossRef]
Topsøe, F. Some inequalities for information divergence and related measures of discrimination. IEEE Trans. Inf. Theory 2000, 46, 1602–1609. [Google Scholar] [CrossRef]
Dwork, C.; Kohli, N.; Mulligan, D. Differential Privacy in Practice Expose your Epsilons! J. Priv. Confidentiality 2019, 2, 1–22. [Google Scholar] [CrossRef]
Abul, O.; Bonchi, F.; Nanni, M. Anonymization of Moving Objects Databases by Clustering and Perturbation. Inf. Syst. 2010, 35, 884–910. [Google Scholar] [CrossRef]
Piorkowski, M.; Sarafijanovic-Djukic, N.; Grossglauser, M. CRAWDAD Epfl/Mobility; IEEE Dataport: Piscataway, NJ, USA, 2022. [Google Scholar] [CrossRef]
Ghazi, B.; He, J.; Kohlhoff, K.; Kumar, R.; Manurangsi, P.; Navalpakkam, V.; Valliappan, N. Differentially Private Heatmaps. Proc. AAAI Conf. Artif. Intell. 2023, 37, 7696–7704. [Google Scholar] [CrossRef]

Figure 1. Single trajectory of the San Francisco Cabspotting data transformed into a heat map with rectangular grids, where the colors stand for the number of data points in a rectangle.

Figure 2. Heat Map Confusion—overview [2].

Figure 3. Visualization of the San Francisco Cabspotting dataset (left), together with a

6 \times 6

heat map of the original dataset (middle) and a corresponding heat map of the synthesized trajectories (right).

Figure 3. Visualization of the San Francisco Cabspotting dataset (left), together with a

6 \times 6

heat map of the original dataset (middle) and a corresponding heat map of the synthesized trajectories (right).

Figure 4. AP-Attack—overview, where the data set of users

U

is divided in known user data

K D

and unknown user data

U D

[6].

Figure 4. AP-Attack—overview, where the data set of users

U

is divided in known user data

K D

and unknown user data

U D

[6].

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dürr, C.; Gühring, G.S. A Combined Approach of Heat Map Confusion and Local Differential Privacy for the Anonymization of Mobility Data. Appl. Sci. 2025, 15, 8065. https://doi.org/10.3390/app15148065

AMA Style

Dürr C, Gühring GS. A Combined Approach of Heat Map Confusion and Local Differential Privacy for the Anonymization of Mobility Data. Applied Sciences. 2025; 15(14):8065. https://doi.org/10.3390/app15148065

Chicago/Turabian Style

Dürr, Christian, and Gabriele S. Gühring. 2025. "A Combined Approach of Heat Map Confusion and Local Differential Privacy for the Anonymization of Mobility Data" Applied Sciences 15, no. 14: 8065. https://doi.org/10.3390/app15148065

APA Style

Dürr, C., & Gühring, G. S. (2025). A Combined Approach of Heat Map Confusion and Local Differential Privacy for the Anonymization of Mobility Data. Applied Sciences, 15(14), 8065. https://doi.org/10.3390/app15148065

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Combined Approach of Heat Map Confusion and Local Differential Privacy for the Anonymization of Mobility Data

Abstract

1. Introduction

2. Heat Map Confusion

3. Local Differential Privacy

4. Combination of HMC and LDPTrace

5. Aggregate Privacy Attack

6. Experimental Evaluation

6.1. Dataset

6.2. Evaluation with Aggregate Privacy Attack

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI