MAC Address Anonymization for Crowd Counting

Research has shown that counting WiFi packets called probe requests (PRs) implicitly provides a proxy for the number of people in an area. In this paper, we discuss a crowd counting system involving WiFi sensors detecting PRs over the air, then extracting and anonymizing their media access control (MAC) addresses using a hash-based approach. This paper discusses an anonymization procedure and shows time-synchronization inaccuracies among sensors and hashing collision rates to be low enough to prevent anonymization from interfering with counting algorithms. In particular, we derive an approximation of the collision rate of uniformly distributed identifiers, with analytical error bounds.


Introduction
Many an event organizer deals with crowd monitoring and management [1].Recently, works from different teams proposed crowd counting systems using WiFi signals [2][3][4].These works describe counting systems detecting special control packets of the WiFi protocol: probe requests (PRs).Such packets are periodically transmitted by WiFi user terminals to detect nearby access points.Therefore, a PR-based counting system eludes the need for user cooperation and the need for an active WiFi connection from terminals to access points within range.
Typically, several WiFi sensors are deployed over the monitored area to detect PRs-and then extract and anonymize their media access control (MAC) addresses.Sensors finally timestamp anonymized PRs and transmit them to a central server processing them jointly.The number of distinct PRs acquired during a time frame of T seconds (with T = 60s in this paper) implicitly provides a rate of PR transmission, which is proportional (in average) to the number of attendees (as shown experimentally in [3] and theoretically in [5]).The proportionality between what is measured (the rate of PR transmission) and what is interesting to event organizers (the number of attendees) is referred to as the extrapolation factor in our previous works and is determined experimentally.
Only in circumstances where occupation varies significantly in 60 seconds is our system less accurate (because it averages probe requests over a time frame of one minute, thereby smoothing any occupancy change occurring over a one minute time frame).However, both indoor and outdoor measurements in [3,5] indicate it does not seem common for events or buildings hosting at least a few hundreds of individuals, probably because they enter and leave monitored areas at different times and also because entry and exit points have limited flow capacity.

NTP servers
Central server H T T P S w it h T L S Figure 1: Scheme of the PR sensing procedure.Three WiFi sensors with overlapping ranges detect WiFi probe requests emitted by the smartphones of individuals.The shaded ellipses and the associated cones depict sensor detection ranges.Each sensor uses HTTPS links to periodically retrieve server peppers from the central server and uses another HTTPS link to upload anonymized PRs.Time synchronization is achieved by calibration with NTP servers.Communication links are depicted for only one sensor, to avoid clutter.
1.1 A short description of the monitoring system architecture Figure 1 depicts the experimentally validated counting scheme in [3,4].Sensors (three in Figure 1) monitor an area and, because their effective detection range is not known precisely (it depends on the propagation environment and decreases as the density of people increases because of body-induced attenuation), they are usually installed densely enough to make detection ranges overlap.Data transfers between sensors and the central server are secured using hypertext transfer protocol secure (HTTPS) connections (with transport layer security (TLS)) so that the traffic is encrypted and the identity of the central server is verified-the latter preventing man-in-the-middle attacks.Sensors synchronize theirs clocks using network time protocol (NTP) servers.

Collected data and the anonymization procedure
As depicted in Figure 2, sensors extract three key data from each PR: i) a timestamp (whose precision is of one second), ii) a received signal strength indicator (RSSI) in dBm and iii) a source address (SA) (MAC address).Although some smartphones randomize the SAs embedded in PRs, it is not guaranteed and we want user tracking to remain impossible, even without terminalside SA randomization.Thus, we transform the original SA into an SA identifier, which is its anonymous counterpart.
To generate an SA identifier from an SA, we use a SHA-256 hash function in conjunction with a pepper and truncate its ouput to 64 bits.With {0, 1} γ denoting the set of all binary sequences of γ bits, our anonymization function is h : X → {0, 1} 64 , which is a truncated SHA-256 hash function whose inputs are 48-bit SAs (X = {0, 1} 48 ).Note that generating SA identifiers of 64 bits is advantageous as such binary sequences can be easily stored as long integers in most databases (e.g., using the standard SQL BIGINT data type).
We prepend a time-varying pepper to every MAC address before hashing it.With || denoting the concatenation operation, and mac address and global pepper representing respectively the MAC address (i.e., the SA) to be anonymized and the pepper prepended, h(global pepper||mac address) generates the SA identifier.The pepper consists in a concatenation of a fixed 128-bit sensor pepper and a time-varying 128-bit server pepper.The central server maintains an up-to-date array of 20 server peppers for a duration of 20 minutes that sensors periodically fetch using an HTTPS link with transport security layer (TLS).Sensors use each server pepper for a specific one-minute time frame.Server peppers are generated using a pseudo random number generator (PRNG) (e.g., /dev/urandom or /dev/random on Linux).If this PRNG is deemed not secure (see [6]), hardware PRNG generators are alternatives [7,8].
The server and the sensors delete server peppers once they become outdated-in particular, the sensors erase the volatile memory chunk storing server peppers before updating it with new peppers periodically retrieved from the server.
The fixed sensor pepper forms a last line of defense in case the server peppers get compromised.It is written in a file or in the codebase of the sniffer, and it is never stored on the server.We proposed a fixed sensor pepper but storing pregenerated sensor peppers for time frames of one minute is possible too; it would represent 42 MB of data for five years.
Loosely speaking, the time-varying and eventually forgotten pepper has a high entropy and breaking the anonymization scheme is about finding out its value for all one-minute time frames of interest.As we show thereafter, this procedure is not computationally tractable.We also explain why SA identifiers generated using different peppers cannot be compared against one another, thereby precluding user tracking.Moreover, despite the data distortion that our anonymization procedure entails, we also demonstrate that it does not affect in any significant way the output of our counting method (a procedure that we explained in more details in [3,5]).Intuitively, anonymization cannot affect counting if the SA identifier of any SA is identical across all sensors at (almost) all time instants.
It is also possible for h : X → {0, 1} 64 to output (random) tokens, instead of being a truncated cryptographic hash function.In this case, the outputted tokens are truly uniformly distributed in the space {0, 1} 64 .The associated tokens should be kept in volatile memory (as well as the corresponding inputs) for a given anonymization window but can be wiped out once a new anonymization window begins.This random-token approach is typically well suited to a central and final anonymization round.It would not be practical to carry it out on a distributed network because all nodes should then agree on a mapping from input SAs to tokens in real time.

Contributions
Previous sections review the crowd monitoring system used in [3,4] for forecasting purposes and presented in [5] (with fewer details about anonymization than in this manuscript).This paper discusses the strength of our anonymization procedure and the effect of time synchronization inaccuracies on it.Besides the proposal of the anonymization process, our contributions also include the demonstration that our system satisfies the following four requirements: 1.It is computationally intractable to recover the original MAC addresses from the anonymous identifiers our system generates.
2. Anonymous identifiers from two distinct one-minute time frames cannot be compared against one another, which entails the impossibility to track individuals over time.
3. The proportion of time instants during which two sensors of our system could generate distinct anonymous identifiers for the same MAC address is negligible.
4. Assuming WiFi devices in an area generate 10 7 distinct MAC addresses within one minute in a monitored area, the collision rate of our anonymization procedure is lower than 10 −9 .The value of 10 7 distinct MAC addresses corresponds roughly to an event of a few million people, which is comparable to or higher than the number of attendees of the vast majority of public events in the world.
Requirements 1) and 2) guarantee privacy, in that the original MAC addresses of devices cannot be recovered and also because tracking individuals is impossible.Requirements 3) and 4) enable the central server to compute accurate attendee counts.Should Requirement 3) not be met, sensors would too often return different SA identifiers for identical devices simultaneously detected (because of overlapping detection ranges), thereby inducing a positive counting bias.Requirement 4) ensures a negligible probability of two devices being identified as a single one (which would imply a negative counting bias).
Proving our system meets Requirement 4) is overwhelmingly a mathematical effort that is based on mathematical approximations of the collision rate of hash functions.This is the most complex result to derive in this paper and, due to its general nature, the theorem approximating the collision rate could be of interest to researchers pursuing other endeavors than the design of a crowd counting system.

Comparison with the state of the art
The authors of [9, Sec.5] succinctly mentioned using random binary sequences appended to the MAC addresses prior to hashing (or to replace MAC addresses with tokens, more specifically, universally unique Identifiers (UUIDs) [10]).Our anonymization scheme uses a similar idea, except that we prepend random sequences a central server partially generates and then shares with time-synchronized sensors.Each sequence is used simultaneously by all our sensors during one minute, a time after which the server and the sensors erase it.Thus, brute force attacks consist in recovering a pepper of high entropy instead of hashed MAC addresses, whose entropy is too low to withstand such attacks [9,11,12].We also split peppers into two parts (which [9] does not propose), with one unknown to the server.
In [13], the authors develop a system similar to ours but for road traffic monitoring.Their anonymization scheme [13, Sec.VI] relies on a truncation of the MAC address prior to hashing, whereas we rely on time-varying peppers of sufficiently high entropy to ensure anonymity and prevent brute-force attacks.Based on their experiments, it is unclear whether their anonymity scheme based on MAC address truncation would yield unacceptably high collision rates for large-scale crowds.
The very recent work [14] presents research similar to ours.[14] derives the collision rate we present in Theorem 1 [14,Sec. 4.2] and also justifies the interest of such a derivation within the framework of WiFi and Bluetooth signal detection for crowd counting.They also validate Theorem 1 numerically for a number of MAC addresses lower than 2 10 5 and for a number of output bits after hashing of up to 24 bits [14,Sec. 1].Thanks to our precise approximation of the collision rate (see Theorem 2), we can handle vastly higher values (e.g., a number of output bits of 64 bits and 10 7 MAC addresses).Moreover, our method is based on secret peppers that are forgotten and that are split in two parts: one stored on sensors and the other stored on a central server, so that if either the sensors or central server are comprised, anonymity still holds (see Section 2.1).The time-varying nature of our peppers also makes it impossible to track individuals (see Section 2.2).We also discuss the impact of typical time synchronization errors on modern networks and find them to have no significant impact on the counting process we used in [3,4] (see Section 2.3).Finally, we point out that our (novel) approximation of the collision rate (and its analytical error bounds) are non-trivial mathematical results to derive (see Section 2.3 and the Appendix).
Other related works on crowd counting using WiFi probe requests are [15] and [16].In particular, [15] discusses smartphone-executed MAC address randomization and its impact on crowd counting algorithms.The authors also propose a method for generating fingerprints that allow them to track individuals whose smartphones emit PRs (a possibility that our system precludes on purpose for privacy reasons).The work [16] deals with user positioning, especially in indoor environments and for non-dense crowds.They notably improve positioning accuracy by leveraging signal strength indicators.

Outline
Section 1 has detailed the way our system works, with Section 1.4 comparing our results against the state of the art.Then, Section 2 shows that our four requirements are met.Finally, Section 3 is the conclusion.The Appendix contains mathematical proofs.

Results
We now turn to our contribution: proving our four requirements are met by the already existing crowd counting system presented in [3][4][5].We insist again that these results are new and not detailed in [3][4][5].
The data collection process is a means to an end: make it possible to count the number of people visiting an area while ensuring their privacy.In other words, the need for satisfying the four requirements is about ensuring two properties: privacy and accurate counting.The first two requirements address the former: how to ensure the privacy of users is preserved and tracking them (even anonymously) is impossible?The penultimate and last requirements deal with the second property: how to ensure that our privacy-enhancing data distortion does not affect counting accuracy?The next subsections detail our four requirements and show how our system satisfies them.
2.1 Requirement 1: impossibility to recover the original SA from SA identifiers Cryptographic hash functions like SHA-256 cannot be directly reversed-in practice, reversing consists in trying inputs until finding one whose hash is the output to be reversed.It is possible for an attacker to know the input MAC address of a particular entry in the list of anonymized PRs; for example, an attacker may go near sensors and send fake PRs with precise timing patterns that make it easy to identify them.In this case, brute forcing the pepper entails testing many of the 256-bit sequences that exist (on average, half of them should be tested).Attackers usually perform this operation using graphical processing units (GPUs), field-programmable gate arrays (FPGAs), or, if they have large resources, application-specific integrated circuits (ASICs).Let us examine if this attack is feasible with GPUs.
For example, 1 million Nvidia RTX 2080 SUPER Founders Edition graphics cards can compute roughly 5700 SHA-256 TeraHashes per second [17]-this implies that testing all 256bit peppers (approximately 1.16 10 65 TeraHashes) takes 2.04 10 61 seconds, i.e., 6.47 10 53 years.Should one of the two 128-bit peppers be known to an attacker, testing all 128-bit sequences still takes roughly 1.90 10 15 years.We point out that relying on a regular SHA-256 hash function without peppers is not safe (see [9,12] and [11, Sec.VI]) as the entropy of MAC addresses is too low to resist brute force attacks.We also highlight that using computationally intensive hashes like bcrypt [18] and Argon2 [19] would imply unreasonable computational requirements for sensors (see also [9,Sec. 5]).

Requirement 2: preventing tracking for more than one minute
This requirement is linked to server peppers being updated between consecutive time frames of one minute.In particular, the avalanche effect of SHA-256 hash functions makes hashing with different peppers return incomparable SA identifiers for any fixed MAC address.(The avalanche effect of cryptographic hash functions is the fact that minor changes in the input significantly change the hash.)

Requirement 3: peppers are identical across all sensors at a given time instant
This requirement depends on the accuracy of time synchronization.We propose to use network time protocol (NTP), which implies accurate time synchronization on low-latency networks (e.g., 4G networks, with timing errors lower than 10 ms [20]).There could be synchronization-related mismatches at the frontiers of consecutive one-minute time frames but only for 20 ms/60000 ms = 0.033 % of their duration.Assuming probe request transmission times are uniformly distributed in time, this figure translates into having on average 0.033 % of all PRs being anonymized by different peppers on the sensors.
2.4 Requirement 4: a collision rate of less than 10 −9 for 10 7 MAC addresses We now derive estimates of the collision rate of truncated hash functions.The first part of this section is mathematical while the second part leverages the results of the first one to show the collision rate achieved by our system to be negligible for up to 10 million SAs.

Mathematical foundations
Variable m denotes a number of possible outputs, such that log 2 (m) ∈ N, and {0, 1} γ denotes the set of all binary sequences of γ bits.We consider a function h : X → {0, 1} log 2 (m) (with n := card(X )).Hereafter, h is a hash function, whose output is approximately uniformly distributed in {0, 1} log 2 (m) [21, Sec.9.7.1].It could also be a token generator, in which case the uniform distribution assumption is exactly satisfied.We follow the standard terminology in the study of hash tables and refer to m and n as the number of buckets and the number of inserts, respectively.Similarly, α := n/m is called the load factor.Finally, Y (n,m) denotes the (random) number of collisions when inserting n values into m buckets (with the uniform distribution assumption).Theorem 1 provides an exact-yet numerically unstable-formula of E Y (n,m) .Theorem 1.For n inserts into m buckets, the collision rate, where the uniform distribution assumption has been used.
Proof.See the Appendix.
As shown in Figure 3, (1) suffers from numerical instabilities for sufficiently low values of the load factor.Therefore, for systems whose load factors are too low for (1) to provide accurate estimates, approximations are needed.In particular, to ensure such approximations are accurate enough, they should have proven analytical error bounds.Theorem 2 proposes three approximations of E[Y (n,m) ]/n, with proven error bounds.Only the penultimate and last inequalities of Theorem 2 are numerically stable.Theorem 2. For a degree of approximation K ≥ 2, a number of inserts n ≥ 2, and a load factor α ≤ 1, there exist error terms δ(α, n) and

The interpretation of Theorem 2
Theorem 2 approximates the exact value of the collision rate that Theorem 1 provides.Equation (2) yields a first approximation that is not numerically stable for sufficiently low values of α (a figure similar to Figure 3 can be easily generated for (2) but has been omitted for the sake of brevity).Equation (3) provides a numerically stable approximation whose precision is controlled through K, hence the name "degree of approximation".
We point out that approximation errors are negligible for our choice of parameters.Our load factor α ≃ 10 −12 implies (for any The conclusion is that our estimate of the collision rate expectation is approximately equal to 10 −12.5 , with an error upper bounded by 5 10 −20 + 10 −24 ≃ 5 10 −20 , so that Requirement 4 is met.

Concentration inequality for the collision rate
While it is interesting to upper bound the expectation of the collision rate, Y (n,m) /n, upper bounding the probability that it exceeds some threshold is also a worthy endeavor.We propose such a (coarse) inequality.Because Y (n,m) /n ≥ 0, we can apply Markov's inequality: Using Theorem 2 with K = 2, we only know that E Y (n,m) /n = α/2 + δ(α, n) + R 1 (α) where δ(α, n) ≤ 0 and R 1 (α) ≤ α 2 /6.Therefore, we can only use the slightly more pessimistic concentration inequality that is where the term α 2 /6 is negligible in comparison to α/2 for α sufficiently low (e.g., α ≤ 10 −3 ).For example, let us consider again the previous calculation of Section 2.4.3 (with n = 10 7 MAC addresses, m = 2 64 and α = 10 which shows that, with probability 99.968 %, the collision rate of our counting system does not exceed 10 −9 .Markov's inequality is coarse (and it may be possible to improve our result using a more sophisticated inequality) but, within the context of upper bounding the collision rate of our crowd counting system for large crowds, that inequality is sufficient to prove its collision rate does not exceed 10 −9 with high probability for large crowds (10 7 MAC addresses per minute).

Validating requirement 4 experimentally
An interesting future work endeavor would be to validate requirement 4 experimentally and to evaluate how sharp the inequalities we obtained are.In particular, an interesting question is to determine to what the extent truncated SHA-256 hashes are close to being randomly distributed and how a discrepancy from uniformity translates into higher collision rates in our particular application.A conceptually simple analysis of this question could be carried out by generating a statistically significant number of random peppers and, for each pepper, to generate at least 10 14 random SAs to evaluate the empirical collision rate (which we know should be around 10 −12.54 according to Figure 4, which explains why generating at least 10 14 SAs is statistically sound).Recent simulation results related to this approach are available in [14,Sec. 5].
Unfortunately, rigorously validating the collision rate experimentally using datasets of true SAs would require to monitor events gathering millions of individuals.Moreover, it would be impossible to know exactly how many people carry smartphones and when each smartphones send PRs.As a result, we propose a slightly weaker variant (that still requires significant efforts).First of all, one needs to identify randomization and PR emission patterns from modern smartphones in a controlled laboratory environment (or use existing results on typical PR generation processes in the literature, see [15,Fig. 1]).This is equivalent to building a statistical distribution that accurately depicts the random process of modern smartphones generating PRs.Then, the methodology of the previous paragraph can be used with this distribution instead of a uniform one for SAs.The difficulties here mainly are about identifying PR transmission patterns for an extensive set of modern smartphones as well as evaluating what is the market share of each smartphone that is tested.

Conclusion
Within the framework of WiFi-based crowd counting, this paper proposes an anonymization scheme for collected MAC addresses.This anonymization scheme is endowed with four desirable properties.First, it makes the recovery of original MAC addresses computationally intractable.Second, it precludes tracking capabilities.Third, it works properly as long as timing synchronization errors between nodes collecting MAC addresses is of the order of 10 ms, which is typically easy to attain on modern cellular networks.Fourth, it achieves a negligible collision rate between MAC addresses.This last point is supported by ample theoretical evidence.Although this paper is motivated by crowd counting applications, the methods and mathematical results could be of interest in other domains.

A Proofs
In what follows, ∥x∥ 2 denotes the ℓ 2 -norm of vector x.The notation (a k ) 1≤k≤K is equivalent to the vector (a 1 , a 2 , . . ., a K ) of size K.

A.1 Proof of Theorem 1
Let p j denote the probability that the jth (1 ≤ j ≤ m) bucket be empty after n inserts.All inserts have equal probabilities to fall within each bucket and whether an insert ends up in one bucket is independent of which buckets are already occupied.As a result, we have p j = ((m − 1)/m) n .Indeed, for the jth bucket to be unoccupied, all n inserts should end up in any of the other m − 1 buckets and, for each insert, there is a probability (m − 1)/m that it ends up in any bucket except the jth one.The expectation of the number of empty buckets after n inserts is equal to where A j = 1 if the jth bucket is empty and equals 0 otherwise.Hence, the expectation of the number of occupied buckets is m − m((m − 1)/m) n .Without any collision after n inserts, there are exactly n distinct occupied buckets.However, with n l < n distinct occupied buckets, there are n − n l collisions.As the number of collisions is equal to n − "number of occupied bucket" the average number of collisions is n − m(1 − ((m − 1)/m) n ) and the proof is complete.

A.2 Lemmas for Theorem 2
To prove Theorem 2, we shall first derive two lemmas.Lemma A1 quantifies to what extent (1 − α/n) n is a good approximation of exp(−α).
Lemma A1.For n ≥ 1 and α < n, where Proof.For 0 ≤ α/n < 1, using the Maclaurin series of log(1 where we have used The sum in f (K) (α, n) is the inner product between vectors ((α/n) k ) 1≤k≤K and (1/(k + 1)) 1≤k≤K .Cauchy-Schwarz inequality yields: We have, using an asymptotic expression for geometric series, Moreover, where ζ(2) is Riemann zeta function evaluated at 2, which is equal to π 2 /6.Therefore, we may use the upper bound It is also easy to notice that ∞ k=1 (α/n) k /(k + 1) ≥ 0 given that all the terms of the sum are positive.
We now turn to a lemma focusing on the accuracy of a polynomial approximation of α −1 (1− exp(−α)).

Figure 3 :
Figure 3: Numerically computed value of log 10 (E Y (n,m) /n) (using (1)) in Matlab R2019a as a function of the number of inserts n and the number of buckets m.With log 10 (n) ≥ 3, numerical instabilities appear for values of log 10 (m) as low as 9.

Figure 4 :
Figure 4: Levels sets of the approximation (3) of the collision rate as a function of the number of inserts n and the number of buckets m.