Preserving Geo-Indistinguishability of the Emergency Scene to Predict Ambulance Response Time

Emergency medical services (EMS) provide crucial emergency assistance and ambula1 tory services. One key measurement of EMS’s quality of service is their ambulances’ response 2 time (ART), which generally refers to the period between EMS notification and the moment an am3 bulance arrives on the scene. Due to many victims require care within adequate time (e.g., cardiac 4 arrest), improving ARTs is vital. This paper proposes to predict ARTs using machine learning (ML) 5 techniques, which could be used as a decision-support system by EMS to allow a dynamic selection 6 of ambulance dispatch centers. However, one well-known predictor of ART is the location of 7 the emergency (e.g., if it is urban or rural areas), which is sensitive data because it can reveal who 8 received care and for which reason. Thus, we considered the ‘input perturbation’ setting in the 9 privacy-preserving ML literature, which allows EMS to sanitize each location data independently 10 and, hence, ML models are trained only with sanitized data. In this paper, geo-indistinguishability 11 was applied to sanitize each emergency location data, which is a state-of-the-art formal notion 12 based on differential privacy. To validate our proposals, we used retrospective data of an EMS 13 in France, namely, Departmental Fire and Rescue Service of Doubs, and publicly available data 14 (e.g., weather and traffic data). As shown in the results, the sanitization of location data and the 15 perturbation of its associated features (e.g., city, distance) had no considerable impact on predicting 16 ARTs. With these findings, EMSs may prefer using and/or sharing sanitized datasets to avoid 17 possible data leakages, membership inference attacks, or data reconstructions, for example. 18

and/or share sanitized data with trusted third parties to train and develop ML-based 91 decision support systems. 92 To summarize, this paper proposes the following contributions: Recognize the most influential variables when building accurately ML-based mod-94 els to predict ART. This would allow other EMS to collect these variables and 95 recreate our methodology or develop their own considering their policies.

96
• Evaluate the effectiveness of several values of (i.e., the privacy budget), to sani-97 tize emergency location data with GI and train ML-based models to predict ART. 98 To the author's knowledge, this is the first work to assess the impact of geo-99 indistinguishability on sanitizing the location of emergency scenes when training 100 ML models for such an important task. While predicting ART is a means to allow 101 EMS to save more lives, we notice that it is also possible to do so while preserving 102 the victims' privacy.

103
Outline: The remainder of this paper is organized as follows. In Section 2, we describe 104 the material and methods used in this work, i.e., the geo-indistinguishability privacy 105 notion that we are considering, the data presentation (context, collection, and analysis), 106 the sanitization of emergency scenes with GI, the ML models, and the experimental 107 setup. In Section 3, we present the results of our experiments and our discussion. Lastly, 108 in Section 4, we present the concluding remarks and future directions.

110
In this section, we revise the notion of privacy considered in this paper, namely, Differential privacy [15] has been accepted as the de facto standard for data privacy.
118 DP was developed in the area of statistical databases but it is now applied to several 119 fields. Furthermore, DP has also been extended to a local model (a.k.a. LDP [23]) in 120 which users sanitize their data before sending it to the server. While DP is well-suited to 121 the case of trusted curators, with LDP, users do not need to trust the curator.

122
Geo-indistinguishability [14] is based on a generalization of DP developed in [26] 123 and has been proposed for preserving location privacy without the need of a trusted 124 curator (e.g., a malicious location-based service -LBSs). A mechanism satisfies -GI 125 if for any two locations x 1 and x 2 within a radius r, the output y of them is ( , r)-geo- 126 indistinguishable if we have:
Intuitively, this means that for any point x 2 within a radius r from x 1 , GI forces the 128 corresponding distributions to be at most l = r distant. In other words, the level of pseudocode of the polar Laplace mechanism in the continuous plane. More specifically, 139 the noise is drawn by first transforming the true location x to polar coordinates. Then, 140 the angle θ is drawn randomly between [0, 2π) (line 3), and the distance r is drawn from The process of an intervention is briefly described in the following. First, an 153 emergency call is received and treated by an operator. Next, the adequate crew/engine 154 is notified (t 1 ). Once the sufficient armament is gathered, the ambulance goes to the 155 emergency scene (t 2 ). Upon arriving on-scene, the crew uses a mechanical system to 156 report their arrival (t 3 ). We focus on the ART period, which is calculated as: ART = 157 t 3 − t 1 .

158
The operation process to decide the adequate SDIS 25 center to attend the interven-159 tion depends on the exact location of the intervention. As stated previously, there is a city, 160 a district, and a zone that jointly define a list of priority centers, which are responsible  in an emergency dataset with approximate locations, this may indicate an urgency for 256 someone who drowned in the river, for example. 257 We used five different levels for the privacy budget = l/r, where l is the privacy 258 level we want within a radius r.   Therefore, the center attribute was not 'perturbed'.

273
To show the impact of the noise added to the Location attribute,  From Table 3, one can notice that many features are perturbed due to sanitization 282 of emergency's location with GI. With high levels of (i.e., less private), the city and  LGBM, MLP, and LASSO) were built to predict ART on each month of 2020 considering 332 the sanitized datasets with different levels of -GI location data (cf. Table 2). In addition,

356
In this section, we present the results of our experimental validation (Subsection 357 3.1) and a general discussion (Subsection 3.2) including related work and limitations.     predicting ARTs for the San Francisco fire department, which closely relates to this paper.

400
The authors processed about 4.5 million EMS calls considering original raw location 401 data to predict ART using four ML models, namely linear regression, linear regression 402 with elastic net regularization, decision tree regression, and random forest. However, no 403 privacy-preserving experiment was performed because the main objective of their paper 404 was proposing a scalable, ML-based, and real-time system for predicting ART. Besides, 405 we also included weather data that the authors in [40] did not consider in their system, 406 which could help to recognize high ARTs due to bad weather conditions, for example.
operator may acquire some personal data about the victim, this is not an operational 420 requirement and, hence, we did not use this information too. This way, we focused our