Multi-Slot BLE Raw Database for Accurate Positioning in Mixed Indoor/Outdoor Environments

: The technologies and sensors embedded in smartphones have contributed to the spread of disruptive applications built on top of Location Based Services (LBSs). Among them, Bluetooth Low Energy (BLE) has been widely adopted for proximity and localization, as it is a simple but efﬁcient positioning technology. This article presents a database of received signal strength measurements (RSSIs) on BLE signals in a real positioning system. The system was deployed on two buildings belonging to the campus of the University of Extremadura in Badajoz. the database is divided into three different deployments, changing in each of them the number of measurement points and the conﬁguration of the BLE beacons. the beacons used in this work can broadcast up to six emission slots simultaneously. Fingerprinting positioning experiments are presented in this work using multiple slots, improving positioning accuracy when compared with the traditional single slot approach.


Introduction
The quick growth of wireless communication networks has fostered new uses beyond simple information exchange. Among others, wireless networks can be used as a backing for applications that serve a certain task according to the user's position. This kind of developments, known as Location-Based Services (LBSs), has grown exponentially in recent years because of its integration with smartphone devices [1]. The most extended technology for LBS is the Global Navigation Satellite System (GNSS), which can provide position accuracy within the order of meters [2], high accuracy may be obtained with, for instance, Precise Point Positioning and augmentation services [3]. Despite its world-wide range, GNSS loses accuracy in complex environments, such as indoor scenarios, because of the physical barriers that the signal must go through [4]. Withal, 87% of the daily time is spent on these environments [5], and therefore, an alternative is needed to provide position estimations for LBS in indoor areas. Many technologies have been tested to build Local Positioning Systems The remainder of this work is organized as follows. In Section 2 related BLE datasets are described. Section 3 explains the collection procedure and the test environment of the database introduced in this paper. Section 4 presents the database organization. Section 5 shows some of the examples on how to use the supplementary materials and the implementation of a basic positioning algorithm. Finally, a discussion and the main conclusions are presented in Section 6.

Related Works
There are currently some databases for indoor positioning purposes, mainly for Wi-Fi fingerprinting. For example, Laoudias et al. database [21] provides an extensive measurement campaign at the KIOS Research Center with five devices for 2D positioning, whereas Mendoza et al. database [22] provides measurements from various devices and with various collection campaigns, repeated over fifteen months within 2 floors of the library building. Regarding BLE, a systematic search criteria has been followed to include works with BLE RSSI positioning applications that have real and public accessible data. Some works are data descriptors, where the dataset itself is the main subject, while others pursue specific applications that give the experimental data some publicly.
A summary of the information of each dataset is shown in Table 1. Some considerations must be taken into account regarding the information provided about each set: • Active developments are defined as those in which the user carries the receiver. On the contrary, in passive systems the user carries a transmitter that is detected by the static receivers. • There are basically two ways of storing the data: raw and fingerprinting. Raw measurements correspond to the individual measurements of each transmitter-receiver pair in a specific time. Fingerprint vectors contain the measurements of all available transmitters in a time window. The fingerprint representation is less valuable as it looses some valuable information. e.g., if the same transmitter has been detected multiple times in the window, only one processed value (average, median, maximum or latest) is stored assigning just one timestamp to the full fingerprint vector. • Different works have different labels for the positions, x-y coordinates on a local reference system, room or zone ID and cell ID. As an exception, only in one work position information is presented as the distance between the transmitter and the receiver. In those databases where no deployment area is given this area has been estimated using the planes and figures attached to the database itself or the related academic paper. • Some works repeat the same experiment with different transmission configurations. In Table 1 these are presented as different subsets of the same database. This consideration is not taken for those works that use more than one receiver because training and test can be done with different devices. • the number of Reference Points (RP) is the number of sample points with a position label and an RSSI associated with it. The number of samples is the amount of independent units of data presented by a database regardless the format. • BLE beacons parameters are the transmission power, expressed in decibel-milliwatts (dBm) and the frequency, expressed as hertz (Hz). Some values can be found in the dataset descriptor or the related academic paper; while others are calculated directly from the databases themselves. • The collection procedure can be done statically, going through all the RP and taking measurements only when the user is over them or while moving through a path. This is indicated in the table in the collection procedure section. Databases that include both are indicated as partial in the table. Usually, for fingerprinting applications the training set is collected statically and the test dynamically. • The transmission power of the beacons is expressed in decibel-milliwatts (dBm) which is the ratio in decibels (dB) with reference to one milliwatt (mW).
Missing values in the table means that, to the best of the authors knowledge, that information was not given neither in the datasets nor in the data descriptors. In some works, the relevant information is not explicitly provided. The database must be loaded to determine the missing information, thus, the researchers must use its resources to load and work with the database to identify the missing descriptive data and the technical issues of each system. Mendoza et al. [23] give a multi-scenario BLE database for fingerprinting positioning. The data were collected in two different scenarios with some set-ups in each one. The first set is acquired in a laboratory in which the collection was repeated for different transmission powers. In the second scenario, a university library, only one transmission power is used but with three different smartphones. Data are given as RSSI fingerprinting vectors, with local coordinates as position labels. Apart from the database itself, a set of Matlab scripts are provided with the database. These scripts give an easy way to handle the database, with different loading and filtering options. The environmental information, which includes walls and obstacles, is attached in the form of Matlab plots. Moreover, some basic positioning algorithms and usage-examples are implemented. This database presents the data as a fingerprinting vector for each measurement in each reference point. Different measurements from the same beacon within a time window are averaged, thus, losing information about the RSSI experimental distribution. The database and the supporting materials can be downloaded from the Zenodo repository at [23].
Miskolc IIS Hybrid IPS [25] is a hybrid indoor positioning dataset with data from custom hardware with magnetometer, Wi-Fi and Bluetooth modules. Measurements were taken in in a multifloor university building using the ILONA system [26] only one device was part of the building infrastructure. On the other hand, all the detected Wi-Fi access points were part of the building itself or nearby facilities. Results were reported as a single CSV file in which each row was a fingerprinting vector, represented as null if the access point was not detected in a particular location. Each datum is labeled with both, absolute position in a local coordinates system and symbolic location. Beacons and Wi-Fi access point are unknown and the data descriptor does not specify if the Bluetooth measurements are from classic Bluetooth devices or BLE.
In [27] an indoor positioning and tracking system based on convolutional neural networks and image recognition is deployed and evaluated in a hospital. Each participant wore a Gimbal Series 10 beacon [38] detected by a set of Raspberry Pi 3 boards installed in the ceiling. RSSI measurements for each beacon are transformed into a B&W image, which is the input of their algorithm. The dataset is divided in two groups: (i) raw BLE RSSI measurements with timestamps, beacons ID and Raspberry ID, and (ii) the B&W images. Training and test data are presented in different files. The beacons were configured to emit with a power of +3 dBm at a rate of 10 Hz. The database is available in [28].
Byrne et al. [29], performed a series of experiments in several houses using different kinds of sensors. All the environments were marked with tags on the floor that were recognized by a camera embedded in a custom wearable device worn by the user. This custom device also integrated a tri-axial accelerometer and a BLE antenna. BLE advertising signals were used to send the accelerometer measures from the wearable to a set of Raspberry Pi 3 boards distributed in each house. At each Raspberry the BLE RSSI for each incoming package was recorded and transmitted to a server via Wi-Fi. The recorded video and its timestamp are used as ground truth of the systems. In total, the experiment was repeated in four houses with different configurations (rooms and floors), one user, and between 8 and 11 Raspberry Pi boards per house. Supporting materials are also provided with the database, like the layout for each floor and an script in python to load and work with the dataset. The database can be downloaded from [30].
Sikeridis et al. [31] describe a dataset of BLE measurements in a multifloor indoor environment. Several Raspberry PI 3 boards were distributed inside a university building to detect the signals of the Gimbal 10 Series worn by the users. In total, 46 participants carried the beacon during the month that the experiment lasted. The beacons were configured to emit at 1 Hz and with +1 dBm. The exact position of the Raspberry Pi boards are not given and no positioning label information is attached to the measurements. Each processed BLE package is individually stored as a vector containing the timestamp, RSSI, beacon ID and receiver ID. Apart from the metadata, no further material is provided, neither in the original work, nor in the repository [32] where the database can be downloaded.
In [15] Sadowski et al. the use of different wireless technologies is evaluated, including BLE, for indoor positioning. Two sets of experiments are presented in two different indoor scenarios. In each of them, two configurations are repeated with three beacons and three measurement points. The first set is focused on establishing the RSSI relationship with distance meanwhile the second aims to test each technology using trilateration as an indoor positioning algorithm. Three different Gimbal beacons were used as emitters and a Raspberry Pi 3 board was used as the receiver. RSSI in a set of text files and transmitter-receiver distances are given in the original work. The database itself can be found in [33].
Mohammadi et al. [34] present as a database the experimental tests carried out for their system based on a semi-supervised machine learning algorithm. A set of beacons was distributed using a regular grid in a university hall. A user with a smartphone and a custom application detected and stored the signals from each beacon. The environment was divided into cells following a regular grid, so that each cell could be identified from the indices that determine their row and column. The data is given as fingerprinted vectors with the missing values replaced by a constant value of −200 dBm. Besides, another set of measures, much more numerous, but not labeled, is also provided. These data are used for the development of the semi-supervised learning algorithm. Regardless of the tag, each one has, in addition to the RSSI of each beacon, the timestamp of the fingerprint. The iBeacon protocol was used although the specific hardware for the BLE beacons is not specified in the work.
Baronti et al. [36] present a complete BLE database that take into account different passive and active fingerprinting developments in a floor of a university building. The database is divided in six sections: a train set with static measurements; a test set with dynamic measurements, in which a user followed a path; and four different social scenarios in which different users interact in different ways. In each experiment the users worn two devices at the same time: a Rad Beacon Dot by Radius Networks [39] and an Honor 8 Smartphone with a custom Android application. On the the other a set of Raspberry Pi boards boards with the BLED112 USB dongle Bluetooth [40] is used as receiver (for the RadBeaconTags) and as transmitter (for the Android device). All the experiments were repeated for three transmission power configurations: −18 dBm, −6 dBm and 3 dBm for both the RadBeacon and the Raspberrys Pi boards. The ground truth was given separately using the cell ID of the measurement and the timestamp. BLE files contained the raw RSSI, a timestamp and the transmitter and receiver IDs.

Setup and Measurement Procedure
The beacons used in this work were the IBKs Plus from Accent System [41] with a Nordic Semiconductor nRF51822 chipset controller [42]. We chose them because they could simultaneously broadcast up to six different signals, with two individually configurable slots following iBeacon®protocol and four slots following Eddystone®protocol. The configurable parameters were the advertising period, the identification code, the emission power, and a calibration factor. The identification code was implemented differently in both protocols. iBeacon®had three distinct identification elements: the Unique Universal Identifier (UUID), a 128-bit hexadecimal code, and two unsigned 256-bit integer values known as major and minor numbers. Usually, the same UUID was used within all the beacons in a system and the major and minor are used to differentiate between elements. Eddystone®had different versions according to what kind of information is being broadcast. The simplest version used a Unique Identifier (UID), which is a unique 16-byte hexadecimal code, to distinguish between beacons. Other versions of the protocol were designed to transmit specific data, such as a Uniform Resource Locator (URL) or sensor and telemetry information, but all of them used the same UIDs for source identification. Aside, all slots within a beacon shared the same media access control address (MAC), which was defined by the BLE protocol. In this work, the UUID and the UID were used to characterize each slot following a custom-defined format. The emission period was set to 200 ms for all the slots, and the calibration was left with its default value. The transmission power was configured differently for each slot as shown in Table 2, but following the same pattern in all the beacons. Measurements were performed using three different smartphones with a custom developed application. This app uses the standard Android BLE application program interface (API) and does not incorporate any third-party libraries. The most aggressive scanning mode, called Low Latency, was used for all the experiments. Android allows two ways to handle BLE scanning results. In the first option, all the results received in a time window are delivered to the application layer (the top of BLE layer stack) at the same time, while in the second option each result is reported individually when detected. In this work, the second mode was used since the first one did not update measurements on the same device within the same time window, causing a loss of information.
All the experiments were performed in the buildings of the Physics and Mathematics departments of the Faculty of Science at the University of Extremadura, on its campus in Badajoz, Spain. Three different deployments were made at these locations, varying in each one the position as well as number of reference points, and the area covered by the system, as shown in Table 3. The first deployment is focused on the Physics department buildings. These were shaped by two cubic structures joined by a walkway on the first floor. The second deployment was an extension appending new RP outside the buildings and new levels in specific locations inside. The last deployment extended the scheme to cover the buildings of both the Physics and Mathematics departments, as well as the space in between. A scheme of this scenario is depicted in Figure 1, where subfigure (a) shows a 3D view of the environment used in the first two deployments with the beacon locations using a different color for each floor. Subfigures (b)-(c) shows the drawing of each floor with the beacons location in green and the sample points in blue. Beacon positions for the first two deployments were the same but the sample points depicted were the ones used in the second deployment. In Figure 1 a scheme of the environment used in the last deployment is shown following the same criteria Figure 2. In Figure 3 the region of each deployment is shown. Examples of the beacons in the indoor and outdoor sections are shown in Figures 4 and 5, respectively. For each deployment, a user moved through all the RP with a smartphone, starting the measurement once the user is on the reference point that is selected in the application. Measurements lasted 30 seconds and the user's orientation was changed to match four orientations (north, south, west, and east) during the collection procedure. The collection procedure could have been done with the user in motion instead of statically. This would have add a new dimension to the positioning system, taking into account tracking and navigation features that are out of the scope of this work. Three different smartphones were used: a BQ Aquaris, a Samsung Galaxy S6, and Xiaomi M8, hereinafter BQ, S6 and M8 respectively. A different user held each phone during the collection procedure. Measurements were done without following a particular order and during holiday time to ensure that the environment had low presence of people.

The BLE RSSI Database
The database and the supplementary material are available at [43]. The different deployments presented in the previous section are treated as independent databases, following the same structure for all of them. Each dataset is presented as a set of eight CSV files: RSSI values, positions, timestamps, identification codes, and training and test indexes. A scheme of each data set structure is shown in Figure 6. These files have the same number of rows, which is determined by the number of different measurements in each deployment.
The set of RSSI values is presented as a list of all the individual measurements as they were reported by the smartphone. These values are presented in the seti _rss.csv file. For some points and beacons there are a lot of measurements whiles for others there are only a few or even none at all. Raw RSSI values are reported as they were by the application developed for these experiments. In the situation that no signal was detected for a given beacon the value -110 dBm was assigned assigned, which is three units lower than the minimum RSSI detected.
The seti _tms.csv file contains the timestamps of each measurement. The timestamp is measured as milliseconds since the beginning of the first of January of 1970 in Greenwich Mean Time (GMT).
The positions, presented in the seti _crd.csv file, are the local coordinates of the measurement position. The origin of these coordinates was chosen to work always with positive values inside the building. These origins were the same for the first two deployments, but different for the last one.
The seti _ZoneId.csv file contains the identification label of each measurement for symbolic positioning applications. There are 13 of these labels, six for each building: three floors and left and right sections, and a label for points in the outdoor scenario. Each of this labels corresponds to a department in the buildings, for example, the first label is associated with the modern physics department, the second one with the electronics department and so on.
Two files with boolean values indicate if a measurement is either part of the training or the test subsets. These files are seti _idxRadioMap.csv and seti _idxEvaluation.csv An auxiliary code identifier is added to uniquely identify each measurement and to distinguish measurement in the same set from different phones. The code for each measurement has five numbers. The first one indicates the deployment, the second the phone, the third the RP and the last two the beacon and the emission slot. BQ, S6 and M8 phones are represented as 1, 2 and 3 respectively. Finally all the coordinates of the measurement points are contained in the seti _RpCrd.csv file.  Figure 6. CSV files in each data set.
The set of beacon configurations used in the experiment is presented in the beacons_config_.csv file. Each row of every file defines the configuration of each beacon. Within each beacon there are up to six emission slots whose configuration is described following the structure presented in Table 2. The transmission frequency was the same for all the slots and set to 5 Hz. Finally, the zoneId_def.csv file contains the definition of the zone tag used for symbolic positioning.

Analysis on the Datasets and Baselines
In addition to the database, which is the basis of this work, some examples of use are presented in this section. These are provided as Matlab scripts and contain all the functions needed to load, filter and manipulate the data. This section is intended to be a starting point to facilitate the use of this database for other researchers.
This database is devoted to evaluation of indoor positioning algorithms. For this reason, the implementation of a simple but widely adopted positioning algorithm is provided for the baseline analysis of the provided databases. Mathematically modeling signal propagation in indoor environments, especially for applications that require high-resolution positioning, is a very difficult task. Fingerprinting methods offer an alternative that avoids this problem. These methods can be sorted in three groups: deterministic, probabilistic and machine learning-based approaches. Deterministic methods assume a distance function to compare and sort the elements of the training set according to test measurements. Probabilistic algorithms use the distribution of the RSSI over the measurement points to assess the position [11,44,45]. This distribution can be used assuming a known statistical distribution, such as a Gaussian, or using the experimental distribution with a normalized histogram. The machine learning-based methods use a mathematical structure whose parameters are automatically determined in a training procedure in which the data is used. These methods include the Artificial Neural Networks [46,47], Support Vector Machines [48][49][50] and Decision Trees [51,52]. In the supporting material the implementation of the deterministic One Nearest Neighbor (NN) algorithm is provided as a base line to start using the database. The NN assigns a distance measurement to every RSSI fingerprint vector in the radio map using a metric function. Usually, the Euclidean distance is used but other alternatives can be implemented [16,53]. The coordinates or position labels associated with the element with the smallest distance is the output of the algorithm. A complex version of the algorithm takes the average of more than one element, k-NN, or a weighted value using some kind of weight function, Wk-NN. In classification tasks, such as Tag ID identification, the k-NN algorithm implements a voting scheme instead of coordinates average. To apply the NN algorithm, raw data must be processed in some way. Our approach to generate the fingerprint vectors from raw RSS data is included in the supporting material and consists of averaging all the RSSI values of a beacon in a single RP, thus, creating a fingerprinting RSSI vector for each reference point.
One of the features of this database is the use of more than one emission slot for each beacon, i.e., each hardware device is broadcasting different signals at the same time. The 1-NN algorithm is used as positioning algorithm considering all different slots as different beacons; therefore, the fingerprinting vector has length of 180 samples. For this indoor positioning application, the data have been divided between training and test subsets, using 20 % of the data for test and 20 % for training. The Cumulative Distribution Function (CDF) of the error in the test samples is shown in Figure 7 for each phone (BQ, S6 and MI 8) and each deployment set. Some statistical parameters of the error distribution, such as the mean value, the median (50−percentile), the 95− percentile and the floor detection rate are shown in Table 4 whose last column also shows the Tag ID recognition rate.
The experiment has been repeated using the same algorithm but considering only one slot per beacon. This reduces just to 30 the number of signals that contribute to the positioning. In this case, the first slot, the one with the highest power is selected from each beacon. Results are shown in Figure 8 and Table 5. it can be seen than using more slots does not reduce the error. Using the code attached to this database the same experiment can be repeated for slots with different emission powers. Besides, different positioning algorithms can be implemented using the supporting materials. Finally, the same experiment is repeated but using the average of all signals coming from the same beacon. In this average missing values contribute with the special value assigned to these situations, which is three units lower than the minimum RSSI detected. These results are shown in Figure 9 and Table 6.

Discussion and Conclusions
This paper presents a new BLE RSSI database for the research community. To establish its format, other works with databases based on the same technology have been studied. Two kind of works have been found in this review: those in which the database is the main subject, and those that provide the experimental data as supplementary material. The databases are scattered in different types of repositories, but the persistence and immobility of the data are not guaranteed in most of them. In some cases, platforms such as Zenodo are used, which assure the continuity of the data without changes. The information presented in each database is often incomplete, key elements for the use of databases such as beacons locations or environment information are sometimes ignored. In other works, some relevant information is not explicitly provided. The database must be loaded to determine the missing information, thus, the researchers must use its own resources to load and work with the database. All works explored in this review present experiments in indoor scenarios.
The database presented in this paper differs from the previous ones in that it uses more than one transmission slot for each BLE beacon. A total of six slots per beacon have been used, four following the Eddystone protocol and two following the iBeacon one. All the beacons have the same transmission configuration, where the transmission frequency is the same but the transmission power is different for each slot (from +4 to −30 dBm). Besides, this work incorporates an indoor-outdoor scenario and a complete model of the environment. This model can be used to study signal propagation in the environment and the contribution of each point in the fingerprinting algorithm. The database is divided into three subsets with different characteristics. Along with the database, a set of Matlab files are provided to load, filter, and work with the database. The data is presented using raw RSSI values, with the coordinates and time of each measurement.
The use of different emission slots for each beacon can be used in a fingerprinting system. In this work simple Nearest Neighbours algorithm (NN) has been implemented using different combinations: all the slots as independent beacons, only the slot with the highest transmission power, and the average of all slots from the same beacon, despite having different transmission powers for each slot. Results show that, for the nearest neighbor algorithm, using all the slots or just one per beacon gives similar results for accuracy, floor, and Tag ID recognition. Results using the average value increase accuracy 10 %, thus showing that the extra information provided by additional slots slots can be used to improve fingerprinting localization.
For the study and development of indoor positioning methods with this database, a baseline has been introduced. This baseline uses the NN algorithm with different combinations, showing that using the average value of all the slots within a beacon increase positioning accuracy. This baseline will provide a starting point to compare new methods for other researchers. Future works will focus on two directions. In the first one, we will work on how to combine information from different slots to improve positioning results. In the second one, we will contribute on the standardization of the evaluation procedure with new databases in order to create a framework, like in machine learning, that can be used to compare different environments and contexts. Funding: This work was supported in part by the Spanish Ministry of Economy and Competitiveness through projects MICROCEBUS (RTI2018-095168-B-C54), REPNIN+ (TEC2017-90808-REDT) and INSIGNIA (Programa Torres-Quevedo, PTQ2018-009981) and in part by the Regional Government of Extremadura and the European Regional Development Fund through project GR18038.