BLE-GSpeed: A New BLE-Based Dataset to Estimate User Gait Speed

To estimate the user gait speed can be crucial in many topics, such as health care systems, since the presence of difficulties in walking is a core indicator of health and function in aging and disease. Methods for non-invasive and continuous assessment of the gait speed may be key to enable early detection of cognitive diseases such as dementia or Alzheimer’s disease. Wearable technologies can provide innovative solutions for healthcare problems. Bluetooth Low Energy (BLE) technology is excellent for wearables because it is very energy efficient, secure, and inexpensive. In this paper, the BLE-GSpeed database is presented. The dataset is composed of several BLE RSSI measurements obtained while users were walking at a constant speed along a corridor. Moreover, a set of experiments using a baseline algorithm to estimate the gait speed are also presented to provide baseline results to the research community.


Introduction
Over the next 40 years, the percentage of people aged 60 and older is expected to rise from 10% to 22% of the total population [1]. This issue will pose a challenge for health care systems, especially considering that older people have more health-related issues and long-term care needs than the rest of society. In this context, cognitive decline and dementia are predicted to double their number of cases every 20 years globally. Health systems have not been oriented toward these needs and may have difficulties responding to the new demographic reality and the associated changes in population health. The presence of walking difficulties is a core indicator of health and function in aging and disease [2]. Gait control is problematic since it integrates multiple systems, including motor, perceptual, and cognitive processes. As any dysfunction in these systems leads to gait slowing, gait speed (GS) is a commonly used parameter in health care research. Because walking speed is a quick, reliable, sensitive, and easy measurement to perform [3][4][5][6], it is often included in clinical and epidemiological research studies [7][8][9], as a consistent risk factor for adverse outcomes in community-dwelling older people. In addition to sex and age, it is used to monitor older adults' functional capacity and forecast their age-related decline rate. The accuracy of predictions based on these three factors is generally similar to more elaborate models requiring many other health-related factors [10]. Several studies have confirmed an association linking gait speed with many significant health-related outcomes, including hospitalization, falls, nursing home placement, mobility disability, and cognitive diseases

Related Work
The publication of databases has become very common in some research fields, especially when researchers cannot reproduce the experimental developments. Alongside the results, detailed information on the experiment characteristics and the test environment is vital in the databases' reproducibility and validation. In another context, where data is gathered from observation of real activities, databases could help understand and predict people's behavior. For example, in the post-COVID-19 era, databases can be used to develop tracking and location-aware systems to control the epidemic [22]. These databases are accompanied by the development of standardization formats and specialized tools like Zenodo and Kaggle [23], where the data are maintained for public access. Here, similar databases to the one presented in this work are described.
People's behavior in real complex environments has been studied using wearable BLE receivers carried by the users. BLE is a radio-frequency protocol communication, similar to Wi-Fi, and developed for short communication in the context of the Internet of Things (IoT). For example, Sikeridis et al. published the measurements from a set of Raspberry Pi boards measuring the RSSI from 46 students' wearables, smartwatches and smartphones, in a multi-floor university building during their daily activities during one-month [24]. Similarly, Tóth and Jamas [25] provide similar databases with measurements in similar environments from different radio-frequency wireless networks, such as BLE, Wi-Fi, and traditional Bluetooth. The same kind of experiments has been repeated in controlled environments where the data, or at least part of it, can be labeled for further analysis. Byrne et al. [26] present a database with data from multiple houses; in each one, a participant wore a custom wearable device with an accelerometer, BLE, and camera. A set of image codes distributed over the houses and recorded by the camera are used to label the measurements to the label de information. In [27] Iqba et al. propose a BLE passive fingerprinting system that can locate persons inside a medical building. The training and test data were published in a database [28]. This is becoming a common practice in fingerprinting positioning where the environment, methodology, and methods can change the same positioning algorithm's results. There are several fingerprinting databases using Wi-Fi RSSI measurements. The first database of this kind is the UJIIndoorLoc [29], which presents RSSI measures in a university environment taken by more than 20 users and 25 devices. Similar databases have been published for congress competitions in the development of indoor positioning systems [30,31].
Since the biggest smartphone software developers have limited the scanning times, BLE fingerprinting positioning has become a suitable alternative. Mendoza et al. [32] present an RSS BLE database with measurements from two different scenarios, a laboratory where multiple beacon configurations were tested, and an university library where data from various users and devices were collected. The database is accompanied by the necessary tools to load and work with it. Similarly, Aranda et al. [33] provide a BLE RSS database with beacons with multiple slots deployed in three different indoor/outdoor environments. The database can be found in the Zenodo repository [34], with a complete description of the experimental environment. Baronti et al. [35] repeat the same fingerprinting experiment with different configurations for active and passive fingerprinting, using the same real point in both cases. Moreover, they add data from experiments to explore the tracking and social interaction between users in the facility.
Some researchers have gathered gait and kinematic information from healthy patients. This information can be used for health-researchers as a baseline for compare their results when investigating specific illnesses. For example, Fukuchi et al. [36] developed a database with detailed kinematic information of 42 healthy volunteers; similarly, Schreiber et al. [37] performed a similar experiment measuring the GS in 50 free-injury participants. Custom high-precision inertial sensors have also been used in [38], focusing on the age difference, and in [39], with the focus on detecting changes in the gait mechanics. All these databases are focused on the health relation between GS and other parameters. To the best of the authors' knowledge, only one publicly available database focuses on evaluating technology for the GS measurement. Chapron et al. [40] combined a set of PIR sensors with a BLE indoor positioning system to detect the closest user to the PIR sensors.
With this solution, the system can automatically label the GS measurements to the correct user in a multi-resident environment. Similarly, the database presented in this work is also designed for GS evaluation, but using a completely new approach in this field.
Finally, more databases for evaluating different technologies in different environments are needed to find a standard solution for the GS measurement in the in-home environment. A summary with some of the most relevant available databases is presented in Table 1, where the references to the databases and their associated works can be found.

Data
The data collection process was performed in a hallway at the Universitat Jaume I in Castellón, Spain. Figure 1 shows a view of the deployment. A total of 20 beacons were mounted on the hallway's ceiling, with a separation of 30 cm between them. We used two different models alternatively, the iBKS 105 model [54] and the iBKS plus model [55], so the separation between two beacons of the same model was 60 cm.  To determine the user's actual speed we used five ultrasonic sensors HC-SR04 attached to the wall at the height of 0.7 m and with a separation of 3.5 m between them, covering a total distance of 14 m. The sensors were connected to an Arduino UNO board that recorded the readings and associated timestamps for each sensor during the experiments. This information was stored in real time in a laptop connected through the serial port to the Arduino board and synchronized through an NTP server to an absolute time frame of reference. On the other hand, the smartwatches used to capture the RSSI signals also use the server as a reference to synchronize their scan results with the user's position captured by the ultrasound sensors.
For a particular walk, the wall-mounted sensors provide the timestamps at which the user passed in front of them. For each pair of sensors (i and j), the speed of the user can be obtained as follows: where v i,j is the speed at which the user moved from sensor i to sensor j, d ij is the distance between sensors, and t i and t j are the timestamps at with the user passed in front of sensors i and j, respectively. Considering only pairs of consecutive sensors, we obtain four speed values for each walk. The resultant speed is calculated as the average of all four measurements, but only when their discrepancies are less than 5 cm/s, since we only want to keep those walks executed at a constant speed. A total of 13 subjects, 11 males and 2 females, aged between 18 and 55, performed several walks in both directions along the hallway. The subjects were instructed to keep their walking speed constant during the process. Each user completed several walks at different speeds, from very slow to fast.  To increase the volume of data recorded, users wore four smartwatches during the acquisition process, two on each wrist. The model used in the experiments was the Sony smartwatch 3, which runs Android as its operating system. An application installed in the smartwatches performed continuous Bluetooth scans, at the maximum frequency allowed by the operating system, and stored the results for later post-processing. Once all the data has been processed, each register of the resultant data set represents a scan result, and is composed of the following fields: Overall, the final data set contains a total of 382 walks. Figure 4a compares the distributions of the scanned RSSI values for each device, as well as the average number of RSSI values acquired per second. For all devices, the scanning rate oscillates between 40 and 50 results per second. During the acquisition process, one of the beacons started to malfunction and finally stopped working, so its data has not been considered. Taking into account that there were 19 functioning beacons, this represents that the advertising signal of a particular beacon is detected roughly 2.4 times per second by each device. Since the beacons were programmed to advertise at an interval of 100 + 0 − 10 m/s, the expected rate should be around 9 values per second. This discrepancy is due to the fact that the smartwatches' scanning rate cannot be directly set but is controlled by their operating system and designed to preserve battery life. With regards to the distribution of signals detected by each device, there are no significant differences. Median values for all devices are in the interval −70 dBm to −75 dBm. Some devices, especially a 650 but also 14 df, seem to have a broader range of detection, capturing RSSI values in the range −30 dBm to −105 dBm, while the remaining two devices only report RSSI values in the range −45 dBm to −105 dBm. With respect to the beacons' characteristics, Figure 4b shows the distribution of RSSI values segmented by beacon model, and the number of beacons of each model detected per second. Beacons of model iBKs plus are distinctly detected with a stronger RSSI value that the beacons of model iBKS 105. Besides their disparate emitting power, the shape of the distribution of the RSSI values is also significantly different. Both models are based on the same chipset (Nordic Semiconductors nrf51822), so the differences in performance might be due to different batteries (a 1000 mAh CR2477 coin cell battery for the 105 model, four AA alkaline batteries with a total capacity of 5000 mAh for the plus model) and/or their different casing and shape. BLE chipsets can demand a peak of 20 mA when transmitting. Coin cell batteries are greatly affected by large current draws [56], while alkaline batteries can handle larger currents. Figure 4c shows a comparison of the distribution of RSSI values for each beacon, as well as of their effective rate or advertisement, this is, for a given smartwatch, the number of scans per second in which the beacon is present. As shown in Figure 4b, the differences between advertising RSSI values coming from distinct models are clear and consistent. With respect to their advertising rates, even though all models have been set up with an advertising interval of 100 ms, only five out of ten model 105 beacons can reach a detection rate superior to 2.5 scans per second. On the other hand, all model plus beacons except one are detected at a rate greater than 2.5 scans per second. The detection rate segmented by smartwatch and beacon is consistent with Figure 4a, being 14 df the smartwatch with the highest scanning rate for all beacons.

Experiments
This section aims to describe a method to determine the user's speed gait using the presented dataset. It illustrates a simple case of use that can be easily replicated so that the results may serve as a future baseline for researchers in this topic. The methods used to achieve these results are straightforward, and the implementation has been made publicly available.

Gait Speed Determination
The RSSI value provided by radio-frequency modules represents an indication of the power strength of the transmitter signal perceived by the receiver node. The RSSI value received at a particular location can be modeled as a function of the logarithmic distance between the receiver and the emitter, plus some parameters related to the environment properties and the devices' characteristics [57]. This analytical model allows estimating the position of a device, the scanning node, knowing the received RSSI value data and the emitting node's position. The path loss model reflects the relationship between the signal strength and the distance to the emitter: where: • RSSI is the received signal strength at a distance d from the beacon. • RSSI 0 is the received signal strength at the reference distance (1 m) from the beacon. • d is the distance between the receiver and the beacon. • d 0 is the reference distance (1 m) • X g is a random variable with zero mean, reflecting the attenuation (in decibel) caused by fading, multipath effect, etc. • γ is the path loss exponent, whose value is normally in the range of 2 to 6. The actual value depends on environmental characteristics. Figure 5a shows the theoretical evolution of the RSSI signal when a user wearing a receiver device walks at a constant speed along a straight line which passes below a BLE beacon. We consider the receiver moving at a constant speed v in the presence of a ceiling-mounted emitting device (see Figure 5b), located at a distance d. Figure 5c shows a real example of RSSI data received from the emitter when the user follows the path at a constant speed. It also shows a path loss curve fitted to the data. Even though perceived RSSI signals are subject to noise due to interferences, multipath effects, signal fluctuations, overlapping channels and other environmental characteristics, the maximum value received during the walk is likely to happen when the receiver is in the nearest point with respect to the emitter, this is, when the user is passing beneath the beacon. To estimate the gait speed of the user, we performed the following process:

1.
For each walk, smartwatch and mac (beacon), we find the timestamp t at which the maximum RSSI value has been detected. We do so in two different ways; by looking at the raw data and applying a 13 point moving average and finding the maximum point in the smoothed version of the RSSI data. We tried different values for the window length, in the range between 3 and 25, obtaining the best results for a window length of 13 measurements.

2.
For two given beacons i, j, separated by a distance d ij , and with t i , t j being the estimated timestamps at which the user walked below them, the speed of the receiver v ij can be estimated as follows: 3.
In the general case, when there are more than two beacons installed, the speed can be estimated as the average of the values obtained for each pair. Given a set of k beacons, the speed of the device is calculated as follows:v where j ∈ [1, 2, ..., k − 1], i ∈ [j + 1, j + 2, ..., k], and n = ( k 2 ).

4.
The speed estimation obtained for each pair is only taken into account when it is comprised in the interval 0.2 <v < 1.8. This is not just because we want to consider only results that correspond to a feasible user speed, but also because the low scanning rate of the smartwatches may produce insufficient data to achieve a good estimation, and can generate artifact results that may not represent a proper approximation of the actual speed of the user.

Results
The results obtained for each smartwatch and beacon model are summarized in Figure 6. The plot shows the average error in m/s for the gait speed estimation for each smartwatch, for each beacon model, and for several beacons ranging from 2 to 10.
The first conclusion we draw from the results is that using raw data to estimate the gait speed provides a very uniform average error (see the blue continuous line in Figure 6), which is independent of the number of beacons considered. There is a slight improvement in accuracy when using three beacons instead of two, but adding more beacons does not significantly reduce the error, which remains around 0.2 m/s regardless of the beacon model or the smartwatch considered. On average, the results obtained when using beacons of model iBKS plus are slightly better than those obtained when using beacons of model iBKS 105, but in general, results are very similar. On the other hand, results look very different when using smoothed data to estimate the timestamp of the maximum RSSI value received during the walks. The green and discontinuous lines in Figure 6 show a distinctive behavior from the previous case. Now the error in the estimation is strongly dependent on the number of beacons considered. When using only two beacons, the error is always greater than when using raw data, but it decreases as the number of beacons increases. This is especially evident when considering only beacons of model iBKS 105. In this case, the error starts at around 0.3 m/s for 2 beacons and goes below 0.1 m/s for 10 beacons. The same behavior occurs for model iBKS plus, but with a less pronounced decrease of the error, which arrives at a value of around 0.15 m/s when we consider 9 beacons.
Mixing different models of beacons ( Figure 7) seems not to influence accuracy when using only raw data. When using smoothed data, results are consistently worse, and they do not improve substantially when the number of beacons increases. The fact that we obtain worse results when mixing beacon models may be due to their different characteristics regarding their RSSI distribution (Figure 4b). Tackling this would probably require a specific approach for each model with respect to the technique used to perform the data smoothing. Better results are obtained when we aggregate the data from all the smartwatches and consider beacons of model iBKS plus (Figure 8). In this case, the average error is around 0.17 m/s. Again, results obtained with smoothed data are generally worse than those obtained using raw data, except in cases where the number of beacons of the same model is large.  Table 2 shows a summary of the results obtained considering all the smartwatches. The best results are obtained when using 10 beacons of model iBKS 105 and performing a 13-point moving average smoothing on the raw data. In general, smoothing provides better results when using a large number of same-model beacons. On the contrary, when the number of beacons is limited, estimating the gait speed using the raw data is the best approach.

Conclusions
Intending to support the rising success of data-driven approaches in several fields, such as computer science, engineering, or healthcare, the availability of specialized datasets has become decisive. Although the volume of available data grows exponentially from an expanding diversity of data sources, getting curated data ready for application is a costly and time-consuming process.
In this work, we present a new Bluetooth Low Energy-based dataset for gait speed estimation. This is the first BLE-based dataset for gait speed estimation that is publicly available to the best of our knowledge,. The dataset was created by using a large number of devices and actors, trying to represent a wide variety of gait speeds and styles of walking. We explore the use of the dataset by analyzing its properties and showcasing some relevant results, with the purpose that this dataset may be useful to researchers and practitioners that may use it to experiment and test their algorithms.
Our current and future work aims to develop more automated and general ways of acquiring user's speed data, with an eye on more familiar environments such as homes.

Reproducibility
The code to reproduce all the plots and experiments described in this work is publicly available at https://github.com/esansano/ble-gspeed. Funding: This work was supported in part by the Spanish Ministry of Economy and Competitiveness through projects MICROCEBUS (RTI2018-095168-B-C53/C54), REPNIN+ (TEC2017-90808-REDT) and in part by the Regional Government of Extremadura and the European Regional Development Fund through project GR18038 and the Regional Government of Valencian Community by project AICO/2020/046.