1. Introduction
The collection and analysis of historical and real-time data has become the cornerstone of the smart city [
1]. A common information feature that concerns many applications is the number of people in a specified location, since large crowds can cause congestion or even safety risks. In addition, from a service perspective, we can optimize the allocation of staff and resources based on forecasted and real-time occupation.
In recent years, the Mobility as a Service (MaaS) paradigm has received increased attention as a key strategy for achieving sustainable urban mobility. In [
2], Jittrapirom et al. present an overview of existing MaaS implementations and identify three core characteristics: demand modeling, supply-side analysis, and business model.
In such a MaaS framework, real-time crowd monitoring can be used in both the demand modeling and supply-side analysis. The demand model can use real-time crowd data for dynamically adjusting the model based on real-world demand or unexpected conditions. The supply side can leverage the real-time crowd data for detecting congestion and adjust supply dynamically, improving efficiency.
In [
3], the author identifies the supply-side challenges to implementing Sustainable MaaS (S-MaaS). The implementation is divided into four categories, one of them being the immaterial components. This includes real-time data-streams that are essential for effective decision-making and detecting unexpected conditions. Information on real-time system status, including platform occupation, can be used to monitor supply–demand interactions to optimize system efficiency [
4].
In this work, we present a dataset for crowd size estimation on public transportation platforms in two light rail environments.
The dataset shared in this work is collected using a Device-Free Wireless Sensing (DFWS) technique. This technique is based on a Wireless Sensor Network (WSN) of battery-powered sensor nodes placed inside and around the environment. Each sensor node in the network periodically transmits a message that is received by the other sensor nodes in the network. The Received Signal Strength Indicator (RSSI) values corresponding to the received messages are stored until the next transmission to be included as the data payload. An internet-connected gateway will receive all the messages in the network as well and will forward the data included in the payload to a back-end server for processing [
5]. A high-level representation of the three operations executed by the measurement cycle is depicted in
Figure 1.
The measured RSSI values of the various links in the network contain valuable information about the state of the environment. When people stand between the sensor nodes, they obstruct the Line of Sight (LoS) between these sensor nodes, affecting the RSSI values. As the human body has a large water content, it will absorb part of the radio waves that pass through it, reducing the signal strength of the remaining signal, which will be received on the other side of the environment by the receiving sensor node.
As we are dealing with public transportation environments, people are not the only entities obstructing the radio links; rail vehicles and road vehicles inside or near the environment can also have a significant impact on the measured signal strength. As these objects are made out of metal, they not only heavily reduce the RSSI value of links passing through the object, but also cause significant constructive and destructive multipath interference for links to and from sensor nodes near the object.
By gathering data from multiple links and combining it with ground truth data, we can leverage machine learning techniques to filter out unwanted effects and estimate the number of people in the environment.
This DFWS technique has its origin in the field of Device-Free Localization (DFL) using Radio Tomographic Imaging (RTI), where the attenuation of the links in the WSN are used to determine the location of people or objects [
6].
Coluccia et al. [
7] introduced a novel framework for RTI that operates effectively with extremely sparse and location-uncertain transmitters. They propose both an optimal and a suboptimal algorithm, achieving near-optimal performance with significantly reduced computational cost.
In [
8], the use of mmWave radio is proposed for DFL. This method uses multipath reflections as virtual anchor nodes to reduce hardware requirements and improve accuracy. The technique is based on compressed sensing, clustering, and ray-tracing-assisted processing.
As the number of people in the environment increases, it can become increasingly difficult to accurately determine their individual locations, but estimating higher-level metrics such as the approximate number of people present in the environment is still possible, as demonstrated by Denis et al. in multiple large-scale festival environments [
9]. This formed the basis for DFWS crowd counting applications.
Different techniques and methods exist to count or estimate the number of people present in an environment. These different techniques have their own strengths and weaknesses when it comes to the accuracy and modalities offered.
The most established technique for people counting and density estimation uses image data collected by a camera system. Image data can be used in two different ways to count people. The first way is a direct approach using feature extraction on an image to estimate the number of people in the image. In [
10], a Convolutional Neural Network (CNN)-based approach is used to count people standing on metro platforms.
The second way camera data can be used is to count people walking in and out of a specific area. This technique can be referred to as line counting or flow counting, as it counts the flow of people crossing a virtually defined line on the image. In [
11], security cameras are used in combination with a CNN and a tracking algorithm to demonstrate the ability to count people moving between the platform and a rail vehicle.
An alternative to camera-based counting can be the use of a radar-based sensor. Radar-based sensors can be used similarly to camera-based counters to count people within the field of view (FoV) of the radar [
12] or to count people walking in and out of the environment when placed above the entrance [
13].
Both camera- and radar-based counting techniques suffer from limitations that affect their usability in real-world scenarios. For example, both techniques have a limited FoV and range, which limits the area that the counting system can cover. Another big factor is occlusions. Examples of such occlusions are people holding umbrellas or even the dense crowd itself. Such occlusions can negatively impact the system’s performance. DFWS offers a crowd size estimation solution suitable for wide-area scenarios with limited reliance on visual LoS.
The use of radio waves in DFWS makes it nearly impossible to identify individuals. The information captured by the attenuation of the radio waves is low enough in spatial and temporal resolution that identifiable information cannot be recovered from the data. Even broader characteristics such as age, sex, or length cannot be recovered. This makes DFWS a privacy-by-design technology. This contrasts with cameras [
14] and, to a limited extent, radar systems [
15], which are capable of collecting personal characteristics and potentially identifying individuals.
The number of open datasets on DFWS systems for people counting is very limited. One of the few datasets in existence was published by Kaya et al. in 2020 [
16]. This dataset contains data on three environments at music festivals. This dataset offers a solid baseline for DFWS applications. However, the applicability of this dataset is limited to environments with only people. Public transport environments have much more dynamic behavior, with rail vehicles being introduced into the environment and faster changes in crowd densities.
In this work, we introduce a DFWS dataset consisting of two environments. The first environment is an underground metro station. We refer to this environment as the indoor environment. The second environment is an outdoor above-ground platform. We refer to this environment as the outdoor environment. For both environments, ground truth data were collected on three different days for periods of approximately one hour. This data includes the presence of rail vehicles and the number of people on the platform counted at regular intervals.
The data of the indoor environment have been used in a prior publication [
5]. In this paper, a WSN was used to estimate the number of people standing on the subway platform with and without a rail vehicle present. To achieve this, the WSN was used to perform a classification on whether a rail vehicle is present in order to switch between separately trained linear regression models.
To the best of our knowledge, this is the first DFWS dataset dedicated to crowd size estimation in public transportation environments. In this data descriptor paper, we provide two datasets collected in two distinct public transportation environments. By making these datasets publicly available, we aim to provide researchers in the field of DFWS and crowd counting the opportunity to develop and test new data processing methodologies on real-world data.
This paper is structured as follows: In
Section 2, we discuss other relevant works within the field of DFWS. In
Section 3, we discuss the content of the dataset and how to interpret the different files.
Section 4 explains how the data was collected. In
Section 5, we demonstrate some example use cases for the dataset. Finally, in
Section 6, we conclude this paper.
4. Materials and Methods
In this section, we discuss how this dataset was gathered. We discuss the hardware used, the methodology for ground truth collection, and the experimental setup in both environments.
4.1. Wireless Sensor Network
The WSN is constructed of two main components: a gateway and a set of wireless sensor nodes. All communication between the nodes and between the nodes and the gateway uses the DASH7 Alliance Protocol [
23,
24] in the 868 MHz Short-Range Devices (SRD) band. The gateway we used is a WizziLab (Montrouge, France) WizziGate Pro. The sensor nodes are WizziLab Wolt-XL nodes containing custom firmware. Both networks communicated on a single Hi-Rate channel with a bandwidth of 150 kHz and a baud rate of 166.667 kbit/s. The nodes transmit using an Effective Radiated Power (ERP) of 14 dBm.
The gateway has two main tasks in the network: The first is sending a coordination “start message”. The start message has the purpose of performing the ad hoc synchronization in the DASH7 network. The nodes will listen in a low-power background mode for the start message. As all nodes receive this message and know their own node ID, they can calculate in which time slots they have to listen for messages of other nodes and in which time slot they have to transmit. The cycle is visually shown in
Figure 5.
The start message also clears the nodes’ RSSI buffer storing the received signal strength values. This results in only half of an RSSI matrix for the first cycle, as only one direction of the links has been measured when the message is transmitted from the node to the gateway. The other half of the matrix is included in the subsequent cycle.
Resetting of the RSSI data buffer is performed for practical reasons. Some applications do not run the system continuously. Since the nodes lack an internal real-time clock and cannot determine the time elapsed since the previous cycle, we clear the buffer to prevent outdated data from remaining.
For power efficiency, the network will only perform a resynchronization every ten cycles. This allows the nodes to save power in between cycles, as they do not have to listen for the gateway in background scan state [
24].
The second task the gateway performs is to receive all the messages transmitted by the sensor nodes and forward them to the back-end server via an LTE cellular backhaul. The payload of the message contains the last state of the RSSI receive buffer. This means that values for a lower node ID are from the current cycle while the values for a higher node ID are from the previous cycle, as this node has not received a message from the corresponding node yet during the current cycle.
4.2. Manual Ground Truth Collection
Ground truth data for both the number of people present and for rail vehicles arriving and departing the station were collected using a smartphone application.
To record the number of people present in the environment, the person collecting training data on site periodically performed manual counting of people within the defined area of the environment. The application allows the input of an integer value representing the number of people present. By pressing the save button, the inserted value, together with the current timestamp, will be stored.
For recording rail vehicle arrivals and departures, the application interface has two buttons, “in” and “out”. When pressing one of these buttons, the value −1 for “in” or −2 for “out” is recorded together with the current timestamp. We use negative values −1 and −2 for arrivals and departures of rail vehicles, as numerical values greater than or equal to 0 represent the people count values. A negative people count would be meaningless and is not possible. We use negative integer values for other events, such as rail vehicle movements, to maintain the signed integer type of the field, allowing for ease of parsing the data.
Since the ground truth data are collected independently from the measurement samples, the timestamps in both sets will not align with each other. This is a problem we have to address when assigning labels to the measurement data. Due to the nature of the data, we perform this alignment differently for the people counts than for the vehicle ground truth data.
Since each collected people count represents a single point in time, we suggest combining it with the nearest measurement cycle, assigning the people count value as the label to this measurement cycle. For safety, a maximum time tolerance can be assigned to prevent combining values over large time gaps (e.g., maximum 5 min), after which the manual count value might be considered invalid. This process is visually depicted in
Figure 6.
In contrast, vehicle ground truth data can be considered as a time span given by a start and end time. We want to label measurement samples with whether or not a rail vehicle was present for use later to train and evaluate a binary classification model. Thus, we can label more than only the nearest sample. We label all measurement samples between the first and last ground truth value. After a vehicle enters the station, we label all subsequent measurements as “present” or 1 until the vehicle leaves the station. Similarly, when a vehicle leaves the station, we label all subsequent measurements as “not present” or 0 until another vehicle enters the station. This process is visually depicted in
Figure 7.
4.3. Experimental Setup
This work presents data from two distinct public transportation environments, an indoor and an outdoor platform environment. The first environment, the indoor environment, is part of the “Groenplaats” subway station located in the city of Antwerp, Belgium. The environment is an underground subway platform. This environment contains a single platform and rail track. The second environment, the outdoor environment, is the platform of the “Hospital Sant Joan Despí|TV3” tram stop located in Barcelona, Spain. This environment features a central platform enclosed by two rail tracks offering a bidirectional tram service.
These two environments feature distinctive properties: “Groenplaats”, being underground, acts as an indoor environment, experiencing more multipath effects on the Radio Frequency (RF) links than outdoor environments. Outdoor environments such as “Hospital Sant Joan Despí” are susceptible to weather conditions and other external effects. One such external effect is dynamic reflections occurring outside the observed environment, for example, from cars passing on the adjacent road. In the remainder of this paper, we will refer to these environments as the “indoor” and “outdoor” environments.
Figure 8 shows a top-view and side-view representation of the indoor environment.
Figure 9 shows a picture of the indoor environment. Similarly,
Figure 10 shows a top-view and side-view representation of the outdoor environment.
Figure 11 shows a picture of the outdoor environment. The top view shows the node locations and their node ID. The side view shows the mounting height. The different (colored) shapes group the nodes for later reference. The hip height sensors are mounted at a height between 1 m and 1.2 m above the platform level. The ceiling nodes are mounted at a height of 2.7 m in the indoor environment and 3 m in the outdoor environment. More detailed node locations are included in the dataset, as discussed in
Section 3.3.
The node groups can be combined to optimize for different use cases, such as people counting or rail vehicle detection. There is no physical difference between the node groups or how the data is handled. The gateway location is depicted as the crossed square.
Looking at
Figure 8 and
Figure 10, it becomes clear that the outdoor environment contains significantly fewer sensor nodes than the indoor environment due to the limitations of the outdoor environment in terms of potential mounting options. For the indoor environment, we opted for maximizing the number of nodes beyond what is common for similar setups to allow analyzing the impact of fewer nodes and node locations in post-analysis.
A note to be made for the indoor environment is that nodes 0 and 2 are missing for the last day in the dataset (17 July 2023) due to physical removal of these nodes by technical staff after detaching from the wall.
4.4. Data Processing
For each measurement cycle, we can compile an RSSI matrix of size
, with
N being the number of nodes in the network. For simplicity, we use a fixed value
during processing to allow for future expansion without changing the index mapping; additionally, this also aligns with the size of the “rssi_values” field, preventing the need to map when certain node IDs are not used.
Figure 12 shows an example of an RSSI matrix for the first cycle of the indoor environment on 17 May 2023. A row in the table corresponds to a row in the dataset. “RX Node” corresponds to the “node_id” field; the values correspond to the “rssi_values” in order.
Notice that the values on the diagonal are always 0, as the node cannot receive a message from itself. The RSSI values corresponding to the transmitting (TX) nodes from which the receiving (RX) node did not receive a message will all be 0 in the “rssi_values” list in the dataset; e.g., the N−1 node did not exist here. Similarly, the dataset does not include lines for receiving nodes that did not exist, since they did not send anything to be received by the gateway. For consistency we also fill these rows with 0. In processing, it might be easier to replace all zero values with a Not a Number (NaN) value to prevent including the zero values in calculations as they are not real measurement values.
To work with the measurement data, we need to align the data to a reference point. For most applications, we want to align the empty environment to 0 dB, which will give us the attenuation relative to the empty environment. We want to be able to recalibrate this reference periodically to compensate for changes in the environment that otherwise could induce an unwanted offset in the calibrated data. For this reason, we perform a calibration each night when such environments are usually empty. To compensate for external impacts and noise in the measurement, we average the collected RSSI values for each (directional) link in the RSSI matrix over a set interval. For this dataset we suggest using an interval between 3:00 and 3:15 (a.m.). In more dynamic environments a more dynamic approach can be used, e.g., looking for an interval with the lowest values and standard deviation.
To perform the calibration, we subtract the calibration matrix from each RSSI matrix. In other words, each measurement cycle gets calibrated using the calibration matrix for that day. When no calibration matrix for the current day can be created, we use the calibration matrix of the next day. This can happen due to incomplete data at the time of calibration. The calibration step can be described as a formula, shown in Equation (
1), with
being the attenuation matrix,
being the RSSI matrix, and
being the calibration matrix. This happens for each measurement cycle
c.
To calibrate the first day in both datasets, we use data from the following day. For calibrating the indoor environment on 17 May 2023, we advise using data from the night of 18 May 2023. On the 17th there was no data in the morning for nodes 7, 8, 9, and 10. The batteries were replaced during that day before the ground truth data was collected. Because of this, these links cannot be calibrated using data from the night before. For the outdoor environment, there was no data available in the morning of 11 June 2024 due to the installation of the sensors that day. We therefore recommend using the data from the following night (12 June 2024) to perform the calibration.
The format of the attenuation matrix still contains a lot of redundant information for feeding into a model. The use of 1-dimensional data is preferred for use in most linear models. We transform the 2-D attenuation matrices into 1-D attenuation vectors. Instead of simply flattening the entire matrix, we reduce the amount of data by implementing two measures. The first measure is to discard the zero values on the diagonal. The second measure is to average the two directions for the same link. This reduces the attenuation vector from a length of to a length of , more than halving the size of the input vector. This reduces computational complexity, as well as reducing the measurement noise remaining in the input data by averaging two measurements of the same link, assuming the measurement noise is Gaussian-distributed. This leaves us with an attenuation vector , which contains a single value for each link in the network.
We can use these attenuation vectors, containing data on individual links, to train and validate models. However, even smaller and simpler models can be of value. For the most simple models, as will be the focus in the remainder of this section, we use a single attenuation value for the entire network as input. We obtain this single value by calculating the mean attenuation value for the attenuation vector. However, including each link in the network in the mean attenuation value is not always optimal. For this reason, we use a binary mask to select the links and nodes that align with the intended application when calculating the mean attenuation for the cycle.
6. Conclusions
In this work, we presented a novel dataset on DFWS in public transportation environments. The dataset contains periodically collected RSSI values of radio links within a WSN deployed at two different light rail platforms. We demonstrated the potential of using RSSI values for applications such as rail vehicle detection through classification and estimating the number of people standing on a platform using regression analysis.
The dataset introduced in this work provides a baseline for further research into DFWS applications in public transportation environments. It broadens the scope of the technology beyond the traditional applications, such as large-scale events.
We believe that the dataset, in combination with this data descriptor, enables researchers to develop novel methods for processing DFWS data.
Future work may explore more sophisticated models and data processing techniques to improve accuracy and applicability in more challenging environments.
In this work, similarly to previous studies, we aggregated the measurements of individual links into a single value for each cycle. This step reduces the dimensionality of the input data, but this may also eliminate valuable information about the spatial distribution of the attenuations and reflections. Future work could attempt to utilize data from the different links independently to improve the system.
Expanding the dataset to additional environments might provide overarching insights and pave the way for developing generalized estimation models, thereby reducing or eliminating the need to train models for each environment separately.
Future work may examine how node locations impact the generalizability of the trained models across environments and which low- and high-level features are transferable between environments and thus could enable environment-agnostic models.