Figure 1.
A map of the town center. (left) Locations of the 20 visual sensors. It is expected that the live data produced from those location will help urban planners to update the city’s mobility plan. (right) Co-location of two CCTV cameras and an air quality (in purple) on a pole.
Figure 1.
A map of the town center. (left) Locations of the 20 visual sensors. It is expected that the live data produced from those location will help urban planners to update the city’s mobility plan. (right) Co-location of two CCTV cameras and an air quality (in purple) on a pole.
Figure 2.
The smart visual sensor, outside and inside.
Figure 2.
The smart visual sensor, outside and inside.
Figure 3.
Simplified diagram of the sensor. The core of the sensor is the Jetson TX2 handling the video analytics, the USB and Ethernet connectivity and powering the LoPy 4 and the OLED screen. The Jetson TX2 is in turn powered by the PSU. The LoPy is used for LoRaWAN-based transmission and the screen to display basic information about the sensor’s status.
Figure 3.
Simplified diagram of the sensor. The core of the sensor is the Jetson TX2 handling the video analytics, the USB and Ethernet connectivity and powering the LoPy 4 and the OLED screen. The Jetson TX2 is in turn powered by the PSU. The LoPy is used for LoRaWAN-based transmission and the screen to display basic information about the sensor’s status.
Figure 4.
Activity flow chart of the sensor.
Figure 4.
Activity flow chart of the sensor.
Figure 5.
The 3U server version of the visual sensor hosting 15 NVIDIA Jetson TX2 units, each of these computing modules are able to process in real time one live video feed from a CCTV camera.
Figure 5.
The 3U server version of the visual sensor hosting 15 NVIDIA Jetson TX2 units, each of these computing modules are able to process in real time one live video feed from a CCTV camera.
Figure 6.
Architecture of YOLO V3 for object detection. A picture is passed through a fully convolutional neural network of 406 hidden layers in order to predict box coordinates and class probabilities at three different scales (large, medium, small). A non-maximum suppression algorithm is then applied to only retain the category and coordinates with the higher score.
Figure 6.
Architecture of YOLO V3 for object detection. A picture is passed through a fully convolutional neural network of 406 hidden layers in order to predict box coordinates and class probabilities at three different scales (large, medium, small). A non-maximum suppression algorithm is then applied to only retain the category and coordinates with the higher score.
Figure 7.
The general architecture of the project. The Agnosticity software stack relies on well established open-source software. The data collection and access is ensured by the open-source implementation of the OM2M standard.
Figure 7.
The general architecture of the project. The Agnosticity software stack relies on well established open-source software. The data collection and access is ensured by the open-source implementation of the OM2M standard.
Figure 8.
The web-based interactive dashboard used to represent the collected data from the different sensors deployed for the Liverpool project. The interface is responsive and can be used both on desktop and mobile browsers.
Figure 8.
The web-based interactive dashboard used to represent the collected data from the different sensors deployed for the Liverpool project. The interface is responsive and can be used both on desktop and mobile browsers.
Figure 9.
Kernel density estimation of the accuracy (left) and percentage deviation (right) computed across the 4500 frames of the Oxford dataset.
Figure 9.
Kernel density estimation of the accuracy (left) and percentage deviation (right) computed across the 4500 frames of the Oxford dataset.
Figure 10.
Evolution of the number of frame per seconds processed (FPS) by sensor (blue) and the number of detected objects (red) over time. It can be seen that the FPS are higher when the number of detection is lower. The drop in FPS is mainly due to the SORT algorithm as there are more objects to be tracked, a task not taking advantage of the CUDA cores available on the Jetson TX2.
Figure 10.
Evolution of the number of frame per seconds processed (FPS) by sensor (blue) and the number of detected objects (red) over time. It can be seen that the FPS are higher when the number of detection is lower. The drop in FPS is mainly due to the SORT algorithm as there are more objects to be tracked, a task not taking advantage of the CUDA cores available on the Jetson TX2.
Figure 11.
Evolution of the accuracy (blue lines, ideal is 1.0) in accordance with the ground truth (orange lines) over time. Accuracy is better with small groups and decreases with large crowds.
Figure 11.
Evolution of the accuracy (blue lines, ideal is 1.0) in accordance with the ground truth (orange lines) over time. Accuracy is better with small groups and decreases with large crowds.
Figure 12.
Number of ground truth detection against the number of detection. We observe that the relationship is linear, meaning that the algorithm manages to capture trends.
Figure 12.
Number of ground truth detection against the number of detection. We observe that the relationship is linear, meaning that the algorithm manages to capture trends.
Figure 13.
A 15 min monitoring of the CPU, GPU, memory and disk usages, average temperature (top) and network utilization (bottom) during a real world deployment of the sensor. Over this period, 280 unique objects have been detected.
Figure 13.
A 15 min monitoring of the CPU, GPU, memory and disk usages, average temperature (top) and network utilization (bottom) during a real world deployment of the sensor. Over this period, 280 unique objects have been detected.
Figure 14.
Plot of the number of people detected inside a building over one hour. The two peaks correspond to the start and end of the fire alarm event.
Figure 14.
Plot of the number of people detected inside a building over one hour. The two peaks correspond to the start and end of the fire alarm event.
Figure 15.
Trajectories followed by the individuals detected and tracked by the sensor. Each of the 631 lines represent one individual.
Figure 15.
Trajectories followed by the individuals detected and tracked by the sensor. Each of the 631 lines represent one individual.
Figure 16.
Heat map of the maximum number of individual detected in the field of view of the sensor.
Figure 16.
Heat map of the maximum number of individual detected in the field of view of the sensor.
Figure 17.
The sensor is located at the center of the city, next to a pedestrian street (highlighted in red). The field of view of the camera is represented in blue.
Figure 17.
The sensor is located at the center of the city, next to a pedestrian street (highlighted in red). The field of view of the camera is represented in blue.
Figure 18.
Number of pedestrian, vehicle and bicycle detected by the sensor between 20 February 2019 and 27 February 2019. Each data point represents the number of objects of a specific type detected over the last minute.
Figure 18.
Number of pedestrian, vehicle and bicycle detected by the sensor between 20 February 2019 and 27 February 2019. Each data point represents the number of objects of a specific type detected over the last minute.
Figure 19.
Number of detection pedestrians (green), bicycles (red), and vehicles (yellow) detected hourly on 23 February 2019.
Figure 19.
Number of detection pedestrians (green), bicycles (red), and vehicles (yellow) detected hourly on 23 February 2019.
Figure 20.
Plot of the pixel coordinates (X,Y) of the detected pedestrians (blue) or bicycles (orange) in the frame for the day 23 February 2019. Each dot represents the centroid of the bounding box corresponding to an item that has been detected at those coordinates.
Figure 20.
Plot of the pixel coordinates (X,Y) of the detected pedestrians (blue) or bicycles (orange) in the frame for the day 23 February 2019. Each dot represents the centroid of the bounding box corresponding to an item that has been detected at those coordinates.
Figure 21.
Trajectories of pedestrians and bicycles within the frame during the day of 23 February 2019. Two flows of pedestrians are visible.
Figure 21.
Trajectories of pedestrians and bicycles within the frame during the day of 23 February 2019. Two flows of pedestrians are visible.
Table 1.
YOLO V3 parameters for the detection task.
Table 1.
YOLO V3 parameters for the detection task.
Parameter | Value |
---|
Input size | 416 × 416 pixels |
Small scale detection grid | 52 × 52 cells |
Medium scale detection grid | 26 × 26 cells |
Large scale detection grid | 13 × 13 cells |
Number of bounding box per cell K | 3 |
Confidence | 0.9 |
NMS | 0.5 |
Table 2.
SORT parameters for the tracking task.
Table 2.
SORT parameters for the tracking task.
Parameter | Value |
---|
Minimum hits | 3 |
Maximum age | 40 |
Threshold | 0.3 |
Table 3.
Summary of the performance results with basic statistics computed over the 4500 frames. Those results demonstrate that the algorithm under estimate the number of detection, but manages to have a good speed (fps).
Table 3.
Summary of the performance results with basic statistics computed over the 4500 frames. Those results demonstrate that the algorithm under estimate the number of detection, but manages to have a good speed (fps).
| Detection | True | Error | Relative Error | Accuracy | fps |
---|
mean | 10.52 | 15.87 | −5.34 | 0.31 | 0.69 | 19.57 |
standard deviation | 2.80 | 4.69 | 3.35 | 0.15 | 0.15 | 3.49 |
minimum | 2.00 | 6.00 | 17.00 | 0.00 | 0.22 | 4.63 |
25th-percentile | 8.00 | 13.00 | −8.00 | 0.21 | 0.57 | 17.28 |
median | 11.00 | 16.00 | −5.00 | 0.33 | 0.66 | 19.77 |
75th-percentile | 13.00 | 19.00 | −3.00 | 0.42 | 0.78 | 22.22 |
maximum | 20.00 | 28.00 | 2.00 | 0.77 | 1.33 | 22.99 |