Prototype to Increase Crosswalk Safety by Integrating Computer Vision with ITS-G5 Technologies

: Human errors are probably the main cause of car accidents, and this type of vehicle is one of the most dangerous forms of transport for people. The danger comes from the fact that on public roads there are simultaneously di ﬀ erent types of actors (drivers, pedestrians or cyclists) and many objects that change their position over time, making di ﬃ cult to predict their immediate movements. The intelligent transport system (ITS-G5) standard speciﬁes the European communication technologies and protocols to assist public road users, providing them with relevant information. The scientiﬁc community is developing ITS-G5 applications for various purposes, among which is the increasing of pedestrian safety. This paper describes the developed work to implement an ITS-G5 prototype that aims at the increasing of pedestrian and driver safety in the vicinity of a pedestrian crosswalk by sending ITS-G5 decentralized environmental notiﬁcation messages (DENM) to the vehicles. These messages are analyzed, and if they are relevant, they are presented to the driver through a car’s onboard infotainment system. This alert allows the driver to take safety precautions to prevent accidents. The implemented prototype was tested in a controlled environment pedestrian crosswalk. The results showed the capacity of the prototype for detecting pedestrians, suitable message sending, the reception and processing on a vehicle onboard unit (OBU) module and its presentation on the car onboard infotainment system.


Introduction
Automotive vehicles are one of the most dangerous people transportation options [1] and human errors are the main cause of accidents. There are different types of users and different types of obstacles on public roads that change their position over time, making it difficult to predict their immediate location. The intelligent transport system (ITS-G5) is the European standard that uses vehicle networks to share messages between vehicles, people, and infrastructure to make the public road safer and more traffic efficient [2]. The use of vehicle networks by the automotive industry and consortia that implement smart cities has grown steadily and these mobile networks ensure digital communications between all the users of these ITS services.
IEEE 802.11p is the standard chosen by ITS for wireless communication on public roads for Vehicular ad-hoc networks (VANET) [3], due to its features, namely low latency and reduced loss of information [4] and allowing communication vehicle-to-vehicle (V2V), vehicle-to-infrastructure (V2I), vehicle-to-everything (V2X) and vehicle-to-pedestrian (V2P) [5].
CAM messages contain information about the presence, position, and status of vehicles [11] that are sent periodically and received by other cars and fixed stations that are at a single hop of distance. Upon receiving a CAM message, the ITS-G5 stations get to know all cars in its vicinity, their position, speed, basic attributes and other data. The CAM message manager, inside the Facilities layer, generates information collected from the clock, vehicle monitoring system and other sensors. Then, the CAM message is sent to the lower layers and relayed to the physical network. At the receiving side, the CAM messages are validated by the Local Dynamic Map (LDM) module, which analyzes the message and updates the LDM, a georeferenced database that contains information about the neighbourhood vehicles. The CAM periodic transmission rate is between a minimum limit (T min = 100 ms) and a maximum limit (T max = 1 s). A CAM message is also trigged when a vehicle exceeds its predefined dynamic thresholds for heading, movement and acceleration. When the wireless channel load is high, the minimum period can be increased. If required by any safety situation the maximum period can be decreased [12].
DENM messages contain information related to hazard warnings or anomalous road conditions [13] and are disseminated to ITS-G5 Stations in the neighbourhood. The DENM message manager is responsible for the creation, managing and processing of these messages. The creation is triggered by an ITS application. On the other hand, when receiving a DENM message, the Facilities layer validates it and sends the information to the application layer and when it is considered relevant it is presented to the driver.
IVIM is an infrastructure-to-vehicle (I2V) message type with information about the traffic infrastructure. The purpose of these messages is to provide drivers access to traffic sign information in any environmental conditions [11].
SPATEM is another I2V message type used for communications at road intersections and contains dynamic information about the intersection status in a given instant, such as traffic lights, predictions of future states and speed advice [11].
MAPEM messages work together with SPATEM and are used to spread information about any layout intersection change. These messages are broadcasted when the road layout is changed: for example, a change in the crossing priority at a traffic lights intersection [11]. Figure 1 shows the ITS application classification which groups them into Infotainment and Comfort, Traffic Management, Road Safety and Autonomous Driving [14].
Information 2020, 11, x FOR PEER REVIEW 3 of 13 CAM messages contain information about the presence, position, and status of vehicles [11] that are sent periodically and received by other cars and fixed stations that are at a single hop of distance. Upon receiving a CAM message, the ITS-G5 stations get to know all cars in its vicinity, their position, speed, basic attributes and other data. The CAM message manager, inside the Facilities layer, generates information collected from the clock, vehicle monitoring system and other sensors. Then, the CAM message is sent to the lower layers and relayed to the physical network. At the receiving side, the CAM messages are validated by the Local Dynamic Map (LDM) module, which analyzes the message and updates the LDM, a georeferenced database that contains information about the neighbourhood vehicles. The CAM periodic transmission rate is between a minimum limit (T min = 100 ms) and a maximum limit (T max = 1 s). A CAM message is also trigged when a vehicle exceeds its predefined dynamic thresholds for heading, movement and acceleration. When the wireless channel load is high, the minimum period can be increased. If required by any safety situation the maximum period can be decreased [12].
DENM messages contain information related to hazard warnings or anomalous road conditions [13] and are disseminated to ITS-G5 Stations in the neighbourhood. The DENM message manager is responsible for the creation, managing and processing of these messages. The creation is triggered by an ITS application. On the other hand, when receiving a DENM message, the Facilities layer validates it and sends the information to the application layer and when it is considered relevant it is presented to the driver.
IVIM is an infrastructure-to-vehicle (I2V) message type with information about the traffic infrastructure. The purpose of these messages is to provide drivers access to traffic sign information in any environmental conditions [11].
SPATEM is another I2V message type used for communications at road intersections and contains dynamic information about the intersection status in a given instant, such as traffic lights, predictions of future states and speed advice [11].
MAPEM messages work together with SPATEM and are used to spread information about any layout intersection change. These messages are broadcasted when the road layout is changed: for example, a change in the crossing priority at a traffic lights intersection [11]. Figure 1 shows the ITS application classification which groups them into Infotainment and Comfort, Traffic Management, Road Safety and Autonomous Driving [14]. The ITS-G5 protocol stack includes a management layer that handles all protocol stack layers and a security layer that provides security and privacy services to all other layers.
There are several scientific works about the development of ITS-G5 applications related to traffic safety which contribute to the increase of road infrastructure security. In [15], the authors propose a system to collect regular weather forecasts for specific road stretches to deliver and display road weather data to drivers, thus increasing traffic fluency and safety. In [16], the authors present the TRUST project, where the main goal is to develop a weather monitoring system for road infrastructures, to identify potential driving hazards and alert generation for cars and traffic management centres. In [17], the authors propose a study to assess and optimise the performance of ITS-G5 for time-critical safety conflict scenarios between vehicles and pedestrians. The ITS-G5 protocol stack includes a management layer that handles all protocol stack layers and a security layer that provides security and privacy services to all other layers.

Computer Vision for Pedestrian Detection
There are several scientific works about the development of ITS-G5 applications related to traffic safety which contribute to the increase of road infrastructure security. In [15], the authors propose a system to collect regular weather forecasts for specific road stretches to deliver and display road weather data to drivers, thus increasing traffic fluency and safety. In [16], the authors present the TRUST project, where the main goal is to develop a weather monitoring system for road infrastructures, to identify potential driving hazards and alert generation for cars and traffic management centres. In [17], the authors propose a study to assess and optimise the performance of ITS-G5 for time-critical safety conflict scenarios between vehicles and pedestrians.

Computer Vision for Pedestrian Detection
Visual data analysis has been a research topic for more than six decades that provides information, analysis and decision-making. Human visual perception initiates from the identification of reference points of an object which is further identified from the comparative points of a similar object learned by the brain [18]. The research in image and video analysis in most cases will try to mimic these processes of bio-inspired behaviour. In our research of pedestrian detection, we combine computer vision and RGB cameras with ITS-G5 technologies to provide road safety context [19][20][21].
In our context, related to pedestrian detection, it is important to implement robust object recognition tasks that accurately give a pose estimation of the object (person), which are the two main tasks related to our computer vision module. The computer vision algorithms extract features from images to perform object detection and recognition, including local and the global features. Local features describe the local vicinity around the selected key points, and in general these algorithms handle scale changes, rotation and occlusion in a more proficient way. As examples, we have Features from Accelerated Segment Test (FAST), the minimum eigenvalue algorithm, the corner detector, Speeded Up Robust Features (SURF), etc. [22,23]. Global features have an important role as a general descriptor in images. As examples, we have shape distributions, viewpoint feature histogram, geometric scale space, etc. [24,25]. Recently, new types of algorithms based on artificial intelligence (AI) have emerged in the field of object detection and recognition, image understanding and description. As an example, deep learning-based methods have been used to identify objects. These methods have shown a great success in robust object classification algorithms, but their training procedure requires large amounts of data (labelled dataset images) and a considerable amount of computational demands.
Convolutional neural networks (CNN) are the most successful approach in deep learning for interpreting images (two-dimensional data) and video (tree-dimensional data). CNN provides the understanding of semantic context in images and videos and gives state-of-art results in image recognition, detection, segmentation, scene reconstruction, etc. In its typical topology, a CNN consists of three main layers and sub-layers, as follows:
Hidden perceptron payers and its sub-layers: • Convolution layer: the core of CNNs that has several learnable filters from data; • Pooling layer: used to reduce the number of parameters and the number of complex computations in training; • Fully connected layer: takes the output of previous layers and turns them into a single vector that can be an input for the next one.
The literature [26] presents a technical review to understand the concepts and topologies of different types of deep neural networks. In our work, we use the YOLO software implementation for its accuracy and real time capabilities. The global overview of the YOLO organization blocks are illustrated in Figure 2. Visual data analysis has been a research topic for more than six decades that provides information, analysis and decision-making. Human visual perception initiates from the identification of reference points of an object which is further identified from the comparative points of a similar object learned by the brain [18]. The research in image and video analysis in most cases will try to mimic these processes of bio-inspired behaviour. In our research of pedestrian detection, we combine computer vision and RGB cameras with ITS-G5 technologies to provide road safety context [19][20][21].
In our context, related to pedestrian detection, it is important to implement robust object recognition tasks that accurately give a pose estimation of the object (person), which are the two main tasks related to our computer vision module. The computer vision algorithms extract features from images to perform object detection and recognition, including local and the global features. Local features describe the local vicinity around the selected key points, and in general these algorithms handle scale changes, rotation and occlusion in a more proficient way. As examples, we have Features from Accelerated Segment Test (FAST), the minimum eigenvalue algorithm, the corner detector, Speeded Up Robust Features (SURF), etc. [22,23]. Global features have an important role as a general descriptor in images. As examples, we have shape distributions, viewpoint feature histogram, geometric scale space, etc. [24,25]. Recently, new types of algorithms based on artificial intelligence (AI) have emerged in the field of object detection and recognition, image understanding and description. As an example, deep learning-based methods have been used to identify objects. These methods have shown a great success in robust object classification algorithms, but their training procedure requires large amounts of data (labelled dataset images) and a considerable amount of computational demands.
Convolutional neural networks (CNN) are the most successful approach in deep learning for interpreting images (two-dimensional data) and video (tree-dimensional data). CNN provides the understanding of semantic context in images and videos and gives state-of-art results in image recognition, detection, segmentation, scene reconstruction, etc. In its typical topology, a CNN consists of three main layers and sub-layers, as follows: 1. Input layer: receives two-dimensional data (images); 2. Hidden perceptron payers and its sub-layers:

•
Convolution layer: the core of CNNs that has several learnable filters from data; • Pooling layer: used to reduce the number of parameters and the number of complex computations in training; • Fully connected layer: takes the output of previous layers and turns them into a single vector that can be an input for the next one.
The literature [26] presents a technical review to understand the concepts and topologies of different types of deep neural networks. In our work, we use the YOLO software implementation for its accuracy and real time capabilities. The global overview of the YOLO organization blocks are illustrated in Figure 2. In general, computer vision tasks related to object detection and recognition deal with objects under different viewpoints, illumination, intraclass variation, object rotation and scale change, dense In general, computer vision tasks related to object detection and recognition deal with objects under different viewpoints, illumination, intraclass variation, object rotation and scale change, dense and occluded object detection and speed up trade-off of detection. It also has been stated [28] that deep models are potentially more capable than shallow models in handling such complex tasks.
There are several scientific works about developments related to our work and that present the major contributes to the state of the art in deep neural networks for pedestrian detection. In [29], the authors use YOLO and an enhanced image method; in [30], the authors use multiple CNNs at different scales for learning multiscale features; in [31], the authors use a lightweight neural network (modified YOLO) to achieve real time processing; in [32], the authors use region-based convolutional neural networks (R-CNNs); and in [33], the authors use infrared images and faster RCNN.
The object recognition methods presented previously combine traditional computer vision techniques with deep learning algorithms. These tend to yield in general better results but need very large image datasets and high-performance processing demands, while traditional computer vision algorithms have more restrict and specific applications.
In [34], the authors propose a new model to analyze and characterize the potential pedestrian risk when they are crossing a crosswalk. The authors use an R-CNN model to analyze the video footage to automatically detect pedestrians and vehicle trajectories.

Prototype Specification
The implemented prototype aims to provide better road information to its users and, therefore, to make the public road network safer for pedestrians and drivers, based on pedestrian pattern detection technologies and the ITS-G5 protocols.
The prototype has two stations, one installed in the road network infrastructure (RSU) and another inside a vehicle (OBU) connected to an infotainment system. The RSU unit incorporates a pedestrian detection (PD) module to detect and recognize pedestrians on a crosswalk and an ITS-G5 module for the creation and sending of DENM messages. The PD module contains a motion sensor, a video camera and deep leaning pedestrian detection and recognition algorithm (YOLO v3 and OpenCV), running on a personal computer (PC). Figure 3 shows the RSU station operating process. The motion sensor component is pointed towards the crosswalk and covers its entire area. Any object on the crosswalk causes the motion sensor to activate the pedestrian detection (PD) module. The video acquired by the camera is analyzed by the YOLO v3 CNN algorithm, which scans image by image in search of pedestrian patterns.

ITS-G5 Prototype Architecture
Information 2020, 11, x FOR PEER REVIEW 5 of 13 and occluded object detection and speed up trade-off of detection. It also has been stated [28] that deep models are potentially more capable than shallow models in handling such complex tasks. There are several scientific works about developments related to our work and that present the major contributes to the state of the art in deep neural networks for pedestrian detection. In [29], the authors use YOLO and an enhanced image method; in [30], the authors use multiple CNNs at different scales for learning multiscale features; in [31], the authors use a lightweight neural network (modified YOLO) to achieve real time processing; in [32], the authors use region-based convolutional neural networks (R-CNNs); and in [33], the authors use infrared images and faster RCNN.
The object recognition methods presented previously combine traditional computer vision techniques with deep learning algorithms. These tend to yield in general better results but need very large image datasets and high-performance processing demands, while traditional computer vision algorithms have more restrict and specific applications.
In [34], the authors propose a new model to analyze and characterize the potential pedestrian risk when they are crossing a crosswalk. The authors use an R-CNN model to analyze the video footage to automatically detect pedestrians and vehicle trajectories.

Prototype Specification
The implemented prototype aims to provide better road information to its users and, therefore, to make the public road network safer for pedestrians and drivers, based on pedestrian pattern detection technologies and the ITS-G5 protocols.
The prototype has two stations, one installed in the road network infrastructure (RSU) and another inside a vehicle (OBU) connected to an infotainment system. The RSU unit incorporates a pedestrian detection (PD) module to detect and recognize pedestrians on a crosswalk and an ITS-G5 module for the creation and sending of DENM messages. The PD module contains a motion sensor, a video camera and deep leaning pedestrian detection and recognition algorithm (YOLO v3 and OpenCV), running on a personal computer (PC). Figure 3 shows the RSU station operating process. The motion sensor component is pointed towards the crosswalk and covers its entire area. Any object on the crosswalk causes the motion sensor to activate the pedestrian detection (PD) module. The video acquired by the camera is analyzed by the YOLO v3 CNN algorithm, which scans image by image in search of pedestrian patterns.

ITS-G5 Prototype Architecture
Whenever the object detection software identifies a pedestrian on the crosswalk, it triggers the RSU-ITS-G5 module, depicted in Figure 4, which broadcasts the predefined DENM messages, and, in this way, car drivers are notified about pedestrians on the crosswalk.
The motion sensor avoids unnecessary processing whenever the crosswalk is empty, in this way reducing the average energy consumption. The RSU-ITS-G5 module, shown in Figure 4, implements the ITS-G5 protocol stack to manage the creation of DENM messages with information about pedestrians on the crosswalk, implemented Whenever the object detection software identifies a pedestrian on the crosswalk, it triggers the RSU-ITS-G5 module, depicted in Figure 4, which broadcasts the predefined DENM messages, and, in this way, car drivers are notified about pedestrians on the crosswalk. in the ITS Facilities layer. All the mandatory headers of the BTP and GeoNetworking protocols are also filled with the specific scenario parameters. The Pedestrian detection (PD) module manages a trigger that activates DENM messages broadcast while pedestrians are on the crosswalk. The OBU station is inside the car connected to the infotainment system and has ITS-G5 support to receive DENM messages as shown in Figure 5. The OBU station uses 802.11p hardware and has support to the ITS-G5 protocol stack. DENM messages are received by the physical layer and are delivered to the upper layers to be decoded. At the facilities layer, the DENM message data are analyzed and made available to the application layer. This application layer interprets the information and shows the presence of pedestrians on crosswalk on the car's infotainment system.  if pedestrians are detected on the crosswalk, the RSU's ITS-G5 module creates the DENM message to broadcast using the 802.11p wireless protocol. The messages are received on the vehicle's OBU device and, after being decoded, the information is shown on the car's infotainment system. This way, the driver is notified about pedestrians on the crosswalk and can take the necessary security measures.  The motion sensor avoids unnecessary processing whenever the crosswalk is empty, in this way reducing the average energy consumption.
The RSU-ITS-G5 module, shown in Figure 4, implements the ITS-G5 protocol stack to manage the creation of DENM messages with information about pedestrians on the crosswalk, implemented in the ITS Facilities layer. All the mandatory headers of the BTP and GeoNetworking protocols are also filled with the specific scenario parameters. The Pedestrian detection (PD) module manages a trigger that activates DENM messages broadcast while pedestrians are on the crosswalk.
The OBU station is inside the car connected to the infotainment system and has ITS-G5 support to receive DENM messages as shown in Figure 5. The OBU station uses 802.11p hardware and has support to the ITS-G5 protocol stack. DENM messages are received by the physical layer and are delivered to the upper layers to be decoded. At the facilities layer, the DENM message data are analyzed and made available to the application layer. This application layer interprets the information and shows the presence of pedestrians on crosswalk on the car's infotainment system.  The OBU station is inside the car connected to the infotainment system and has ITS-G5 support to receive DENM messages as shown in Figure 5. The OBU station uses 802.11p hardware and has support to the ITS-G5 protocol stack. DENM messages are received by the physical layer and are delivered to the upper layers to be decoded. At the facilities layer, the DENM message data are analyzed and made available to the application layer. This application layer interprets the information and shows the presence of pedestrians on crosswalk on the car's infotainment system.  Figure 6 exhibits the typical prototype operating scenario. The RSU's motion sensor device detects movement on the crosswalk; the video camera is started; the images recorded are analyzed; if pedestrians are detected on the crosswalk, the RSU's ITS-G5 module creates the DENM message to broadcast using the 802.11p wireless protocol. The messages are received on the vehicle's OBU device and, after being decoded, the information is shown on the car's infotainment system. This way, the driver is notified about pedestrians on the crosswalk and can take the necessary security measures.   Figure 6 exhibits the typical prototype operating scenario. The RSU's motion sensor device detects movement on the crosswalk; the video camera is started; the images recorded are analyzed; if pedestrians are detected on the crosswalk, the RSU's ITS-G5 module creates the DENM message to broadcast using the 802.11p wireless protocol. The messages are received on the vehicle's OBU device and, after being decoded, the information is shown on the car's infotainment system. This way, the driver is notified about pedestrians on the crosswalk and can take the necessary security measures.
Information 2020, 11, x FOR PEER REVIEW 6 of 13 in the ITS Facilities layer. All the mandatory headers of the BTP and GeoNetworking protocols are also filled with the specific scenario parameters. The Pedestrian detection (PD) module manages a trigger that activates DENM messages broadcast while pedestrians are on the crosswalk. The OBU station is inside the car connected to the infotainment system and has ITS-G5 support to receive DENM messages as shown in Figure 5. The OBU station uses 802.11p hardware and has support to the ITS-G5 protocol stack. DENM messages are received by the physical layer and are delivered to the upper layers to be decoded. At the facilities layer, the DENM message data are analyzed and made available to the application layer. This application layer interprets the information and shows the presence of pedestrians on crosswalk on the car's infotainment system.  Figure 6 exhibits the typical prototype operating scenario. The RSU's motion sensor device detects movement on the crosswalk; the video camera is started; the images recorded are analyzed; if pedestrians are detected on the crosswalk, the RSU's ITS-G5 module creates the DENM message to broadcast using the 802.11p wireless protocol. The messages are received on the vehicle's OBU device and, after being decoded, the information is shown on the car's infotainment system. This way, the driver is notified about pedestrians on the crosswalk and can take the necessary security measures.  The prototype uses a VMWare virtual machine with Ubuntu 16.04, with a kernel modification to support Atheros Ath9k wireless drivers which support the IEEE 802.11p standard. DENM messages are sent by a cooperative ITS-G5 structure which works through a vehicular ad hoc network (VANET) to carry out an I2V communication.
CSS Labs' OpenC2X [35] software was used to implement the ITS-G5 protocol stack and to create the DENM messages. The OpenC2X is an experimental and open-source platform for creating ETSI ITS-G5 services.

Computer Vision Solution
The main computer vision task aims to detect pedestrian instances and locate their positions in images and their correspondent positions on the road. Detected pedestrian positions are then analyzed to verify if they are on the crosswalk. The pedestrian pattern recognition task uses the OpenCV library for the global image processing functions and to interact with YOLO v3 which is an object detector using deep learning for real-time object detection. The OpenCV library has image and video processing modules, data structures, and supports various computer vision algorithms for real-time object recognition. The global functional blocks of the pedestrian detection module (PD) are shown in Figure 7.
Information 2020, 11, x FOR PEER REVIEW 7 of 13 messages are sent by a cooperative ITS-G5 structure which works through a vehicular ad hoc network (VANET) to carry out an I2V communication. CSS Labs' OpenC2X [35] software was used to implement the ITS-G5 protocol stack and to create the DENM messages. The OpenC2X is an experimental and open-source platform for creating ETSI ITS-G5 services.

Computer Vision Solution
The main computer vision task aims to detect pedestrian instances and locate their positions in images and their correspondent positions on the road. Detected pedestrian positions are then analyzed to verify if they are on the crosswalk. The pedestrian pattern recognition task uses the OpenCV library for the global image processing functions and to interact with YOLO v3 which is an object detector using deep learning for real-time object detection. The OpenCV library has image and video processing modules, data structures, and supports various computer vision algorithms for real-time object recognition. The global functional blocks of the pedestrian detection module (PD) are shown in Figure 7. The first block uses OpenCV functions for image acquisition and image resize to minimise the computing demands of the YOLO v3 object detection.
The second block uses the YOLO v3 with the pre-trained model generated from the common objects in context (COCO) dataset with adjusted settings to output only the person class. The output of this block is the YOLO v3 return bounding box coordinates in the form of centerX, centerY, width and height.
The third block uses OpenCV functions to implement the region of interest (ROI) classification and is responsible for analyzing the bounding box coordinates to check if they correspond to the crosswalk ROI. If the pedestrians are inside the ROI, the DENM messages are trigged.
Several tests have been conducted with different typical crosswalk scenarios to access their geometric variations, the pedestrian's sizes and if the YOLO v3 accuracy was suitable for this prototype.

Building DENM Messages
The DENM messages are created in the Facilities layer with the data to fill the specific headers, where some are mandatory, such as stationType, relevanceTrafficDirection and relevanceDistance, and other ones that are optional, such as Situation and Location, as shown in Table 1.  The first block uses OpenCV functions for image acquisition and image resize to minimise the computing demands of the YOLO v3 object detection.
The second block uses the YOLO v3 with the pre-trained model generated from the common objects in context (COCO) dataset with adjusted settings to output only the person class. The output of this block is the YOLO v3 return bounding box coordinates in the form of centerX, centerY, width and height.
The third block uses OpenCV functions to implement the region of interest (ROI) classification and is responsible for analyzing the bounding box coordinates to check if they correspond to the crosswalk ROI. If the pedestrians are inside the ROI, the DENM messages are trigged.
Several tests have been conducted with different typical crosswalk scenarios to access their geometric variations, the pedestrian's sizes and if the YOLO v3 accuracy was suitable for this prototype.

Building DENM Messages
The DENM messages are created in the Facilities layer with the data to fill the specific headers, where some are mandatory, such as stationType, relevanceTrafficDirection and relevanceDistance, and other ones that are optional, such as Situation and Location, as shown in Table 1. Figure 8 shows the layers that are involved in the creation, sending and receiving of the DENM messages. At the RSU end, the application software, after identifying pedestrians at the crosswalk, generates a trigger for the creation of DENM messages at the Facilities layer. Then, the BTP-P and GeoNetworking protocols headers are filled with the specific parameters for each scenario, as per the ones specified in Table 1. After that, the packets are sent to the OBU end over the 802.11p wireless channel. At the OBU end, the packets are decoded by all protocol layers until the information arrives at the application layer where it is presented on the vehicle infotainment system. humanPresenceOnRoad (12) subCausedCode: 0 Figure 8 shows the layers that are involved in the creation, sending and receiving of the DENM messages. At the RSU end, the application software, after identifying pedestrians at the crosswalk, generates a trigger for the creation of DENM messages at the Facilities layer. Then, the BTP-P and GeoNetworking protocols headers are filled with the specific parameters for each scenario, as per the ones specified in Table 1. After that, the packets are sent to the OBU end over the 802.11p wireless channel. At the OBU end, the packets are decoded by all protocol layers until the information arrives at the application layer where it is presented on the vehicle infotainment system.

Evaluation and Results
This section describes the performed tests. The adopted test methodology was to install the prototype near a crosswalk in a controlled environment, place pedestrians moving near the crossing sidewalk and have them cross the crosswalk, and have private cars passing through the crosswalk with OBU stations, in a limited amount of time. The tests were carried out with pedestrians inside and outside the crosswalk, to evaluate the ability of the motion sensor to trigger the video camera, the pedestrian detection module that only identifies pedestrians on the crosswalk, the creation and sending of DENM messages and, inside the vehicle, the OBU's ability to receive and display the corresponding information.

Evaluation and Results
This section describes the performed tests. The adopted test methodology was to install the prototype near a crosswalk in a controlled environment, place pedestrians moving near the crossing sidewalk and have them cross the crosswalk, and have private cars passing through the crosswalk with OBU stations, in a limited amount of time. The tests were carried out with pedestrians inside and outside the crosswalk, to evaluate the ability of the motion sensor to trigger the video camera, the pedestrian detection module that only identifies pedestrians on the crosswalk, the creation and sending of DENM messages and, inside the vehicle, the OBU's ability to receive and display the corresponding information.

RSU Pedestrian Detection
This section presents the tested scenario for pedestrians near and on the crosswalk that may correspond to a trigger message situation event. The experimental layout is explained and the results of the computer vision processing are depicted in Figures 9 and 10. Figure 9 represents the layout with a single crosswalk.
Information 2020, 11, x FOR PEER REVIEW 9 of 13 This section presents the tested scenario for pedestrians near and on the crosswalk that may correspond to a trigger message situation event. The experimental layout is explained and the results of the computer vision processing are depicted in Figures 9 and 10. Figure 9 represents the layout with a single crosswalk. The test scenario includes the following outcomes: Figure 10a, in which all pedestrians are outside the crosswalk ROI; Figure 10b, in which there are pedestrians inside and outside the crosswalk ROI; Figure 10c, in which there are non-pedestrian objects on the crosswalk ROI; and Figure 10d, in which there are pedestrians outside and on both sides of the crosswalk ROI. The pedestrians detected out of the crosswalk ROI are marked with a green bounding box and the ones detected on the crosswalk ROI are highlighted with a red bounding box.
When a pedestrian is detected on the crosswalk (within a red bounding box), it triggers the RSU-ITS-G5 module to send the DENM messages that alert nearby drivers via the in-vehicle OBU station.  This section presents the tested scenario for pedestrians near and on the crosswalk that may correspond to a trigger message situation event. The experimental layout is explained and the results of the computer vision processing are depicted in Figures 9 and 10. Figure 9 represents the layout with a single crosswalk. The test scenario includes the following outcomes: Figure 10a, in which all pedestrians are outside the crosswalk ROI; Figure 10b, in which there are pedestrians inside and outside the crosswalk ROI; Figure 10c, in which there are non-pedestrian objects on the crosswalk ROI; and Figure 10d, in which there are pedestrians outside and on both sides of the crosswalk ROI. The pedestrians detected out of the crosswalk ROI are marked with a green bounding box and the ones detected on the crosswalk ROI are highlighted with a red bounding box.
When a pedestrian is detected on the crosswalk (within a red bounding box), it triggers the RSU-ITS-G5 module to send the DENM messages that alert nearby drivers via the in-vehicle OBU station. The test scenario includes the following outcomes: Figure 10a, in which all pedestrians are outside the crosswalk ROI; Figure 10b, in which there are pedestrians inside and outside the crosswalk ROI; Figure 10c, in which there are non-pedestrian objects on the crosswalk ROI; and Figure 10d, in which there are pedestrians outside and on both sides of the crosswalk ROI. The pedestrians detected out of the crosswalk ROI are marked with a green bounding box and the ones detected on the crosswalk ROI are highlighted with a red bounding box.
When a pedestrian is detected on the crosswalk (within a red bounding box), it triggers the RSU-ITS-G5 module to send the DENM messages that alert nearby drivers via the in-vehicle OBU station.
The test implemented for pedestrian detection using the combination of motion sensors and the YOLO software reveals itself to perform well in typical geometric applications, i.e., one or two cameras pointing to a crosswalk and with the standard YOLO trained dataset.

RSU DENM Messages Creation
This section presents the DENM message generated by the RSU station. In Figure 10a,d, the pedestrians were outside the crosswalk and the RSU-ITS-G5 module was not triggered. In Figure 10c, the pedestrian detection module was activated but it did not identify a pedestrian and the RSU-ITS-G5 module was also not triggered. In Figure 10b, the pedestrians were detected and the RSU-ITS-G5 module was triggered, creating and sending DENM messages. Table 1 exhibits the DENM message data used in the Facilities Layer and broadcasted to the OBU station.
The header DENMV1 shows the event system timestamp (2,961,833,238,866), when the pedestrian was detected on the crosswalk with the crosswalk GPS coordinates (401,994,420−84,407,560). The OBU should consider the event relevant since it is at less than the relevanceDistance parameter (lessThan50 m). All traffic directions are affected through the relevanceTrafficDirection parameter (allTrafficDirections). The validation duration of the message is defined by the validityDuration parameter (oneSecondAfterDetection) and the type of the emitting device is defined by the stationType parameter (roadSideUnit). The informationQuality parameter defines the level of reliability of the information received (Highest) and the cause of the alert is defined by the eventType parameter (humanPresenceOnTheRoad) and the sub-cause (unavailable).
At the Network layer the GeoNetworking header contains information of the message timestamp and the crosswalk GPS coordinates, latitude and longitude. The BTB-B header has the code 2002 because the packet is transported in a non-interactive way, with only the destination port included, since the communication is unidirectional.

OBU DENM Messages Reception and Presentation
The test scenario in Figure 10b triggered the RSU-ITS-G5 module to send DENM messages. The messages were received by the OBU station in-vehicle and, after validation, the alert was displayed on a laptop screen (emulating the infotainment system). The message validation corresponds to the action of checking whether the DENM message is of interest to the receiver (OBU) based on the car's current georeferenced position. Figure 11 shows the multimedia message presented to the driver inside the vehicle. The test scenario has shown that the application prototype works according to the specifications. The test implemented for pedestrian detection using the combination of motion sensors and the YOLO software reveals itself to perform well in typical geometric applications, i.e., one or two cameras pointing to a crosswalk and with the standard YOLO trained dataset.

RSU DENM Messages Creation
This section presents the DENM message generated by the RSU station. In Figure 10a,d, the pedestrians were outside the crosswalk and the RSU-ITS-G5 module was not triggered. In Figure 10c, the pedestrian detection module was activated but it did not identify a pedestrian and the RSU-ITS-G5 module was also not triggered. In Figure 10b, the pedestrians were detected and the RSU-ITS-G5 module was triggered, creating and sending DENM messages. Table 1 exhibits the DENM message data used in the Facilities Layer and broadcasted to the OBU station.
The header DENMV1 shows the event system timestamp (2,961,833,238,866), when the pedestrian was detected on the crosswalk with the crosswalk GPS coordinates (401,994,420−84,407,560). The OBU should consider the event relevant since it is at less than the relevanceDistance parameter (lessThan50 m). All traffic directions are affected through the relevanceTrafficDirection parameter (allTrafficDirections). The validation duration of the message is defined by the validityDuration parameter (oneSecondAfterDetection) and the type of the emitting device is defined by the stationType parameter (roadSideUnit). The informationQuality parameter defines the level of reliability of the information received (Highest) and the cause of the alert is defined by the eventType parameter (humanPresenceOnTheRoad) and the sub-cause (unavailable).
At the Network layer the GeoNetworking header contains information of the message timestamp and the crosswalk GPS coordinates, latitude and longitude. The BTB-B header has the code 2002 because the packet is transported in a non-interactive way, with only the destination port included, since the communication is unidirectional.

OBU DENM Messages Reception and Presentation
The test scenario in Figure 10b triggered the RSU-ITS-G5 module to send DENM messages. The messages were received by the OBU station in-vehicle and, after validation, the alert was displayed on a laptop screen (emulating the infotainment system). The message validation corresponds to the action of checking whether the DENM message is of interest to the receiver (OBU) based on the car's current georeferenced position. Figure 11 shows the multimedia message presented to the driver inside the vehicle. The test scenario has shown that the application prototype works according to the specifications.

Conclusions
This paper describes a functional ITS-G5 showcase application included in the ITS-G5 road safety application classification, to increase pedestrian and driver safety at pedestrian crosswalks. After a pedestrian detection on a crosswalk, the application triggers DENM messages broadcast to all ITS-G5 receivers in the vicinity using 802.11p technology. The DENM messages are then received by all OBU units and are displayed on the vehicle's infotainment system.
The test carried out showed that the application has the functionalities to detect pedestrians on a crosswalk, create and send the DENM messages, which are correctly displayed on the vehicle infotainment system. The prototype showed that its usage might effectively increase the safety of pedestrians and drivers alike.
As future work, the prototype will be tested on public roads and with more complex road scenarios. Funding: This research received no external funding.