A Forward-Collision Warning System for Electric Vehicles: Experimental Validation in Virtual and Real Environment

: Driver behaviour and distraction have been identiﬁed as the main causes of rear end collisions. However a promptly issued warning can reduce the severity of crashes, if not prevent them completely. This paper proposes a Forward Collision Warning System (FCW) based on information coming from a low cost forward monocular camera for low end electric vehicles. The system resorts to a Convolutional Neural Network (CNN) and does not require the reconstruction of a complete 3D model of the surrounding environment. Moreover a closed-loop simulation platform is proposed, which enables the fast development and testing of the FCW and other Advanced Driver Assistance Systems (ADAS). The system is then deployed on embedded hardware and experimentally validated on a test track.


Introduction
The rapid population and economic growth of recent years has led to an increasing number of circulating vehicles, thus inducing traffic congestion, road accidents and pollution. The main cause of road accidents is related to driver behaviour, distraction or altered state (e.g., see recent studies of health and transportation organizations [1-3] and references therein). The need to improve driver and pedestrian safety led to the development of on-board active safety systems, which extend the functionalities of the traditional passive systems, such as seat belts and airbags. Namely, active systems are developed with the aim to predict the occurrence of an accident, while passive systems are engaged only to soften the consequences. In this scenario, Advanced Driving Assistance Systems (ADASs) are recognized as the key enabling technology for the active reduction of the main road transport issues in the very near future [4][5][6][7].
In this perspective, safe and green transport will be possible by embedding the ADAS on full-electric vehicles that, thanks to their simplified high-efficiency powertrains and zero direct emission peculiar features, can greatly improve urban air quality by reduction of CO x , NO x and C x H y emissions.
Along this line, the aim of this work is to introduce a Forward Collision Warning (FCW) system based on a single monocular forward-facing camera. The employment of an affordable sensor opens the system to a wide range of vehicles. Moreover, it will be shown how the design of a co-simulation platform can greatly ease the development and testing of the algorithms. A single camera can be exploited for estimating the TTC precisely enough to realize a FCW application (e.g., see [8] and references therein). Slightly improved performances could be achieved via a stereo system that obviously involves the additional costs in terms of sensors, software, and computing hardware. Note that FCW has been demonstrated to be an indispensable tool for reducing rear-end crashes by promptly warning the driver with acoustic or visual signals. The proposed design takes advantage of visual cues only, i.e., the information is retrieved in the 2D image space and the three-dimensional transformation is ignored, in order to target lower-end vehicles. In so doing, the algorithm has the great advantage of being independent of the particular mounting angle of the camera, and it works even when lane markings, typically used for online calibration or to filter out-of-path obstacles, are non-existent. This enables us to deploy the system on a wider range of vehicles, and enhance the total robustness of the algorithm. Moreover, a purposely designed co-simulation platform is introduced, which is particularly useful for the virtual testing of the approach before its deployment on the actual in-vehicle hardware. Finally, experiments are carried out on an electric vehicle and the results of the experimental tests are analyzed for the validation of the overall design.
Given the FCW potential, different design approaches can be found in the technical literature. In particular, the state-of-the-art system relies on geometric or feature-based methods to detect vehicles, e.g., the Sobel edge detector filter [9,10] or Haar-like features [8,11]. More recent attempts rely on machine-learning techniques, like support vector machine classifiers [12], Hough Forest [13], deep learning tools by resorting to Single Shot MultiBox Detector (SSD) [14,15] or You Only Look Once (YOLO) [16][17][18].
It is worth noting that only a few recent attempts in the technical field rely on twodimensional camera information since most of the solutions are tailored for high-end cars that are equipped at the sensing layer with LIDAR or RADAR. Conversely, in our work, thanks to the employment of a deep Convolutional Neural Network (CNN), the system capabilities are generalized and expanded, thus enhancing the very recent results in the technical literature and allowing the implementation of affordable and reliable FCW to small low-end cars.
The rest of the paper is organized as follows. Section 2 describes the design of the FCW algorithm in each of its components, while Section 3 also presents the numerical analysis based on the purposely designed co-simulation platform. The experimental results, confirming the theoretical derivation and disclosing the effectiveness of the proposed approach, are described in Section 4, where the hardware architecture used for the tests is also illustrated. Conclusions are provided in Section 5.

Forward Collision Warning
The aim of FCW is to measure the collision risk and promptly warn the driver if it grows above a predefined threshold. The warning must be raised promptly enough so that the driver has sufficient time to react and avoid the collision or at least reduce its severity.
Some FWC leverage radar technologies for sensing [19] result in high sensing costs. However, since our main aim is to target lower-end vehicles, here a single monocular forward-facing camera is leveraged for sensing the obstacles at the front.
In order to measure the collision risk, the following Time-To-Collision (TTC) index is considered: where D[m] is the distance from the obstacle and V (m/s) is the relative velocity. While relative distance and velocity are not directly measurable from vision information, in the following it will be shown how to derive the TTC information leveraging the scale change of objects in the image frame. The FWC is in three main components: Detection, tracking and warning logics. Each of the steps will be described briefly in the following sections.

Object Detection
There are two main kinds of algorithms to tackle object detection via computer vision tasks: Feature-based and learning-based. Traditional techniques, such as Sobel edge detector or Haar-like features, belong to the first kind and, despite their simpler implementation and lower computational cost, they suffer in terms of accuracy and generalization capabili-ties. Learning approaches instead require high-end hardware, due to the computational cost, but results in higher accuracy [20]. In this context, Convolutional Neural Networks (CNNs) are the state of the art for object detection tasks. Common approaches leverage two stage detectors, where the neural network first generates the region candidates and then classifies them [21]. Alternatively, a single stage detector can be exploited to directly predict the location and the class of an object in one step, thus resulting in faster inference time (e.g., see YOLO [22] and SSD [23]).
Taking into account the application strict real-time constraints, the single stage detectors have been investigated. In particular, due to YOLO faster computation time with respect to SSD, and comparable mean Average Precision (mAP) [22,24], the first has been chosen for our application design.
The third improved version YOLOv3 is a multiscale one stage object detector, which uses a Darknet-53 as backbone to extract features and localize possible objects in the input image. Despite its depth, it achieves state-of-the-art performance in classification and the highest measured in floating point operations per second. From the base feature extractor, several convolutional layers have been added, which predict bounding box, objectness and class. To achieve the best result, the K-means algorithm is run on the dataset before training, and the final K value chosen is the one with the best recall/complexity tradeoff. In order to address the multiscale problem, the network predicts boxes at three different scales, using a Feature Pyramid Network (FPN)-like architecture. FPN makes predictions at each layer (scale) and uses multiscale features from different layers combining low resolution (semantically strong) features with high resolution (semantically weak) features using top-down pathways and lateral connections. The network architecture is shown in Figure 1.
The described network is used to detect the vehicles, pedestrians, bicycles and motorcycles. The output of the network is the so-called bounding box, for each detected obstacle, defined as: where b x and b y are the pixel coordinates of the bounding box top-left corner, while b w and b h are the bounding box width and height, respectively.

Multi-Target-Tracking
The tracking component is essential in order to build a history of each detected object [25]. Taking into consideration that our scenario involves more than one detection for each frame, a Multi-Target-Tracker (MTT) algorithm is employed. An MTT must assign each new incoming detection, to the existing tracks before it can use the new measurements to update them. The assignment problem can be challenging due to the number of targets to track and the detection probability of the sensor which can lead to both false positives and false negatives.
The Global Nearest Neighbour (GNN) algorithm is chosen [26] with a bank of linear Kalman filter. The GNN is a single hypothesis tracker, whose goal is to assign the global nearest measurement to each track. Due to the fact that conflict situations can occur, a cost function must be defined and an optimization problem must be solved at each time-step. The Intersection-Over-Union (IOU) ones' complement , between detection and track pairs, is chosen as cost function: where d i is the i-th detection and t j is the j-th track. The optimization problem is solved by using the Munkres algorithm [27,28], which ensures the global optimum convergence in polynomial time. Due to the small number of tracks and detection (typically below 20) the Munkres algorithm can be solved in real time on the chosen deployment hardware. Moreover, in order to reduce the complexity of the problem, a preceding gating step is applied during which a high cost in bid to unlikely assignments.
Once the association problem is solved, the measurements are used to update the bank of filters. A constant velocity linear Kalman filter [29] is used for each track by defining the state as where b x and b y are the abscissa and the ordinate of the top left corner of the bounding box, while b w and b h are its width and height; finally, the subscript v denotes the respective velocities in the image frame. It is pointed out here that the state is defined in the frame coordinate, which simplifies the problem of measuring three-dimensional coordinates from a monocular camera. Finally, track management additionally involves creating track hypotheses from non-associated detections, and deleting old non-associated tracks.
More complicated solutions, e.g., multiple-hypothesis trackers combined with extended Kalman filters, would require more information about the target position and relative angle with respect to the camera in the three-dimensional space, which is not natural information given by the chosen sensor architecture. Regardless, the proposed solution has been proven simple enough to be scheduled in real time, yet effective for the purpose of developing a forward collision warning system.

Collision Risk Evaluation
Once the tracks are updated at each time step, the collision risk for each one of them can be checked. It is shown now how the TTC in Equation (1) can be linked to the scale change of the bounding box between consecutive frames.
The width of an obstacle in the three-dimensional space is projected to the i-th image frame through the pinhole camera model giving: where w i is the obstacle width in the image frame, W is the obstacle width in the three dimensional space and f is the camera focal length. By tracking the objects between two frame i and i + 1, it is possible to define the scale change as: Since the time interval between the two frames ∆t is small ( 1/30 s), constant relative velocity is assumed and hence: By substitution of Equation (6) into Equation (5), the following is obtained: and hence, from Equation (1), the TTC can be written as: Note that the above formulation of the TTC is independent of the actual distance between the camera and the obstacles, which enables us to ignore camera calibration and assumptions on road properties, e.g., flatness, bank and slope angles. The accuracy of Equation (8) mainly depends on the choice of ∆t, and on the accuracy of the detection and tracking system. In particular, by increasing ∆t, the noise coming from the detection system can be attenuated, but a reduced number of measurements are obtained for each obstacle. Discussion on theoretical bounds on ∆t are addressed in [30].
If the TTC in Equation (8), for any track, is below a chosen threshold, between 2 and 3 s, a collision might occur. The warning should be raised if and only if the examined track, with a TTC lower than the threshold, is in the ego vehicle's path. In order to check the latest, the state of its Kalman filter can be used considering that it contains information about the velocity of the obstacle. In particular, the position of the bounding box in the image frame can be predicted by using the following: where b x pr is the predicted abscissa of the top left of the bounding box. With the same reasoning, the right corner can be predicted by using the width of the box. If the predicted box is inside a precalibrated region of the image frame the warning is issued. The Equation (9) is based on constant velocity assumption which results in a good approximation in the scenarios of interest, additionally considering that the in these cases the TTC takes low values.

Testing and Deployment
The effectiveness of the approach is first investigated via model-in-the-loop leveraging a purposely designed virtual test platform and is then confirmed by experiments with an electric vehicle on the Kineton test track located in Naples, Italy.

Model-in-the-Loop Testing
The design for improved solutions of safety-related features is greatly eased by the usage of appropriate simulation platforms. Here, a co-simulation platform for Model-Inthe-Loop (MIL) is proposed where autonomous vehicle can be safely tested, while moving within a potentially dangerous, realistic traffic scenario.
This co-simulation environment has been built leveraging the following two components: • Matlab/Simulink has been used to develop the algorithm and lately auto-generating C code through the Embedded Coder toolbox. • the open-source urban simulator CARLA (CAR Learning to Act) [31] has been used to design traffic scenarios and generate synthetic sensor measurements.
CARLA has a python-based core with embedded physics simulation which is capable of generating realistic measurements. In order to retrieve reliable sensor data, the simulation is carried out in synchronous fashion between the two environments; in particular, Simulink acts as a client by sending simulation commands to CARLA, acting as a server, which replies with the new generated measurements. In order to link the two environments a series of API have been implemented to create a communication between matlab-based Simulink and python-based CARLA cores. Figure 2 shows a screen capture of the proposed platform during a use case. In the case of the FCW feature the raw RGB frames are required, which are generated, at 30 fps, by a camera attached to a moving vehicle, mounted behind the windshield. The raw frames are the input to the algorithm introduced in Section 2. The driving scenarios designed in CARLA are those defined in the safety assist assessment protocol by EuroNCAP [32], namely: All the scenario are repeated with varied vehicle velocities and lateral overlap ranging from −50% to +50%, as defined by the protocol procedures. To demonstrate proof of concept, two exemplary scenarios will be shown, namely a CCRS with the ego vehicle traveling at v = 50 km/h with a starting distance of around d = 67 m, and a CCRM with the ego vehicle traveling at v = 50 km/h, the leading vehicle traveling at v = 20 km/h with a starting distance of around d = 30 m. Moreover, a quantitative comparison is performed with respect to the latest literature results, see [18], in which the authors proposed a similar CNN-based solution. Nonetheless the risk estimation index takes into account a single frame bounding box, resulting in velocity-independent information. Figure 3 shows the numerical results in the first driving scenario (CCRS). Namely, the estimated TTC and the real one are compared in Figure 3a in order to assess the accuracy of the algorithm. Due to the constant relative velocity between the two vehicles, the TTCs decrease linearly with time. While the oscillations are in the estimated TTC, no false positives are reported in all the CCRS scenarios. Furthermore, only a small constant percentage error bias can be appreciated, essentially due to the constant distance between the forward facing camera, mounted behind the windshield, and the front bumper of the car, where the actual TTC is evaluated. Note that this bias varies with the distance to the forward obstacle, so it could be compensated by its estimation, which is currently not embedded in our particular design; it is the object of our next research work. Figure 3b shows the warning activation which occurs as soon as the estimated TTC goes below the threshold, chosen as 2.1 s. The comparison in Figure 3b discloses that by taking into account multiple frames a more accurate warning can be issued. Indeed a warning issued around TTC 1 s could not be enough to avoid a collision. Figure 4 shows the numerical results for the CCRM scenario and, as expected, the TTC trend is similar to the previous case. Note that the inaccuracies at high distances do not worsen the performances of the system, in fact no false positives are reported. Finally, Figure 4b shows the activation signal for the FCW for the CCRM case, along with the comparison with [18]. The outcome is very similar to the CCRS case.

Experimental Validation
In-vehicle experiments were carried out to validate the whole design. The camerabased algorithm was deployed on a NVIDIA Jetson AGX Xavier Developer board equipped with 8 CPU cores, 512 GPU cores and 32 GB of RAM. The hardware platform is able to achieve the 30 fps for real time purposes. The application was developed using the open source YoloV3 implementation available at [33] for the object detection component. This implementation is particularly convenient for embedded deployment because it uses CUDA and cuDNN for the fastest CNN inference on GPU cores. The remaining steps were implemented in MATLAB/Simulink at first for rapid prototyping, finally auto-generating C code through the Embedded Coder toolbox. Finally, the C++ OpenCV library [34] was used for visualization purposes during tests. The camera used is an HDR 2MP Starlight Camera, which uses an Omnivision Sensor. The combination of the high dynamic range, up to 120 dB, and the ultra low light technology allows the camera to capture images in difficult light conditions, thus enabling the FCW logic even during nights or inside tunnels [35]. It uses an electronic rolling shutter and a 58°field of view lens with fixed focus. Finally, it is connected to the NVIDIA board through the USB protocol. Clearly, more robust solutions can be deployed by fusing the camera signals with additional data coming from more accurate yet more expensive radar or lidar measurements.
During MIL tests one can use the ground truth quantities, given by the simulator, to assess performances, whereas during experimental validation a second reliable source of information is required in order to make the same kind of assessment. In particular, an automotive RADAR was used, namely, ARS 404-21 from Continental, which can directly measure obstacle distances and relative velocities accurately; thus, it is possible to use Equation (1) to evaluate the TTC and make a comparison with the camera based in Equation (8). Finally, RADAR messages and warning activation signals, along with other vehicle data of interest, were collected from the vehicle CAN-bus using a PCAN-USB by PEAK System. Figure 5 shows the chosen hardware architecture and Figure 6 shows the electric platform employed during in-vehicle experiments.  The driving scenario is the well-known EuroNCAP, which is one of the standard driving cycles for validation. In the CCRS scenario the following vehicle starts accelerating until it reaches around v = 30 km/h, moving towards a stationary vehicle. The FCW system emits a sound alarm when the TTC is lower than a threshold, which was set to 2.45 s during the tests. A test should be considered successful if the FCW algorithm generates an alert in a proper time, to brake the ego-vehicle and avoid the impact with the forward obstacle (see Figure 7 where an exemplary frame extracted from a recorded video of the CCRS scenario is shown). Experimental results are disclosed in Figure 8, where the comparison between the estimated TTC based on camera information and the one based on expensive high-accuracy high-performance on-board radar are also shown (see Figure 8a). Results confirm that the performance obtained by the camera is comparable to the one achievable with the radar, so confirming the effectiveness of the approach for low-end commercial cars; moreover, no false positives or false negatives were reported during the experimental tests, as reported in Figure 8c.

Conclusions
In this paper a forward collision warning system is presented which leverages a deep convolutional neural network based on sensing data from an on-board forward camera. Moreover, it is shown that, by resorting to the scale change between consecutive frames, it is viable to rule out the error coming from camera calibration, making the system more robust with respect to camera mounting angle.
A general model-based virtual-testing platform has been designed to perform modelin-the-loop tests in a safe manner, exploitable even for more complex active safety systems. The numerical and experimental analysis show that the system is capable of promptly warning the driver if a collision is about to occur by replicating the EuroNCAP safety test assessments. Despite using a low-cost monocular camera for sensing, the overall architecture is accurate enough, at least in the TTC range of interest, i.e., below 3 (s). As prescribed by Euro NCAP protocol, we have extensively tested our design, not only experimentally, but also numerically by randomly varying the scenario conditions, thus verifying that the typical false/true warning rate requirements [36] are fulfilled. Results showed no false positives in all the appraised scenarios. Moreover, the employment of a state of the art deep CNN enhances the performances of the latest literature results. Future work will involve the investigation on estimation of the obstacle distance leveraging a mono or stereo camera, along with the implementation of more complex traffic and driving scenarios.

Data Availability Statement:
No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: