Feasibility of Using a MEMS Microphone Array for Pedestrian Detection in an Autonomous Emergency Braking System

Pedestrian detection by a car is typically performed using camera, LIDAR, or RADAR-based systems. The first two systems, based on the propagation of light, do not work in foggy or poor visibility environments, and the latter are expensive and the probability associated with their ability to detect people is low. It is necessary to develop systems that are not based on light propagation, with reduced cost and with a high detection probability for pedestrians. This work presents a new sensor that satisfies these three requirements. An active sound system, with a sensor based on a 2D array of MEMS microphones, working in the 14 kHz to 21 kHz band, has been developed. The architecture of the system is based on an FPGA and a multicore processor that allow the system to operate in real time. The algorithms developed are based on a beamformer, range and lane filters, and a CFAR (Constant False Alarm Rate) detector. In this work, tests have been carried out with different people and in different ranges, calculating, in each case and globally, the Detection Probability and the False Alarm Probability of the system. The results obtained verify that the developed system allows the detection and estimation of the position of pedestrians, ensuring that a vehicle travelling at up to 50 km/h can stop and avoid a collision.


Introduction
Nowadays, road vehicles are the most widespread transportation system. Current estimations set the mortality due to traffic accidents and collisions at around 1.35 million people per year. Furthermore, 23% of this mortality is associated with pedestrian fatalities, totaling some 310,000 pedestrian deaths per year [1]. To try to reduce this fatal number, the automotive and transport sectors are actively working on different solutions, even more so since the development of autonomous cars began, where their use in urban environments is still an unsolved problem due to the high presence of pedestrians.
On the one hand, among the different actions to reduce the number of pedestrian accidents that are being carried out, some of these actions focus on studies related to the analysis of pedestrian behavior at crosswalks [2][3][4][5], to try to anticipate their movements, and take them into account when making decisions about braking the vehicle or slowing down its speed in areas likely to be at risk, due to the presence of pedestrians, who could become potential victims. This information is very valuable in terms of paying attention to pedestrian detection and tracking [4] and also in terms of designing routes to avoid potential risk areas as much as possible [5].
On the other hand, further actions to reduce the number of pedestrian accidents are focused on the development and improvement of intelligent transport systems. Many of these systems pay special attention to the interaction between vehicles and pedestrians, warning vehicle drivers that pedestrians are in the vicinity [6][7][8]; other systems are centered on information sharing between vehicles and the transport infrastructure itself [9,10].

•
The designed system will have to be onboard a vehicle, specifically at its front, in order to be able to detect pedestrians in its path.

•
The system will be based on the principle of RADAR (RAdio Detection and Ranging) but using sound waves instead of radio waves, i.e., the system will be based on a SODAR (SOund Detection and Ranging) system. This type of system is an active one, based on the generation of an acoustic signal. This signal is reflected on a pedestrian that could be in the vehicle's path. The reflected signal is then received by the MEMS microphone array, for its subsequent analysis and processing. • The acoustic pedestrian detection system shall be able to communicate with the vehicle's central vehicle control system to alert the AEB system, so that the car can act accordingly. The reason of this requirement is that the purpose of this system is to be integrated into a vehicle, together with other ADAS systems with which the vehicle may be equipped, to assist the AEB system in making a more reliable braking decision in a pedestrian detection in the vehicle's path.

•
The system should provide reliable results taking into account that the vehicle may travel at different speeds. In this respect, since AEB systems are often used in urban environments, it has been defined that the system should be able to detect pedestrians for vehicle speeds below 50 km/h, which is the usual speed limit in urban environments (although in some cities, this limit is already being reduced to 30 km/h).

•
Another requirement that the system must comply with is that it must have a good resolution in the horizontal coordinate (azimuth). We assume that the pedestrian will be standing on the road, at the vertical height of the vehicle, so the resolution of the system should be focused on the horizontal coordinate. So, the position of the pedestrian will be represented in this azimuth coordinate. The selection of the working frequency range of the system responds to multiple factors: the physical size of the array, the sensor spacing, the frequency response of the acoustic MEMS sensors and the emitter, the acoustic reflectivity of people, and finally the required angular resolution of the system. As a result of all these considerations, it is necessary to reach a compromise relationship, since many of these parameters are opposite. Thus, a frequency band between 14 and 21 kHz has been selected. These high frequency values have been selected also to avoid the ambient noises, of much lower frequency, that could interfere with the behavior of the system. Once the working frequency range was defined, the acoustic array was designed in such a way that the sensor spacing provided good resolution at low frequencies, and it also avoided the appearance of grating lobes in the Field of View (FoV) for high frequencies.
SPH0641LU4H-1 digital MEMS microphones of Knowles were chosen for the implementation of the array. These microphones have a PDM (Pulse Density Modulation) interface and a one-bit digital output [49]. The main features of these microphones are: high performance, low-power, omnidirectional response, 64.3 dB SNR, high sensitivity (−25 dBFS), and an almost flat frequency response (±2 dB in the range of 10 kHz to 24 kHz).

Processing System
The hardware used to implement the system was selected taking into account the previous requirements defined in Section 2.1. A search for a commercial solution was done due to the high cost and time of a specific hardware design.
The base unit of the system is an sbRIO 9629 platform [50]. This platform belongs to National Instruments, particularly to the Reconfigurable Input-Output (RIO) family of devices. Specifically, this sbRIO platform is an embedded single-board controller, with an FPGA Artix-7 200T and a Quad-Core Intel Atom processor. The FPGA has 96 digital inputs/outputs, of which 75 are used as the connection interface with the 150 MEMS microphones of the array, so that in each I/O line, two microphones are multiplexed, while the other lines are used to generate the clock and synchronize. The Atom processor is equipped with 2 GB of DDR3 RAM, 4 GB of built-in storage space, USB Host port, and Giga Ethernet port. Finally, it has sixteen 16-bit analog inputs and four 16-bit analog outputs, which are used to generate the transmitted signal. All this hardware is mounted on a single PCB board (155 mm × 102 mm × 35 mm) with a weight of 330 g. These sbRIO devices are oriented to sensors with nonstandard acquisition procedures, allowing lowlevel programming of the acquisition routines. The selection of the working frequency range of the system responds to multiple factors: the physical size of the array, the sensor spacing, the frequency response of the acoustic MEMS sensors and the emitter, the acoustic reflectivity of people, and finally the required angular resolution of the system. As a result of all these considerations, it is necessary to reach a compromise relationship, since many of these parameters are opposite. Thus, a frequency band between 14 and 21 kHz has been selected. These high frequency values have been selected also to avoid the ambient noises, of much lower frequency, that could interfere with the behavior of the system. Once the working frequency range was defined, the acoustic array was designed in such a way that the sensor spacing provided good resolution at low frequencies, and it also avoided the appearance of grating lobes in the Field of View (FoV) for high frequencies.
SPH0641LU4H-1 digital MEMS microphones of Knowles were chosen for the implementation of the array. These microphones have a PDM (Pulse Density Modulation) interface and a one-bit digital output [49]. The main features of these microphones are: high performance, low-power, omnidirectional response, 64.3 dB SNR, high sensitivity (−25 dBFS), and an almost flat frequency response (±2 dB in the range of 10 kHz to 24 kHz).

Processing System
The hardware used to implement the system was selected taking into account the previous requirements defined in Section 2.1. A search for a commercial solution was done due to the high cost and time of a specific hardware design.
The base unit of the system is an sbRIO 9629 platform [50]. This platform belongs to National Instruments, particularly to the Reconfigurable Input-Output (RIO) family of devices. Specifically, this sbRIO platform is an embedded single-board controller, with an FPGA Artix-7 200T and a Quad-Core Intel Atom processor. The FPGA has 96 digital inputs/outputs, of which 75 are used as the connection interface with the 150 MEMS microphones of the array, so that in each I/O line, two microphones are multiplexed, while the other lines are used to generate the clock and synchronize. The Atom processor is equipped with 2 GB of DDR3 RAM, 4 GB of built-in storage space, USB Host port, and Giga Ethernet port. Finally, it has sixteen 16-bit analog inputs and four 16-bit analog outputs, which are used to generate the transmitted signal. All this hardware is mounted on a single PCB board (155 mm × 102 mm × 35 mm) with a weight of 330 g. These sbRIO devices are oriented to sensors with nonstandard acquisition procedures, allowing low-level programming of the acquisition routines. The embedded processor included in sbRIO is capable of running all the software algorithms to detect targets, so it can be used as a standalone array module formed by a sbRIO connected to a MEMS array board as shown in Figure 2. The position of the detected target will be sent to the AEB system of a vehicle. The embedded processor included in sbRIO is capable of running all the software algorithms to detect targets, so it can be used as a standalone array module formed by a sbRIO connected to a MEMS array board as shown in Figure 2. The position of the detected target will be sent to the AEB system of a vehicle.

Software Algorithms
Based on being active, this system must consist of both a receiving subsystem, based on an acoustic array, and a transmitting subsystem, in charge of generating the signal(s) that will be reflected on the possible pedestrians to be detected. Thus, the algorithms implemented in the system, shown in Figure 3, can be divided into four blocks: Transmission Generation, MEMS Acquisition, Signal Processing, and Detection. The Transmission Generation block synthesizes a pulsed multitone signal to be sent through the DA converter to the signal amplifier and from there to the loudspeaker which will output the transmitted signal.
In the Acquisition block, implemented in the FPGA, each MEMS microphone acquired the reflected signal with a PDM interface. This interface internally incorporates a one-bit sigma-delta converter with a sampling frequency of 2 MHz. In this block, a common clock signal is generated for all 150 MEMS to read signals simultaneously via the digital inputs of the FPGA.
In the Signal Processing block, three routines are implemented: • Delay and Sum beamforming: a discrete set of beams that cover the surveillance space in azimuth is generated for a fixed elevation angle. Each of these beams is implemented using a Delay and Sum beamformer, which essentially delays specifically each of the 150 microphone signals and sums them together, so that the array points

Software Algorithms
Based on being active, this system must consist of both a receiving subsystem, based on an acoustic array, and a transmitting subsystem, in charge of generating the signal(s) that will be reflected on the possible pedestrians to be detected. Thus, the algorithms implemented in the system, shown in Figure 3, can be divided into four blocks: Transmission Generation, MEMS Acquisition, Signal Processing, and Detection. The embedded processor included in sbRIO is capable of running all the software algorithms to detect targets, so it can be used as a standalone array module formed by a sbRIO connected to a MEMS array board as shown in Figure 2. The position of the detected target will be sent to the AEB system of a vehicle.

Software Algorithms
Based on being active, this system must consist of both a receiving subsystem, based on an acoustic array, and a transmitting subsystem, in charge of generating the signal(s) that will be reflected on the possible pedestrians to be detected. Thus, the algorithms implemented in the system, shown in Figure 3, can be divided into four blocks: Transmission Generation, MEMS Acquisition, Signal Processing, and Detection. The Transmission Generation block synthesizes a pulsed multitone signal to be sent through the DA converter to the signal amplifier and from there to the loudspeaker which will output the transmitted signal.
In the Acquisition block, implemented in the FPGA, each MEMS microphone acquired the reflected signal with a PDM interface. This interface internally incorporates a one-bit sigma-delta converter with a sampling frequency of 2 MHz. In this block, a common clock signal is generated for all 150 MEMS to read signals simultaneously via the digital inputs of the FPGA.
In the Signal Processing block, three routines are implemented: • Delay and Sum beamforming: a discrete set of beams that cover the surveillance space in azimuth is generated for a fixed elevation angle. Each of these beams is implemented using a Delay and Sum beamformer, which essentially delays specifically each of the 150 microphone signals and sums them together, so that the array points The Transmission Generation block synthesizes a pulsed multitone signal to be sent through the DA converter to the signal amplifier and from there to the loudspeaker which will output the transmitted signal.
In the Acquisition block, implemented in the FPGA, each MEMS microphone acquired the reflected signal with a PDM interface. This interface internally incorporates a one-bit sigma-delta converter with a sampling frequency of 2 MHz. In this block, a common clock signal is generated for all 150 MEMS to read signals simultaneously via the digital inputs of the FPGA.
In the Signal Processing block, three routines are implemented: • Delay and Sum beamforming: a discrete set of beams that cover the surveillance space in azimuth is generated for a fixed elevation angle. Each of these beams is implemented using a Delay and Sum beamformer, which essentially delays specifically each of the 150 microphone signals and sums them together, so that the array points in a specific direction in the surveillance space. A diagram of a beamformer can be observed in Figure 4a.
• Decimation and Filtering: applying downsampling techniques, based on decimation and filtering [51], 150 independent signals are obtained and the sampling frequency is reduced from 2 MHz to 50 kHz.

•
Matched filter: A matched filter is then applied to the decimated signal to maximize the SNR at the input of the detection block.
FOR PEER REVIEW 6 of 15 in a specific direction in the surveillance space. A diagram of a beamformer can be observed in Figure 4a. • Decimation and Filtering: applying downsampling techniques, based on decimation and filtering [51], 150 independent signals are obtained and the sampling frequency is reduced from 2 MHz to 50 kHz.

•
Matched filter: A matched filter is then applied to the decimated signal to maximize the SNR at the input of the detection block. The Detection block implements three processes: • First, the relative maxima for each of the beams are identified and a list of potential targets is generated.

•
Then, all targets that are outside the detection lane are eliminated, while those whose distance is within the surveillance range of the system [Rmin, Rmax] are selected, as shown in Figure 4b.

•
Finally, the selected targets are processed by a CFAR (Constant False Alarm Rate) detector [52]. The CFAR detector compares the energy of the potential target with a dynamic threshold that is proportional to the average of the energy received in a set of cells close to the evaluated one, where a number of cells contiguous to the evaluated one, called guard cells, have been excluded. Its scheme can be observed in Figure  4c. The CFAR threshold has been obtained by varying the CFAR gain according to the equation: −n The Detection block implements three processes: • First, the relative maxima for each of the beams are identified and a list of potential targets is generated.

•
Then, all targets that are outside the detection lane are eliminated, while those whose distance is within the surveillance range of the system [Rmin, Rmax] are selected, as shown in Figure 4b.

•
Finally, the selected targets are processed by a CFAR (Constant False Alarm Rate) detector [52]. The CFAR detector compares the energy of the potential target with a dynamic threshold that is proportional to the average of the energy received in a set of cells close to the evaluated one, where a number of cells contiguous to the evaluated one, called guard cells, have been excluded. Its scheme can be observed in Figure 4c.
The CFAR threshold has been obtained by varying the CFAR gain according to the equation: −n where n1 and n2 determine the indices of the cells where the threshold is evaluated. Specifically, the threshold is evaluated in 2 × (n2 − n1) cells that are n1 cells away from the cell under test. On the other hand, k is the parameter that allows weighting the relationship between the Detection Probability and the False Alarm Probability. The values of n1 and n2 are calculated as a function of the transmitted pulse width.

Test Setup
For the analysis of the system, a compact vehicle has been assumed on a normal road surface and an ABS braking system with a maximum deceleration of 0.8 g ms −2 . A 4 m wide road has been used, with street lamps and trees along its edges, as shown in Figure 5a.
where n1 and n2 determine the indices of the cells where the threshold is evaluated. Specifically, the threshold is evaluated in 2*(n2 − n1) cells that are n1 cells away from the cell under test. On the other hand, k is the parameter that allows weighting the relationship between the Detection Probability and the False Alarm Probability. The values of n1 and n2 are calculated as a function of the transmitted pulse width.

Test Setup
For the analysis of the system, a compact vehicle has been assumed on a normal road surface and an ABS braking system with a maximum deceleration of 0.8 g ms −2 . A 4 m wide road has been used, with street lamps and trees along its edges, as shown in Figure  5a.
(a) Taking into account that the typical vehicle speed limits in urban environments are between 30 km/h and 50 km/h, the minimum braking distance of a vehicle travelling at these speeds was calculated, as shown in Table 1. On the basis of these braking distances, a set of experiments has been designed, placing a person at six distances from the system: 5 m, 7.5 m, 10 m, 12.5 m, 15 m, and 20 m, defining the six detection distances to be considered on the analysis of the system. It can be observed that for the speed of 30 km/h, all the defined test distances would be valid, i.e., if a pedestrian is detected at these distances, the AEB system would be able to brake the vehicle so that no collision would occur. For the speed of 40 km/h, only the defined test distances of 10 m and above would be adequate to avoid a collision between the vehicle and the pedestrian. For the speed of 50 km/h, only test distances defined from 15 m would be adequate. In the cases for detection distances shorter than the braking distance, it has been confirmed by simulations with CarSim [53], that although the vehicle would collide with the pedestrian, the collision would occur at a sufficiently low speed so that the impact would not cause serious injuries to the pedestrian.
The time required to detect a pedestrian is the time required for the transmitted signal to propagate to the pedestrian and be reflected back to the sensor. In addition, the time needed for signal processing (beamformer, filters, and detector) has to be added. The scenario configuration has a maximum range of 25 m, which, assuming a propagation speed of 343 m/s, takes 145 ms for the acoustic signal to travel. Signal processing is performed on the FPGA in real time simultaneously with the acquisition. At the end of the acquisition, the processed signal is already formed for all 11 beams. Additionally, 25 ms are required for the detection process to be implemented in the embedded multicore processor.
Sensors 2021, 21, 4162 9 of 15 Therefore, the system is capable of performing five detections per second, as the total time required for each detection is less than 200 ms.
For the tests carried out, a transmission signal was generated consisting of a 3 ms pulse composed of eight discrete frequencies between 14 kHz and 21 kHz, with a spacing of 1 kHz. Several frequencies have been used to improve the probability of detection, since depending on the physical characteristics and clothing of pedestrians, reflectivity varies with frequency [54]. Frequencies below 14 kHz have not been used because this would result in beam broadening, which would reduce the spatial resolution of the system. The pulse width of 3 ms was chosen as a compromise between the range resolution, which is inversely proportional to the pulse width, and the transmitted energy. The signal was generated with a tweeter loudspeaker with a flat frequency response (±1.5 dB) between 10 kHz and 24 kHz.

Scenario Analysis
For a person located at 10 m (Figure 5a), an acoustic image was obtained at the operating frequency of 20 kHz, generating 120 beams of 4 • of beamwidth with steering azimuth angles from −60 • to 60 • , with 1 • resolution and for a surveillance range between 4 and 25 m. In this case, Figure 5b represents the obtained acoustic image in azimuth/range space where the red crosses represent the detected targets. In the image, it is clear that the system detects both the person, placed at around 10 m, and other objects in front of it, such as lampposts and trees on the roadside. Table 2 shows the targets detected, with their position, expressed in range and azimuth. Comparing Figure 5a,b, it can be observed that in the acoustic image (Figure 5b) two detections appear that are not represented in the test scenario (Figure 5a). These targets actually exist in the scenario but do not appear in Figure 5a because the picture is not taken with a wide-angle camera.
In Figure 5b, it can also be visualized the lane boundaries, represented by a dashed green line. As shown in Figure 5b, most of the detected targets are outside the road lane and should therefore not be taken into consideration by our system. It is therefore justified to perform a selection of the targets that are inside the road in the preset surveillance range, from 5 to 25 m. This operation is performed by the detection algorithm (Figure 4c). After applying the lane filter, only the pedestrian remains as a detected target.

Detector Performance Analysis
In a real system, it is not necessary to know with such a high resolution the position of a pedestrian and on the other hand fast update rates between 5 and 10 frames per second are required, and it is advisable to work with a smaller number of beams.
First, the angles of excursion for the detection system must be determined. Assuming that the road has a width of d meters and that the car drives in the center, at a distance R, the angle of maximum excursion will be a function of distance R according to the equation: In turn, the array used has a beamwidth (∆θ) of 4 • at 3 dB for 20 kHz; therefore, the number of beams required is: Combining Equations (2) and (3) gives the number of beams required for the distances defined in the test scenario. So, since the system has to detect any pedestrian in the range between 5 and 20 m, the minimum value defined for the distance R shall be used, giving a number of working beams of 11. In this case, the steering angles of the defined beams are equispaced between −20 • and 20 • , with a separation of 4 • . Figure 6 shows the radiation patterns for the 11 defined beams, where it can be observed that they cut off at 3 dB and cover the scan angles between −22 • and 22 • . First, the angles of excursion for the detection system must be determined. Assuming that the road has a width of d meters and that the car drives in the center, at a distance R, the angle of maximum excursion will be a function of distance R according to the equation: In turn, the array used has a beamwidth (Δθ) of 4° at 3 dB for 20 kHz; therefore, the number of beams required is: Combining Equations (2) and (3) gives the number of beams required for the distances defined in the test scenario. So, since the system has to detect any pedestrian in the range between 5 and 20 m, the minimum value defined for the distance R shall be used, giving a number of working beams of 11. In this case, the steering angles of the defined beams are equispaced between −20° and 20°, with a separation of 4°. Figure 6 shows the radiation patterns for the 11 defined beams, where it can be observed that they cut off at 3 dB and cover the scan angles between −22° and 22°. The analysis of the CFAR detector has been carried out as a function of the associated threshold gain k. This value has varied between 3 and 10, with increments of 0.01. Several experiments have been carried out to obtain the values of the guard and reference window cells of this CFAR detector, n1 and n2, respectively. The values of these cells' range have been determined based on the pulse width used. In this case, given that the signal pulse used lasts 3 ms, which corresponds to a value in range of 100 cm, both the guard cells and the reference window cells were calculated for an initial 100 cm value. After the study carried out to find suitable values for these cells, a guard cell value of 200 cm and a reference cell value of 300 cm have been selected.
Once the operating parameters of the different algorithms to be used in the system have been defined, 1000 experiments were carried out for each of the six defined test scenarios, placing the pedestrian on the center of the lane. Figure 7 below shows the overall Detection Probability (Pd) and False Alarm Probability (Pfa) obtained for the system. In Figure 8, the Detection Probability (Figure 8a) and the False Alarm Probability (Figure 8b) are displayed in detail for each scenario, as a function of the k value that controls the threshold of the CFAR detector. The analysis of the CFAR detector has been carried out as a function of the associated threshold gain k. This value has varied between 3 and 10, with increments of 0.01. Several experiments have been carried out to obtain the values of the guard and reference window cells of this CFAR detector, n1 and n2, respectively. The values of these cells' range have been determined based on the pulse width used. In this case, given that the signal pulse used lasts 3 ms, which corresponds to a value in range of 100 cm, both the guard cells and the reference window cells were calculated for an initial 100 cm value. After the study carried out to find suitable values for these cells, a guard cell value of 200 cm and a reference cell value of 300 cm have been selected.
Once the operating parameters of the different algorithms to be used in the system have been defined, 1000 experiments were carried out for each of the six defined test scenarios, placing the pedestrian on the center of the lane. Figure 7 below shows the overall Detection Probability (Pd) and False Alarm Probability (Pfa) obtained for the system. In Figure 8, the Detection Probability (Figure 8a) and the False Alarm Probability (Figure 8b Figure 7 shows that both the Detection Probability and the False Alarm Probability are monotonically decreasing functions with k. Based on RADAR theory, according to the Neyman-Pearson Observer theorem [52], the optimal detection threshold, and hence the value of the Detection Probability of the system, is obtained from a preset False Alarm Probability value. In this case, a False Alarm Probability value of 10 −2 has been set, as this is a typical value used for in-vehicle detection systems. Based on this value of the False Alarm Probability, the threshold value obtained corresponds to k = 4.91, which in turn defines a value for the Detection Probability of 0.995. For this type of application, where a person's life is at risk, the obtained Detection Probability is adequate. Furthermore, it should be taken into account that in these systems a False Alarm confirmation procedure is usually established, which allows the false detections obtained to be discarded, and therefore the value of the False Alarm Probability defined, which may seem high, is adequate in practice.
In addition to the overall behavior of the system, the Detection Probability and the False Alarm Probability have also been analyzed for each of the scenarios defined, shown in In principle, this probability should decrease as the distance increases, because the SNR decreases. However, this behavior is not observed in the system detector, due to the existence of reflections from the multiple targets located at the road boundaries, which interfere with the reflections from the pedestrian itself. Figure 8b shows that False Alarm Probabilities are different depending on the distance of the pedestrian when the detection threshold is low. These false detections are due to the sidelobes of the array beampattern, through which reflections from objects close to the lane are received. Depending on the relative position between the pedestrian and the nearby objects, destructive/constructive interference arises and influences the average energy estimation estimated by the CFAR in the detection environment. By increasing the detection threshold, this uneven behavior disappears as the CFAR detector eliminates these false detections. In this work, the experiments have been evaluated independently, and it has not been taken into account that in a dynamic environment, the detections of each experiment can be validated in subsequent experiments by confirming them as detections or discarding them as False Alarms, reducing the False Alarm Probability significantly.
With the value of the threshold obtained (k = 4.91), the values of the Probability of Detection and the Probability of False Alarm for each of the scenarios have been calculated individually and are shown in Table 3. In view of these, it can be seen that, for certain scenarios, the value of the Probability of False Alarm exceeds the established limit value of 10 −2 , although for other scenarios, it is lower. To ensure that the limit value for the Probability of False Alarm is met in all scenarios, it has been calculated that the value of k associated with the CFAR threshold to be considered should be 5.07. For this new value of k, it can be seen in Table 3 that all the values of the False Alarm Probability are below it. At the same time, it can also be observed that the average value of the Probability of Detection has decreased by only 0.3%, while the average value of the Probability of False Alarm has decreased by 50%. In summary, a characterization of the detector has been made, both jointly for the six test scenarios and for each one independently, defining two possible detection thresholds: the first one based on the joint characterization and the second one based on considering the worst case of all the scenarios.

Conclusions
In this work, an active acoustic system has been presented that, by means of a 2D array of MEMS microphones and a processing architecture based on an FPGA/microprocessor, can estimate the 2D position of a person up to a distance of 20 m and prevent a car from colliding with a pedestrian up to speeds of 50 km/h. For higher speeds, the vehicle's AEB system, which will interact with the presented system, can mitigate the impact with the pedestrian, avoiding fatal injuries. By estimating the 2D position of the pedestrian, the system could even warn the vehicle's steering system to avoid the pedestrian.
The analysis of the system for a set of scenarios with people at different distances has shown the feasibility of the system. A processing scheme based on beamforming, range and lane filters, and finally a powerful CFAR detection algorithm has been developed, where the operating parameters of a False Alarm Probability lower than 0.01 and a Probability of Detection higher than 0.99, acquired by more than 6000 experiments, have been obtained.
This system can be fused with existing camera, LIDAR, and RADAR based pedestrian detection systems, decreasing the False Alarm Probability and increasing the joint Detection Probability. Finally, this system can work in low visibility situations: rain, fog, smoke, night, etc., where cameras/LIDAR are not operational; as well as improve the detection of a person by a RADAR system, since the radar section of people is usually very low compared to their acoustic section.  Data Availability Statement: The data used in this work may be requested by sending an e-mail to one of the authors of the article.