Performance Evaluation of a Biometric System Based on Acoustic Images

An acoustic electronic scanning array for acquiring images from a person using a biometric application is developed. Based on pulse-echo techniques, multifrequency acoustic images are obtained for a set of positions of a person (front, front with arms outstretched, back and side). Two Uniform Linear Arrays (ULA) with 15 λ/2-equispaced sensors have been employed, using different spatial apertures in order to reduce sidelobe levels. Working frequencies have been designed on the basis of the main lobe width, the grating lobe levels and the frequency responses of people and sensors. For a case-study with 10 people, the acoustic profiles, formed by all images acquired, are evaluated and compared in a mean square error sense. Finally, system performance, using False Match Rate (FMR)/False Non-Match Rate (FNMR) parameters and the Receiver Operating Characteristic (ROC) curve, is evaluated. On the basis of the obtained results, this system could be used for biometric applications.


Introduction
Biometric identification [1][2][3] is a subject of active research, where new algorithms and sensors are being developed. The most widely used identification systems are based on fingerprints, hand geometry, retina, face, voice, vein, signature, etc. The fusion of information from multiple biometric systems is also improving the performance of identification and verification systems [4].

OPEN ACCESS
Radar-based systems require expensive hardware and can be unreliable due to the very low reflection intensity from humans. Acoustic imaging provides a simple and cheap sensor alternative that allows for very precise range and angular information. Specifically in the acoustic field, there are two accurate and reliable classification systems for targets: (1) Animal echolocation, performed by mammals such as bats, whales and dolphins, where Nature has developed specific waveforms for each type of task [5,6] such as the classification of different types of flowers [7]. (2) Acoustic signatures used in passive sonar systems [8,9], which analyze the signal received by a target in the time-frequency domain.
There are few papers working on acoustic imaging in air for the detection of human beings. Moebus and Zoubir [10,11] worked with the ultrasonic band (50 kHz) using a 2D array and beamforming in reception. They analyzed solid objects (poles and a cuboid on a pedestal) in their first work and human images more recently. They showed that humans have a distinct acoustic signature and proposed to model the echoes from the reflection parts of objects in the scene by a Gaussian-Mixture-Model. Based on the parameters of this model, a detector could be designed to discriminate between persons and non-person objects.
In previous works, the authors of this paper have developed multisensor surveillance and tracking systems based on acoustic arrays and image sensors [12,13]. After an exhaustive search in the literature, we have not found any papers on acoustic imaging in air for biometric verification of humans. Consequently, we launched a new line of research to develop a novel biometric system, based on acoustic images acquired with electronic scanning arrays. Humans are acoustically scanned by an active system working from 6 to 12 kHz (audio band), that registers acoustic images. Based on these images, the system can identify people using a previously acquired database of acoustic images.
Assuming a plane wave x(t) with a direction of arrival θ, and an array with N sensors separated a distance d, the signal received at each sensor x n , is a phase-shifted replica of x(t). A beamformer combines linearly the signals x n , which are previously multiplied by complex weights w n , obtaining an output signal y(t). Figure 1, shows the structure of a beamformer. By means of the selection of the weights, it is possible to generate a narrow beam steered to a given direction, called steering angle, and therefore to implement an electronic scanning array [14,15]. The spatial response of a beamformer is called the array factor, and its graphical representation is the beampattern. Figure 2 shows a beampattern of an array with 8 λ/2-equispaced sensors, for a steering angle of 0°. The proposed system uses beamforming, with a linear microphone array and a linear tweeter array, in transmission and reception, respectively. In this paper, Section 2 describes the system including the hardware architecture and the functional description. Section 3 designs the system parameters and characterizes the acoustic array sensor for these parameters. Section 4 describes the definition and extraction of acoustic profiles and Section 5 tests these images for biometry applications, defining a metric based on mean square error, and presents the obtained FMR/FNMR parameters and ROC curve. Finally, Section 6 presents our conclusions.

Functional Description
Based on basic Radar/Sonar principles [17,18], an acoustic sound detection and ranging system for biometric identification is proposed, according to the block diagram in Figure 3.
The manager controls all subsystems, performing three main tasks: (i) person scanning and detection, (ii) acoustic images acquisition and (iii) person identification based on a database of acoustic images.

Sensors 201
The follo After pro mage, as it    Next, for he images s

Hardwa
The biom

Spatial Aperture Selection
In the design process of the spatial aperture (length) of transmission and reception ULAs, the following parameters must be considered: angular resolution, frequency band, angular excursion and transducer diameter. If the array spatial aperture is increased, the angular resolution improves, however grating lobes appear [16].
Two ULAs with 15 λ/2-equispaced sensors have been employed. These arrays have different spatial apertures in order to reduce sidelobe levels on the final beampattern (Tx + Rx). Note that sidelobe positions on each beampattern are different, while the mainlobe keeps its position.
A transmission array with a 50 cm spatial aperture and a reception array with a 40 cm spatial aperture has been used. On the transmission array, the tweeters are placed so as to occupy the minimum space.

Frequency Band Selection
After defining the array spatial apertures, it is necessary to evaluate the range of frequencies where the array works properly. This evaluation is based on: • The angular resolution: 3-dB beamwidth of the mainlobe.
• Non appearance of grating lobes.
• Frequency response of the microphone-tweeter pair.
• Frequency response of a person.
Working with low frequencies increases the main beam width and therefore degrades the angular resolution. Working with high frequencies decreases the main beam width, but grating lobes appear, which degrades the beampattern. On the other hand, we note that main lobe width and grating lobes level increases as the steering angle rises. Therefore, the maximum steering angle should be determined by the size of the person and his/her distance from the array.
Based on these considerations, the following parameters have been selected: • The positioning area is located 3 m from the array • The maximum width of a person with outstretched arms is 2 m • The range is 2.5 m For these parameters, the angle excursion is ±15º, as shown in Figure 14.   Finally, Figure 16 shows the beampattern of the proposed array for θ = 0° and θ = 15° for the higher working frequency. i.e., 14 kHz. It can be observed that there are no gratings lobes and main lobe width ranges from 1.80° on the broadside to 1.85° on maximum steering angle.  For frequencies below 4 kHz, or above 14 kHz the beampattern degrades significantly and cannot be used. Analyzing the frequency response for the microphone-tweeter pair used (Figure 17), the following results have been obtained.
For frequencies below 6 kHz and for frequencies above 12 kHz, the system sensitivity is very low, due to the pass-band response of the tweeter. Therefore a frequency band between 6 kHz and 12 kHz has been selected.
At this point, the number of frequencies and values must be determined. It is clear that a large number of frequencies allow people characterization, but at the expense of increasing acquisition and processing times. On the other hand, a high number of frequencies does not improve the system performance, as the frequencies have to be closer and the obtained images are not independent.
After several tests, four different frequencies that guarantee the independence of the obtained images have been selected. The determination of the optimal values would be a very complex process, because it would depend on people are wearing and an exhaustive study will be required. Finally, the selected frequencies are 6 kHz, 8 kHz, 10 kHz and 12 kHz, where the frequency gap is the maximum in order to have independent images.

Angle Resolution Cells and Number of Beams
Given a ULA, u Δ is defined as 3-dB beamwidth of the mainlobe in the sin(θ) space. Beamwidth in sin(θ) space does not depend on the steering angle and therefore, assuming that beams are 3-dB overlapped, the number of beams necessary to cover the exploration zone will be [16]: where θ max is the angular excursion.
After evaluating the final beampattern of the transmission and reception arrays, Δθ is obtained. Δθ is defined as the 3-dB beamwidth of the mainlobe in degrees. Then, beamwidth in the sin(θ) space is obtained using the expression: Finally, the number of beams for each frecuency, M k , is calculated using Expressions (1) and (2).
These values are shown in Table 1. A value of θ max = 15° for the angular excursion has been assumed. Steering angles for each frequency are shown in Figure 18. Finally, the collection of beampatterns necessary to cover the exploration zone for f = 10 kHz is shown in Figure 19.

Definition and Extraction of Acoustic Profiles
A collection of samples of acoustic images, with the procedures and parameter values used for image acquisition and identification are presented in this section.

Image Parameters
Following the design considerations of Section 3, the system retrieves the acoustic image associated to a rectangle of 2 m × 2.5 m (width × depth) dimensions, where the person under analysis must be located 3 meters away from the line array, as described in Figure 14. As justified in the previous section, 4 frequencies: f 1 = 6 kHz, f 2 = 8 kHz, f 3 = 10 kHz and f 4 = 12 kHz were selected. A 2 ms pulse width value has been selected. This value is a trade-off between range resolution, which is inversely proportional, and the received energy, which is proportional. The acoustic images are collected from 2.0 m to 4.5 m, in range coordinate, and from -15° to 15°, in azimuth coordinate, using M i steering angles. The acoustic images are stored into a matrix I: Assuming a sampling frequency f s = 32 kHz and a sound velocity v = 340 m/s, the matrix dimension N will be: N = 2.5 m·32 kHz/340 = 235 (4) The matrix dimension M k is the number of steering angles necessary to cover the exploration area for each frequency.

Positions
After analyzing multiple positions of people was concluded, it has been determined that the best results are obtained for the following positions: front view with arms folded on both sides (p 1 ), front view with arms outstretched (p 2 ), back view (p 3 ) and Side view (p 4 ). Figure 20 shows the four positions using a test subject. These po differentiate Only these processing t

Acoustic
The acou positions (p 1

Image Normalization
Each acoustic image is normalized for its energy, according to the expression:

Metric Based on Mean Square Error
An algorithm for biometric identification has been implemented based on the mean square error between acoustic images from the profile P i and the profile P j [20]. First, the function [ ] j i E f p , is defined as the mean square error between an acoustic image from profile P i and an acoustic image from profile P j , for a specific frequency f and a position p: where NF is the number of acoustic profiles stored in the database.
Finally, the function E(i,j), called global error, is defined as the sum of the multifrequency error at each position p: If P k is an unknown profile to be identified, the algorithm will associate the profile P k , to the person "i" in the database, whose profile P i has the minimum E[k,i] value. The normalized global error will be defined as the distance or metric used by the biometric system.

Scenario Definition
The case-study involved 10 people-5 men and 5 women-in order to analyze the behaviour of the system. Each selected person has distinct morphological features, as shown in Table 2. In this analysis, all people use an overall as common reference clothing, in order to eliminate clothing as a distinctive factor. The biometric system uses a metric or distance based on the mean square error, according to expression (6). To evaluate the system, acoustic profiles were captured 10 times for each of the 10 people under test. These captures were carried out for 10 days, one capture per person per day, thereby people did not remember their position in the previous capture and there was not a "memory effect". They were placed in the center of the measurement area (cross marked on the ground). Every 60 seconds, a multifrequency capture was done for each position, with the following sequence: front view (p 1 ) with arms folded on both sides, front view with arms outstretched (p 2 ), back view (p 3 ) and side view (p 4 ).
One hundred profiles were acquired; each one with 16 acoustic images (four frequencies by four positions). These captures were stored with a unique identifier formed by sub identifiers associated with the person ID, the number of capture, the position and the frequency. Finally, the normalized global error between all acquired profiles was calculated.

False Match Rate (FMR) and False Non-Match Rate (FNMR)
Based on the methodology to characterize a biometric system [21], FNMR and FMR parameters have been calculated. It is assumed that there are no errors in the acquisition; therefore FAR/FMR and FRR/FNMR pairs are equivalent.
False match rate (FMR) is the probability of the system matching incorrectly the input acoustic profile to a non-matching template in the database, i.e. the percentage of imposters incorrectly matched to a valid user's biometric. It measures the percent of invalid inputs which are incorrectly accepted. FMR is obtained by matching acoustic profiles of different people. The global error E(i,j) is calculated for all these cases; then the FMR parameter is calculated as the percentage of matching whose error value is equal or less than the distance d: Where the distance d is the set of possible values of the global error. False non-match rate (FNMR) is the probability of the system not matching the input acoustic profile to a matching template in the database, i.e. the percentage of incorrectly rejected valid users. It measures the percent of valid inputs which are incorrectly rejected. FNMR is obtained by matching acoustic profiles of the same people.

Conclusions
An acoustic biometric system based on an electronic scanning array using sound detection and ranging techniques has been developed. People are scanned with a narrow acoustic beam in an anechoic chamber, and then an acoustic image is created by collecting people's response to the transmitted signal.
This work is focused on analyzing the feasibility of employing acoustic images of a person as a biometric feature. Specifically, four pulsed tone signals with frequency 6 kHz, 8 kHz, 10 kHz and 12 kHz and four positions for the person (front, front with arms outstretched, back and side) have been used, getting a representative set of acoustic images. FNMR, FMR and the ROC curve have been obtained, being comparable to those of commercial biometric systems. These facts confirm the feasibility of using acoustic images in biometric systems. Currently, work on improving algorithms and extending the case-study presented with a broader set of users is being carried out. The weights of the different acoustic images (frequency and position) in the error function are also being optimized.