Remote Binaural System (RBS) for Noise Acoustic Monitoring

: The recent emergence of advanced information technologies such as cloud computing, artiﬁcial intelligence, and data science has improved and optimized various processes in acoustics with potential real-world applications. Noise monitoring tasks on large terrains can be captured using an array of sound level meters. However, current monitoring systems only rely on the knowledge of a singular measured value related to the acoustic energy of the captured signal, leaving aside spatial aspects that complement the perception of noise by the human being. This project presents a system that performs binaural measurements according to subjective human perception. The acoustic characterization in an anechoic chamber is presented, as well as acoustic indicators obtained in the ﬁeld initially for a short period of time. The main contribution of this work is the construction of a binaural prototype that resembles the human head and which transmits and processes acoustical data on the cloud. The above allows noise level monitoring via binaural hearing rather than a singular capturing device. Likewise, it can be highlighted that the system allows for obtaining spatial acoustic indicators based on the interaural cross-correlation function (IACF), as well as detecting the location of the source on the azimuthal plane.


Introduction
The urban population continues to grow, and with it, the challenges of managing the environmental quality of cities. According to the Department of Economic and Social Affairs (United Nations), it is projected that by 2030, most countries in Europe and the Americas will have more than 80% of their inhabitants living in cities [1]. In acoustic terms, these urban densifications imply an increase in noise emissions levels (related to traffic sources, commercial and industrial activities, etc.) that increase the risk to people's health and will reduce their quality of life [2].
Concerning urban acoustic management processes, the most common approach is related to noise control, although, in recent years, the soundscape concept has also appeared as a complement to this paradigm. The noise control approach is based on the monitoring and control of noise emission levels. For this purpose, different legislations and mandatory regulations in each country establish the permissible sound pressure levels according to land use and daytime. In order to obtain acoustic data for this approach, traditional acoustic monitoring systems have been developed. They usually provide energetic information on acoustic environments under study. The energetic acoustic descriptors used in these monitoring processes are generally the A-weighted Equivalent Continuous Sound Pressure Level (L Aeq ), Maximum and minimum sound pressure levels, and the L 10 and L 90 percentile microphones, and a binaural signal is synthesized. Other projects [20] describe generic acoustic environments with binaural psychoacoustic parameters, as stated within ISO 12913. This led to the implementation of a Wireless Acoustic Sensor Network (WASN) to evaluate the spatial distribution and evolution of urban acoustic environments. The evaluated parameters in this study were sound pressure level, binaural loudness, and binaural sharpness. Cross-validation analysis confirms the usefulness of the proposed system. Finally, there are currently no specific standard guidelines or standards governing the use of Remote Binaural Systems (RBS) in acoustic noise monitoring. However, the IEC 60,318 and IEC 61,672 series are used for human head/ear simulators and sound level meters, respectively.
This project aims to develop a binaural monitoring system that allows for obtaining energetic and spatial acoustic parameters. A binaural recording system was designed, implemented, and characterized for that purpose. The architecture's system supports twelve acoustic monitor stations, and each one is available to transmit, save, and process acoustic data of more than 288 sound events per day.

Materials and Methods
This section describes the criteria for the design and implementation of the monitoring system prototype, as well as its characterization with the use of acoustic measurements. Two main parts are described. First, the binaural data sensing involves the artificial head with microphones. Second, the software and architecture to receive and process data from the sensors on the cloud. Figure 1 presents both processes.

Design
When it comes to a binaural audio recording, there are several options documented in the literature and commercial solutions [21][22][23]. As the general idea behind the project was to implement a network of binaural monitoring at several points, commercial heads were not considered due to the costs associated with its installation and operation.
The first approach considered was to use two microphones inside an artificial pinna, each covered using a circle of wood, as shown in Figure 2. Likewise, the small circles in the system were kept in order to hold the silicone artificial pinna. The square box under the microphones was designed to store the electrical parts of the prototype.   When it comes to a binaural audio recording, there are several options documented in the literature and commercial solutions [24][25][26]. As the general idea behind the project was to implement a network of binaural monitoring at several points, commercial heads were not considered due to the costs associated with its installation and operation.
The first approach considered was to use two microphones inside an artificial pinna, each covered using a circle of wood, as shown in Figure 2. Likewise, the small circles in the system were kept in order to hold the silicone artificial pinna. The square box under the microphones was designed to store the electrical parts of the prototype. The final prototype design was made in CAD software, seeking to maintain the pro portions of an average adult human head. Additionally, in the internal part, the adjus ment was made to be able to introduce two microphones. The space for the pinna wa covered with two life-size silicone ears. The final design with lateral and 3D views can b seen in Figure 3.  The created model was 3D printed using 1.75 mm PLA (Polylactic acid) material at temperature of 200 °C. Approximately 1200 g were used with a fill percentage of 9%. Th percentage was chosen to make the part lighter and thus facilitate transport and on-sit implementation, as well as reduce costs for its serial production. The printing time wa approximately two days, and due to its dimensions, which exceeded the capacity of th printer, it was divided into eight sections (four on the front and four on the back). Figur 4b shows the printing result. Future versions may consider larger fills and the printing o only two pieces. The final prototype design was made in CAD software (ANSYS SpaceClaim 2019 R2), seeking to maintain the proportions of an average adult human head. Additionally, in the internal part, the adjustment was made to be able to introduce two microphones. The space for the pinna was covered with two life-size silicone ears. The final design with lateral and 3D views can be seen in Figure 3. The final prototype design was made in CAD software, seeking to maintain the proportions of an average adult human head. Additionally, in the internal part, the adjustment was made to be able to introduce two microphones. The space for the pinna was covered with two life-size silicone ears. The final design with lateral and 3D views can be seen in Figure 3.  The created model was 3D printed using 1.75 mm PLA (Polylactic acid) material at a temperature of 200 °C. Approximately 1200 g were used with a fill percentage of 9%. This percentage was chosen to make the part lighter and thus facilitate transport and on-site implementation, as well as reduce costs for its serial production. The printing time was approximately two days, and due to its dimensions, which exceeded the capacity of the printer, it was divided into eight sections (four on the front and four on the back). Figure  4b shows the printing result. Future versions may consider larger fills and the printing of only two pieces. The created model was 3D printed using 1.75 mm PLA (Polylactic acid) material at a temperature of 200 • C. Approximately 1200 g were used with a fill percentage of 9%. This percentage was chosen to make the part lighter and thus facilitate transport and on-site implementation, as well as reduce costs for its serial production. The printing time was approximately two days, and due to its dimensions, which exceeded the capacity of the printer, it was divided into eight sections (four on the front and four on the back). Figure 4b shows the printing result. Future versions may consider larger fills and the printing of only two pieces.  On the other hand, the capture of the acoustic signals was carried out using two omnidirectional condenser acoustic measurement microphones (their frequency response will be shown later). These were connected to a Steinberg UR242 audio interface to carry out the respective digitization and transmission to a PC. The selection of analog and nondigital microphones was determined using the randomness of the lag produced by them when making simultaneous captures, which affected the obtaining of the spatial parameters.

Anechoic Chamber Measurements
The measurements were made in the anechoic chamber of the Polytechnic University of Madrid. Pink noise was used as the excitation signal, guaranteeing a signal-to-noise ratio of 25 dB for all frequency bands. The prototype was placed on a remotely controlled turntable from another room, allowing values to be obtained for different degrees of analysis. Only azimuthal variations were considered. Figure 4a shows the measurement plane in the counterclockwise direction and the experimental setup. The position of the source never changed during the experiment.
The source used was a two-way Yamaha MSP5 Studio active loudspeaker with a frequency response from 50 Hz to 40 kHz. All optional settings included in this speaker have been disabled. The microphones were calibrated at 1 kHz and 94 dB before starting the measurement. The excitation signal was a sine sweep, taking three repetitions per point and obtaining results per 1/3 octave band from 100 Hz to 16,000 Hz. This bandwidth was defined based on IEC 60318-7. Likewise, the FRF (Frequency response function) of the microphones used in the prototype was adjusted by compensating for the response of the loudspeaker used as an acoustic source. The above is in such a way that only the effect of the head and pinna is reflected.

Field Measurements
In this section, two devices were used: the monitoring station prototype and a class 1 sound level meter. The purpose of using the sound level meter (SLM) was to compare the results of Leq, LAeq, and levels per third of an octave obtained using both measurement systems. It is known that the effect of the head in binaural listening affects the frequency response for each ear differently depending on the angle of incidence of the sound source [24], so differences are expected between the measurement with the microphone of the omnidirectional flat response of the microphone and the measurement microphones implemented in the binaural head of the station. Hence, energetic values obtained are used only as a reference.
A 15-minute continuous measurement was performed as a "baseline" with both devices at a height of 1.5 m (Figure 5a). Likewise, both were calibrated using a class 1 pistonphone (94 dB at 1 kHz) before and after the procedure. A point located on a main street On the other hand, the capture of the acoustic signals was carried out using two omnidirectional condenser acoustic measurement microphones (their frequency response will be shown later). These were connected to a Steinberg UR242 audio interface to carry out the respective digitization and transmission to a PC. The selection of analog and non-digital microphones was determined using the randomness of the lag produced by them when making simultaneous captures, which affected the obtaining of the spatial parameters.

Anechoic Chamber Measurements
The measurements were made in the anechoic chamber of the Polytechnic University of Madrid. Pink noise was used as the excitation signal, guaranteeing a signal-to-noise ratio of 25 dB for all frequency bands. The prototype was placed on a remotely controlled turntable from another room, allowing values to be obtained for different degrees of analysis. Only azimuthal variations were considered. Figure 4a shows the measurement plane in the counterclockwise direction and the experimental setup. The position of the source never changed during the experiment.
The source used was a two-way Yamaha MSP5 Studio active loudspeaker with a frequency response from 50 Hz to 40 kHz. All optional settings included in this speaker have been disabled. The microphones were calibrated at 1 kHz and 94 dB before starting the measurement. The excitation signal was a sine sweep, taking three repetitions per point and obtaining results per 1/3 octave band from 100 Hz to 16,000 Hz. This bandwidth was defined based on IEC 60318-7 [27]. Likewise, the FRF (Frequency response function) of the microphones used in the prototype was adjusted by compensating for the response of the loudspeaker used as an acoustic source. The above is in such a way that only the effect of the head and pinna is reflected.

Field Measurements
In this section, two devices were used: the monitoring station prototype and a class 1 sound level meter. The purpose of using the sound level meter (SLM) was to compare the results of Leq, LAeq, and levels per third of an octave obtained using both measurement systems. It is known that the effect of the head in binaural listening affects the frequency response for each ear differently depending on the angle of incidence of the sound source [28], so differences are expected between the measurement with the microphone of the omnidirectional flat response of the microphone and the measurement microphones implemented in the binaural head of the station. Hence, energetic values obtained are used only as a reference.
A 15-minute continuous measurement was performed as a "baseline" with both devices at a height of 1.5 m (Figure 5a). Likewise, both were calibrated using a class 1 pistonphone (94 dB at 1 kHz) before and after the procedure. A point located on a main street in Bogotá-Colombia was taken as the measurement location. Figure 5b shows the point in the geodetic coordinate system EPSG:4686. This is an important avenue through which heavy vehicles, public transport, motorcycles, and private vehicles pass.
Likewise, through the bike path located in the middle of the road, there is traffic of bicycles and other motorized personal transport vehicles. Additionally, there is a traffic light on the road on the east-west side, although this did not generate a concentration of vehicles near the measurement position. As mentioned, during the measurement time, the predominant noise source was road traffic. The circulation speed was close to 45 km/h, so the contribution of the powertrain predominated and, to a lesser extent, road noise, resulting in a low-frequency noise source. There were conditions of dry pavement and fluid circulation of vehicles. Finally, the distance from the measurement point to the buildings was greater than 50 m, decreasing the influence of lateral reflections. in Bogotá-Colombia was taken as the measurement location. Figure 5b shows the point in the geodetic coordinate system EPSG:4686. This is an important avenue through which heavy vehicles, public transport, motorcycles, and private vehicles pass. Likewise, through the bike path located in the middle of the road, there is traffic of bicycles and other motorized personal transport vehicles. Additionally, there is a traffic light on the road on the east-west side, although this did not generate a concentration of vehicles near the measurement position. As mentioned, during the measurement time, the predominant noise source was road traffic. The circulation speed was close to 45 km/h, so the contribution of the powertrain predominated and, to a lesser extent, road noise, resulting in a lowfrequency noise source. There were conditions of dry pavement and fluid circulation of vehicles. Finally, the distance from the measurement point to the buildings was greater than 50 m, decreasing the influence of lateral reflections.

Acoustic Parameter Software
Besides the hardware, the prototype development includes the software dedicated to the obtention of the energetic and spatial acoustic indicators. This will be described in detail in each of the following aspects.
To obtain the energetic acoustic indicators, a class was developed that receives the arrangement with the audio samples, the sampling frequency, and the type of weighting to be applied. Figure 6a shows the structure of the program. In this case, three instances of the time parameters class are created, each with a different weight.

Acoustic Parameter Software
Besides the hardware, the prototype development includes the software dedicated to the obtention of the energetic and spatial acoustic indicators. This will be described in detail in each of the following aspects.
To obtain the energetic acoustic indicators, a class was developed that receives the arrangement with the audio samples, the sampling frequency, and the type of weighting to be applied. Figure 6a shows the structure of the program. In this case, three instances of the time parameters class are created, each with a different weight.  On the other hand, spatial acoustic indicators were obtained using an algorithm that computed the IACF by applying autocorrelation and normalized cross-correlation equations. The calculations of the spatial parameters made with the algorithm were compared with the results of the analysis using the DSSF3 software, validating the correct function- On the other hand, spatial acoustic indicators were obtained using an algorithm that computed the IACF by applying autocorrelation and normalized cross-correlation equations. The calculations of the spatial parameters made with the algorithm were compared with the results of the analysis using the DSSF3 software (version 5), validating the correct functioning of the developed algorithms. Figure 6b shows the scheme of the developed program.

• Autocorrelation Function
Correlation allows statistical analysis of the linear relationship between two variables. When there is a signal in the time domain, it is characterized by having a periodicity. Using the autocorrelation function (ACF), it is possible to measure the correlation of a signal with itself in versions displaced in time. Depending on the type of signal, the ACF may have different periodic and/or random components. The normalized autocorrelation function (NACF) has been proposed as a sound quality analysis tool [29]. This analysis establishes a window size, a running step to scroll the analysis window, and a running time τ that defines the window delay for the correlation calculation. The ACF is normalized by dividing the geometric average between the energy of the original window and that of the time-shifted window.

•
Interaural Cross-Correlation Cross-correlation (CCF) performs the process of correlating between two different signals. In this case, interaural cross-correlation (IACF) analyzes the left and right channels of a binaural recording [30]. The displacement time of the right channel with respect to the left τ has a range of −1 ms to 1 ms due to the maximum interaural time difference between the two ears of a human head. The IACC of the window is the maximum value that is obtained from the IACF, and τIACC is the value of τ at which the IACC is found. WIACC is the time difference between the two points close to the IACC, at which it falls by 90%.

Network Architecture for Cloud Storage and Processing
The main challenge of this project was to develop an architecture that could support more than 12 sound monitor stations, each transmitting approximately 288 sound events per day and processing each sound file with an average size of 80 MB. Since the algorithms could change in the future and the number of stations could increase, we opted to use an event-driven architecture as described by Michelson [31]. This type of architecture is inherently loosely coupled and highly distributed, which makes it easy to maintain and scale the solution in the long run.
While using a device with the capacity to perform sound processing is a viable approach, it may not always be the optimal solution. Sometimes, the algorithms used for sound processing can be complex and resource-intensive, and introducing new features could further strain the device's resources, ultimately increasing the cost of implementation. Considering the potential resource limitations of on-site devices, we decided to adopt an on-cloud processing approach while designing the system architecture. This approach allows us to leverage the scalability and flexibility of cloud resources, ensuring that we can handle the processing requirements of our system effectively, even as they evolve over time.
As illustrated in Figure 7, the architecture of our system is based on an event-driven design. We have deployed two applications in AWS Beanstalk as worker agents, which receive messages from the Simple Queue Service (SQS). The SQS, in turn, receives notifications from the Simple Notification Service (SNS) whenever an audio file -in WAV format-is uploaded to the S3 bucket named "soundmonitor-audiodata". We identified the following advantages of this type of design.

time.
As illustrated in Figure 7, the architecture of our system is based on an event-driven design. We have deployed two applications in AWS Beanstalk as worker agents, which receive messages from the Simple Queue Service (SQS). The SQS, in turn, receives notifications from the Simple Notification Service (SNS) whenever an audio file -in WAV format-is uploaded to the S3 bucket named "soundmonitor-audiodata". We identified the following advantages of this type of design.

•
One of the key benefits of using the event-driven architecture on AWS Beanstalk is the scalability it provides. We can easily increase the number of worker agents taking messages from the SQS service to handle additional loads as needed.

•
If a new algorithm is developed in the future, it can be integrated into the system without major changes to the overall architecture by adding a new Beanstalk agent and including a new subscriber to the SNS service, allowing for efficient and flexible updates. • AWS Beanstalk allows for vertical scalability, meaning that if any of the current or future algorithms used for processing require more computing resources, we can easily increase the resources at any time.

•
The results from the algorithms are stored in JSON format in the "soundmonitor-NoiseLevel" and "soundmonitor-NoiseType" S3 buckets, enabling the integration between multiple AWS services, such as AWS Quicksight, for visualization purposes. • One of the key benefits of using the event-driven architecture on AWS Beanstalk is the scalability it provides. We can easily increase the number of worker agents taking messages from the SQS service to handle additional loads as needed.

•
If a new algorithm is developed in the future, it can be integrated into the system without major changes to the overall architecture by adding a new Beanstalk agent and including a new subscriber to the SNS service, allowing for efficient and flexible updates. • AWS Beanstalk allows for vertical scalability, meaning that if any of the current or future algorithms used for processing require more computing resources, we can easily increase the resources at any time.

•
The results from the algorithms are stored in JSON format in the "soundmonitor-NoiseLevel" and "soundmonitor-NoiseType" S3 buckets, enabling the integration between multiple AWS services, such as AWS Quicksight, for visualization purposes.
We opted to use Raspberry Pi devices to collect audio from the recording devices in our sound monitoring stations. However, after an analysis of the required services and resources, it was found that these devices lacked the resources to perform space-time sound metrics calculator algorithms for on-site processing. As a result, we decided to use an on-cloud processing approach, as discussed earlier. This means that the Raspberry Pi devices are only responsible for capturing sounds from the environment around the recording devices and uploading the audio to the cloud via the HTTPS protocol.

Results and Discussion
In this section, the results obtained are discussed.

Anechoic Chamber
Initially, the free field frequency response graphs of the microphones used in the prototype were obtained. In 160 Hz, 250 Hz, and 400 Hz bands, there are increases of approximately +1 dB. Then, from 5000 Hz, there is an ascending behavior up to 16 kHz with a peak value of +6.8 dB. This information was used to apply an inverse filter during prototype data processing.
Data for head-mounted microphones were then processed for a zero-degree angle to the speaker. Figure 8 initially shows how an increase of up to 11 dB is presented at 4 Hz and an attenuation of −13 dB at 8 kHz. This can be attributed to different aspects. Firstly, the microphones are located 2.1 cm inside the prototype, creating a semi-closed tube whose first resonant frequency for 1 4 wavelength is approximately 4 kHz. It is worth mentioning that this behavior extends a bit more between 5 kHz and 6 kHz, which may be caused by the shell effect inside the pinnae.
In this section, the results obtained are discussed.

Anechoic Chamber
Initially, the free field frequency response graphs of the microphones used in the prototype were obtained. In 160 Hz, 250 Hz, and 400 Hz bands, there are increases of approximately +1 dB. Then, from 5000 Hz, there is an ascending behavior up to 16 kHz with a peak value of +6.8 dB. This information was used to apply an inverse filter during prototype data processing.
Data for head-mounted microphones were then processed for a zero-degree angle to the speaker. Figure 8 initially shows how an increase of up to 11 dB is presented at 4 Hz and an attenuation of −13 dB at 8 kHz. This can be attributed to different aspects. Firstly, the microphones are located 2.1 cm inside the prototype, creating a semi-closed tube whose first resonant frequency for ¼ wavelength is approximately 4 kHz. It is worth mentioning that this behavior extends a bit more between 5 kHz and 6 kHz, which may be caused by the shell effect inside the pinnae. On the other hand, for 1/2 wavelength, there must be attenuation for the semi-closed tube model, which would explain the strong drop at 8 kHz. Then, between 12.5 kHz and 16 kHz, another resonance occurs, mainly associated with the third harmonic of the ear canal. Finally, the differences between the right and left channels can be caused by small differences in the physical shape of the ears used.
In the case of 180°, in general lines, there is a similar frequency behavior up to 2.5 KHz. Then, at the first resonant frequency of the ear canal (4 kHz), there is a difference of On the other hand, for 1/2 wavelength, there must be attenuation for the semi-closed tube model, which would explain the strong drop at 8 kHz. Then, between 12.5 kHz and 16 kHz, another resonance occurs, mainly associated with the third harmonic of the ear canal. Finally, the differences between the right and left channels can be caused by small differences in the physical shape of the ears used.
In the case of 180 • , in general lines, there is a similar frequency behavior up to 2.5 KHz. Then, at the first resonant frequency of the ear canal (4 kHz), there is a difference of approximately 3.5 dB for both channels; however, the trend is maintained up to 8 kHz. From then on, the differences between 0 • and 180 • are very marked, with values between 4 dB and 14 dB, the latter being evident for frequencies greater than 12 kHz. This could be explained because the incident wavelength begins to be comparable with the size of the pinna and its internal structure. In this way, in practical terms, to discern whether the acoustic source is in front of or behind the prototype, frequencies from 8 kHz onwards play a fundamental role.
In the same way, the analysis was carried out for 90 • and 180 • , the results of which are shown in Figure 9. It can be observed that the right and left channels follow a similar behavior, presenting higher amplitudes when the source is in front of the prototype's ears (R 90 • and L 270 • ). For the opposite case (R 270 • and L 90 • ), there are greater differences between each ear; however, the trend remains. acoustic source is in front of or behind the prototype, frequencies from 8 kHz onwards play a fundamental role.
In the same way, the analysis was carried out for 90° and 180°, the results of which are shown in Figure 9. It can be observed that the right and left channels follow a similar behavior, presenting higher amplitudes when the source is in front of the prototype's ears (R 90° and L 270°). For the opposite case (R 270° and L 90°), there are greater differences between each ear; however, the trend remains. Figure 9. Comparison of the FRF of the free field prototype for 90° and 270° incidence.
In a complementary way, the frequency response data of the prototype were compared with the Head Acoustics HMS II.3 binaural head. Figure 10 shows the results of a single channel for four angles of incidence of the acoustic wavefront in the direct field. The values used for comparison were obtained in the same anechoic chamber under the same previously mentioned methodology.
(a) (b) Figure 9. Comparison of the FRF of the free field prototype for 90 • and 270 • incidence.
In a complementary way, the frequency response data of the prototype were compared with the Head Acoustics HMS II.3 binaural head. Figure 10 shows the results of a single channel for four angles of incidence of the acoustic wavefront in the direct field. The values used for comparison were obtained in the same anechoic chamber under the same previously mentioned methodology.
acoustic source is in front of or behind the prototype, frequencies from 8 kHz onwards play a fundamental role.
In the same way, the analysis was carried out for 90° and 180°, the results of which are shown in Figure 9. It can be observed that the right and left channels follow a similar behavior, presenting higher amplitudes when the source is in front of the prototype's ears (R 90° and L 270°). For the opposite case (R 270° and L 90°), there are greater differences between each ear; however, the trend remains. Figure 9. Comparison of the FRF of the free field prototype for 90° and 270° incidence.
In a complementary way, the frequency response data of the prototype were compared with the Head Acoustics HMS II.3 binaural head. Figure 10 shows the results of a single channel for four angles of incidence of the acoustic wavefront in the direct field. The values used for comparison were obtained in the same anechoic chamber under the same previously mentioned methodology. In all four incidence angles studied, the HMS II system presents greater amplitude at all frequencies, as well as a more linear behavior at low frequencies. On the other hand, the HMS II shows amplification between 2 kHz and 5 Hz for 0° and 180°, while in the case of the prototype, its response is narrow, focusing on 3 kHz. Likewise, the differences be- In all four incidence angles studied, the HMS II system presents greater amplitude at all frequencies, as well as a more linear behavior at low frequencies. On the other hand, the HMS II shows amplification between 2 kHz and 5 Hz for 0 • and 180 • , while in the case of the prototype, its response is narrow, focusing on 3 kHz. Likewise, the differences between the prototype and the commercial head can be explained from different perspectives. For example, with respect to resonant frequencies and frequency response, the commercial head must have ear simulators that comply with IEC 60318-4, and all together must comply with IEC 60318-7. This series of standards specifies the impedance and frequency response that must be met. All of the above is because the commercial head focuses mainly on standard measurements in closed spaces. In the case of the prototype, a design was made where the main objective is to obtain a binaural record without the costs of a commercial head and become a first step for binaural acoustic monitoring. In this way, the differences in the technical characteristics of the transducers and the use of normalized pinnas in the commercial head directly influence the variances in FRF. On the other hand, as mentioned above, the distance to the ear canal of the prototype and the materials are also the cause of the differences between the FRF of the prototype and the commercial head.
Finally, the results are presented after applying the correlation functions to obtain the IACF and the spatial acoustic indicators for 0 • , 90 • , 180 • and 270 • (Figure 11). During this experiment, the source remained fixed by only varying the position of the prototype on the azimuthal axis and emitting a broadband noise for 30 s. In all four incidence angles studied, the HMS II system presents greater amplitude at all frequencies, as well as a more linear behavior at low frequencies. On the other hand, the HMS II shows amplification between 2 kHz and 5 Hz for 0° and 180°, while in the case of the prototype, its response is narrow, focusing on 3 kHz. Likewise, the differences between the prototype and the commercial head can be explained from different perspectives. For example, with respect to resonant frequencies and frequency response, the commercial head must have ear simulators that comply with IEC 60318-4, and all together must comply with IEC 60318-7. This series of standards specifies the impedance and frequency response that must be met. All of the above is because the commercial head focuses mainly on standard measurements in closed spaces. In the case of the prototype, a design was made where the main objective is to obtain a binaural record without the costs of a commercial head and become a first step for binaural acoustic monitoring. In this way, the differences in the technical characteristics of the transducers and the use of normalized pinnas in the commercial head directly influence the variances in FRF. On the other hand, as mentioned above, the distance to the ear canal of the prototype and the materials are also the cause of the differences between the FRF of the prototype and the commercial head.
Finally, the results are presented after applying the correlation functions to obtain the IACF and the spatial acoustic indicators for 0°, 90°, 180° and 270° ( Figure 11). During this experiment, the source remained fixed by only varying the position of the prototype on the azimuthal axis and emitting a broadband noise for 30 s. The τIACC values showed consistency when comparing the position of the source with respect to the prototype. Thus, for 90° and 270°, τIACC was −0.79 ms and 0.78 ms, respectively. It is worth mentioning that this indicator is directly related to the ITD (interaural time difference) and, therefore, makes sense for angles other than 0° and 180°. Similarly, the τIACC values for 90° and 270° are within the limits of an IACF function and are close to the maximum ITD of a typical human head (0.66 mS) [28].
In the cases of incidence under 0° and 180°, amplitudes close to 1 are presented in the IACC value, indicating high similarity for the two ears, as expected. Although the graphs The τIACC values showed consistency when comparing the position of the source with respect to the prototype. Thus, for 90 • and 270 • , τIACC was −0.79 ms and 0.78 ms, respectively. It is worth mentioning that this indicator is directly related to the ITD (interau-ral time difference) and, therefore, makes sense for angles other than 0 • and 180 • . Similarly, the τIACC values for 90 • and 270 • are within the limits of an IACF function and are close to the maximum ITD of a typical human head (0.66 mS) [32].
In the cases of incidence under 0 • and 180 • , amplitudes close to 1 are presented in the IACC value, indicating high similarity for the two ears, as expected. Although the graphs of the IACF function are very similar, the differences are found in the presence of some additional peaks at the extremes, possibly caused by differences between the FRFs of both ears. Finally, ωIACC in all cases was 0.03 for the four source positions, and there was a sharp peak of the IACF. This means that by placing the source on the horizontal plane, there will be a well-defined localization [33].

Field Measurements
These measurements sought to characterize the behavior of the prototype in a space close to its final application. Next, the results of the energetic and spatial acoustic indicators obtained with the open space prototype are presented. Table 1 shows the results for the equivalent continuous level. It is observed that the head measured higher levels than those obtained using the sound level meter. The biggest differences are found in the A-weighting (3.2 dB in the right channel and 3.7 dB in the left channel), while the Z-weighting presents the smallest differences (1.1 dB in the right channel and 2.1 dB in the left channel).  Figure 12 shows the levels obtained using third-octave bands. The smallest differences obtained between the head and the sound level meter are found in the band range between 80 Hz and 500 Hz (less than ±1.0 dB). Starting at 2 kHz, the effect of the head begins to generate attenuation in the ear opposite to the direction of the sound source or gains in the direct ear. Therefore, the head of the station works according to what is expected to be the effect of a real human head. However, there are differences in low frequency (20 Hz to 60 Hz), which could have been due to the lack of a windscreen on the binaural headset microphones. The wind generates pressure on the microphone diaphragm intermittently, associated with a low-frequency stimulus. A head gain is also observed with respect to the sound level meter between 2.5 kHz and 8 kHz. In this range, there is the effect of the pinna of the external ear, as well as of the auditory canal, which, due to its physical properties, obtains a resonance at these frequencies. There is a similar effect between 10 kHz and 20 kHz. It is worth noting that an inverse filter was applied to compensate for the head pineapple effect at 0 • . However, since the source is road traffic moving from right to left and vice versa and the FRF of the RBS changes for every angle, the increase in energy in the frequencies is still visible, as previously mentioned. Future work will be performed to adapt a windscreen and to find a filter that better compensates for the differences between SLM and the RBS. due to its physical properties, obtains a resonance at these frequencies. There is a similar effect between 10 kHz and 20 kHz. It is worth noting that an inverse filter was applied to compensate for the head pineapple effect at 0°. However, since the source is road traffic moving from right to left and vice versa and the FRF of the RBS changes for every angle, the increase in energy in the frequencies is still visible, as previously mentioned. Future work will be performed to adapt a windscreen and to find a filter that better compensates for the differences between SLM and the RBS.

Spatial Parameters
The spatial parameters obtained from the A-weighted IACF are shown in Figure 13. The IACC is a measure of how similar the signals arriving at both ears from the binaural head are. The average value of IACC was 0.2456, with a minimum of 0.0229 and a maximum of 0.8886. Analyzing this function using its percentiles, P10 is 0.1178 and P90 0.4247, with a difference of 0.3069 between them. The difference in these percentiles makes it possible to estimate the variation of the IACC over time. Therefore, there was little correlation between both channels of the recording, being a very diffuse environment.

Spatial Parameters
The spatial parameters obtained from the A-weighted IACF are shown in Figure 13. The IACC is a measure of how similar the signals arriving at both ears from the binaural head are. The average value of IACC was 0.2456, with a minimum of 0.0229 and a maximum of 0.8886. Analyzing this function using its percentiles, P10 is 0.1178 and P90 0.4247, with a difference of 0.3069 between them. The difference in these percentiles makes it possible to estimate the variation of the IACC over time. Therefore, there was little correlation between both channels of the recording, being a very diffuse environment. The τIACC makes it possible to obtain the temporal difference between both ears over time. This function shows a P10 of −0.7708 and a P90 of −0.7917, having a high difference between them, which suggests that there was a high variation of the different sound sources during the recording. This is due to the traffic that was passing through Calle 170 during the recording. The difference between these τIACC percentiles is inversely propor- Figure 13. Spatial parameters obtained with IACF calculation with the monitoring station prototype.
The τIACC makes it possible to obtain the temporal difference between both ears over time. This function shows a P10 of −0.7708 and a P90 of −0.7917, having a high difference between them, which suggests that there was a high variation of the different sound sources during the recording. This is due to the traffic that was passing through Calle 170 during the recording. The difference between these τIACC percentiles is inversely proportional to the overall perceived quality of the sound environment.
WIACC is an indicator of the apparent width of the sound source, which is inversely dependent on frequency. The P10 had a value of 0.0854 and the P90 of 0.2511. There are peaks in the WIACC graph, which can be associated with sound events of short duration and with predominantly low-frequency content.

Future Work
Currently, work is being performed on the development of the other stations in order to integrate them into the capture and processing architecture deployed in AWS. Likewise, the classification of sound sources using AI is being deepened with the aim of knowing more information about the sound landscape of each point where the stations are located. Another approach that is being explored is to have stations with the possibility of performing Edge Computing for those cases in which broadband internet access is not possible. Finally, the main objective of the project is to provide state entities and research groups with useful data to implement urban planning policies that improve the quality of life of citizens.

Conclusions
A prototype of a binaural acoustic monitoring system was implemented using the shape of a 3D-printed human head as a design basis. Measurements carried out in an anechoic chamber allowed obtaining the FRF at four different angles, showing variations in medium and high frequencies caused by the pinna and ear canal effect. These variations, in general terms, follow the trend of a commercial head, however, with displacements in frequency and amplitude around 4 kHz. The above could be improved by increasing the distance between the microphones and the exterior ear, which leads to adjusting the physical design of the head. Regarding the spatial location of the source based on the IACF function, the difference between 90 • and 270 • was clear. However, for 0 • and 180 • , the similarity of τIACC made this task difficult.
Additionally, a measurement was made on the field to compare the results of the temporal acoustic parameters with a sound level meter class 1. The prototype simulates the effect of a human head in binaural listening, generating interaural differences using bands. The head effect causes certain frequency ranges to be boosted or attenuated, depending on the angle of incidence of the source, as described in Figure 12. In general, for the measurements made, the station shows equivalent continuous level values greater than those of the sound level meter for each of the weights measured. The main goal of the station is to obtain spatial indicators; however, a single microphone class 1 could be attached to obtain more accurate results, like a sound level meter. Regarding the spatial parameters, it was observed that the prototype can provide them once the data is transmitted and processed on the cloud. This is useful to obtain additional data that can help to apply new approaches in soundscape evaluation in urban scenarios.