Urban Traffic Noise Analysis Using UAV-Based Array of Microphones

(1) Background: Transition to smart cities involves many actions in different fields of activity, such as economy, environment, energy, government, education, living and health, safety and security, and mobility. Environment and mobility are very important in terms of ensuring a good living in urban areas. Considering such arguments, this paper proposes monitoring and mapping of a 3D traffic-generated urban noise emissions using a simple, UAV-based, and low-cost solution. (2) Methods: The collection of relevant sound recordings is performed via a UAV-borne set of microphones, designed in a specific array configuration. Post-measurement data processing is performed to filter unwanted sound and vibrations produced by the UAV rotors. Collected noise information is location- and altitude-labeled to ensure a relevant 3D profile of data. (3) Results: Field measurements of sound levels in different directions and altitudes are presented in the paperwork. (4) Conclusions: The solution of employing UAV for environmental noise mapping results in being minimally invasive, low-cost, and effective in terms of rapidly producing environmental noise pollution maps for reports and future improvements in road infrastructure.


Introduction
Urban traffic is, in present days, a major factor of urban environmental pollution. Therefore, most of the EU countries and countries worldwide are trying to alleviate the negative effects of traffic on the environment, seeking solutions both in infrastructure and on the mobile platforms. Both emissions of traffic-generated chemical composites, particulate matter, and noise produce negative effects on residents, traffic participants, and local natural life. Knowing the way that noise from daily road traffic is dispersed and behaves could be beneficial for competent authorities and municipalities to evaluate and propose countermeasures to reduce the impact on environment and citizens' lives. In this paper, research that has been carried out is presented to determine the main characteristics of the urban noise produced by traffic, and a solution for a cost-effective and efficient collection of noise data is also presented. The data collection is carried out using a UAVairborne set of microphones and a data processing unit.
The research in this field is quite rich, and several solutions for collecting information on environmental noise pollution have been proposed by different scholars and industrial researchers.
Starting with modeling, both polluting emissions and noise assessment in urban areas represent a measure designed to find the most appropriate solutions for reducing their impact on citizens' lives, as shown in [1], where Hui Di et al. determined in specific conditions a traffic noise propagation model to predict instantaneous sound levels, based on noise attenuation. The authors consider that noise is affected by air absorption, obstacle reflex, and ground absorption during its transmission process, which causes noise attenuation, The research conducted by Luo L. et al. [18] is oriented, on the other hand, towards collecting noise information via wireless acoustic sensor networks (WASNs). They propose a new system that employs WASNs to monitor the urban noise and recognize acoustic events with a high performance; the system is composed of sensors with the ability to produce local signal processing, convolutional neural networks for classification, and a platform for noise maps visualization. This direction of research is also employed by other authors, as shown in papers [19][20][21][22][23][24][25], employing also modeling of emissions based on Google maps road profiles [24].
Based on noise measurements, evaluations, and mapping, noise reduction strategies can be imagined for different urban or extra-urban scenarios. Hsiao Mun Lee et al. [26] extended the study on traffic-generated noise to affected population, dividing it into group noise indicators (highly annoyed and sleep-disturbed people) and studying the effect of noise reduction measures on these segments of the population. The results of this study showed that installing noise absorbents and barriers is highly recommended because it significantly reduces the influence of noise on nearby located environment. However, this solution solves neither the problem of chemical pollutants emissions nor the PM, also generated by traveling vehicles [27]. Of course, there are different effects on those segments of the population, according to the measures taken. Similar research is given in works [28][29][30].
General model-based noise mapping in urban areas is the focus in paper [31], where Ye Liu et al. discuss the working principle of general model-based noise mapping using mobile crowdsensing and acoustic sensor networks (ASNs). The authors propose an interesting solution, that is, collecting sound information via smart devices carried by citizens in an urban area (smart watches, smart bracelets, etc.) and using GPS-related information to send relevant data regarding levels and features of noise pollution. However, this solution would probably need serious discussions about ensuring privacy, as personal mobile devices are used to collect audio files. Connected with this research and with a view on future possible applications, based on work [32], noise-relevant data collection and, in a certain amount, chemical pollutants information could also be possibly collected by the means of connected vehicles. However, an overall view of existing technologies for noise data collection specific for smart cities is given in [33].
Within the collection of sound information in cities, it is also possible to detect specific events from this collection of data. Such a proposal is shown in work [34], where Soccoro J.C. et al. are investigating the possibility to eliminate anomalous noise events (ANE) unrelated to road traffic.
The preoccupation towards the employment of mobile crowdsourcing for data collection is also growing in this domain, as the new developments for smart city applications have issued mobile crowdsensing (MCS), a service designed to enable users to collect and share sensor data in different urban areas. As shown in [35], "MCS services can produce detailed sensor readings and provide means to discover new phenomena in urban environments that otherwise cannot be measured by individuals, such as the occurrence of traffic congestion, or environmental noise pollution monitoring".
As it can be observed from the above literature review, the focus on collecting information regarding urban noise pollution is at high preoccupation of different researchers, and it is also in focus for the development and concept of new road infrastructures, especially in smart cities transformation processes. The purpose of the present research is to investigate a simple, but efficient and effective method for noise data collection employing a UAV that can be used both for building 3D noise maps and to detect ANEs on specific paths, contributing to the correct management of environmental protection and also to public security in large cities. As a future development, the solution of employing UAV-borne sensors could be also used for collecting relevant data on other types of road traffic-generated pollutants, such as CO, CO 2 , NOx, SO 2 , and PM, most of the latter deriving from harmful products, such as harmful byproducts including nitrogen dioxide, carbon monoxide, hydrocarbons, benzene, and formaldehyde. The remainder of this paper is organized as follows: Section 2 is dedicated to Materials and Methods; Section 3, Results; Section 4, Discussion; and, finally, Conclusions.

General Information
Sound data collection is important in many applications for smart cities: control of environment, reduction of traffic-generated pollution, or even detection of security-related public events. Mapping the sound levels and characteristics could also be beneficial for smart city administrators, helping them to better control traffic flowing, delicate environmental areas such as hospitals, and people's mobility, and to take the most appropriate measures to ensure a healthy level of ambient noise in urban areas to provide the best environment for living.
Sound pressure level (SPL) represents the difference in the local ambient atmospheric pressure that is caused by the sound wave. Having as a measurement unit the Pascal (Pa), the SPL can be measured in open air using a microphone. Usually, a microphone's sensitivity is estimated using a sine wave at 94 dB SPL (equivalent to 1 pascal of air pressure) at the frequency of 1 kHz, where the human acoustic sensitivity is high. The microphone's sensitivity is measured as its response to the 94 dB SPL signal. Such a microphone considers the maximum value of sound pressure (measured in dB SPL) at which the microphone can work correctly before producing audible levels of distortion. Close-passing road traffic usually produces noise ranging from 60 to 100 dB SPL (dB SPL = 20log 10 P 1/P 0 − P 1 , where P 1 represents the sound pressure level measured [Pa] and P 0 is the reference sound pressure level of 20 [µPa]). The value of 0 dB SPL corresponds to the lowest threshold of human hearing. In normal conditions, pressure levels might reach quite a large range, spreading from 0.00002 to 200 pascals. However, sensors and devices such as microphones do not have the same dependency on SPL as the human ears. Therefore, A-weighting represents a way to make the sound loudness relative to SPL to be perceived by sensors in a similar way as the human ears. While the theoretical sound frequency band perceived by the human ears is 20-20,000 Hz, in reality, it depends quite a lot on individuals, and especially on their age. Older people usually have a much less broad band perception. However, in the 20-20 kHz frequency range, a normal person perceives lower frequencies with lower sensitivity, and human hearing is generally logarithmic in nature, so decibels, as a measurement unit, match in a large amount with human perception and experience. Usually, normal people perceive sounds with a maximum accuracy in the range of 250-5000 Hz. Therefore, a so-called "A-weighted" curve is used to compensate an acoustic sensor's sensitivity and make it similar, or at least very close to that of the human ears. The A-weighted curve is depicted in Figure 1a, and the conversion from [dB SPL] to [dBA] employing A-weighting adjustments is presented in Figure 1b. harmful products, such as harmful byproducts including nitrogen dioxide, carbon monoxide, hydrocarbons, benzene, and formaldehyde. The remainder of this paper is organized as follows: Section 2 is dedicated to Materials and Methods; Section 3, Results; Section 4, Discussion; and, finally, Conclusions.

General Information
Sound data collection is important in many applications for smart cities: control of environment, reduction of traffic-generated pollution, or even detection of security-related public events. Mapping the sound levels and characteristics could also be beneficial for smart city administrators, helping them to better control traffic flowing, delicate environmental areas such as hospitals, and people's mobility, and to take the most appropriate measures to ensure a healthy level of ambient noise in urban areas to provide the best environment for living.
Sound pressure level (SPL) represents the difference in the local ambient atmospheric pressure that is caused by the sound wave. Having as a measurement unit the Pascal (Pa), the SPL can be measured in open air using a microphone. Usually, a microphone's sensitivity is estimated using a sine wave at 94 dB SPL (equivalent to 1 pascal of air pressure) at the frequency of 1 kHz, where the human acoustic sensitivity is high. The microphone's sensitivity is measured as its response to the 94 dB SPL signal. Such a microphone considers the maximum value of sound pressure (measured in dB SPL) at which the microphone can work correctly before producing audible levels of distortion. Close-passing road traffic usually produces noise ranging from 60 to 100 dB SPL (dB SPL = 20 − , where represents the sound pressure level measured [Pa] and is the reference sound pressure level of 20 [µPa]). The value of 0 dB SPL corresponds to the lowest threshold of human hearing. In normal conditions, pressure levels might reach quite a large range, spreading from 0.00002 to 200 pascals. However, sensors and devices such as microphones do not have the same dependency on SPL as the human ears. Therefore, Aweighting represents a way to make the sound loudness relative to SPL to be perceived by sensors in a similar way as the human ears. While the theoretical sound frequency band perceived by the human ears is 20-20,000 Hz, in reality, it depends quite a lot on individuals, and especially on their age. Older people usually have a much less broad band perception. However, in the 20-20 kHz frequency range, a normal person perceives lower frequencies with lower sensitivity, and human hearing is generally logarithmic in nature, so decibels, as a measurement unit, match in a large amount with human perception and experience. Usually, normal people perceive sounds with a maximum accuracy in the range of 250-5000 Hz. Therefore, a so-called "A-weighted" curve is used to compensate an acoustic sensor's sensitivity and make it similar, or at least very close to that of the human ears. The A-weighted curve is depicted in Figure 1a, and the conversion from [dB SPL] to [dBA] employing A-weighting adjustments is presented in Figure 1b. (a) A-weighting is the standard for establishing hearing damage caused by high sound pressure levels and also used to determine noise pollution. International Electrotechnical Commission standard 61672:2003 presents the curves that are employed in sound meters. All modern sound meters are able to provide A-weighted decibel and C-weighted decibel (dBC) measurements (C-weighted is a flatter curve of sensitivity, used beyond 100 dBs, where the human hearing characteristics also change). In this standardization, there is also the "Z-weighted" curve, a specific characteristic where no compensation is applied, defining the band between 20 Hz and 20 kHz with ±1.5 dB.
SPL, in decibels, is described by Equation (1) where: d-distance from source to receiver [m]; -reference distance (1 m). The atmospheric absorption is given by: where α-the atmospheric attenuation coefficient [dB/km].

Instruments and Initial Setup
In order to initially determine the main characteristics of the traffic-generated noise, some field measurements have been conducted. The instrumentation employed for this purpose was composed of: A-weighting is the standard for establishing hearing damage caused by high sound pressure levels and also used to determine noise pollution. International Electrotechnical Commission standard 61672:2003 presents the curves that are employed in sound meters. All modern sound meters are able to provide A-weighted decibel and C-weighted decibel (dBC) measurements (C-weighted is a flatter curve of sensitivity, used beyond 100 dBs, where the human hearing characteristics also change). In this standardization, there is also the "Z-weighted" curve, a specific characteristic where no compensation is applied, defining the band between 20 Hz and 20 kHz with ±1.5 dB.
where: L AT -equivalent continuous A-weighted sound pressure level; p A (t)-the instantaneous A-weighted sound pressure, [Pa]; p 0 -the reference sound pressure (=20 × 10 −6 ) Pa; T-specified time interval [s]. The attenuation produced by geometrical sound divergence may be computed according to Equation (2): where: The atmospheric absorption is given by: where α-the atmospheric attenuation coefficient [dB/km].

Instruments and Initial Setup
In order to initially determine the main characteristics of the traffic-generated noise, some field measurements have been conducted. The instrumentation employed for this purpose was composed of: The tests were performed in a busy junction in Bucharest, Romania ( Figure 2). The tests were performed in a busy junction in Bucharest, Romania ( Figure 2). The location for initial measurements was chosen considering that the respectiv is known as very busy and with high levels of traffic-generated noise. The main boul is four lanes per direction of traveling, and there are also public transport lines. The tion is situated near a very busy subway station and a large market. Ground level measurements of traffic-generated noise had the purpose of determining actual lev noise pollution, frequency range, and main causes. The microphone was placed 1 m a ground, on a tripod, facing traffic, 5 m from the first lane (on a shoulder of the road). processing of collected information consisted of recording levels on different freque and analyzing SPL.
However, due to the current period, there were some specific weather cond when the tests have been performed, meaning the carriageway was wet, causing incr traffic noise with a slightly different frequency range, compared to noise generated b normal (dry) roadway surface. Still, we consider that this is not a major downside, this case, we managed to collect information in a really peak noise environment, in a busy area. Further field measurements will be taken also in different humidity condi to compare results. Outside temperature was 3 0 C. The area is surrounded in the N-S by relatively high buildings, favoring propagation of noise with many reflections NW part is occupied by a busy market and commercial areas.

Initial Findings
Based on the field-collected information, several diagrams have been drawn termine the main components of traffic-generated noise and their amplitudes, kno the fact that the main health affections caused by traffic noise are increased hyperte risk, myocardial infarction, cardiovascular, and even stroke mortality risk. High nois sitivity has also been recently associated with lower ischemic heart disease mortality and road traffic-generated noise was also associated with Phase 4 psychological ill-h but only among those exposed to 56-60 dBA [36]. In the area where the initial mea ments were conducted, there are a lot of buildings with numerous flats, so it also r sents an important residential area, hence the importance of finding such noise l Initial results are presented in the following diagrams.
It can be observed in Figure 3 that the main traffic noise components (with greater than 50 dBA) increase from 100 Hz and extend up to over 15 kHz, occupying parts of the human hearing spectrum. The peak levels are reached at around 400-50 An interesting fact is that some medical reports and scientific papers claim that a 43 tuning has better effects on the human body, but there are no scientific studies tha port this hypothesis. The sound frequency of "432 Hz … was associated with a slig crease of mean (systolic and diastolic) blood pressure values (although not significa The location for initial measurements was chosen considering that the respective area is known as very busy and with high levels of traffic-generated noise. The main boulevard is four lanes per direction of traveling, and there are also public transport lines. The location is situated near a very busy subway station and a large market. Ground level initial measurements of traffic-generated noise had the purpose of determining actual levels of noise pollution, frequency range, and main causes. The microphone was placed 1 m above ground, on a tripod, facing traffic, 5 m from the first lane (on a shoulder of the road). Post-processing of collected information consisted of recording levels on different frequencies and analyzing SPL.
However, due to the current period, there were some specific weather conditions when the tests have been performed, meaning the carriageway was wet, causing increased traffic noise with a slightly different frequency range, compared to noise generated by the normal (dry) roadway surface. Still, we consider that this is not a major downside, as, in this case, we managed to collect information in a really peak noise environment, in a very busy area. Further field measurements will be taken also in different humidity conditions, to compare results. Outside temperature was 3 • C. The area is surrounded in the N-E and S by relatively high buildings, favoring propagation of noise with many reflections. The NW part is occupied by a busy market and commercial areas.

Initial Findings
Based on the field-collected information, several diagrams have been drawn to determine the main components of traffic-generated noise and their amplitudes, knowing the fact that the main health affections caused by traffic noise are increased hypertension risk, myocardial infarction, cardiovascular, and even stroke mortality risk. High noise sensitivity has also been recently associated with lower ischemic heart disease mortality risk, and road traffic-generated noise was also associated with Phase 4 psychological ill-health, but only among those exposed to 56-60 dBA [36]. In the area where the initial measurements were conducted, there are a lot of buildings with numerous flats, so it also represents an important residential area, hence the importance of finding such noise levels. Initial results are presented in the following diagrams.
It can be observed in Figure 3 that the main traffic noise components (with SPL greater than 50 dBA) increase from 100 Hz and extend up to over 15 kHz, occupying most parts of the human hearing spectrum. The peak levels are reached at around 400-500 Hz. An interesting fact is that some medical reports and scientific papers claim that a 432-Hz tuning has better effects on the human body, but there are no scientific studies that support this hypothesis. The sound frequency of "432 Hz . . . was associated with a slight decrease of mean (systolic and diastolic) blood pressure values (although not significant), a marked decrease in the mean of heart rate (−4.79 bpm, p = 0.05) and a slight decrease of the mean respiratory rate values (1 r.a., p = 0.06), compared to 440 Hz" [37], but this is the case for music, and not for incoherent sounds. Usually, traffic noise is responsible for stress increase.
Sensors 2023, 23, x FOR PEER REVIEW 7 of 27 marked decrease in the mean of heart rate (−4.79 bpm, p = 0.05) and a slight decrease of the mean respiratory rate values (1 r.a., p = 0.06), compared to 440 Hz" [37], but this is the case for music, and not for incoherent sounds. Usually, traffic noise is responsible for stress increase. To determine more precisely the sound components, separate diagrams have been produced for the main harmonic frequencies, as shown in  When analyzing the diagrams in Figures 4-6, another observation that may be drawn is that the levels are most varied in the lower frequency range than in medium and higher frequency ranges ( Figure 4 compared to Figures 5 and 6).
Levels appear to be more constant in midrange frequencies ( Figure 5), exceeding 55 dBA on a regular basis. Still, considering that the initial field measurements have been conducted with a wet roadway surface, it is possible that a higher range of frequencies (above 2 kHz) of the levels of sound pressure are to be favorized by the humid surface of roads. Finally, sound components recorded for higher frequencies also show an increased variation compared to medium range (Figure 6 compared to Figure 5). Spectrum, ground level  Levels appear to be more constant in midrange frequencies ( Figure 5), exceeding 5 dBA on a regular basis. Still, considering that the initial field measurements have bee conducted with a wet roadway surface, it is possible that a higher range of frequencie (above 2 kHz) of the levels of sound pressure are to be favorized by the humid surface o roads. Finally, sound components recorded for higher frequencies also show an increase variation compared to medium range ( Figure 6 compared to Figure 5). Spectrum, ground level   To determine changes in sound characteristics when picking up data at a different altitude, another set of field measurements has been conducted using a flat in a nearby building at 30-m high by placing the instrumentation in this location. The results are presented in the following diagrams.
In Figure 7, recorded higher frequencies experienced a significant decrease, up to 40 dBA, while the peak changed also in frequency, being situated around 1900 Hz. In addition, while the peak level of sound pressure remained around 60 dBA, this was only for a smaller band, from approx. 1 kHz up to 5 kHz, and the rest of the levels experienced a decrease, more obviously in the lower domain.  To determine changes in sound characteristics when picking up data at a different altitude, another set of field measurements has been conducted using a flat in a nearby building at 30-m high by placing the instrumentation in this location. The results are presented in the following diagrams.
In Figure 7, recorded higher frequencies experienced a significant decrease, up to 40 dBA, while the peak changed also in frequency, being situated around 1900 Hz. In addition, while the peak level of sound pressure remained around 60 dBA, this was only for a smaller band, from approx. 1 kHz up to 5 kHz, and the rest of the levels experienced a decrease, more obviously in the lower domain. When analyzing the diagrams in Figures 4-6, another observation that may be drawn is that the levels are most varied in the lower frequency range than in medium and higher frequency ranges ( Figure 4 compared to Figures 5 and 6).
Levels appear to be more constant in midrange frequencies ( Figure 5), exceeding 55 dBA on a regular basis. Still, considering that the initial field measurements have been conducted with a wet roadway surface, it is possible that a higher range of frequencies (above 2 kHz) of the levels of sound pressure are to be favorized by the humid surface of roads. Finally, sound components recorded for higher frequencies also show an increased variation compared to medium range (Figure 6 compared to Figure 5).
To determine changes in sound characteristics when picking up data at a different altitude, another set of field measurements has been conducted using a flat in a nearby building at 30-m high by placing the instrumentation in this location. The results are presented in the following diagrams.
In Figure 7, recorded higher frequencies experienced a significant decrease, up to 40 dBA, while the peak changed also in frequency, being situated around 1900 Hz. In addition, while the peak level of sound pressure remained around 60 dBA, this was only for a smaller band, from approx. 1 kHz up to 5 kHz, and the rest of the levels experienced a decrease, more obviously in the lower domain.  One observation that is worth mentioning is that noise levels are more variable when recorded at higher altitude; this is probably because of side activities' influence (especially in the 50-Hz range, Figure 8).
This remarque is also available for the other frequencies (medium range and higher range, Figures 9 and 10).   One observation that is worth mentioning is that noise levels are more variable when recorded at higher altitude; this is probably because of side activities' influence (especially in the 50-Hz range, Figure 8).  One observation that is worth mentioning is that noise levels are more variable when recorded at higher altitude; this is probably because of side activities' influence (especially in the 50-Hz range, Figure 8).
This remarque is also available for the other frequencies (medium range and higher range, Figures 9 and 10).  Noise spectrum, 30 m above traffic This remarque is also available for the other frequencies (medium range and higher range, Figures 9 and 10).  From the above presented measurements, it results that it is possible to collect relevant information on traffic-generated noise even from above, meaning that a solution to provide a more detailed imaging on sonic pollution is possible to obtain from an automated UAV flight over selected areas, at specific intervals of time.

Information Collection Process
Data collection regarding traffic-generated noise is important in the development of a friendly environment for the inhabitants of a large city. However, it is not economic to   From the above presented measurements, it results that it is possible to collect relevant information on traffic-generated noise even from above, meaning that a solution to provide a more detailed imaging on sonic pollution is possible to obtain from an automated UAV flight over selected areas, at specific intervals of time.

Information Collection Process
Data collection regarding traffic-generated noise is important in the development of a friendly environment for the inhabitants of a large city. However, it is not economic to From the above presented measurements, it results that it is possible to collect relevant information on traffic-generated noise even from above, meaning that a solution to provide a more detailed imaging on sonic pollution is possible to obtain from an automated UAV flight over selected areas, at specific intervals of time.

Information Collection Process
Data collection regarding traffic-generated noise is important in the development of a friendly environment for the inhabitants of a large city. However, it is not economic to install sensors in every important point of a street's network, and there are involved important costs in such as an investment. Still, it is advisable that the next generation for this activity employ sensors in a future connected vehicles network, installed onboard vehicles, using those as "floating cars", or moving sensors, in the traffic flow. Until then, though, a simpler solution is proposed in this research-to employ periodic measurements in different locations using a UAV. The vehicle is equipped with a specific configuration of microphones, able to create a very informative sound imaging and overall information regarding the sonic pollution that moving traffic is producing in large urban agglomerations.
The proposed solution estimates traffic noise based on the use of a drone on which a directional microphone array is mounted. To reduce the background noise, and especially the noise produced by the propeller rotations, an adaptive noise cancellation (ANC) filter developed on the FPGA is employed. Two microphones are pointed upwards, collecting noise from UAV rotors, in order to reduce its level. Adaptive filtering is performed with an optimized Least-Mean-Square (LMS) algorithm. The general architecture of the system is presented in Figure 11, and, in the following subsections, the mathematical concepts and processing modules are described.
ensors 2023, 23, x FOR PEER REVIEW install sensors in every important point of a street's network, and there are portant costs in such as an investment. Still, it is advisable that the next gen activity employ sensors in a future connected vehicles network, installed cles, using those as "floating cars", or moving sensors, in the traffic flo though, a simpler solution is proposed in this research-to employ peri ments in different locations using a UAV. The vehicle is equipped with a s uration of microphones, able to create a very informative sound imaging a formation regarding the sonic pollution that moving traffic is producing agglomerations.
The proposed solution estimates traffic noise based on the use of a dro directional microphone array is mounted. To reduce the background noise, the noise produced by the propeller rotations, an adaptive noise cancellatio developed on the FPGA is employed. Two microphones are pointed upwa noise from UAV rotors, in order to reduce its level. Adaptive filtering is p an optimized Least-Mean-Square (LMS) algorithm. The general architecture is presented in Figure 11, and, in the following subsections, the mathema and processing modules are described. Figure 11. Architecture of the ultrasonic sound localization system using a drone.

Sound Processing
The directional microphone array is arranged on a Kintex 7 FPGA board with a square-shaped ultrasound sensor array containing 24 ME spaced at 11 mm. Figure 12 shows the simulation of the square microphone with the beamforming analysis using MUSIC DOA (Multiple Signal Class rection of Arrival) [38]. DOA indicates the direction from which a propaga ically arrives at a point where a set of sensors is placed [39]. The image on th 12 shows the energy detection of the acoustic signal generated by the drone' with the location position labeling (azimuth and elevation), for the two aco cies characteristic of drones (white color), represented on the frequency spe right). Figure 11. Architecture of the ultrasonic sound localization system using a drone.

Sound Processing
The directional microphone array is arranged on a Kintex 7 FPGA development board with a square-shaped ultrasound sensor array containing 24 MEMS capsules, spaced at 11 mm. Figure 12 shows the simulation of the square microphone array together with the beamforming analysis using MUSIC DOA (Multiple Signal Classification & Direction of Arrival) [38]. DOA indicates the direction from which a propagating wave typically arrives at a point where a set of sensors is placed [39]. The image on the left of Figure 12 shows the energy detection of the acoustic signal generated by the drone's rotors, along with the location position labeling (azimuth and elevation), for the two acoustic frequencies characteristic of drones (white color), represented on the frequency spectrum (bottom right). The configuration of the acoustic processing system is presented in Figure 13-the acoustic signal picked up from the microphone sensor array is converted from pulse density modulation (PDM) to pulse code modulation (PCM) for each sensor. The conversion module consists of a suite of digital filters, and, finally, a multiplication block is inserted (as shown in Figure 14). PDM can be the ideal choice in challenging application environments, thanks to its inherent resilience to noise.
In the first stage, a low-pass comb filter for PDM to PCM conversion is employed. To achieve lossless conversion, we use a half-band compensation filter because the passband of the comb filters is not flat, and the last digital filter of the finite impulse response (FIR) type has the role of eliminating the noise remaining after conversion. Finally, the multiplier converts the signal to a frequency in the audible spectrum. The sampling frequency for PDM is 4.608 MHz for the MEMS sensors used in the construction of the microphone array.   The configuration of the acoustic processing system is presented in Figure 13-the acoustic signal picked up from the microphone sensor array is converted from pulse density modulation (PDM) to pulse code modulation (PCM) for each sensor. The conversion module consists of a suite of digital filters, and, finally, a multiplication block is inserted (as shown in Figure 14). PDM can be the ideal choice in challenging application environments, thanks to its inherent resilience to noise. The configuration of the acoustic processing system is presented in Figure 13-the acoustic signal picked up from the microphone sensor array is converted from pulse density modulation (PDM) to pulse code modulation (PCM) for each sensor. The conversion module consists of a suite of digital filters, and, finally, a multiplication block is inserted (as shown in Figure 14). PDM can be the ideal choice in challenging application environments, thanks to its inherent resilience to noise.
In the first stage, a low-pass comb filter for PDM to PCM conversion is employed. To achieve lossless conversion, we use a half-band compensation filter because the passband of the comb filters is not flat, and the last digital filter of the finite impulse response (FIR) type has the role of eliminating the noise remaining after conversion. Finally, the multiplier converts the signal to a frequency in the audible spectrum. The sampling frequency for PDM is 4.608 MHz for the MEMS sensors used in the construction of the microphone array.    The configuration of the acoustic processing system is presented in Figure 13-the acoustic signal picked up from the microphone sensor array is converted from pulse density modulation (PDM) to pulse code modulation (PCM) for each sensor. The conversion module consists of a suite of digital filters, and, finally, a multiplication block is inserted (as shown in Figure 14). PDM can be the ideal choice in challenging application environments, thanks to its inherent resilience to noise.
In the first stage, a low-pass comb filter for PDM to PCM conversion is employed. To achieve lossless conversion, we use a half-band compensation filter because the passband of the comb filters is not flat, and the last digital filter of the finite impulse response (FIR) type has the role of eliminating the noise remaining after conversion. Finally, the multiplier converts the signal to a frequency in the audible spectrum. The sampling frequency for PDM is 4.608 MHz for the MEMS sensors used in the construction of the microphone array.   In the first stage, a low-pass comb filter for PDM to PCM conversion is employed. To achieve lossless conversion, we use a half-band compensation filter because the passband of the comb filters is not flat, and the last digital filter of the finite impulse response (FIR) type has the role of eliminating the noise remaining after conversion. Finally, the multiplier converts the signal to a frequency in the audible spectrum. The sampling frequency for PDM is 4.608 MHz for the MEMS sensors used in the construction of the microphone array.

Delay and Sum Beamforming
To achieve the Delay & Sum mode, we use the delay and sum algorithm (DAS)-Bartlett beam. The formation of the DAS beamformer consists of applying a delay and a weight to the output of each sensor, and, finally, summing the signals [40]. The delays have the role of obtaining the maximum of the sensitivity matrix for the received acoustic signals. Moreover, by adjusting the delays, the directionality of the microphone area is oriented towards the direction of interest for the reception of the signals, thus achieving the consecutive addition of the individual acoustic signals received from the sensors. This aspect means that, at certain angles of incidence, constructive interference is obtained, and, for other angles of incidence, destructive interference is obtained. Figure 15 shows the way the DAS algorithm works.

Delay and Sum Beamforming
To achieve the Delay & Sum mode, we use the delay and sum algorithm (DAS)-Bartlett beam. The formation of the DAS beamformer consists of applying a delay and a weight to the output of each sensor, and, finally, summing the signals [40]. The delays have the role of obtaining the maximum of the sensitivity matrix for the received acoustic signals. Moreover, by adjusting the delays, the directionality of the microphone area is oriented towards the direction of interest for the reception of the signals, thus achieving the consecutive addition of the individual acoustic signals received from the sensors. This aspect means that, at certain angles of incidence, constructive interference is obtained, and, for other angles of incidence, destructive interference is obtained. Figure 15 shows the way the DAS algorithm works.

Directional Energy
To perform the audio energy analysis, some processing must be performed to identify the energy of the summed audio signals [39][40][41]. Because the energy of a continuous signal ( ) is calculated with the formula: for discrete signals x(n), the energy is calculated with the formula: The hardware implementation for the directional energy calculation is shown in Figure 16. An FPGA multiplier module is employed to form the square DAS beamforming

Directional Energy
To perform the audio energy analysis, some processing must be performed to identify the energy of the summed audio signals [39][40][41]. Because the energy of a continuous signal x c (t) is calculated with the formula: for discrete signals x(n), the energy is calculated with the formula: The hardware implementation for the directional energy calculation is shown in Figure 16. An FPGA multiplier module is employed to form the square DAS beamforming signal. The output of the accumulator module calculates the signal energy based on the acquired samples.

Adaptive Filtering
The adaptive filter for identifying an unknown system has the configuration shown in Figure 17. In this configuration, x(n) represents the input signal at the discrete time at moment n, while y(n) is the output of the system to be determined (i.e., h) [41]. By filtering the input signal through both systems, we can write ( ) = ℎ ( − 1) ( ) and ( ) = ℎ ( ), where ( ) is a vector containing the most recent L input samples [42,43].
Finally, the output of the adaptive system is subtracted from the desired signal, so that the error signal e(n) results in ( ) = ( ) − ( ).

Least-Mean-Square (LMS) Algorithm
The Wiener filter facilitated the development of one of the most popular adaptive algorithms, the LMS algorithm. The popularity of the LMS algorithm is due to its ease of understanding, implementation, and use. At first, the Wiener filter aimed to determine an optimal filter, ℎ , which minimizes the error signal in each iteration [43,44].
In this context, the optimal filter that minimizes the error signal is identified by minimizing a cost function, the mean squared error (MSE), defined as ℎ ( ) = [ ( )], where [. ] represents the statistical averaging operator. After several calculations, the optimal Wiener filter is obtained as ℎ = . Calculation of the correlation vector and the autocorrelation matrix, , involves the knowledge of some statistical estimates, making the steepest slope algorithm unsuitable for practical use [45,46]. If we consider the instantaneous values: the following equation results: which defines the updated equation of the LMS algorithm. The theoretical configuration of an adaptive filter for noise reduction/adaptive noise cancellation (ANC) is presented in Figure 18.

Adaptive Filtering
The adaptive filter for identifying an unknown system has the configuration shown in Figure 17. In this configuration, x(n) represents the input signal at the discrete time at moment n, while y(n) is the output of the system to be determined (i.e., h) [41]. By filtering the input signal through both systems, we can writeŷ(n) =ĥ T (n − 1)x(n) and y(n) = h T x(n), where x(n) is a vector containing the most recent L input samples [42,43].

Adaptive Filtering
The adaptive filter for identifying an unknown system has the configuration shown in Figure 17. In this configuration, x(n) represents the input signal at the discrete time at moment n, while y(n) is the output of the system to be determined (i.e., h) [41]. By filtering the input signal through both systems, we can write ( ) = ℎ ( − 1) ( ) and ( ) = ℎ ( ), where ( ) is a vector containing the most recent L input samples [42,43].
Finally, the output of the adaptive system is subtracted from the desired signal, so that the error signal e(n) results in ( ) = ( ) − ( ).

Least-Mean-Square (LMS) Algorithm
The Wiener filter facilitated the development of one of the most popular adaptive algorithms, the LMS algorithm. The popularity of the LMS algorithm is due to its ease of understanding, implementation, and use. At first, the Wiener filter aimed to determine an optimal filter, ℎ , which minimizes the error signal in each iteration [43,44].
In this context, the optimal filter that minimizes the error signal is identified by minimizing a cost function, the mean squared error (MSE), defined as ℎ ( ) = [ ( )], where [. ] represents the statistical averaging operator. After several calculations, the optimal Wiener filter is obtained as ℎ = . Calculation of the correlation vector and the autocorrelation matrix, , involves the knowledge of some statistical estimates, making the steepest slope algorithm unsuitable for practical use [45,46]. If we consider the instantaneous values: the following equation results: which defines the updated equation of the LMS algorithm. The theoretical configuration of an adaptive filter for noise reduction/adaptive noise cancellation (ANC) is presented in Figure 18. Finally, the output of the adaptive system is subtracted from the desired signal, so that the error signal e(n) results in e(n) = d(n) −ŷ(n).

Least-Mean-Square (LMS) Algorithm
The Wiener filter facilitated the development of one of the most popular adaptive algorithms, the LMS algorithm. The popularity of the LMS algorithm is due to its ease of understanding, implementation, and use. At first, the Wiener filter aimed to determine an optimal filter,ĥ w , which minimizes the error signal in each iteration [43,44].
In this context, the optimal filter that minimizes the error signal is identified by minimizing a cost function, the mean squared error (MSE), defined as J ĥ (n) = E e 2 (n) , where E[.] represents the statistical averaging operator. After several calculations, the optimal Wiener filter is obtained asĥ W = R −1 x p xd . Calculation of the p xd correlation vector and the autocorrelation matrix, R x , involves the knowledge of some statistical estimates, making the steepest slope algorithm unsuitable for practical use [45,46]. If we consider the instantaneous values: the following equation results:ĥ (n) =ĥ(n − 1) + 2µx(n)e(n) Sensors 2023, 23, 1912 15 of 27 which defines the updated equation of the LMS algorithm. The theoretical configuration of an adaptive filter for noise reduction/adaptive noise cancellation (ANC) is presented in Figure 18.

Optimized LMS Algorithm
To improve the performance of the LMS algorithm, the fixed step of updating the coefficients is replaced with a control matrix. Y(n) is considered the scaled unitary control matrix, and: where is the LxL identity matrix and is fixed step size. The autocorrelation matrix for the error coefficients is then calculated, replacing the misalignments, and obtaining the following result: where: -system noise variation; -autocorrelation matrix of the input signal. The ANC simulation result with optimized LMS is presented in Figure 19. It is found that the system noise variation parameter has an important role because if it has a high value, it allows the optimized LMS algorithm to determine the new system parameters in a reasonable calculation time.

Neural Network for Road Traffic Noise Prediction
For the prediction of road traffic noise, a multilayer feed-forward neural network (MFFN) architecture is employed [46][47][48]. The classic structure of the MFFN is shown in Figure 20.
MFFNs are layered feed-forward networks, trained with statistical inverse propagation. The main advantages are determined by the ease of use and implementation, but also

Optimized LMS Algorithm
To improve the performance of the LMS algorithm, the fixed step of updating the coefficients is replaced with a control matrix. Y(n) is considered the scaled unitary control matrix, and: where M L is the LxL identity matrix and µ is fixed step size. The autocorrelation matrix for the error coefficients is then calculated, replacing the misalignments, and obtaining the following result: where: σ 2 h -system noise variation; R x -autocorrelation matrix of the input signal. The ANC simulation result with optimized LMS is presented in Figure 19. It is found that the system noise variation parameter has an important role because if it has a high value, it allows the optimized LMS algorithm to determine the new system parameters in a reasonable calculation time.

Optimized LMS Algorithm
To improve the performance of the LMS algorithm, the fixed step of updating th coefficients is replaced with a control matrix. Y(n) is considered the scaled unitary contro matrix, and: where is the LxL identity matrix and is fixed step size. The autocorrelation matrix for the error coefficients is then calculated, replacing th misalignments, and obtaining the following result: where: -system noise variation; -autocorrelation matrix of the input signal. The ANC simulation result with optimized LMS is presented in Figure 19. It is found that the system noise variation parameter has an important role because if it has a hig value, it allows the optimized LMS algorithm to determine the new system parameters i a reasonable calculation time.

Neural Network for Road Traffic Noise Prediction
For the prediction of road traffic noise, a multilayer feed-forward neural networ (MFFN) architecture is employed [46][47][48]. The classic structure of the MFFN is shown i Figure 20.
MFFNs are layered feed-forward networks, trained with statistical inverse propaga Figure 19. ANC simulation result with optimized LMS algorithm.

Neural Network for Road Traffic Noise Prediction
For the prediction of road traffic noise, a multilayer feed-forward neural network (MFFN) architecture is employed [46][47][48]. The classic structure of the MFFN is shown in Figure 20. that they can approximate any input/output map [49,50]. The input information of the MFFN network for traffic volume, average speed, and vehicles is obtained from a convolutional neural network (CNN), and the gradient and building density information is determined by Equations (9) and (10) presented in the following methodology subsection.

Methodology
The equalization of the continuous sound level, weighted A, is achieved by the constant sound level for a period of time and contains the same acoustic energy as its variation in real time [50][51][52][53][54]. Mathematical expressions are defined by: where p(t) is the A-weighted instantaneous sound pressure, is the reference sound pressure equal to 20 × 10 −5 N/m 2 , and , are the time periods [50].
The hourly traffic volume is defined by the total number of cars that pass on a road section (e.g., intersection) in a specified interval of time. In addition, the total number of cars can be divided into car categories (vehicles classes). The average speed of the cars is set at 50 km/h. The classification of vehicles and the calculation of average travel speeds was carried out by analyzing the video streams obtained from the video camera mounted on the drone and using an artificial convolutional neural network (CNN).
The road gradient is determined by the relationship: where the values of parameters a, b, and L are determined via topographic measurements. The building density was calculated using the relationship: MFFNs are layered feed-forward networks, trained with statistical inverse propagation. The main advantages are determined by the ease of use and implementation, but also that they can approximate any input/output map [49,50]. The input information of the MFFN network for traffic volume, average speed, and vehicles is obtained from a convolutional neural network (CNN), and the gradient and building density information is determined by Equations (9) and (10) presented in the following methodology subsection.

Methodology
The equalization of the continuous sound level, weighted A, is achieved by the constant sound level for a period of time and contains the same acoustic energy as its variation in real time [50][51][52][53][54]. Mathematical expressions are defined by: where p(t) is the A-weighted instantaneous sound pressure, p 0 is the reference sound pressure equal to 20 × 10 −5 N/m 2 , and t 1 , t 2 are the time periods [50]. The hourly traffic volume is defined by the total number of cars that pass on a road section (e.g., intersection) in a specified interval of time. In addition, the total number of cars can be divided into car categories (vehicles classes). The average speed of the cars is set at 50 km/h. The classification of vehicles and the calculation of average travel speeds was carried out by analyzing the video streams obtained from the video camera mounted on the drone and using an artificial convolutional neural network (CNN).
The road gradient is determined by the relationship: Sensors 2023, 23, 1912 17 of 27 where the values of parameters a, b, and L are determined via topographic measurements. The building density was calculated using the relationship: where θ i are the visibility angles for the facades of the buildings in the monitored area and θ t is the maximum visibility angle (opening) of the video camera for the monitored area.
The tests were carried out in five residential areas of Bucharest, Romania, using a UAV on which was mounted an array of microphones, and the analysis was carried out at heights between 80 m and 110 m, for 15-20 min. Figure 21 shows the result of testing the system using a sound source at a height of 100 m. Terms θ and φ represent the reception angles of the directional energy vectors for sound capture in relation to the horizontal and vertical movement of the drone on which the microphone array is mounted.
are the visibility angles for the facades of the buildings in the monitor and is the maximum visibility angle (opening) of the video camera for the mo area.
The tests were carried out in five residential areas of Bucharest, Romania, UAV on which was mounted an array of microphones, and the analysis was carr at heights between 80 m and 110 m, for 15-20 min. Figure 21 shows the result of testing the system using a sound source at a h 100 m. Terms θ and ϕ represent the reception angles of the directional energy vec sound capture in relation to the horizontal and vertical movement of the drone on the microphone array is mounted.

Collected Data Using the Artificial Neuronal Network
To carry out the data processing, a neural architecture with two analysis m was used. The first module is the Supervised Convolutional Machine Learning and the second module is an MFFN architecture for determining the traffic nois The structure is presented in Figure 22.

Collected Data Using the Artificial Neuronal Network
To carry out the data processing, a neural architecture with two analysis modules was used. The first module is the Supervised Convolutional Machine Learning (CNN), and the second module is an MFFN architecture for determining the traffic noise level. The structure is presented in Figure 22. where are the visibility angles for the facades of the buildings in the monitored area and is the maximum visibility angle (opening) of the video camera for the monitored area.
The tests were carried out in five residential areas of Bucharest, Romania, using a UAV on which was mounted an array of microphones, and the analysis was carried out at heights between 80 m and 110 m, for 15-20 min. Figure 21 shows the result of testing the system using a sound source at a height of 100 m. Terms θ and ϕ represent the reception angles of the directional energy vectors for sound capture in relation to the horizontal and vertical movement of the drone on which the microphone array is mounted.

Collected Data Using the Artificial Neuronal Network
To carry out the data processing, a neural architecture with two analysis modules was used. The first module is the Supervised Convolutional Machine Learning (CNN), and the second module is an MFFN architecture for determining the traffic noise level. The structure is presented in Figure 22. Considering the large amount of data associated with video information, a fusion system is proposed that uses deep neural networks to process the prediction output associated with multiple end-to-end systems, to combine all these outputs and create the set of predictions. Given the intrinsic ability of dense layers to detect patterns in the input data and to predict with high accuracy the analyzed samples, we theorize that, by using a Considering the large amount of data associated with video information, a fusion system is proposed that uses deep neural networks to process the prediction output associated with multiple end-to-end systems, to combine all these outputs and create the set of predictions. Given the intrinsic ability of dense layers to detect patterns in the input data and to predict with high accuracy the analyzed samples, we theorize that, by using a set of dense layers, we can exploit the correlation between classifier biases, allowing different combinations of classifiers to strengthen or weaken the discriminative power of the pool of classifiers used in aggregation, based on the patterns learned by the dense network. In addition, because dense layers do not need special assumptions about the domain of the input data, we theorize that their use is also useful for creating a domain-independent aggregation system. The network architecture ( Figure 23) is determined by constructing a search system designed to vary a set of dense network parameters as follows: (i) the optimal width by testing the  First, there is no intrinsic spatial information associated with such an aggregate input (yi), therefore, some data decoration techniques must be developed to generate spatial correlations that can then be exploited by convolutional layers. As such, we create these correlations and relationships, through a process we call the input decorator. We theorize that by creating the decorated input vector for sample-level convolutional processing and applying convolutional filters to this new input, we can create a system where classifiers with similar discriminative powers are arranged in close spatial proximity, supporting or canceling group decisions based on their spatial relationships. Therefore, two problems need to be solved to introduce convolutional layers in aggregation networks: (i) finding a criterion for detecting similarity between classifiers and (ii) creating a similarity-based spatial organization system. Let With the introduction of convolutional layers in the aggregation network, a method was created that can process the similarities between classifiers. However, convolutional networks have been proposed for image processing using the same filters for processing the entire image, and would, therefore, share, in the case of assembly systems, the same First, there is no intrinsic spatial information associated with such an aggregate input (y i ), therefore, some data decoration techniques must be developed to generate spatial correlations that can then be exploited by convolutional layers. As such, we create these correlations and relationships, through a process we call the input decorator. We theorize that by creating the decorated input vector for sample-level convolutional processing and applying convolutional filters to this new input, we can create a system where classifiers with similar discriminative powers are arranged in close spatial proximity, supporting or canceling group decisions based on their spatial relationships. Therefore, two problems need to be solved to introduce convolutional layers in aggregation networks: (i) finding a criterion for detecting similarity between classifiers and (ii) creating a similarity-based spatial organization system. Let p j = y 1j , y 2j , y 2j , . . . , y Mj be a prediction vector representing the outputs of classifier j for all M samples, where the similarity between two classifiers m and n can be calculated by r mn = M (p m , p n ). Finally, by ordering the vector of similarity scores between a classifier m and all other classifiers, we can create a list of the most similar classifiers for each of the N classifiers. The second problem involves using the similarity values calculated in the previous step, and decorated using the given values of r, as follows: With the introduction of convolutional layers in the aggregation network, a method was created that can process the similarities between classifiers. However, convolutional networks have been proposed for image processing using the same filters for processing the entire image, and would, therefore, share, in the case of assembly systems, the same weights between different centroids. Although this is a step forward in classifier correlation processing, we theorize that this correlation between classifiers is different for each individual classifier, and, thus, weights should not be shared between centroids. With this assumption in mind, we propose to create a new type of DNN layer, which we call the "Cross-Space-Fusion", or CSF layer. The implementation of the CSF layer is based on creating a new input decoration method as a new layer in the aggregation architecture.
Therefore, we propose a new input decoration method that creates an additional third dimension, which separately stores similar results of classifiers and similarity scores. Thus, the CSF layer processes these details in the third dimension of the matrix, processing the outputs of the classifiers and the similarity scores of the correspondents together, using the same M function for computing the similarity scores. We also need to quantify that regular convolutional filters may not be optimal for learning correlations, as they may be different from centroid to centroid. Thus, in the CSF layer, we design a larger number of parameters, and, although this leads to an additional depth of the neural network, the number of added parameters is still small, especially compared to the depth and width of the dense architecture. Considering these particularities, the following equation presents the operations performed by the new CSF layer: where c i represents the matrix of predicted outputs for the eight most similar classifiers for a classifier i, while R i represents the respective similarity scores, calculated using the function M. These two matrices create the third dimension of the decorated input. After the decoration stage, the input is fed into the CSF layer. For each group (c i , r i ) of centroids, the network needs to create and learn a set of weights that combine the prediction of the original classifier with the prediction results and clustered similarity scores in the centroids. Thus, the CSF layer contains a set of parameters α and β to be learned, where parameters α are used to control the prediction output of each classifier i and parameters β are used to control the prediction outputs and similarity scores for classifiers similar to i. The output of the CSF layer is then passed through a subsampling layer by the averaging function, yielding a new set of input data of the same size as the original. Two implementations were tested, in which pairs of contrasts were considered for the most similar four classifiers, and for the most similar eight classifiers.
To establish the performance values, it is necessary to calculate the confusion matrix. The confusion matrix consists of real values in one dimension and predicted labels for the second dimension, and each class consists of a row and a column. The diagonal elements of the matrix represent the correctly classified results.
The values calculated from the confusion matrix represent Precision, Recall, and F1score, which is the harmonic average of the accuracy and recall and the accuracy of the classification: Precision: it is defined as the number of samples that contain the existence of the drone. Recall: it is defined as the ratio between the expected number of samples to contain a drone and the number of samples that contain the drone.
F-measure: it is defined as the harmonic average between accuracy and recall. F1-scores are calculated for each class, followed by the average of the scaled scores with weights. The weights are generated from the number of samples corresponding to each class.
The network is trained for 200 epochs with a batch size of 16. This experiment resulted in a classification accuracy of 91 percent, Recall is 0.96, and micro average F1-score is 0.91. The confusion matrix is shown in Table 1 and the classification report is shown in Table 2.

Assessment of Noise Pollution in the Study Area
The evaluation of the noise levels at the measurement points, five residential areas in Bucharest, demonstrates the exceeding of the maximum permissible noise level for commercial-residential areas legislated in Romania, which is 60 dB(A), because the samples obtained showed a sound level that exceeds 75 dB(A), shown in Figure 24. Table 3

Assessment of Noise Pollution in the Study Area
The evaluation of the noise levels at the measurement points, five residential areas Bucharest, demonstrates the exceeding of the maximum permissible noise level for com mercial-residential areas legislated in Romania, which is 60 dB(A), because the sampl obtained showed a sound level that exceeds 75 dB(A), shown in Figure 24. Table 3    Analyzing the results of the L aeq analysis obtained for five residential areas in Bucharest, Romania, presented in Figure 24, it turns out that the permitted noise level is exceeded compared to the legislated level of 60 dB.
Noise pollution has increased to an alarming degree in all areas of the city (residential, commercial, industrial, and quiet) due to the rapid increase in urbanization, industrialization, and other connectivity of transport systems in all areas of the city. Identifying noise hotspots where immediate remedial action is required has always been a challenge. In addition to providing noise propagation in the X and Z directions, 2D noise mapping is an essential method for identifying regions where noise levels may reach a dangerous level. Thus, the comparative examination of all residential areas within a city provides a clear picture of noise exposure. The measurement of noise on the ground is carried out with the help of the sound level meter. In the horizontal direction, the orientation and distance of buildings from the road or flight path, as well as the shape of buildings are the main influencing factors on the noise effect. Based on the analyses carried out from ground level and from a height of 30 m, by making 2D maps, they showed that the surrounding structures (high buildings, brick walls, borders with grills, trees, etc.) significantly influence noise propagation in all directions. That is why the next step was to make 3D noise maps from a height of 100 m (e.g., road traffic and aircraft noise propagate from different directions and produce different effects on the ground floor and top floor of buildings). In particular, the buildings with the highest noise level are not close to the road or railway, and the apartments with high noise are not located close to the ground or the top of the building. The composition of the noise and the existence of multiple 3D noise sources determine the presence or absence of high noise. The high noise level in the residential area may be due to the heavy traffic volume and the road surrounded by tall buildings. 3D maps can also be used by decision makers in the process of formulating noise control strategies or implementing corrective measures.

Discussion
The novelties presented by the proposed solution are: Carrying out the analysis of noise pollution from traffic with the array of microphones mounted on a UAV. Use of the best algorithms reported in the field (energy, neural networks) for analyzing the solution. Use of a combined neural network structure, a CNN module for detecting the number of vehicles in traffic, calculating the average travel speed, and identifying vehicles (cars, trucks, and motorcycles) and the 2nd MFFN (multilayer feed-forward) module neural network for prediction. Proposal to create a new type of CNN layer, named "Cross-Space-Fusion", or CSF layer. The implementation of the CSF layer is based on creating a new input decoration method as a new layer in the aggregation architecture. Proposing a new input decoration method for the CSF network, which creates an additional third dimension that separately stores similar results of classifiers and similarity scores. Thus, the CSF layer processes these details in the third dimension of the matrix, processing the outputs of the classifiers and the similarity scores of the correspondents together.
To determine the level of noise pollution, the specialized literature only mentions measuring the ground level, and the solution proposed in the article can determine the sound level from a height (100 m), but also separate the individualized sound levels of vehicles in traffic. In the capital city of Romania, decibel limits are exceeded in many locations, every day. Unfortunately, the action plan for reducing noise is slow, or not developing, while the residents of the capital are increasingly affected by noise pollution. The level of noise and vibrations generated by the movement of means of transport belonging to the local transport society also have an impact, among the causes of the phenomenon being worn carriageways or the operation of a carriageway whose constructive solution does not correspond to the urban characteristics of the area (e.g., narrow streets, with fronts of buildings located near the roadway). There is a need for the development of an urban noise monitoring plan and adequate solutions for decreasing noise impact. The greatest impact is felt primarily by children and elderly people.
We believe that a solution to making noise reduction measures more efficient, while optimizing costs and efforts, could be the development of combined strategies to reduce traffic noise and air pollution.

Conclusions
Noise monitoring and prediction can raise various complex problems determined by non-linearity, and collected data are very large. The present research has been carried out in two phases: (i) collecting relevant information from traffic with traditional instruments, at the ground level and at an altitude of 30 m above and analyzing the data to determine patterns in noise behavior, and (ii) designing and testing a UAV-borne array of microphones and collecting similar data to determine feasibility of the solution. The solution is based also on previous research experience of the authors [55]. The proposed solution presented here is intended to solve many of these problems due to the use of neural network modules for the analysis of large volumes of data for noise prediction. Moreover, neural networks can be implemented in FPGA and VLSI, being superior compared to statistical and deterministic models. As future improvements and developments, we would like to enhance the filtering methods to suppress non-informative noise coming from the UAV's own rotors [56][57][58] and lateral side noise sources by modifying the pattern of sensitivity in the horizontal plane, according to altitude of the UAV. Another concern is to choose the correct UAV type to keep environmental impact in urban sensitive areas (hospitals, residential areas, etc.) as low as possible [59]. In addition, a second direction of improvement that we are envisaging is to increase the directivity of the microphone array, based on recent research [60][61][62][63][64][65][66], and introduce the possibility to remotely configure it, according to local geometrical and environmental conditions.