Evaluation of Acoustic Noise Level and Impulsiveness Inside Vehicles in Different Traffic Conditions

Recently, the issue of sound quality inside vehicles has attracted interest from both researchers and industry alike due to health concerns and also to increase the appeal of vehicles to consumers. This work extends the analysis of interior acoustic noise inside a vehicle under several conditions by comparing measured power levels and two different models for acoustic noise, namely the Gaussian and the alpha-stable distributions. Noise samples were collected in a scenario with real traffic patterns using a measurement setup composed of a Raspberry Pi Board and a microphone strategically positioned. The analysis of the acquired data shows that the observed noise levels are higher when traffic conditions are good. Additionally, the interior noise presented considerable impulsiveness, which tends to be more severe when traffic is slower. Finally, our results suggest that noise sources related to the vehicle itself and its movement are the most relevant ones in the composition of the interior acoustic noise.


Introduction
Acoustic noise has received much attention in the automotive industry due to the increasing demand for in-vehicle voice assistant systems [1]. Noise evaluation is an essential issue in this field, enabling the design of in-vehicle multimedia systems with better noise control and fewer disturbances that degrade acoustic communications performance in a vehicle interior. Such disturbances may have different sources, and identifying them in order to focus on the most dominant sources will result in more efficient noise controls and optimized systems for audio applications. The statistical characteristics of the noise are crucial to define and configure the active noise control techniques [2]. Therefore, this work presents an experimental evaluation of the characteristics of the acoustic noise inside a vehicle under the perspective of an in-vehicle voice reception system. We also capture sources related to the traffic that might impact a car's interior environment, providing insights concerning the acoustic noise and its source in the vehicle interior.
Our previous analysis [3] showed how some factors such as traffic are correlated to the average noise power inside the vehicle. For that evaluation, a total of 194 noise samples were collected using a measurement setup installed in a C4 Lounge from Citroën. In this paper, we extended our previous work on the subject in the following ways: • Acquiring an additional 254 noise samples, which were collected in a different vehicle and a different time of the year from our previous work [3]; • Validating our setup using a sound pressure level meter to verify the power levels measured; • Improving the statistical evaluation of the selected variables, examining which of them have more influence in the noise power levels inside, comparing the results between both measurement sets, and providing a more in-depth analysis of the effect of the car's windows; • Evaluating the degree of impulsiveness of the interior noise and comparing the AWGN and alpha-stable noise models; • Analyzing the window size for the estimation of alpha-stable distribution parameters.
Finally, we highlight that the complete collection of measurements, including information about the conditions and location, is freely available [4] and can be helpful for different purposes.
This paper is organized as follows. Section 2 presents a brief overview of the literature on vehicle interior noise. The measurement campaigns and setup are described in Section 3. Section 4 presents the statistical methods used to analyze the collected data, and Section 5 discusses the results. Finally, in Section 6, we present our final remarks.

Related Works
The topic of acoustic noise or sound quality in vehicles is a multidisciplinary subject that is related not only to the health and comfort of drivers and passengers but also to the appeal of vehicles as a product. Thus, since the beginning of the car industry, many studies have been developed exploring different aspects of sound quality. Many of these studies focus on the effects of traffic noise on human health. There is research on the impact of noise on sleep and mental health [5][6][7], on the development of cognitive processes in children [8], and on the increase of risk of heart diseases [9,10] and diabetes [11]. Recent studies present contributions to identifying and optimizing vehicle interior noise [12][13][14][15][16][17], approaching different noise sources. The sound quality of the vehicle cabin is also essential for various in-vehicle applications, such as multimedia [18][19][20], security [21][22][23], assistive [24][25][26], and autonomous vehicles [27,28]. Furthermore, several studies in psychoacoustics seek to establish objective metrics to assess the subjective sound quality perceived by vehicle passengers [29,30], which is an essential factor for consumer satisfaction and the marketability of a vehicle [31].
Given the importance of the subject, several studies were developed to characterize internal noise in vehicles. As the noise perceived inside the cabin is a composition of noise sources of different natures, such as wind, engine, and rolling, most works focus on describing the contribution of specific components. Knowledge of the most relevant sources can indicate the main challenges in acoustic systems and the best way to represent them mathematically. Therefore, its characterization is essential for anyone interested in vehicular acoustic systems.
According to the literature [32,33], the different contributions to in-vehicle noise can be classified according to their source. Noise can originate from the structural vibrations of the car and its components or from aerodynamic excitations [34] transmitted by the cabin of the car. For instance, the noise created by the tires/road interaction is usually separated into two components [33]: structural low-frequency noise (below 500 Hz) [35,36], and aerial noise, with medium and high-frequency contributions (above 500 Hz). Understanding the characteristics of this particular noise source is essential for the development of low-noise roads [37][38][39]. Other works investigate the sound quality of specific vehicle phenomena and components, such as closing car doors [40,41], engine noise [32,42], Heating, Ventilation, Air Conditioning (HVAC) systems [43], seat belts [44], and wind [45,46]. Some studies use linear regression modeling [47] to establish correlations between objective psychoacoustics metrics, such as pitch, roughness, volume, and others [48], and subjective sound quality metrics, which are often obtained from jury reviews [48]. With a similar aim, some works use machine learning techniques, such as clustering and neural networks, to model the contribution of one or multiple sources [49][50][51] or even the human auditory system [52]. However, these models usually use psychoacoustics metrics, focusing on predicting the assessment of the subjective perception of the sound quality of a passenger or driver. Many works contribute to noise prediction using artificial intelligence, providing a model for the traffic noise [53][54][55] and sound quality prediction [56][57][58] contexts.
Although there are a wide variety of works on the evaluation of interior vehicle noise, most of them focus on studying one or a few noise sources at a time [32]. The works on modeling and prediction are usually elaborated from the perspective of psychoacoustics, whose metrics may not be relevant for a voice processing system in a vehicular multimedia center. In addition, these studies are often conducted in laboratory or highly controlled environments [31,50,59]. Thus, relevant sources of acoustic noise present in a real driving scenario in an urban environment, and the composition of their effects are potentially disregarded.
Additionally, in our literature survey, studies that seek to consider the composition of multiple sources for in-vehicle noise with probability distribution modeling were not found. One of the most popular noise models in communication systems is the Additive White Gaussian Noise (AWGN) model [60], based on the Gaussian probability distribution. Although the AWGN model is of great importance, it is not always adequate [61]. Impulsive phenomena, in which the noise changes suddenly to a value far from the mean in a short period of time, can affect the performance of signal processing solutions based on traditional Gaussian modeling [62][63][64]. In the context of this work, impulsive noise is relevant to source location [64][65][66][67][68], voice processing [69,70], and noise comfort and pollution [71,72]. In several of the works cited above, impulsiveness is modeled by an alpha-stable distribution. Alpha-stable distributions are widely used to represent a range of phenomena for which non-Gaussian behavior is expected. The flexibility of its parameters, which allow for changes in the symmetry, dispersion, and tail mass of the distribution, as well as the Generalized Central Limit Theorem and empirical evidence [73] justify the use of stable models in applications such as econometrics, computing, meteorology, medicine, and image processing, among others [73,74].

Measurement Campaigns
The evaluated interior noise data were obtained in two separate measurement campaigns. Both campaigns took place in Natal, Brazil. Located in northeastern Brazil, the city has an area of 167 km 2 and a typical tropical climate with warm temperatures and high humidity throughout the year. The first campaign was carried out in June and July of 2019. The samples collected and the result of their analysis were presented in our previous work [3]. To extend the results of the previous work, we also conducted a second measurement campaign, which occurred between April and May of 2021. It is worth emphasizing that the second campaign was carried out during the SARS-CoV-2 pandemic. Due to social distancing measures such as the closing of schools, restaurants, and public spaces, the traffic patterns in the city were altered. In particular, during the weeks with high-level restrictions, traffic was less heavy than expected in some regions of the city.
Even though both campaigns use the same measurement setup and have the same objectives, differences between the results for each are expected. Factors such as the model of the cars, the driver, the months of the year, and different traffic patterns, among others, can influence the interior noise and the data collection process. In this work, we aim to evaluate if the results for both campaigns are compatible and exhibit the same trend, especially considering the higher amount of measurement points in the second campaign.
The sampling points were located in different streets and avenues, aiming to spread uncontrolled conditions such as crowd noise. Figures 1 and 2 show the sampling locations. The colors in the marking represent the traffic conditions associated with each sample. These conditions are defined following Google Maps' traffic conditions policy [75]. There are four possible traffic conditions, according to the mean speed of the cars in the street. Table 1 lists the traffic categories and the speed intervals used by the map application to classify the traffic in each street. Moreover, the car used in the data acquisition was always at the speed interval for that specific measurement. The exact car speed value can be found in our dataset [4]. It is also expected that on average, the nearby cars are in the same speed interval. The colors indicate the traffic condition at the time of the measurement, as described in Table 1. In addition to the traffic condition, each of the sampling points has different characteristics. Hence, we measured the noise in many different areas of the city to represent the different noise sources of each environment. We aimed to measure each traffic category in different streets and times of the day. For example, for the Green condition, we collected samples in the federal highway BR-101, which always presents fast but intense traffic flow with multiple lanes and also measured in the coastal highway, which comparatively has fewer cars and lanes but presents sources such as the wind and the ocean. In addition, the measurements were done at different times of the day for each location in order to represent the variations in traffic patterns throughout the day.
All measurements were obtained in asphalt with smooth road surface conditions with no potholes or unevenness. The HVAC systems were turned off. The participants were quiet during measurement, and all objects that could create noise during the car's movement were removed. Furthermore, to avoid bias in the impulsiveness analysis, we removed from the dataset some of the samples that could represent outliers. We checked each measured audio sample for highly impulsive events that are not part of the observed variables or that could not be adequately represented by the amount of collected samples: While one can argue that many of the events listed above are common in a typical traffic scenario, it should be noted that the main objective of this work is to analyze the contribution of the controlled variables to the noise level inside a car and how these variables affect the impulsiveness of this noise. For example, a car horn is an event that will be impulsive and contribute significantly to the noise observed inside the car regardless of the traffic conditions.  Table 1.

Controlled and Uncontrolled Variables
In addition to the traffic condition explained above, two other variables were controlled during the campaigns, as presented in Table 2. They are the position of the car windows and the maximum speed of the car at the moment during a measurement. These variables, along with traffic conditions, were chosen based on our literature review and due to being easily controllable. Care was taken to obtain noise samples for all combinations of these variables. The car speed was always compatible with the speed interval of the traffic conditions described in Table 1. Table 2. Description of the controlled environment variables.

Variable Possible Values Notes
Windows positions Open; Closed All four windows on the same position. Traffic Black; Red; Orange; Green Speed interval (see Table 1). Speed 0-80 km/h Maximum value during measurement interval.
The three variables in Table 2, along with time and location, are the variables we could control during our experiment. However, each measurement is affected by far more variables. The model of the car, the type of road, the weather, the driver, the number of people on the streets, and many other factors can affect the noise characteristic inside the vehicle. Some of these variables have fixed values (such as the car model or the absence of rain). For the other uncontrolled variables, care was taken to represent their effects in the sample data. For instance, we drove through many different streets and avenues to account for variations in the type of asphalt between roads. We expect that the selected controlled variables will significantly affect the interior noise levels and impulsiveness [32,34]. However, it is worth bearing in mind that other factors not accounted for in this experiment may also influence the noise.

Measurement Setup
The measurement setup used is an adaption of the one presented in [76], which was composed of an Analog-to-Digital Converter (ADC) AC108 embedded in an expansion board for Raspberry Pi, called ReSpeaker Core v1 (MT7688) board ( Figure 3 [77]). The instrument's specifications are described in Table 3. The recordings were stored using a Raspberry Pi 3 (Model B), which also controlled the setup. Each microphone records a five seconds-long audio sample, although only the samples from the first channel were used. The car selected for the first campaign was a C4 Lounge (Figure 4), and the one used in the second campaign was a C3 ( Figure 5), which are both from Citroën and with automatic transmission. The boards were positioned above the cars' panels, as pictured in Figure 6. This position was chosen to mimic that of microphones in vehicle multimedia systems.

Statistical Methods
Our goal is to understand how the selected variables affect the characteristics of the acoustic noise inside the vehicle, namely the noise levels and the impulsiveness. To evaluate the first, the average power of each noise sample was computed, and statistical analysis was performed to assess the most relevant variables. To evaluate impulsiveness, we fit the noise data to the AWGN and alpha-stable models and compare their performance and how each variable affects the distribution of the models' parameters, as described in Figure 7.

Average Power
The average power of the measurements is computed for each individual acquisition as follows: where N is the length of sampling, and x(n) is the voltage signal from the microphone. Usually, the power of acoustic signals and noise is measured using specific tools, such as a Sound Pressure Level (SPL) meter. We measured some sampling points with our setup and an MSL-1352C (Minipa) SPL meter to verify the measured power levels, whose specifications are described in Table 4. The meter was set to use A-weighting, slow response (1 s) and a range of 30 to 130 dB, as instructed by the meter's manual for measurement of the SPL of an oscillating noise. The SPL was positioned near the ReSpeaker setup, in the car's panel, and acquired data simultaneously as the microphone in the ReSpeaker. Unlike the ReSpeaker acquisition, the SPL measurements were manual. Figure 8 compares the power levels measured by the instruments. Despite the instrumentation errors present in this scheme, such as the positioning of the instruments and the reading of the SPL meter by a human, we can observe a linear relationship between the values measured by each instrument.

Regression Analysis
Each noise sample has four features: average noise power, windows positions, traffic conditions, and speed. Some of them are numeric in nature (power and speed), while the others are categorical. The Window variable was encoded with 0 s and 1 s due to being a binary variable. The Traffic variable is encoded in descending order of severeness, where Green corresponds to 3, and Black corresponds to 0.
We employ tools from descriptive and inferential statistics to analyze how the average power is related to the other variables. Boxplots are used to compare the power levels between different categories and histograms and density curves to visualize how it is distributed.
Next, we create linear regression models for the continuous and Traffic variables. The objective of the models is to highlight the relations between the selected variables and the power levels measured inside the car. The models obtained cannot be considered noise models for the acoustic noise in this scenario, as there are not enough samples for this type of characterization nor are sufficient measurements conditions being considered (such as multiple models of vehicle, for instance). Nevertheless, the use of linear regression models allows us to visualize the relations between the variables. Even if this relation is only approximately linear, the models can identify and quantify the effect of input on the output of a system. Consider a set of N pair of observations (x i , y i ) = (x 1 , y 1 ), (x 2 , y 2 ), ..., (x N , y N ); the simple linear regression model (with a single independent variable) for this set of observations is given by: with y as the dependent or response variable, and x as the independent, explanatory, or regression variable, and β 0 and β 1 as the regression or model coefficients [78]. The model coefficients are estimated using the Ordinary Least Squares (OLS) method [78]. Three Goodness of Fit (GoF) metrics [79] are used to compare the results of the models: Mean Squared Error (MSE), coefficient of determination (R 2 ), and F-statistic. We use the logistic regression model [79] for categorical data and the McFadden Pseudo-R 2 [80] coefficient as a GoF metric to compare the models.

Impulsive Noise and Alpha-Stable Model
One of the most ubiquitous noise models in communications systems is the AWGN model, which is based on the Gaussian distribution. The use of the Gaussian distribution is motivated by the Central Limit Theorem, which states that the distribution of the sample mean of N independent and identically distributed (i.d.d.) random variables with finite variance converges to a Gaussian distribution as N → ∞ [79]. Thus, the distribution is suited for modeling the cumulative effect of many independent noise sources.
Despite its importance, the Gaussian model is not always the best choice to represent the noise in a communication channel [61]. Impulsive phenomena, when the noise varies subtly and greatly from the mean in a short period, can jeopardize the performance of solutions and strategies based on the traditional Gaussian approach [62][63][64]. Impulsive noise is present in several scenarios of communication systems, such as powerline communications [81], OFDM in wireless networks [82], and sensor networks [83]. Unlike the Gaussian, the alpha-stable distribution can have infinite variance, so it better represents data with heavy tails [61,84].
The family of stable distributions, also known as Lévy's alpha-stable distribution, comprises a class of distributions that satisfy the stability property [73]: a random variable X is said to be stable if for two independent instances X 1 and X 2 of X and for any positive constants a and b, the variable aX 1 + bX 2 has the same distribution that the variable cX + d, for c > 0 and d ∈ R.
In other words, a linear combination of i.d.d. stable variables will have the same distribution, except possibly for the location and scale parameters. Another essential property of the stable distributions is that they generalize the Central Limit Theorem. Relaxing the constraint of finite variance, the limit of the sum of i.d.d. random variables tends to a stable distribution. The Gaussian distribution and the traditional Central Limit Theorem are special cases when the variances of the random variables are finite [73].
The alpha-stable distribution has several different parametrizations. As found in recent literature [73], the most common form is to describe the distribution by its characteristic function φ(t): and The four parameters in the alpha-stable distribution are as follows: • α, the characteristic exponent, satisfying 0 < α ≤ 2. It is the main shape parameter of the distribution, describing the tails of the distribution. Smaller values of α indicate a heavier tail, meaning a higher probability of extreme events. Conversely, values approaching 2 indicate a behavior closer to that of a Gaussian distribution. When α = 2, it is equivalent to a Gaussian distribution; • β, the skewness parameter, is limited to β ∈ [−1, 1]. It controls the skewness of the distribution. For β = 0, the distribution is symmetric. If β > 0, then the distribution is right-skewed. If β < 0, then the distribution is left-skewed; • γ, the scale parameter, which is always a positive number (γ > 0). This parameter behaves similarly to the variance in the Gaussian distribution. It determines the dispersion around the location parameter. It should be noted that the variance of an alpha-stable variable is only defined for α = 2; • δ, the location parameter, which shifts the distribution to the left or to the right by an amount δ ∈ R.
Lastly, we highlight that the distributions with β = 0 and δ = 0 form a particular family of symmetric stable distributions known as Symmetric α-Stable (SαS). These distributions share many characteristics with the Gaussian distribution. Both are continuous, unimodal, and bell-shaped distributions. The main difference is in the decay of the tails: the Gaussian curve has an exponential decay, while the SαS has an algebraic one [84]. These properties make the SαS model a common choice to model problems in signal processing where the distribution is similar to the Gaussian but with heavier tails [63,84,85].
To evaluate the degree of impulsiveness in the interior vehicle noise, as well as to compare the performance of the two models, we estimate the parameters of a Gaussian and a stable distribution fitted to all the collected samples. The fitting of the noise samples to the models is obtained with Maximum Likelihood Estimation (MLE) [86].
The application of MLE for the Gaussian case is straightforward. In the case of the stable distributions, for which no closed expression for the probability density function exists, the MLE must be found with numerical methods and optimizations routines [73]. In this work, we computed the MLE using MATLAB, which bases its implementation on the works of John P. Nolan [87,88]. To obtain a starting point to the optimization routine, MATLAB uses the method described in [89]. In this approach, the four parameters are derived in terms of five quantiles of the data. Although the accuracy of this method is inferior, its low computational cost makes it convenient to provide a starting point for other estimation techniques. Table 5 shows the number of samples collected in each measurement campaign as well as the encoding used for the variables Traffic and Window. There is a balanced amount of samples for the two window position situations in both campaigns and in Campaign 1 for the traffic categories. In the case of Campaign 2, there is a higher amount of Black category samples. The change in traffic patterns imposed by the COVID-19 pandemic sanitary measures made it difficult to obtain samples in this category, as traffic became lighter than usual for some of the roads where traffic jams usually happen. However, it was possible to obtain an equivalent number of samples for the Black category to that of Campaign 1.

Results and Discussions
In this section, we analyze the results using the methodology described in Section 4 and illustrated in Figure 7. The analyses were performed using acoustic signals measured with setup and constraints described in Section 3.3 for each condition described in Table 2. Finally, the evaluation metrics used are described in Sections 4.2 and 4.3.  Figure 9 shows the average power distributions of the collected noise samples for both campaigns. The negative density in the second histogram is merely a consequence of its mirroring for illustrative purposes. The range and distribution of values are similar for both campaigns, and a visual inspection indicates that most samples are concentrated in the center of the range.

Noise Power Level Analysis
The histograms show that the campaigns have measurements with compatible power values. In the following subsections, an analysis of the average power in relation to the other three variables of the study (traffic, window position, and speed) is carried out individually, which is followed by an analysis of the three variables together. Figure 9. Comparison of the distribution of the power levels measured in both campaigns. The histograms were normalized, the total area of each being unitary. The negative density in the second histogram is merely a consequence of its mirroring for illustrative purposes. Figure 10 presents the box diagrams of the power levels grouped by traffic conditions for Campaigns 1 and 2, respectively. For both cases, the boxes are ordered from Black to Green in ascending order of power, indicating that noise levels inside the car tend to increase as traffic becomes less severe. The main difference between the results of the campaigns is the greater variation in power in the first three traffic categories for the second campaign, which is illustrated in its taller boxes and lines. For example, there is a greater intersection between the power levels for the Yellow and Green categories in the second campaign, with samples from the Yellow category reaching higher power levels. Despite these differences, both results indicate the same trend toward higher noise levels associated with more fluid traffic conditions. These results suggest that there is some significant correlation between the two variables. To assess this relation, a linear regression model was built for each campaign in the form:

Traffic Analysis
where the intercept variable is a 0 and the coefficient of the explanatory variable (power level) is a 1 . We chose to fit a linear model due to the ordered nature of the traffic data and the trend implied in Figure 10.
The models obtained are shown in Figure 11. The circles represent the actual traffic condition associated with each sample, while the diamonds represent the traffic predicted by the model. Both models show that higher power levels imply a less severe traffic condition, which is in accordance with the behavior shown in the box diagrams. The range of predictions for each traffic category is centered around the correct value for the Traffic variable, although some variation causes overlap between the categories. For instance, this can be seen in the red diamonds centered around Traffic = 1.
A comparison between the two models in Figure 11 highlights the greater variability of the data in Campaign 2, which can also be seen in the box diagrams. This is also reflected in the GoF metrics listed in Table 6. The first model has a higher value for R 2 and for the F-Statistic and a lower value for MSE, confirming its better performance. In fact, the results indicate that 72% of the power variability is explained by the variation in traffic in data of the first campaign. This suggests a strong relationship between the variables and that a large part of the observed indoor noise power is associated with the traffic level. The high F value and its low p-value confirm the significance of this relation. Although the results of the second model are inferior due to the higher dispersion of power in each traffic condition, they too indicate a significant association between power and traffic, with 61% of the variation explained by the model. In both campaigns, the MSE has a low value. However, due to the categorical nature and scale of the traffic data, the MSE is not an adequate GoF metric for the Traffic models. Table 6. Regression coefficients (with 95% confidence interval) and GoF metrics for the power and traffic models.  Figure 12 presents the box diagrams for the power levels grouped by the position of the car windows for both campaigns. Comparing the campaigns, the power levels for the second are slightly higher than the first. Unlike the Traffic variable, the layout of the samples is visually very similar between the two categories. In both campaigns, power levels tend to be higher when the windows are open, which is expected, as there is more coupling of outside noise inside the car. However, there is a significant overlap in values between the two categories. For Campaign 1, only 8.42% of the measurements in the Open group have a power greater than the maximum power in the Closed group, while for Campaign 2, the percentage is 4.72%. This implies few distinctions in power values when the car windows are open or closed. This result goes against expectations, as the qualitative difference is significant when perceived by a passenger or when listening to the recordings of this experiment. However, this sensorily-perceived difference does not manifest itself in an expressive difference in the average power level received by a microphone located close to the vehicle's panel, which can be advantageous for voice command applications.

Window Analysis
To verify whether this observation has any bias in relation to the Traffic variable, Figures 13 and 14 present the box diagrams of the samples grouped by traffic and window position for both sets of measurements. Once more, the results for the two campaigns are in agreement. The most significant difference between the two windows positions occurs in the Black category when the vehicle is stopped in a traffic jam.
We speculate that in this case, the absence of movement of the car makes external noises predominate, and the position of the windows becomes more significant than in other scenarios. As it gains speed, the noise generated by the vehicle becomes more relevant, so that for Red or Yellow traffic, the difference between the power levels is small between both states of the windows. Finally, when the vehicle reaches a higher speed (category Green), there is again a noise trend of higher power values when the windows are open. It is speculated that at these speeds, the noise produced by the wind, thanks to the car's movement, has a more significant contribution.  Then, the overall effect observed is of slightly higher noise power when the windows are open. As there is little distinction between the power levels of the groups, it is expected that a model that takes into account only the average power of the samples will be unable to represent the data well. Given the categorical and binary nature of the variable in question, two logistic models are obtained in the following form: where b 0 and b 1 are the model coefficients. Table 7 presents the coefficients and GoF metrics, while Figure 15 shows the models obtained. Visually, it is clear that there is little differentiation between the categories. If the models were used for a classification task, the accuracy would be substandard. The Pseudo-R 2 values of both models are low and close to each other. These results suggest a weak influence of the window position on the measured internal power. It is important to emphasize that this result considers all traffic categories. Figures 13 and 14 show that the distinction between power levels is greater for extreme traffic categories. Finally, in contrast to the Traffic variable, the performances of the Window models are quite similar for the two measurement campaigns. Figure 15. Window data and predictions using the logistic model for both campaigns. Table 7. Regression coefficients (with 95% confidence interval) and GoF metrics for the power and window models.

Speed Analysis
The histogram in Figure 16 shows the speed distribution of the measurements from both campaigns. There is a larger number of measurements for zero velocity. These points correspond to the Black traffic category, when the car is stationary or at a very low speed due to traffic jams. The speed of the car during measurement is linked to the traffic condition at the time of measurement (Table 1). Therefore, there is a greater concentration of measurements in the ranges between 0 and 20 km/h (Red), and 20 and 40 km/h (Orange), when compared to the longer range of 40 to 80 km/h (Green). In both campaigns, we sought to measure at different speeds to take into account the entire speed range of each traffic level. Of the three variables analyzed, speed is the only numerical one in nature. Therefore, a linear regression model is obtained in the form: where c 0 and c 1 are the regression coefficients. Figure 17 shows the predictions of the models and the actual speed values, while Table 8 lists the GoF metrics. The figures indicate that a higher speed implies higher noise levels, which is expected, as higher speeds result in more engine noise and more vibrations in other parts of the vehicle. Figure 17. Speed data and predictions using the linear models. Table 8. Regression coefficients (with 95% confidence interval) and GoF metrics for the power and speed models. Visually, both models show a good fit to the data. This is also confirmed by the GoF metrics. The R 2 value indicates that approximately 68% and 57% of the variation in power is explained by the variation in speed in models 1 and 2, respectively. The high F-value and the low p-value confirm that the relation between the variation in power and the variation in speed expressed by the models is unlikely to result from chance. Similar to what was discussed for the Traffic models, the dispersion of power levels is greater in the Campaign 2 samples. The circles in Figure 17 illustrate the greater variability of power levels in the second campaign in the speed range from 0 to 40 km/h. This interval is in accordance with the greater variability in power values observed when comparing the box diagrams shown in Figure 10 for the Black, Red, and Yellow conditions. The models obtained for the variables Traffic and Speed are similar in appearance and GoF metrics. On the other hand, the difference between the two measurement groups is smaller for the Window variable. This set of results suggests that the average power level measured inside the vehicle is generally more influenced by the car's movement, which depends on its speed and traffic conditions, than by external noise sources. Furthermore, the similarity between the adjustments of the variables Traffic and Speed suggests a strong correlation between the two, arising from the way traffic levels are defined in Table 1.

Multiple Variable Analysis
The Window variable analysis demonstrates that the windows have a weak influence on the vehicle noise level. Nonetheless, the traffic and speed variables contribute significantly to the vehicle interior noise in the traffic and speed analyses. One way to check the strength of the relationship between variables and noise level is to calculate their cross-correlation. Figure 18 presents the correlation matrix for the two datasets. In both, noise power has a high correlation with traffic and speed and a low correlation with window position.  Figure 18 also shows a high correlation between Traffic and Speed. A high correlation is to be expected due to how the traffic conditions are defined by Google (Section 3.1). In the context of statistical modeling, the variables convey roughly the same information about the noise power inside the vehicle. To better illustrate this redundancy, two models of the average noise power are created using the other characteristics as independent variables ( Figure 19 and Table 9). The models are a linear regression with categorical and numerical variables, in the form where d 0 is the intercept; d 1 is the coefficient of Speed; d 2 , d 3 , and d 4 are the coefficients added when the traffic condition is red, yellow, or green, respectively; and d 5 is the coefficient added when windows are open. Table 9. Model coefficients and GoF metrics for the power model vs. other variables. Coefficients d 2 , d 3 , and d 4 from Equation (9) determine a base power level for each traffic category. This is illustrated in the graphs above: the predictions are grouped according to their traffic condition. Each group has two straight lines corresponding to the two possible states of the Window variable. These lines are close together for all traffic conditions, with a small difference between the two. This difference is compatible with the box diagrams discussed previously, which indicate that the power tends to be slightly higher when the windows are open. In fact, the variable Speed, related to the slope of the eight line functions, is what determines the power in each group of traffic conditions.

Regression Coefficients Goodness of Fit
The high values of R 2 in Table 9 indicate that most of the variation in noise power is accounted for by the models. However, a comparison with the GoF metrics in Sections 5.1.3 and 5.1.1 show that although the R 2 increases greatly in the more complex model, the F-value decreases with the addition of the other two variables. The Traffic variable contributes little to the model due to its redundancy with Speed, while Window has almost no relation to the response variable. We conclude that either the traffic categories or the speed can be used as an explanatory variable for the noise power inside vehicles due to the way traffic was defined in this work. Figure 19. Power data grouped by traffic conditions, and predictions using the linear models with Speed, Window, and Traffic as the explanatory variables. Colors represent the actual traffic conditions of each sample, which are in accordance with Table 1.

Impulsiveness Evaluation
For this section, the alpha-stable and Gaussian distributions were fitted to each noise sample using MLE. To calculate the fitting error for the two distributions, a histogram was created based on the empirical cumulative distribution of each measurement. Then, the Root Mean Squared Error (RMSE) was calculated between a probability density curve with the parameters estimated by the MLE and the data histogram.
As in the first section, this analysis is split between the variable Traffic and the variable Window. The variable Speed is omitted for clarity, since it would have redundant results with the variable Traffic. For the stable distributions, we assume an SαS model and estimate only the α and γ parameters. Figures 20 and 21 show, respectively, the distribution of the estimated parameters α and γ of the alpha-stable distribution, which are grouped by traffic conditions. Figure 22 shows the distribution of the parameter σ of the Gaussian distribution. The parameter µ is close to zero for all measurements, indicating no offset in the noise level.

Traffic Analysis
In Figure 20, the estimated values in the categories Green, Yellow, and Red are closer together and to α = 2 in Campaign 2 than in Campaign 1. The first campaign also has more significant outliers, which have metrics with α < 1.8 across all traffic categories. This makes the results of Campaign 1 more dispersed. The difference is even better seen by comparing the Kernel Density Estimation (KDE) curves, which are more spread out around α = 2 for the first campaign. Figure 20. Distribution of the estimated parameter α, from the alpha-stable distribution, grouped by the traffic conditions. The KDE curves were obtained using a Gaussian kernel.  However, the results of both campaigns are compatible with the behavior of α across the traffic categories. In general, the values are concentrated very close to α = 2. The positions of the box diagrams and the fact that the KDE curves are centered close to this value demonstrate this fact. This suggests that most of the measured samples present behavior that can be well-represented by a Gaussian distribution, and, therefore, it can be said that they present low impulsiveness. However, a significant number of noise samples deviate from α = 2 and can be said to present some degree of impulsiveness. As with the average power, this degree is ordered according to traffic categories: Green presents the least impulsive behavior, with a greater proportion of samples close to α = 2, while the Black category has the widest range of values. The Red and Yellow categories have a similar distribution, being placed between the other two.
Combining this result with the one from Section 5.1.1, we speculate that a lower degree of impulsiveness can be associated with a higher speed of the car. When in a Green traffic situation, the continuous sound produced by the vehicle and the rolling noise are more dominant in the composition of the internal acoustic noise, so that it tends to have a characteristic closer to that of a Gaussian noise. Conversely, when a car stops in a traffic jam, sources such as other vehicles passing in the adjacent lanes, vehicles braking beside and behind the car, among other external sources, are more prevalent. These tend to be more transitory in nature, contributing to an increase in the observed impulsiveness.
Therefore, assuming an AWGN model for the vehicle's internal noise can be detrimental to algorithms and applications that suffer performance degradation in the presence of non-Gaussian noise. It is important to remember that in this study, sources of noise such as potholes and horns, the noise produced by magazines and local businesses, sounds generated by passengers, and events such as rain, among others, were disregarded. Therefore, the impulsive nature described above is optimistic, justifying research on more complex models that consider non-Gaussian behavior for the vehicular scenario.
The second estimated parameter of the alpha-stable model is the scale parameter γ (Figure 21). An evaluation of the box diagrams shows that the distributions of values by traffic category are in ascending order. This result is very similar to what was discussed about the average power in Section 5.1.1. Another similarity between the behavior of the average power and γ is the difference between campaigns. Again, the dispersion of values is greater in the second campaign, especially in the Yellow category, which is similar to the comparison made between the campaigns in Figure 10.
In turn, the result for the estimation σ, in turn, shows great similarity with the distributions of γ. In fact, these parameters have a similar nature, being related to the dispersion of their respective probability distributions. In fact, for α = 2, the alpha-stable distribution is equivalent to a Gaussian distribution with variance σ 2 = 2γ. Observing the distribution of the parameter α in Figure 20, the similarity in the shape of the distributions of γ and σ is justified, since most of the measured signals have an α value close to 2.
The relation between average power, γ, and σ is illustrated in Figure 23, which graphs power in relation to the other parameters. To obtain the curves, only data from the first campaign were used, since a similar result can be shown for the second campaign. Both curves follow a logarithm shape, with oscillations in the case of γ. In the AWGN model, the noise variance is an estimator for its power. In contrast, variance and power are not defined for the alpha-stable distribution (unless α = 2), meaning there is no direct association between the dispersion parameter and power such as in the Gaussian case. However, γ is still helpful to measure the noise level in a stable noise model. For instance, some authors define a generalized version of the signal-to-noise ratio (GSNR) that takes into account the power of a s(t) signal and the dispersion of an alpha-stable noise [63]: Consequently, the results obtained for γ and σ reaffirm the conclusion that the fluidity of traffic and the speed at which the vehicle can move due to this traffic are related to the noise power levels received inside the vehicle.  Figures 24-26 show, respectively, the distribution of the estimated parameters α, γ, and σ, which are categorized by the Window variable. In accordance with the previous section, most samples have an α value close to 2 in both campaigns and window positions, and the dispersion is greater for the first campaign. However, unlike the Traffic variable, the distribution of α is similar for both states of the windows.   Similarly, Figure 25 indicates little difference between the distribution of γ and σ for the two positions of the car windows. There is only a trend of slightly higher values for the open windows scenario, which is a result compatible with the average power analysis. The results for the Window and Traffic variables lead to the conclusion that the position of the car's four windows has little influence on the level and degree of impulsiveness of the observed internal noise. From the standpoint of an audio reception system, external factors have less relevance to the internal noise composition when the vehicle is at higher speeds. It is important to emphasize that these conclusions are valid for the scenario described in Section 3.1. Table 10 lists the Root-Mean-Squared Error (RMSE) fitting errors of the models in all samples, comparing the performance of alpha-stable and Gaussian distributions. The first line shows that in both campaigns, more than half of the samples had a lower RMSE when modeled by an alpha-stable variable. In addition, the biggest absolute difference of RMSE when the Gaussian model performs better is 0.0060. In contrast, this difference for when the alpha-stable model has a superior performance is 1.1606 for Campaign 1 and 0.6124 for Campaign 2. Therefore, even when they are inferior in terms of RMSE, the alpha-stable models approximate the performance of Gaussian models, while the opposite does not occur. This is due to the greater flexibility of the alpha-stable variable. By adjusting the tail of its distribution, it is able to represent different degrees of impulsiveness, including the Gaussian case with α = 2. Finally, the average RMSE of all samples is smaller for the alpha-stable option in both measurement groups. Hence, the alpha-stable distribution achieves a better fit in an overall assessment of the scenario.

Considerations about Window Size for Estimation
The window size is an important parameter for any signal processing system. In the context of impulsive noise, the choice of the number of samples used to estimate the parameters of an alpha-stable probability distribution must take into account the trade-off between latency and stability of the estimation. Larger windows lead to faster convergence of parameters, while with smaller windows, the estimation tends to vary more depending on whether or not impulsive events are included in the observed interval. However, larger windows can represent a signaling cost that can make delay-sensitive applications unfeasible.
The effect of window size on the estimation of the α parameter by MLE can be seen in Figure 27. Two of the measurements with 240,000 samples each were divided into fixed windows, ranging from 1000 to 21,000 samples with a step of 2000 samples. The figure shows the mean and variance of the estimated value of α for the computation of each window size. In the first curve, obtained from a measurement of the Green traffic category, the estimated value rapidly converges to its final value, which is close to α = 2, which is the value obtained when all samples are used. Likewise, the variation between the estimated values quickly becomes negligible. In the second curve, from a measurement of the Black category, the variation in the estimated value is greater than for the first curve.
However, in both cases, the variance of the estimator decreases as the number of samples included in the window increases. This is a desirable feature for an estimator [86]. The curves illustrate the trade-off between convergence to an optimal value and the processing time involved in the choice of window size. The convergence of the estimation depends on the degree of the impulsiveness of the measured signal and on the inclusion or not of impulsive events in the window [76].
Another important factor to consider for windows with few samples is the possibility of numerical errors due to an insufficient amount of samples. For some of the noise samples tested for the convergence curves of α, it was not possible to obtain results for a window size of 1000 samples. In these cases, the error is associated with the convergence of the parameter γ. As discussed in Section 4, the use of MLE in MATLAB for the alpha-stable distribution depends on the choice of initial values to start the optimization routine. As these values are chosen based on quantiles of the signal, the number of samples in the window has a great influence on the result of the quantile estimation algorithm [60]. In the case of the signals for which the error occurs, the optimizer is not able to find a valid value of γ within the number of iterations imposed by the optimization routine, returning γ = 0. Therefore, the estimation for very small windows may be unfeasible. In the tests performed in this work, the smallest size for which all estimators converged to a valid result was for a window of 3000 samples.

Final Remarks
Internal acoustic noise is an important factor in vehicle design. Interest in this topic grows due to concern with health and acoustic comfort issues and with the emergence of autonomous vehicles and new vehicular applications such as more advanced multimedia centers. Although there is extensive literature on the subject, most of the works focus on the study of the contribution to the noise of specific vehicle components in controlled environments such as test laboratories and under the perspective of psychoacoustics. In this work, an experimental evaluation of in-vehicle noise was presented, in which several noise samples were collected in two separate measurement campaigns in real traffic scenarios, with a setup elaborated from the perspective of multimedia systems and sound processing applications.
The results found in both measurement campaigns show a strong correlation between the traffic level and the internal noise level. More fluid traffic, or equivalently a vehicle moving at higher speeds, results in higher average power levels. In contrast, the position of the car's windows showed a weak influence on the power level measured inside the vehicle. It is important to emphasize that this result, although counter-intuitive, was obtained from the perspective of an audio capture system. It was also shown that from the point of view of statistical modeling, speed and traffic are redundant variables, as the latter is defined from the former.
Moreover, a comparison of the AWGN and alpha-stable models was made in the modeling of the collected noise data. The comparison shows that models based on the stable distribution have a superior fit. An evaluation of the parameter α of the alpha-stable models revealed that although the internal noise has a predominant Gaussian characteristic, there is a relevant degree of impulsiveness. The frequency of these impulsive phenomena tends to be higher for heavier traffic situations. Thus, the alpha-stable model, particularly the SαS model, is an appropriate option for representing this type of noise [63,76,84].
The results above lead to the conclusion that the most relevant factors for the characteristics of internal acoustic noise are internal to the vehicle. It is speculated that higher speeds lead to greater noise produced by the engine and other components of the car and by the wind, reducing the influence of other external factors, even when the vehicle's windows are open. In addition, the observed internal noise has significant impulsiveness, which again tends to have less relevance when the car is at higher speeds. Therefore, efforts to mitigate internal noise, such as studies related to active noise control and optimizing vehicle structure systems, should mind the noise produced by the vehicle itself, and the design of applications such as voice commands and source location should consider an impulsive noise model.
In future works, the authors consider evaluating the vehicular interior noise in the presence of speech signals, dealing with source separation systems. In addition, we intend to extend our experiment with changes to the setup regarding the microphone's position and directivity. Such studies may provide new insights to comprehend and mitigate the noise in a vehicle interior environment, which will contribute to the development of in-vehicle voice and audio applications.