Global and Continuous Pleasantness Estimation of the Soundscape Perceived during Walking Trips through Urban Environments

: This paper investigates how the overall pleasantness of the sound environment of an urban walking trip can be estimated through acoustical measurements along the path. For this purpose, two laboratory experiments were carried out, during which controlled and natural 3-min audio and audiovisual sequences were presented. Participants were asked to continuously assess the pleasantness of the sound environment along the sequence, and globally at its end. The results reveal that the global sound pleasantness is principally explained by the average of the instantaneous sound pleasantness values. Accounting for recency or trend effects improved the estimates of the global sound pleasantness over controlled sound sequences, but their contribution is not signiﬁcant for the second group of stimuli, which are based on natural audio sequences and include visual information. In addition, models for global and continuous pleasantness, as a function of the instantaneous sound pressure level L eq,1s , are proposed. The instantaneous sound pleasantness is found to be mainly impacted by the average sound level over the past 6 s. A logarithmic fading mechanism, extracted from psychological literature, is also proposed for this modelling, and slightly improves the estimations. Finally, the globally perceived sound pleasantness can be accurately estimated from the sound pressure level of the sound sequences, explaining about 60% of the variance in the global sound pleasantness ratings.


Introduction
The health benefits of practicing a physical activity on a daily basis, and walking in particular, is widely acknowledged [1].Soft transportation modes are also known to ease traffic flows.Thus, municipalities are increasingly promoting the use of walking or cycling to their city dwellers, for commuting, and investing in facilities that encourage these practices [2][3][4][5].However, although soft transportation modes undoubtedly have a positive global environmental effect, an increased exposure to road traffic pollutants, namely airborne pollutants, fine particles, and noise levels, amplified by the high correlation generally observed between these pollutants [6][7][8], is a harmful counterpart of choosing this transportation mode in urban areas.Moreover, the environmental quality at the neighborhood scale, strongly influences the choice of walking as transportation mode [9][10][11].Therefore, being able to estimate the exposure associated with an urban walking trip has many potential interests, such as for informing pedestrians about the potential health benefit of their intended walk, or for optimizing the related route choice through specific algorithms [12,13].
However, estimating noise exposure is made difficult by the high spatial and temporal sound pressure level variability, typical in urban environments [14,15].Moreover, recent works have revealed the complex relations between perceptual assessments (e.g., pleasantness of the sound environment) [16][17][18] and physical measurements [19,20].The importance of the temporal and spectral dimensions of sound [19,20], the interest of explicitly introducing the contribution of different sound sources (e.g., vehicles, voices, birds, etc.) into the modeling [16][17][18], the influence of non-acoustical parameters [21], such as the visual scene and the openness of the space [21,22], and even non-physical factors, such as demographic, cultural, and social factors, or context factors [23][24][25], advocate not relying on energetic indicators when producing sound pleasantness maps or assessing the sound pleasantness of urban walking trips.Recently proposed noise mapping alternatives, which include mobile measurements, fulfil the requirements for estimating the sound pleasantness of walking trips, as they account for all of the sound sources that encompass urban sound environments, allowing one to estimate advanced indicators [26][27][28].
This new context makes it possible to estimate the sound pleasantness of an urban walking trip.However, this requires an understanding of how a pedestrian globally and retrospectively assesses a sound environment that varies with time.This paper investigates these relations through a modeling framework of three steps, described in Figure 1.First, models are proposed to relate perceptual assessments of continuous and overall pleasantness of the presented sound sequences (Figure 1C).Then, models of the instantaneous and overall sound pleasantness appreciation, based on sound levels (Figure 1A,B), are proposed.counterpart of choosing this transportation mode in urban areas.Moreover, the environmental quality at the neighborhood scale, strongly influences the choice of walking as transportation mode [9][10][11].Therefore, being able to estimate the exposure associated with an urban walking trip has many potential interests, such as for informing pedestrians about the potential health benefit of their intended walk, or for optimizing the related route choice through specific algorithms [12,13].However, estimating noise exposure is made difficult by the high spatial and temporal sound pressure level variability, typical in urban environments [14,15].Moreover, recent works have revealed the complex relations between perceptual assessments (e.g., pleasantness of the sound environment) [16][17][18] and physical measurements [19,20].The importance of the temporal and spectral dimensions of sound [19,20], the interest of explicitly introducing the contribution of different sound sources (e.g., vehicles, voices, birds, etc.) into the modeling [16][17][18], the influence of non-acoustical parameters [21], such as the visual scene and the openness of the space [21,22], and even non-physical factors, such as demographic, cultural, and social factors, or context factors [23][24][25], advocate not relying on energetic indicators when producing sound pleasantness maps or assessing the sound pleasantness of urban walking trips.Recently proposed noise mapping alternatives, which include mobile measurements, fulfil the requirements for estimating the sound pleasantness of walking trips, as they account for all of the sound sources that encompass urban sound environments, allowing one to estimate advanced indicators [26][27][28].
This new context makes it possible to estimate the sound pleasantness of an urban walking trip.However, this requires an understanding of how a pedestrian globally and retrospectively assesses a sound environment that varies with time.This paper investigates these relations through a modeling framework of three steps, described in Figure 1.First, models are proposed to relate perceptual assessments of continuous and overall pleasantness of the presented sound sequences (Figure 1C).Then, models of the instantaneous and overall sound pleasantness appreciation, based on sound levels (Figure 1A,B), are proposed.Previous research in the field of psychology, psychoacoustics, and soundscape, has shown that retrospective overall judgement is not a simple average of instantaneous judgment, but is significantly influenced by the following principal temporal effects (more details can be found in [29]):


The recency effect, by which initial and final momentary judgments of a sequence are more remembered at the instant when the retrospective assessment is given, has been observed for sound sequences by Västfjäll [30,31].


The peak-end rule, which states that the global judgement of an experiment is influenced by its most intense point and its end (negative or positive perception), has been observed in [32,33].


The trend effect, which describes the fact that people often make predictions about the future based on trends that they have observed in the past, has been shown by Steffens & Guastavino, on a corpus of various 1-min length samples [29].
The main works that have dealt with the retrospective assessment of time-varying acoustical signals often focused on loudness perception, on very controlled stimuli (pure tones, white noise, Previous research in the field of psychology, psychoacoustics, and soundscape, has shown that retrospective overall judgement is not a simple average of instantaneous judgment, but is significantly influenced by the following principal temporal effects (more details can be found in [29]):

•
The recency effect, by which initial and final momentary judgments of a sequence are more remembered at the instant when the retrospective assessment is given, has been observed for sound sequences by Västfjäll [30,31].

•
The peak-end rule, which states that the global judgement of an experiment is influenced by its most intense point and its end (negative or positive perception), has been observed in [32,33].

•
The trend effect, which describes the fact that people often make predictions about the future based on trends that they have observed in the past, has been shown by Steffens & Guastavino, on a corpus of various 1-min length samples [29].
The main works that have dealt with the retrospective assessment of time-varying acoustical signals often focused on loudness perception, on very controlled stimuli (pure tones, white noise, specific sound sources, etc.), or on short sound sequences.Evaluating retrospective global judgments, such as the pleasantness of the sonic environment during urban walks, requires new experimental set-ups and stimuli, closer to the in situ experience.Recently, virtual reality and auralization tools have been proposed by some authors, in order to fulfill this requirement [34,35], and more immersive experiments could help in highlighting these temporal effects over longer sound sequences.An in-situ experiment by Aumond et al. revealed that recency or trend effects significantly influence the global judgment for very short paths (inferior to 1 min), but not for larger paths (>15 min) [36].These results need to be compared with other experiments.
The relation between the continuous instantaneous judgement during a time-varying sound environment, and the physical properties of the stimuli, are also of particular interest.They enable one to estimate the integration, relaxation, and reaction times that link sound levels to momentary evaluations: a null reaction time and an integration time of about 2.5 s were, for example, found relevant in [37].In addition, the links between overall pleasantness evaluations and sound levels time-series must be furthered investigated.
The present paper aims at investigating how the sound pleasantness of an urban walking trip can be estimated through measurements of the sound pressure level along walking paths in an urban environment.Two experiments are built.For both, the participants had to assess the continuous and overall sound pleasantness of sound sequences:

•
A first experiment is based on different arrangements of two audio files, and aims to determine how the global temporal structure of a sound sequence affects its continuous and overall sound pleasantness appreciation.The sound sequences are built with the goal of assessing the effect of the temporal structure of the "background" sound environment.Therefore, strong markers of the soundscape or peaks in the sound levels were specifically avoided.

•
A second experiment is based on the same principle, but with real sound sequences, played conjointly with video content, in order to investigate the same questions with natural sequences and a higher ecological validity.

Apparatus
The listening tests took place in a semi-anechoic room.Figure 2A presents the experiment set-up.Each participant performed the test individually; he/she was seated in a chair in front of a computer screen showing the test instructions.In the first experiment, a blurred image of an urban environment was projected onto a large screen located behind the computer, in order to have a realistic and comfortable luminosity in the room, without providing too much visual information, which could influence the judgments; however, the stimuli was only comprised of audio files.In the second experiment, a video sequence was added to the sound, in order to enhance the sensation of immersion.
Appl.Sci.2017, 7, 144 3 of 16 specific sound sources, etc.), or on short sound sequences.Evaluating retrospective global judgments, such as the pleasantness of the sonic environment during urban walks, requires new experimental set-ups and stimuli, closer to the in situ experience.Recently, virtual reality and auralization tools have been proposed by some authors, in order to fulfill this requirement [34,35], and more immersive experiments could help in highlighting these temporal effects over longer sound sequences.An insitu experiment by Aumond et al. revealed that recency or trend effects significantly influence the global judgment for very short paths (inferior to 1 min), but not for larger paths (>15 min) [36].These results need to be compared with other experiments.The relation between the continuous instantaneous judgement during a time-varying sound environment, and the physical properties of the stimuli, are also of particular interest.They enable one to estimate the integration, relaxation, and reaction times that link sound levels to momentary evaluations: a null reaction time and an integration time of about 2.5 s were, for example, found relevant in [37].In addition, the links between overall pleasantness evaluations and sound levels time-series must be furthered investigated.
The present paper aims at investigating how the sound pleasantness of an urban walking trip can be estimated through measurements of the sound pressure level along walking paths in an urban environment.Two experiments are built.For both, the participants had to assess the continuous and overall sound pleasantness of sound sequences:


A first experiment is based on different arrangements of two audio files, and aims to determine how the global temporal structure of a sound sequence affects its continuous and overall sound pleasantness appreciation.The sound sequences are built with the goal of assessing the effect of the temporal structure of the "background" sound environment.Therefore, strong markers of the soundscape or peaks in the sound levels were specifically avoided.


A second experiment is based on the same principle, but with real sound sequences, played conjointly with video content, in order to investigate the same questions with natural sequences and a higher ecological validity.

Apparatus
The listening tests took place in a semi-anechoic room.Figure 2A presents the experiment setup.Each participant performed the test individually; he/she was seated in a chair in front of a computer screen showing the test instructions.In the first experiment, a blurred image of an urban environment was projected onto a large screen located behind the computer, in order to have a realistic and comfortable luminosity in the room, without providing too much visual information, which could influence the judgments; however, the stimuli was only comprised of audio files.In the second experiment, a video sequence was added to the sound, in order to enhance the sensation of immersion.
(A) (B)  The sound sequences were transaurally reproduced; using a system composed of two loudspeakers (Tannoy) and a high quality sound card (RME Fireface 400, Audio AG, Haimhausen, Germany).The listening position was located at ±30 • from the loudspeakers.The transaural listening technique has the advantage of minimizing front/back confusions, which are known to appear with headphone listening when individual HRTFs and head-tracking are not available, while preserving the perceptual characteristics of a diffused sound field [38].The fact that participants are not using headphones improves the realism of the simulation technique.
For both experiments, the audio files were recorded using two high quality microphones (DPA 4060, Alleroed, Denmark), inserted into the operator's ears using specific ear clips.Prior to each recording, a calibration tone (1 kHz/94 dB) was recorded by the microphones, such that the sound level reproduction in the laboratory experiments could be calibrated.For experiment 2, the video sequences were simultaneously recorded with the sound recordings, with a small action camera carried by the hand of the operator at the eye's level.During both experiments, participants had to rate the pleasantness of the soundscape on the computer screen, while an urban picture (experiment 1) or a motion picture (experiment 2) was projected onto a large screen behind.
All of the statistical tests presented in this paper were realized with the Statistics and Machine Learning Toolbox™ from Matlab ® (Natick, MA, USA).

Procedure
The sequences were played in a random order.Participants were first asked to continuously rate soundscape pleasantness on a semantic differential scale from unpleasant (coded 0), to pleasant (coded 10).The assessment was made, while listening, by moving a marker along a large horizontal bar with the mouse.The following instructions were orally presented to the participants: "During this experiment, you will experience 10 virtual urban trips of 3 min.You will have to point continuously with the mouse at the sound pleasantness of the presently heard sound environment: the more the sound environment is pleasant to you, the more you will move the mouse to the right; the more it is unpleasant, the more you will move the mouse to the left." The assessed instantaneous sound pleasantness (P) ratings were collected with a time resolution of 125 ms (same sampling rate that the sound level time series).In addition, at the end of the sound sequence, the participants had to assess the global sound pleasantness (GP) of the sequence, on the same scale, from unpleasant to pleasant.
Figure 2B presents the graphical interface for continuous assessment that has been developed in the laboratory.

Participants
Two groups of 30 participants were involved in the experiments.In the first experiment, 11 women and 19 men participated, with a mean age of 33 years (SD = 14).In the second experiment, 18 women and 12 men participated, with a mean age of 33 years (SD = 14).
For the first experiment, seven participants were eliminated from the analysis.Two of them presented hearing loss, detected by preliminary audiometry (>20 dB HL) [39].Five of them gave very incoherent responses (very incomplete, constant, or random ratings).Thus, 23 participants were included in the analysis.In the second experiment, no hearing problems were detected among the participants ("normal or subnormal hearing") [39].
In both experiments, the participants were naive with regards to the test hypotheses, and received a small monetary compensation for participation.Each participant was involved in only one of the experiments.All of the participants gave their informed written consent, prior to the experiments.

First Experiment
A total of 16 sound sequences have been constructed, based on different combinations of two initial sound sequences, α and β, each with a duration of 90 s, in order to focus on the effect of the sound sequence temporal structure on the sound pleasantness global assessment.The resulting sound sequences have a duration of 3 min, which represents the median duration of a pedestrian trip in the city of Paris [40].The initial sound sequences α and β have been recorded with the same binaural technique in the 13th district of Paris, during April 2015; the sequence α in a small park (L 50 = 55 dB, L 10 − L 90 = 25 dB), and the sequence β nearby a large boulevard (approximate flow: 1000 vehicles/hour, L 50 = 76 dB, L 10 − L 90 = 24 dB).α and β have been carefully chosen , in orderto avoid particular events, such as too loud two wheelers, dog barks, voices with semantic understanding, or exceptionally strong sound level fluctuations.These events could become very salient markers of the sound environment, and could potentially significantly drive the global and instantaneous sound pleasantness assessment.α and β have been assessed by the participants before the beginning of the test, on a continuous pleasantness scale from unpleasant (coded 0) to pleasant (coded 10): the average perceived sound pleasantness for α and β were 8.1 (σ = 2.2) and 2.4 (σ = 2.0), respectively.Practically, the 16 sound sequences were obtained by combining α and β with different appearance times.The 16 sound sequences were formed with slow or fast alternations between α and β, evolving from calmness to noisiness, or the inverse.The transitions between each environment lasted at least 30 s ("Fast"), which was observed in situ as a minimum walking transition time."Slow" alternation corresponds to a 3 min transition, which is the length of the sound sequence.Each sequence was presented once to the participants.
This methodology included the repetition of two 1 min initial sound sequences, assessed by the participants at the beginning of the test.Thus, some memory, demand, or transfer effects could have perturbed the experiment.Nevertheless, two points relativize this possible influence: (i) at the end of the test, the participants were orally asked to freely comment on the experiments.If some of them recognized that parts of the sequences came from the same initial sound recordings, as no strong marker of the sound environment (voice, klaxon, etc.) was present, they said something like, "I think two or three times I heard a part of the same sequence"; and (ii) all of the sequences were played in a random order to avoid the effect due to the repetition of the initial sequences, always being reported on the same sound sequences.

Second Experiment
The second experiment was based on 10 audio-visual urban sequences of 3 min, recorded in the 13th district of Paris, during April 2015.Similar sound environment conditions were chosen (recordings on Mondays, between 10 and 12 h, or between 14 and 16 h).
In order to obtain time-varying sound environments, the first six sequences consisted of a transition between two different sound environments, and the last four sequences were comprised of a transition between three different sound environments.The 10 sequences corresponded to five trips, run in both directions.Table 1 presents a short description of the streets travelled on during the experiment.Table 4 and Figure 5 present the 10 sequences that alternate slowly or quickly between these described environments.Contrarily to the first experiment, the presence of the video did not allow controlling the sound sequences, thus particular events sometimes occurred in the sequences, such as loud two wheelers, or voices with semantic understanding.The sequences S9 and S10, which were 4-min long, have been artificially shortened at their center, cutting out a part of the walking trip in the "Rue des 2 avenues", in order to be coherent in length with the other sequences.Special care was taken to not alter the realism of the resulting sequences, keeping the cut as discrete as possible.Each sequence was presented once to the participants.

From Continuous to Global Perceived Pleasantness Assessment
3.1.1.First Experiment Figure 3 depicts, for each of the 16 sequences presented in Section 2.4.1, the 1s sound pleasantness (P) evolution (mean values and standard deviation for the 25 participants) and the 1s sound level.The combination of the initial sequences α and β resulted in 16 sequences, with a large variety.
Table 2 shows the average pleasantness, of both the participants and over time (average of the 125 ms mean pleasantness ratings over the 3 min), the average global sound pleasantness (GP) of the participants, and the difference between both values (ΔGP-Pmean), for each of the 16 sequences.
There is a statistically significant difference in ΔGP-Pmean between the sequences, confirmed by a one-way ANOVA (F(15,351) = 2.84, p < 0.001).Table 2 shows that, for sequences that mainly consist of a boulevard interrupted by a park sequence α (for example, compare sequences A1, A2, A3 and A4), and the more that α appears near the end of the sequence, the more the ΔGP-Pmean significantly increases (F(3,88) = 4.78, p < 0.01).Inversely, although less pronounced, for sequences that mainly consist of a park interrupted by the boulevard sequence (for example, compare sequences B1, B2, B3 and B4), and the more of the unpleasant environment that appears near the end of the sequence, the greater the difference in ΔGP-Pmean decreases.This trend is not significant (F(3,88) = 1.75, p = 0.16), but the difference in ΔGP-Pmean between the sequences B1 and B4 is significant (F(1,44) = 4.52, p < 0.05).Finally, the Ci sequences are significantly different to the Di sequences (F(1,135) = 20.63,p < 0.01).Table 2 shows the average pleasantness, of both the participants and over time (average of the 125 ms mean pleasantness ratings over the 3 min), the average global sound pleasantness (GP) of the participants, and the difference between both values (∆GP-P mean ), for each of the 16 sequences.There is a statistically significant difference in ∆GP-P mean between the sequences, confirmed by a one-way ANOVA (F(15,351) = 2.84, p < 0.001).Table 2 shows that, for sequences that mainly consist of a boulevard interrupted by a park sequence α (for example, compare sequences A1, A2, A3 and A4), and the more that α appears near the end of the sequence, the more the ∆GP-P mean significantly increases (F(3,88) = 4.78, p < 0.01).Inversely, although less pronounced, for sequences that mainly consist of a park interrupted by the boulevard sequence (for example, compare sequences B1, B2, B3 and B4), and the more of the unpleasant environment that appears near the end of the sequence, the greater the difference in ∆GP-P mean decreases.This trend is not significant (F(3,88) = 1.75, p = 0.16), but the difference in ∆GP-P mean between the sequences B1 and B4 is significant (F(1,44) = 4.52, p < 0.05).Finally, the Ci sequences are significantly different to the Di sequences (F(1,135) = 20.63,p < 0.01).
In order to investigate the apparent temporal effect when assessing the global sound pleasantness of a sound sequence, a multiple linear regression is constructed over all of the sound sequences.Four presumed factors are tested (mean value, trend effect, recency effect, and primacy effect), using the variables presented in Table 3.All of the variables are calculated over the averaged temporal curve.The best linear regression model is obtained through a stepwise procedure (Bidirectional elimination), maximizing the explained variance (In this paper, the explained variance corresponds to the adjusted R 2 ) at 95% (R 2 = 0.95, F(2,14) = 136.0,p < 0.001).The function selects the variables P mean (b* = 0.81, t(14) = 13.6,p < 0.001) and the final sound pleasantness P end (b* = 0.45, t(14) = 7.6, p < 0.001), which corresponds to the arithmetic average of the sound pleasantness, collected during the last 30 s of the sequence (the sequences are constructed so that the last 30 s always have a stable sound environment).This model outperforms the GP value, estimated with the unique predictor P mean (b* = 0.87, t(14) = 6.5, p < 0.001), which explains only 74% of the variance (R 2 = 0.75, F(2,14) = 42.8,p < 0.001).The significant difference between both models (F(2,14) = 29.1,p < 0.001) highlights the influence of the end of the sequence on the assessment of the global sound pleasantness, over the constituted 3-min sequences.
The variable linked to the trend effect P trend can also replace the P end variable as a predictor for the regression (b* = 0.45, t(14) = 7.4, p < 0.001) with P mean (b* = 0.92, t(14) = 15.09,p < 0.001) keeping an identic-explained variance.It is worth noting that there exists a theoretical overlap between the trend and the recency effect (as reported by Steffens & Gusatavino [29]).
No influence of the speed at which the sound environment switches occur on the global sound pleasantness is observed.For example, if one compares sequences C1 and C3, which evolve from the park to the boulevard quickly or very slowly, the sound pleasantness GP is lower than P mean for both sequences, in accordance with the demonstrated recency effect, but to a similar extent (∆GP-P mean = −0.58 and −0.70 for C1 and C3).One-way ANOVA tests confirm this observation, showing that there are no significant differences between the Ci sequences (F(2,66) = 0.13, p = 0.87), but also between the Di sequences (F(2,66) = 0.26, p = 0.77).This would suggest that the speed at which sound environments vary from one to the other has no influence on the global sound pleasantness assessment.Finally, an one-way ANOVA test reveals that the difference between the Ei sequences, where multiple changes were present in the sound environment, is not significant (F(1,44) = 1.32, p = 0.25).
The highlighted recency effect suggests the possibility to call for time series modelling, in order to account for the effect of the sound sequence temporal structure, whereas in the previous section, only the last 30 s were used.The multiscale model SIMPLE (Scale-Independent Memory, Perception and LEarning) has been proposed in psychological literature to model human memory [41].This model estimates the probability to remember, at the end of a sequence, one specific event that occurred during the sequence.If the global sound pleasantness is considered as the sum of the 125 ms events that the participant remembers at the end of the sequence, then the global pleasantness (GP) can be expressed as the weighted average of all the instantaneous pleasantness (P) values collected during the sequence.The SIMPLE model relies on three parameters: c (temporal distinctiveness of memory representations), t (threshold), and s (slope).More details on the mathematical formulation and implementation can be found in [42].
These three parameters are optimized in the dataset (c = 40, t = 0.55, s = 11), using scale ranges for each parameter, as presented in the literature [41].Figure 4 presents the ponderation coefficients extracted from the optimized SIMPLE model, but also for the two precedent models (average value P mean with and without taking into account the P end note).A threshold is observed after 150 s, resulting from the monotonous sound environment after this time instance (the last 30 s).When applying the SIMPLE model, the explained variance reaches 96% (p < 0.001), which again highlights the advantage to propose a smoother and more realistic temporal response, than an end over-weighting model.If the difference with the precedent model is not significant (F(2,14) = 1.62, p = 0.18), then this approach permits one to integrate more complexity and realism into the function that models the recency effect.The highlighted recency effect suggests the possibility to call for time series modelling, in order to account for the effect of the sound sequence temporal structure, whereas in the previous section, only the last 30 s were used.The multiscale model SIMPLE (Scale-Independent Memory, Perception and LEarning) has been proposed in psychological literature to model human memory [41].This model estimates the probability to remember, at the end of a sequence, one specific event that occurred during the sequence.If the global sound pleasantness is considered as the sum of the 125 ms events that the participant remembers at the end of the sequence, then the global pleasantness (GP) can be expressed as the weighted average of all the instantaneous pleasantness (P) values collected during the sequence.The SIMPLE model relies on three parameters: c (temporal distinctiveness of memory representations), t (threshold), and s (slope).More details on the mathematical formulation and implementation can be found in [42].
These three parameters are optimized in the dataset (c = 40, t = 0.55, s = 11), using scale ranges for each parameter, as presented in the literature [41].Figure 4 presents the ponderation coefficients extracted from the optimized SIMPLE model, but also for the two precedent models (average value Pmean with and without taking into account the Pend note).A threshold is observed after 150 s, resulting from the monotonous sound environment after this time instance (the last 30 s).When applying the SIMPLE model, the explained variance reaches 96% (p < 0.001), which again highlights the advantage to propose a smoother and more realistic temporal response, than an end over-weighting model.If the difference with the precedent model is not significant (F(2,14) = 1.62, p = 0.18), then this approach permits one to integrate more complexity and realism into the function that models the recency effect.There is a statistically significant difference in the ΔGP-Pmean values between the sequences, as determined by the one-way ANOVA (F(9,286) = 2.97, p < 0.01), which implies that the global assessment is not only the average of the continuous ones.Table 4 shows Pmean and GP, along with their differences ΔGP-Pmean, for each sequence.A more detailed analysis of the pairs of sound sequences (two of the same trips, but run in opposite directions) leads to contrasted conclusions.For S3 & S4 and S5 & S6, which are only composed of two different sound environments, the expected tendencies are observed: sequences that evolve towards improved sound environments show higher ΔGP-Pmean values than sequences that evolve towards deteriorated sound environments.Nevertheless, these tendencies are not statistically significant (p > 0.05).This tendency is also contradicted by the sequences S1 & S2, S7 & S8, and S9 & S10, which show ΔGP-Pmean values that are not in accordance with any recency effect.There is a statistically significant difference in the ∆GP-P mean values between the sequences, as determined by the one-way ANOVA (F(9,286) = 2.97, p < 0.01), which implies that the global assessment is not only the average of the continuous ones.Table 4 shows P mean and GP, along with their differences ∆GP-P mean , for each sequence.A more detailed analysis of the pairs of sound sequences (two of the same trips, but run in opposite directions) leads to contrasted conclusions.For S3 & S4 and S5 & S6, which are only composed of two different sound environments, the expected tendencies are observed: sequences that evolve towards improved sound environments show higher ∆GP-P mean values than sequences that evolve towards deteriorated sound environments.Nevertheless, these tendencies are not statistically significant (p > 0.05).This tendency is also contradicted by the sequences S1 & S2, S7 & S8, and S9 & S10, which show ∆GP-P mean values that are not in accordance with any recency effect.The best linear regression model is obtained through a stepwise procedure (Bidirectional elimination).The variance of the global pleasantness GP, explained by the unique predictor P mean (b* = 0.87, t(8) = 4.9, p < 0.005), reaches 72% (R 2 = 0.75, F(2,8) = 24.1,p < 0.005).Interestingly, the determination coefficient and the P mean standardized beta coefficient values are very similar to those observed between GP and P mean in the first experiment.In accordance with the previous observations, in this experiment, taking into account that the variables P end and P trend do not improve the GP estimates, neither does the SIMPLE modelling.As a first step, the instantaneous sound pleasantness P, assessed at time step t, is estimated, based on a constant aggregation of the noise levels measured in the recent past.The modelling calls for two parameters, namely the response time rt, and the integration time it.The response time describes the delay between the noise event and its assessment by the participant.It corresponds to the time needed to detect the sound, then to understand and assess it in terms of pleasantness, and finally to move the cursor to the targeted point on the screen.The integration time describes the signal duration taken into account by the participant, for the instantaneous pleasantness assessment.As a result, P(t) can be estimated using the following formula: P(t) = f(t-rt-it:t-rt), where f is a time series of the noise levels between t-rt-it and t-rt.Then, the modelling consists of finding the function f, and the rt and it values, which maximize the correlation between the estimated and the actual P(t) values.
Figure 6 presents the correlations, averaged over the 10 sequences, between all the instantaneous 125 ms pleasantness rates and the calculated f function, for different rt and it values, and considering the function f and the usual noise indicators L 50 , L 90 , L 10 , and L eq .The presented correlation is calculated over 1000 observations (from the 3-min sequences sampled at 125 ms, but subtracting the earliest 400 values for integration purposes).The four noise indicators result in similar correlations, although correlations when using L 90 are slightly less significant.The correlation curves simultaneously describe the influence of the two parameters, rt and it.The best couples {it; rt} range between 3 and 10 s for it, and between 0 and 2 s for rt.The maximum correlation found, 0.84, is obtained for the couple {6; 0} and the L eq function.Thus, the resulting integration time, also called the "psychological or perceptual present" in [37], is about 6 s.Nevertheless, it is not possible to dissociate the couple {it; rt}, as the integration time includes de facto a part of the reaction time.Using the same methodology proposed by Kuwano and Namba [37], the reaction time is defined for a null integration time, which corresponds to approximatively 2 s for this experiment (couple {0; 2}).These values are slightly higher than the durations found in the literature, for the continuous assessment of sound levels.For example, the best couples {it; rt} found in [37] for perceived sound level assessment are {2.5;0} and {0; 1}.This might be due to the higher complexity of an appreciation task.
To develop this analysis, the SIMPLE model presented in the previous section is calibrated for determining the weighting of the sound level time series.The parameter rt was added to the original ones, in order to introduce the reaction time into the SIMPLE model, adding a delay to the original weighting function.The SIMPLE parameters are optimized over the last 30 s, in order to obtain the best estimation of the instantaneous pleasantness, according to the sound level.The best optimized SIMPLE function has been obtained using the following coefficients: c = 50, t = 0.55, s = 14, and rt = 0.
The optimized SIMPLE function shows a null reaction time (rt = 0).The weighting function shows a flat section between t = 0 s to t = −3.25 s, which suggests that all of the events included in this time interval have the same impact on the continuous pleasantness appreciation.Then, the function decreases strongly between t = −3.25 to t = −10 s, in accordance with the integration time previously found.However, the weight does not fall totally to 0, suggesting that the sound level between t = 30 s and t = 10 s has a limited, but existing, impact on the instantaneous sound pleasantness.
based on a constant aggregation of the noise levels measured in the recent past.The modelling calls for two parameters, namely the response time rt, and the integration time it.The response time describes the delay between the noise event and its assessment by the participant.It corresponds to the time needed to detect the sound, then to understand and assess it in terms of pleasantness, and finally to move the cursor to the targeted point on the screen.The integration time describes the signal duration taken into account by the participant, for the instantaneous pleasantness assessment.As a result, P(t) can be estimated using the following formula: P(t) = f(t-rt-it:t-rt), where f is a time series of the noise levels between t-rt-it and t-rt.Then, the modelling consists of finding the function f, and the rt and it values, which maximize the correlation between the estimated and the actual P(t) values.The presented correlation is calculated over 1000 observations (from the 3-min sequences sampled at 125ms, but subtracting the earliest 400 values for integration purposes).The four noise indicators result in similar correlations, although correlations when using L90 are slightly less significant.The correlation curves simultaneously describe the influence of the two parameters, rt and it.The best couples {it; rt} range between 3 and 10 s for it, and between 0 and 2 s for rt.The maximum correlation found, 0.84, is obtained for the couple {6; 0} and the Leq function.Thus, the resulting integration time, also called the "psychological This more complete accounting of the noise level time series significantly increases the correlation, averaged over the 10 sequences, between the estimated instantaneous sound pleasantness using the SIMPLE model, and the observed instantaneous sound pleasantness.It reaches 0.93, compared to 0.84 in the previous analysis (t(9) = 2.8, p < 0.05).The SIMPLE model thus enables a more accurate estimation of the continuous sound pleasantness estimates.

Global Sound Pleasantness Estimation Based on Sound Level Time Series
In a practical case, the available data will more likely be a time series of sound levels, instead of instantaneous sound pleasantness values.Therefore, this section aims to estimate the global sound pleasantness of a walking trip, based on its sound level time series.The two proposed approaches consist of: (i) estimating GP from the instantaneous P values, which are themselves estimated in terms of noise level time series; (ii) directly estimating GP in terms of the noise level time series.
A model has been proposed in the previous section for estimating the instantaneous sound pleasantness based on the last 30 s sound level time series through SIMPLE modeling.Section 3.1.2showed that the GP value can be estimated as the arithmetic average of these instantaneous estimated pleasantness values, P estimated .Combining these two results enables one to estimate GP values from sound level measurements.Based on the 10 real sound sequences of the second experiment, the resulting model, built on the unique predictor P mean,estimated (b* = 0.81, t(8) = 3.8, p < 0.005), explains 60% of the total variance (R 2 = 0.65, F(2,8) = 14.9, p < 0.005), with a Root Mean Square Error (RMSE) of 0.72.This model has the advantage of considering the short-term recency effect, but this, in return, makes the GP value dependent on the direction of the walking trip.Figure 7 presents the estimated global pleasantness obtained with this approach, versus the actual assessed global pleasantness averaged over the participants.Table 5 presents the relations between the sound level time series (Leq,1s) and the GP, relative to a 3-min sequence through simple indicators that neglect the recency effects.These models take advantage of simplifying the GP estimation by giving it the same value, whatever the direction is.The tested models rely on three different indicators, namely the median and the arithmetic average of sound levels and the Leq, which is often used for exposure assessment through the widely used Sound Exposure Level indicator (SEL).The two indicators Lmean and L50 enable good GP estimates.Inversely, there is no significant correlation between the Leq, and the GP values calculated over the sequences.Leq, contrary to Lmean and L50, is impacted by noise peaks, explaining the poor correlation.This suggests again that there is no peak-effect on the global sound sequences assessments in the present study, in accordance with [29].

Discussion
The conclusions of the first experiment are in accordance with other studies [29]: (i) the mean of the continuous pleasantness assessment is the most important predictor of the global pleasantness assessment (ii) the recency effect and trend effect both influence the retrospective global assessment of the pleasantness of a sound sequence.Nevertheless, the recency effect that is observed in the first experiment tends to disappear in the second experiment.Two hypotheses are formulated:


The sound sequences of the second experiment are less contrasted and more complex than the controlled sound sequences used in the first experiment.This attenuates the conclusions concerning a recency effect for the sound pleasantness assessment of real sound sequences.Moreover, as in the first experiment, the focus was to observe the influence of the temporal structure in an environment, so sound markers or events have been removed.Such events, as semantic content, are suspected to significantly influence the overall rating of the sound environment [43].These events and markers, present in the second experiment, might have masked the recency and trend effects that were observed in the first experiment.


The video content might have helped participants to analyze the sequences of the second experiment as a whole, thus attenuating the recency effect.
Finally, contrary to the first experiment, GP values are globally higher than the Pmean values.This might be the consequence of the visual factor on global pleasantness appreciation, with the help of Table 5 presents the relations between the sound level time series (L eq,1s ) and the GP, relative to a 3-min sequence through simple indicators that neglect the recency effects.These models take advantage of simplifying the GP estimation by giving it the same value, whatever the direction is.The tested models rely on three different indicators, namely the median and the arithmetic average of sound levels and the L eq , which is often used for exposure assessment through the widely used Sound Exposure Level indicator (SEL).The two indicators L mean and L 50 enable good GP estimates.Inversely, there is no significant correlation between the L eq , and the GP values calculated over the sequences.L eq , contrary to L mean and L 50 , is impacted by noise peaks, explaining the poor correlation.This suggests again that there is no peak-effect on the global sound sequences assessments in the present study, in accordance with [29].

Discussion
The conclusions of the first experiment are in accordance with other studies [29]: (i) the mean of the continuous pleasantness assessment is the most important predictor of the global pleasantness assessment (ii) the recency effect and trend effect both influence the retrospective global assessment of the pleasantness of a sound sequence.Nevertheless, the recency effect that is observed in the first experiment tends to disappear in the second experiment.Two hypotheses are formulated:

•
The sound sequences of the second experiment are less contrasted and more complex than the controlled sound sequences used in the first experiment.This attenuates the conclusions concerning a recency effect for the sound pleasantness assessment of real sound sequences.Moreover, as in the first experiment, the focus was to observe the influence of the temporal structure in an environment, so sound markers or events have been removed.Such events, as semantic content, are suspected to significantly influence the overall rating of the sound environment [43].These events and markers, present in the second experiment, might have masked the recency and trend effects that were observed in the first experiment.

•
The video content might have helped participants to analyze the sequences of the second experiment as a whole, thus attenuating the recency effect.
Finally, contrary to the first experiment, GP values are globally higher than the P mean values.This might be the consequence of the visual factor on global pleasantness appreciation, with the help of the video.The positive effect of the video on the overall pleasantness rating has already been shown in [21,22,44].But then, the fact that GP values are relatively higher than P mean values, would suggest that the visual effect has no influence on the continuous assessment of the sound pleasantness, which needs to be investigated in future studies.
The estimated instantaneous pleasantness is accurately estimated by the sound level measurements in Section 3.2.1,although some discrepancies remain unexplained.Attempts to take into account the spectral content of the signal or the typology of the sound sources, did not improve the explained variance.The visual content might also explain the remaining discrepancies between the sound pleasantness estimates, as a closer look at the ending of sequence five, and the beginning of the sequence six, suggests.These sub-sequences both correspond to environments that are visually unpleasant, and precisely at these instants, the models, which do not account for the visual settings, over-estimate the sound pleasantness rating given by the participants.Another explanation for the remaining discrepancies relates to the high correlations observed between sound pleasantness and sound intensity: participants might have relied on noise intensity to assess the sound pleasantness over the continuous appreciations.Including a better description of the sound environment, for example, with specific sound source descriptors, might enhance the instantaneous estimated sound pleasantness.Section 3.2.2reveals that the mean or median sound level value better estimates the pleasantness of an urban path than the equivalent sound level, which is commonly used to measure sound level exposures.If this result is confirmed by further studies, this will lead to two distinct models, one for measuring the global sound exposure of an urban walk, and one for measuring its global pleasantness.
In experiment 2, it has been shown that about 60% of the variance in the global sound pleasantness can be explained by the sound level of the stimuli.Further studies should be done to determine what part of the remaining variance is due to acoustic factors other than the unique sound level, but also to non-acoustic factors such as visual information [44], personal factors such as noise sensitivity [45], and individual variability.
A 3 min length path was used in this study, since this corresponds to the average pedestrian trip durations in Paris, but it could be interesting to confront these results to other stimuli, with a larger variety of sequence time lengths.If temporal effects have been demonstrated for shorter lengths [29,36], they could disappear for sequences longer than 15 min [36].Generalizing the test for different trip durations will help to cover wider trip characteristics.
Finally, extending the experiment to cover a wider variety of environments, including more parks, and very noisy or animated locations, is now required, in order to test the domain of the validity of the models and develop more universal models.This might also highlight further psychological effects, other than recency.

Conclusions
This paper aimed to estimate both the instantaneous, and the global pleasantness, of the soundscape during 3 min urban walking trips.For this purpose, two laboratory experiments were conducted, in which controlled and natural sound sequences were presented, and during which participants were asked to continuously assess the sound pleasantness along the sequence, and globally, at its end.The conclusions are:

•
The modeling of the recency effect, through the state-of-the-art SIMPLE model, improves the estimation of the global sound pleasantness over the controlled sound sequences.This effect tends to decline or disappear when the sound sequences are more realistic, including, among other things, some visual information.

•
The global sound pleasantness can be estimated by using the median or the arithmetic average of the instantaneous sound pleasantness values.

•
The instantaneous sound pleasantness is mainly impacted by the sound level during the last few seconds.Reaction and integration times are used by participants for estimating the continuous judgment of the pleasantness of the sound environment.The sound level time series can be more accurately taken into account with the SIMPLE model, which then highlights that the last 30 s also influence, although to a lesser extent, the instantaneous sound pleasantness assessments.

•
Finally, the Global sound pleasantness can be accurately estimated based on the sound level time series of the 3 min sequences, either by relying on an intermediate estimation of the instantaneous sound pleasantness values, or directly based on the sound level time series, through an arithmetic average or a median value of the L eq,1s values.Both approaches are relevant, explaining about 60% of the variance in the global sound pleasantness, with an error inferior to 0.75 points over an 11-points scale.
The final proposed model enables one to estimate the sound pleasantness of a walking trip along a particular path, based on the sound level time series encountered along that path.The conclusions from this work could thus be helpful for constructing models that select urban walking routes with optimal sound pleasantness.

Figure 1 .
Figure 1.Modeling framework.(A): relations between sound-level time series and perceived overall pleasantness; (B) relations between sound-level time series and perceived continuous pleasantness; (C): relations between perceived continuous and overall pleasantness (C).

Figure 1 .
Figure 1.Modeling framework.(A): relations between sound-level time series and perceived overall pleasantness; (B) relations between sound-level time series and perceived continuous pleasantness; (C): relations between perceived continuous and overall pleasantness (C).

Figure 2 .
Figure 2. (A) Experimental setup and (B) graphical interface for continuous assessments.

Figure 2 .
Figure 2. (A) Experimental setup and (B) graphical interface for continuous assessments.

Figure 3 .
Figure 3. Continuous perceived pleasantness, mean values over participants Pmean (thick black line), standard deviations (light black lines), and sound level (Leq,1s, purple) over time for the 16 sequences.

2 Figure 3 .
Figure 3. Continuous perceived pleasantness, mean values over participants P mean (thick black line), standard deviations (light black lines), and sound level (L eq,1s , purple) over time for the 16 sequences.

Figure 5
Figure 5 depicts, for each of the 10 sequences presented in Section 2.4.2, the 1s sound pleasantness evolution (mean values and standard deviation of the participants) and the 1s sound level.A large variety of sequences is observed.

Figure 5
Figure 5 depicts, for each of the 10 sequences presented in Section 2.4.2, the 1 s sound pleasantness evolution (mean values and standard deviation of the participants) and the 1 s sound level.A large variety of sequences is observed.

Figure 5 .
Figure 5. Continuous perceived pleasantness, mean values over participants Pmean (thick black line), standard deviations (light black lines), and sound level (Leq,1s, purple) over time for the 16 sequences.

Figure 5 .
Figure 5. Continuous perceived pleasantness, mean values over participants P mean (thick black line), standard deviations (light black lines), and sound level (L eq,1s , purple) over time for the 16 sequences.

3. 2 .
From Measurements to Continuous and Retrospective Perceived Pleasantness 3.2.1.Continuous Sound Pleasantness Estimation Based on Noise Level Time Series Section 3.1 demonstrated the possibility of relating the global sound pleasantness of a 3-min walking trip, to its perceived pleasantness time series.Thus, relating the perceived continuous sound pleasantness values to physical noise indicators, is a required intermediate step for proposing an estimate of the global sound pleasantness based on noise level time series.This section attempts to develop such relations, from the corpus of the 10 audiovisual sequences.

Figure 6
Figure6presents the correlations, averaged over the 10 sequences, between all the instantaneous 125 ms pleasantness rates and the calculated f function, for different rt and it values, and considering the function f and the usual noise indicators L50, L90, L10, and Leq.The presented correlation is calculated over 1000 observations (from the 3-min sequences sampled at 125ms, but subtracting the earliest 400 values for integration purposes).The four noise indicators result in similar correlations, although correlations when using L90 are slightly less significant.The correlation curves simultaneously describe the influence of the two parameters, rt and it.The best couples {it; rt} range between 3 and 10 s for it, and between 0 and 2 s for rt.The maximum correlation found, 0.84, is obtained for the couple {6; 0} and the Leq function.Thus, the resulting integration time, also called the "psychological

Figure 6 .
Figure 6.Pearson correlation coefficient between pleasantness and L 50 , L 90 , L 10 , and L eq , with varying reaction and integration times.

Figure 7 .
Figure 7.Estimated global pleasantness from Pestimated versus assessed global pleasantness for the 10 sequences.

Figure 7 .
Figure 7.Estimated global pleasantness from P estimated versus assessed global pleasantness for the 10 sequences.

Table 1 .
Short description and labels for the travelled sound environments., for each of the 16 sequences presented in Section 2.4.1, the 1 s sound pleasantness (P) evolution (mean values and standard deviation for the 25 participants) and the 1 s sound level.The combination of the initial sequences α and β resulted in 16 sequences, with a large variety.

Table 2 .
Pleasantness averaged both over participants and over time (P mean ), the global sound pleasantness (GP) averaged over participants, and the difference between both (∆GP-P mean ), for each of the 16 sequences.

Table 3 .
Test variables for the multilinear regression.

Table 4
Pleasantness averaged both over participants and over time (Pmean), the global sound pleasantness (GP) averaged over participants, and the difference between both (ΔGP-Pmean), for each of the 16 sequences.

Table 4 .
Pleasantness averaged both over participants and over time (P mean ), the global sound pleasantness (GP) averaged over participants, and the difference between both (∆GP-P mean ), for each of the 16 sequences.

Table 5 .
Different models to estimate global pleasantness from physical measurements.

Table 5 .
Different models to estimate global pleasantness from physical measurements.