Smartphone-Based Participatory Soundscape Mapping for a More Sustainable Acoustic Environment

The urban environmental planning, a fundamental dynamic process for cities’ sustainability, could benefit from the soundscape approach, dealing with the perception of the acoustic environment in which sound is considered as a resource rather than a waste (noise). Noise and soundscape maps are useful tools for planning mitigation actions and for communication with citizens. Both mappings can benefit from crowdsourcing and participatory sound monitoring that has been made possible due to the large use of internet connections and mobile devices with dedicated apps. This paper is a “scoping review” to provide an overview of the potential, benefits, and drawbacks of participatory noise monitoring in noise and soundscape mapping applications, while also referring to metrological aspects. Gathering perceptual data on soundscapes by using digital questionnaires will likely be more commonly used than printed questionnaires; thus, the main differences between the experimental protocols concern the measurement of acoustic data. The authors propose to classify experimental protocols for in-field soundscape surveys into three types (GUIDE, MONITOR, and SMART) to be selected according to the survey’s objectives and the territorial extension. The main future developments are expected to be related to progress in smartphone hardware and software, to the growth of social networks data analysis, as well as to the implementation of machine learning techniques.


Introduction
Urbanization is expected to continue in the future, causing an increase in the number of people living in cities to reach up to 6.5 billion by 2050 [1]. Because of this trend, urban environmental planning is a necessary and dynamic process to ensure that the utilization of land and other resources will be sustainable, in order to meet the needs of present and future generations. Hence, urban environmental planning should not only be developed for the people, but should also be developed with the people, engaging citizens to actively participate in all stages of the planning process.
The acoustic environment is an important topic in urban planning, even though it has been undervalued until some decades ago compared to the visual environment [2]. Furthermore, another fundamental aspect is the perception of the acoustic environment which is the focus of "soundscape", a multi-criteria concept introduced in the late 1960s dealing with how acoustic environments would affect the perceived quality of cities and how sounds could be used in urban planning and design [3,4]. There is clear evidence in the literature that soundscape is an increasingly important research topic attracting the interest of many disciplines, like music, physics, psychology, data coming from multiple networks at different scales, in order to promote sustainability and improve people's quality of life [18]. This paper is a "scoping review" on the above topics with the aim of highlighting the potential, benefits, and drawbacks of participatory noise monitoring in noise and soundscape mapping applications. This type of review was deemed to be useful, especially for non-experts on acoustics, considering the multidisciplinary nature of the soundscape approach and the growing interest in it through research and applications, as well as the increasingly spread of participatory sensing and, in particular, of smartphone applications for sound measurement. The experience of the authors on soundscape, especially on field surveys [19][20][21][22], helped them to browse and select references available on the web, which have been retrieved through relevant keywords on search engines, by looking at the references reported in the collected papers and at similar articles listed by the publishers.
As outlined in Figure 1, after the introduction, the paper has been divided into sections to facilitate the understanding of such a broad and multidisciplinary topic, starting with the review of the participatory sensing paradigm and a focus on the environmental noise aspect (Section 2). Then, the following three sections describe the main features of the smartphone applications dealing with sound measurement (Section 3), noise, and soundscape mapping (Sections 4 and 5, respectively). All these sections provide some examples of applications. Moreover, Section 5 reports some methodologies to collect acoustic and perceptual data and a proposal of classifying experimental protocols for in-field soundscape surveys into three types (GUIDE, MONITOR, and SMART), to be selected according to the survey's objectives and the territorial extension. In the end, Section 6 summarizes the conclusions, addressing some fields for possible future developments.
Sustainability 2020, 12, x FOR PEER REVIEW 3 of 19 This paper is a "scoping review" on the above topics with the aim of highlighting the potential, benefits, and drawbacks of participatory noise monitoring in noise and soundscape mapping applications. This type of review was deemed to be useful, especially for non-experts on acoustics, considering the multidisciplinary nature of the soundscape approach and the growing interest in it through research and applications, as well as the increasingly spread of participatory sensing and, in particular, of smartphone applications for sound measurement. The experience of the authors on soundscape, especially on field surveys [19][20][21][22], helped them to browse and select references available on the web, which have been retrieved through relevant keywords on search engines, by looking at the references reported in the collected papers and at similar articles listed by the publishers.
As outlined in Figure 1, after the introduction, the paper has been divided into sections to facilitate the understanding of such a broad and multidisciplinary topic, starting with the review of the participatory sensing paradigm and a focus on the environmental noise aspect (Section 2). Then, the following three sections describe the main features of the smartphone applications dealing with sound measurement (Section 3), noise, and soundscape mapping (Sections 4 and 5, respectively). All these sections provide some examples of applications. Moreover, Section 5 reports some methodologies to collect acoustic and perceptual data and a proposal of classifying experimental protocols for in-field soundscape surveys into three types (GUIDE, MONITOR, and SMART), to be selected according to the survey's objectives and the territorial extension. In the end, Section 6 summarizes the conclusions, addressing some fields for possible future developments.

Participatory Sensing
The outstanding advances in mobile phone technology (processing power, embedded sensors, storage capacities, and network data rates), together with their large diffusion (3.2 billion users worldwide estimated in 2019 [23]) enable accomplishing large-scale sensing, known in the literature as "participatory sensing" [24,25]. The key idea behind this form of crowdsourcing is to empower citizens to collect and share sensed data from their surrounding environments using their mobile phones by means of dedicated applications. In the basic architecture of participatory sensing citizens collect data using their mobile devices and upload them to an application server using existing communication infrastructure, e.g., 4G data communication standard or WiFi access points. Then, the application server combines the received data from participants, computes the data statistics, and uses the results to build a representation of the phenomenon of interest.
Mobile phones, though not built specifically for sensing, can be readily used as sensors: the builtin camera provides video and images; the microphone, when not used for voice communications, can act as an acoustic sensor; the embedded GPS (Global Positioning System) receiver can provide

Participatory Sensing
The outstanding advances in mobile phone technology (processing power, embedded sensors, storage capacities, and network data rates), together with their large diffusion (3.2 billion users worldwide estimated in 2019 [23]) enable accomplishing large-scale sensing, known in the literature as "participatory sensing" [24,25]. The key idea behind this form of crowdsourcing is to empower citizens to collect and share sensed data from their surrounding environments using their mobile phones by means of dedicated applications. In the basic architecture of participatory sensing citizens collect data using their mobile devices and upload them to an application server using existing communication infrastructure, e.g., 4G data communication standard or WiFi access points. Then, the application server combines the received data from participants, computes the data statistics, and uses the results to build a representation of the phenomenon of interest.
Mobile phones, though not built specifically for sensing, can be readily used as sensors: the built-in camera provides video and images; the microphone, when not used for voice communications, can act as an acoustic sensor; the embedded GPS (Global Positioning System) receiver can provide location information. Other embedded sensors, such as a gyroscope, accelerometer, and proximity sensor can collectively be used to estimate useful contextual information (e.g., if the user is walking or moving by bicycle). Further, additional sensors, such as external microphone enabling an accurate calibration, can be easily interfaced with the phone via Bluetooth or wired connections [26].
Participatory sensing offers many advantages over traditional sensor networks which require a large number of static wireless sensor devices, particularly in urban areas. For instance, the use of mobile phones and communication (cellular or WiFi) infrastructures enables a limited cost for implementing participatory sensing. The inherent mobility of the phone carriers can provide large spatio-temporal coverage and also makes it possible to observe unexpected events.
Several applications are already available for participatory sensing, also in the field of environmental noise monitoring. However, a participatory sensing system relies on the data collected by volunteers and these data come only from the place and time where the volunteers are present and decide to collect and transfer them. Thus, sample data sent by mobile phones are usually randomly distributed in space and time, and may be incomplete. Recovering the original spatio-temporal profile of the monitored data from random and incomplete samples obtained via crowdsourcing is a challenge already under study, i.e., by using compressive sensing [27].
A requirement in participatory sensing is the protection of users' privacy, a topic surveyed in [28]. Users are aware of the possible consequences of privacy infringement, and may therefore be reluctant to contribute to the sensing campaigns, diminishing their impact and relevance at a large scale. To reduce the risk that a user's privacy might be compromised, mechanisms to preserve it are mandatory [29].
Unlike opportunistic sensing where individuals are passively involved, for example by pre-authorizing their sensing device to share information, as used for instance in the Ear-Phone app [30], and unlike data scavenging where the individual remains unaware of the data collection process, for example agreeing to the fact that their posts are in the public domain, participatory sensing is the only type of data collection that requires an explicit and active involvement of the person [31]. Thus, the participants usually determine how, when, what, and where to share their sensed data.
Regarding the acoustic environment appraisal, asking people in a place, most often outdoors, to describe their aural experience automatically triggers attentive and descriptive listening, discriminating between the sound sources in the complex mixture forming the acoustic environment. Such an analytical listening style would only be important in those cases when the intended activity includes a strong attention focusing on the environment or when the sound is so prominent and salient (foreground sound) that listening to it cannot be avoided [8]. Analytic listening most likely requires more cognitive efforts and is, as a consequence, also slower than holistic listening that allows the person to more quickly create a mental image of the acoustic environment as a whole and to act correspondingly [8]. As stated in [32], soundscape assessment relies upon the identification of the sound sources, or at least their type [33,34], and their perceived prominence. Another important issue worth exploring is the possible influence of the use of smartphones when the person is listening to the environment.
The direct and active involvement of participants might introduce erroneous and/or malicious contributions. For instance, users may unwittingly position their devices such that incorrect measurements are recorded, e.g., storing their phone in a bag or a pocket while taking sound recordings, or deliberately increasing or decreasing noise data. Thus, it is necessary to train the participants on correct data measurement protocol to reduce such drawbacks. It would also be worthwhile for the application server to automatically evaluate the trustworthiness of collected data to detect unfaithful contributions [35].
It is also important that participatory sensing applications do not require significant energy consumption for users during sensing, processing, and data transmission. In particular, some sensors such as GPS consume significantly more energy than others. Thus, it is vital for participatory applications to make use of these sensors with conservative energy consumption [26].
Other issues to consider are that participatory projects often fail to attract enough participants, participants tend to be engaged with projects only briefly, and even successful projects rely on a small share of their contributors. Projects also often do not involve a representative sample of society: they tend to attract individuals with high levels of education and pre-existing interest, persons with income above the average, and so forth. Even though the ubiquity of mobile phones makes mass participation feasible, it remains questionable how the public can be motivated to voluntarily participate (opportunity, rewards, curiosity, etc.). In addition to explicitly and intentionally join a sensing campaign, as above mentioned, they may simply share data on social media, treated as sensor networks, with no awareness of the sensing application. They may share data on their own, unaware to do a public good, or may be motivated to do it in exchange for a benefit [31]. For instance, analysis of data self-shared on social network linked to the perception of the acoustic environment quality have provided interesting outcomes [36], like those of ChattyMaps [37].
Last but not least, participatory sensing enables the active involvement of citizens in environmental issues, which is an important topic that is also acknowledged by the European Directive 2003/35/EC on providing and improving public participation on environmental topics [38].

Smartphone-Based Sound Measurement
The growth of smartphone applications for sound measurement and their improvements is so rapid that it is impossible, and useless, to list all those available on the web. An updated review and a comparison of a selection of them is reported in [39][40][41]. Moreover, an extensive survey [42] shows evidence that there is still much heterogeneity in terms of technologies applied and deployment approaches, although modular designs at both client and server elements seem to be dominant. The Android mobile operating system is the most diffuse client platform, while server platforms are typically web-based, and client-server communications mostly rely on XML or JSON over HTTP. The main pitfall concerns the performance evaluation of the different proposals, which typically fail to make a scalability analysis, despite being a critical issue when targeting very large communities of users.
There is still a large debate in the scientific community on the use of these applications as an alternative to the traditional measurement instrumentation, like a sound level meter, complying with standard requirements. Notwithstanding this, the use of these applications is increasing for environmental noise, as they can be used by citizens for detecting the sound level in their living environment. Moreover, in the framework of participatory sensing, these data when shared can offer to enlarge the area under monitoring and increase the spatio-temporal resolution.
The user has to consider whether spectral information (i.e., 1/3 octave bands and/or Fast Fourier Transform (FFT) spectra) or even more advanced features (i.e., the sonogram in which the sound pressure level is plotted versus the time and frequency) are needed, keeping in mind that this might require more power consumption and a higher bandwidth. If 4G data communication standard is used, then the price of data transmission may become a significant factor in the deployment of the sensor network and, for this reason, the transmission can often be limited to those instances where the smartphone can connect to the internet free of charge.
Most of these applications display to the user a map showing her/his location and some of them allow the user to add some tags to the audio signal, such as sound sources recognized by the user, an assessment of the heard sound intensity of each source, an appraisal of the perceived pleasantness of the acoustic environment, and so forth. These tags (including date, time and, in some apps, a picture of the site) are useful for retrieving the sound measurements and are essential to migrate from noise maps to soundscape maps. Some applications (i.e., NoiseTube [43]) additionally provide users with a noise exposure dosimeter that informs them of their daily "dose" of noise pollution.

Example of a Smartphone Application
OpeNoise is a free and open-source application developed by the Regional Agency for Environmental Protection of Piedmont in Italy, downloadable in the Android and iOS versions from the Google Play Store and Apple App Store, respectively [44,45]. The real time output includes the A-weighted continuous equivalent level at 1 s time frame (L Aeq1s ), the corresponding maximum and minimum values and the overall L AeqT over the measurement time T. Regarding the frequency domain, the FFT spectra (for frequencies above 200 Hz), the 1/3 octave band spectra (computed from the FFT spectrum), and the relevant sonogram are also available ( Figure 2). Before the measurement, an offset can be set by the user to obtain calibrated readings and the recorded data can be saved in a text file. The laboratory and field tests performed have shown a satisfactory accuracy (±1 dB) of the application compared to a calibrated sound level meter within the most typical ranges of environmental noises, namely 45-80 dB(A) and 200-5000 Hz [46]. For iPhone devices, more recent analysis has shown a good linearity in the range from 35 to 110 dB(A). domain, the FFT spectra (for frequencies above 200 Hz), the 1/3 octave band spectra (computed from the FFT spectrum), and the relevant sonogram are also available ( Figure 2). Before the measurement, an offset can be set by the user to obtain calibrated readings and the recorded data can be saved in a text file. The laboratory and field tests performed have shown a satisfactory accuracy (±1 dB) of the application compared to a calibrated sound level meter within the most typical ranges of environmental noises, namely 45-80 dB(A) and 200-5000 Hz [46]. For iPhone devices, more recent analysis has shown a good linearity in the range from 35 to 110 dB(A).

Metrological Issues
The quality of built-in smartphone microphones is constantly increasing and their frequency response and amplitude linearity is often quite acceptable. Notwithstanding this, the use of smartphones in sound measurement raises several metrological questions. For instance, these microphones have a directivity pattern maximizing the sound field at normal incidence (microphone pointed towards the sound source) and can lead to significant errors at oblique incidence. The main differences between the smartphones used for sound measurement and the professional instrumentation, such as sound level meters, deal with [47]: 1. The hardware of the microphone, based on Micro-Electro-Mechanical System (MEMS) technology in the smartphone and on the condenser principle (capacitors microphones) in the sound level meter 2. The specific algorithms, filters, and the sound application programming interfaces (API) that are used to process the sensed values.
Several studies are available in the literature dealing with the above metrological aspects, for instance [48][49][50][51][52], and some indications may be summarized from their outcomes, as listed in the following: 1. Apps seem to be more consistent and perform better on iOS than on Android-based systems [48], most likely because iOS smartphones share the common audio architecture "Core Audio"

Metrological Issues
The quality of built-in smartphone microphones is constantly increasing and their frequency response and amplitude linearity is often quite acceptable. Notwithstanding this, the use of smartphones in sound measurement raises several metrological questions. For instance, these microphones have a directivity pattern maximizing the sound field at normal incidence (microphone pointed towards the sound source) and can lead to significant errors at oblique incidence. The main differences between the smartphones used for sound measurement and the professional instrumentation, such as sound level meters, deal with [47]: 1.
The hardware of the microphone, based on Micro-Electro-Mechanical System (MEMS) technology in the smartphone and on the condenser principle (capacitors microphones) in the sound level meter.

2.
The specific algorithms, filters, and the sound application programming interfaces (API) that are used to process the sensed values.
Several studies are available in the literature dealing with the above metrological aspects, for instance [48][49][50][51][52], and some indications may be summarized from their outcomes, as listed in the following:

1.
Apps seem to be more consistent and perform better on iOS than on Android-based systems [48], most likely because iOS smartphones share the common audio architecture "Core Audio" and are produced by a single firm and high quality hardware components are used for all its models; in contrast, Android smartphones are produced by different firms, causing a wide variability of hardware quality and leading to a greater variability in accuracy.

2.
A research showed that smartphones of different brands and models behave differently from each other but smartphones of the same model behave in the same way [49]. This could lead to the idea that once a specific correction function is defined for each smartphone with respect to the sound level meter, then this can be valid for all the smartphones of the same brand and model. 3.
The use of external calibrated microphones greatly enhances the accuracy and precision of smartphone-based noise measurements, provided that in outdoor usage, an appropriate windshield is mounted on the microphone to reduce the influence of wind and the effects of motion [50].

4.
Concerning stability over the time, in the study described in [51] several types of microphones were placed outdoors for a period of 6 months to investigate their responses under extreme temperatures and their aging in a humid environment. The best type of consumer microphone reading deviated less than 2 dB(A) from the reference equipment and a limited meteorological dependence was observed. 5.
The age of the smartphone seems to influence the measurement accuracy; on average, younger phones provide more accurate readings than older ones, but with greater unsteadiness. Whether this outcome is due to the deterioration of microphone hardware over time or due to contemporary versions of noise apps, which are coded more accurately for microphones in newer smartphones, is unclear and requires more extensive testing [48].
Another main limitation of mobile sound measurements concerns the geo-localization of the collected data (a typical precision of about 10-50 m) due to the GPS data deviation occurring sometimes outdoors in urban areas (e.g., satellite signals being shielded or lost). These errors can be much more critical than the errors on the noise levels themselves, as they can allocate high noise levels to quiet streets, and the other way around. This location inaccuracy is even more frequent and larger in indoor than in outdoor environments [41].
The calibration of smartphone applied for noise monitoring is a critical issue. An external microphone should be preferred to the internal one because the former makes the calibration by a reference sound easier and more accurate. Recently, a calibration procedure to be run in the field has been proposed [53], based on the comparison of the FFT average spectra of environmental sound recorded simultaneously by the smartphone microphone and a class 1 microphone as an uncompressed WAVeform audio file format. Furthermore, for the most popular smartphones, pre-defined calibration values are available from some developers, such as in the app "Noise Exposure/Buller" published by the Swedish Work Environment Authority or in the system iCal [54]. However, it has to be pointed out that any inappropriate use of the sound measurement protocol may introduce errors much bigger than those due to the smartphone hardware and to the app, leading to unreliable sound pressure level values.
Smartphone apps are still unlikely to replace professional instruments or comply with relevant standards in the near future. However, due to the advancements made in app design and external microphones availability, the gap between professional instruments and smartphone-based apps is rapidly reducing [55][56][57].

Smartphone-Based Noise Mapping
Among the actions required by the Directive 2002/49/EC, noise mapping of the main sources (transport and industries) is one of the most commonly implemented actions, in part because its reporting to the European Commission is mandatory at least every five years. The noise maps represent in a graphical and immediate way, through colored scale intervals, the spatial distribution of a specific sound descriptor (usually the day-evening-night level L den in dB(A)). They are mainly obtained by numerical models of outdoor sound propagation and many efforts have been applied in the last years in Europe to harmonize these models [58]. The role of sound measurements is therefore rather limited, even if it is important to tune the model to a real environmental setting. Indeed, these measurements require appropriate instrumentation, are time consuming and expensive, and cannot be applied on a large scale, as needed in urban areas.
Many improvements have been achieved in the last decades on these drawbacks, mainly due to the development of low cost and reliable noise sensors (e.g., MEMS microphones realized with a Micro-Electro-Mechanical System component placed on a printed circuit board and protected with a mechanical cover). Such devices can be spatially distributed and connected in a network to allow the realization of dynamic noise mapping, which is an automatic, frequent update of large-scale noise maps [59]. An extensive review of this approach based on wireless acoustic sensor networks is described in [39,60]. However, these maps, even if useful to inform citizens on a noise pollution situation, do not provide any information on the relevant harmful health effects, such as annoyance or sleep interference [61,62], which in actual urban areas have reached large impacts, as also shown by the noise levels reduction observed during the recent lockdown restrictions due to the COVID-19 outbreak [63].
Noise maps, at least those realized by numerical simulations according to the Directive 2002/49/EC requirements, show some limitations [41]: 1.
The considered noise sources are limited to transport infrastructures or industrial sites and, therefore, noise maps do not reflect the real whole sound environment (e.g., cultural or festive activities, markets, etc.) to which people are exposed during their daily activity 2.
Noise emission models are simplified (for example, urban road traffic is considered to be constant on a road section, without considering traffic dynamics), leading to "frozen" noise maps, with the exception of dynamic noise maps 3.
Sound propagation models are based on approximations, which become larger as the environmental settings become more complex 4.
Noise maps require high computational capabilities and long calculation times at the scale of an agglomeration due to the large amount of the entailed data 5.
Numerical simulations at an agglomeration scale require a large amount of information concerning the investigated area (e.g., the buildings, the topography, the nature of soils and of the road pavements, the yearly probabilities of occurrence of meteorological conditions, etc.) and the noise sources (e.g., road, railway and air traffics, noisy industries and activities, etc.); unfortunately, some of these simulation inputs are sometimes missing, incomplete, or difficult to estimate.
The above limitations can be overcome in some way by using smartphone applications for noise monitoring on a large scale. These applications are one of the main components in a basic architecture of participatory noise mapping platforms. As shown in Figure 3, the data collected by the participants by means of the smartphone noise measurement applications are sent via the internet to the server in charge of many tasks, namely data collection, data processing, data geo-referencing, and data representation reporting the outcome on a web portal for data communication to the public.
The above limitations can be overcome in some way by using smartphone applications for noise monitoring on a large scale. These applications are one of the main components in a basic architecture of participatory noise mapping platforms. As shown in Figure 3, the data collected by the participants by means of the smartphone noise measurement applications are sent via the internet to the server in charge of many tasks, namely data collection, data processing, data geo-referencing, and data representation reporting the outcome on a web portal for data communication to the public.  For instance, "NoiseCapture" [64] is the application used in the Noise-Planet project [65], aimed to provide a global and generic framework dedicated to collecting environmental noise measurements, their numerical modeling, and their perceptual assessment. As the project has been developed by open-source components, in line with the concept of the INSPIRE Directive in Europe [66], all source codes, methodologies, tools, and raw data are freely available and, if necessary, can be duplicated for any specific use. The Spatial Data Infrastructure (SDI) manages all the raw data collected by participants at three different levels, namely the measurement point in a path (1 s short L Aeq value, black points in Figure 4), the measurement path (the L Aeq value over the whole recording duration) and an aggregation area, shaped as a regular hexagon with a 15 m radius [67] (hexagon delimited by red lines in Figure 4). The L A50 and overall L Aeq values on the whole measurement time are given, as well as other information such as the occurrences of perceived sound sources in the acoustic environment reported by a word cloud (the size of the word is proportional to how often it is mentioned by contributors). Only hexagons containing at least 1 measurement point in a path are displayed on the map provided by OpenStreetMap (free crowdsourced online maps). For instance, "NoiseCapture" [64] is the application used in the Noise-Planet project [65], aimed to provide a global and generic framework dedicated to collecting environmental noise measurements, their numerical modeling, and their perceptual assessment. As the project has been developed by open-source components, in line with the concept of the INSPIRE Directive in Europe [66], all source codes, methodologies, tools, and raw data are freely available and, if necessary, can be duplicated for any specific use. The Spatial Data Infrastructure (SDI) manages all the raw data collected by participants at three different levels, namely the measurement point in a path (1 s short LAeq value, black points in Figure 4), the measurement path (the LAeq value over the whole recording duration) and an aggregation area, shaped as a regular hexagon with a 15 m radius [67] (hexagon delimited by red lines in Figure 4). The LA50 and overall LAeq values on the whole measurement time are given, as well as other information such as the occurrences of perceived sound sources in the acoustic environment reported by a word cloud (the size of the word is proportional to how often it is mentioned by contributors). Only hexagons containing at least 1 measurement point in a path are displayed on the map provided by OpenStreetMap (free crowdsourced online maps).
As mentioned above, these noise maps are not alternative to the "standard" noise maps, but are complementary to these because they provide additional information on the acoustic environment. Indeed, they deal with sound sources often present in the daily citizen exposure (markets, leisure and recreational noise, etc.) but not often considered in the "standard" noise maps, mainly dealing with transport and industrial noise. The optional presence of some perceptual qualitative appraisal on the acoustic environment is an added value, linking these maps to those representing the perceived quality of soundscape.

Smartphone-Based Soundscape Assessment
Standard noise maps do not include the citizens' perception of the acoustic environment, which plays an important role in the use of the outdoor public places by citizens [68]. However, examples of soundscape mapping are available in the literature, some reporting the sound quality of the As mentioned above, these noise maps are not alternative to the "standard" noise maps, but are complementary to these because they provide additional information on the acoustic environment. Indeed, they deal with sound sources often present in the daily citizen exposure (markets, leisure and recreational noise, etc.) but not often considered in the "standard" noise maps, mainly dealing with transport and industrial noise. The optional presence of some perceptual qualitative appraisal on the acoustic environment is an added value, linking these maps to those representing the perceived quality of soundscape.

Smartphone-Based Soundscape Assessment
Standard noise maps do not include the citizens' perception of the acoustic environment, which plays an important role in the use of the outdoor public places by citizens [68]. However, examples of soundscape mapping are available in the literature, some reporting the sound quality of the acoustic environment in terms of spatial distribution of psychoacoustic descriptors (loudness, sharpness, roughness, fluctuation strength) [69], others describing a procedure to process perceptual data and spatial information [70][71][72][73][74][75]. Two important aspects need to be addressed:

1.
Soundscapes often vary considerably over time and space: an urban square can be a market in the morning, a quiet pedestrian area in the afternoon, and a gathering place for "movida" at evening and night. Thus, long-term average values have little or no meaning, while it is necessary to describe the situation for specific time periods and, if necessary, also for different listening locations 2.
The type of perceived sound sources influences the assessment of soundscape quality: sources that are appropriate or expected to be present in an environment are evaluated as being less unpleasant and more acceptable than others that are not expected or not consistent with the environment (i.e., road traffic noise heard in an urban park). Many surveys, e.g., [76], have indicated the validity of grouping sounds into the broad categories of "natural" (i.e., birdsongs, water fountain), "anthropic" (i.e., steps, shouting, voices), and "mechanical, technological or man-made" (i.e., transport, machines) [77].
Smartphones can be very useful for collecting perceptual data regarding soundscape appraisal. Indeed, the digital version of a self-completion structured questionnaire can be easily accessed on the respondents' smartphones with many advantages in comparison with the traditional printed version. For instance, a digital questionnaire is inexpensive to distribute and is transmitted as soon as it is completed, together with a notification, allowing researchers to conduct preliminary data analyses while waiting for the desired number of responses to accumulate. Controls can be included in its design to automatically check that the respondent answers to all the questions according to the presentation order, to avoid multiple responses to single response questions, and so forth. This gathering method can be used for either respondents participating to a traditional soundwalk guided by an experimenter [20,78] or other persons autonomously selecting the place and time of the soundscape appraisal and sending their corresponding perceptual data via the internet. Furthermore, the digital version enables researchers to record the start and end time of each filled questionnaire, which is useful for associating the relevant sound recording stretch and to evaluate the respondent performance.
The Technical Specification ISO/TS 12913 Part 2 [79], aimed at harmonizing the collection of data by which relevant information on the soundscape key components (people, acoustic environment, and context) are obtained, measured, and reported, is a useful guide to define the questionnaire (questions, rating scales) to be implemented in the smartphone application.
The ISO/TS 12913 Part 2 in its Annex C proposes two alternative questionnaires for perceptual data collection (Method A and B) and a general protocol for conducting narrative interviews, which typically take place off-site and aim at gathering more qualitative data and deepen the experts' understanding of the context (Method C) [80]. According to the standard, the questionnaire should, at least, contain questions dealing with:

1.
The extent at which the sound sources have been perceived (i.e., appraisal given on a 5-point ordinal category-scale from "not at all perceived" to "predominant"), for instance the sources can be divided into the broad categories of natural, anthropic, and technological sources.

2.
Appraisal of some attributes of the soundscape, for instance pleasant, chaotic, vibrant, uneventful, calm, annoying, eventful, and monotonous [81]; translation from English to other languages should be carefully done to preserve the original meaning of the attribute. 3.
The assessment of the perceived quality of the surrounding acoustic environment, given on a 5-point ordinal category-scale, for instance from "very good" to "very bad" with a neutral point in the middle.

4.
An appraisal of the perceived appropriateness of the surrounding acoustic environment, given on a 5-point ordinal category-scale, for instance from "not at all appropriate" to "perfectly appropriate". 5.
The assessment of the perceived quality of the surrounding environment as a whole, also including its visual aspect, again given on a 5-point ordinal category-scale, for instance from "very good" to "very bad" and neutral point in the middle. 6.
The motivation of the participant to record and send her/his data, such as being a resident of the place, either a frequent or occasional visitor, and if she/he is a naive or has experience in soundscape appraisal.
The questionnaire should be limited to the essential questions, avoiding too many long questionnaires that give a greater burden for the respondent and may lead to lower response rates and quality of responses. Questions should be meaningful and interesting to the respondent and their wording is very important, as it can have a significant impact on people responses. Checks should be run during the completion of a questionnaire to verify that all the questions are answered properly and in the required order. Additional questions, for instance, could deal with characteristics of soundscape worth preserving, such as its ecological, aesthetic, affective values, comfort, identity, and uniqueness. An extensive review of the methods for soundscape data collection has been recently published, analyzing 52 peer-reviewed papers over the past 20 years [82].
In participatory soundscape mapping, drawbacks can occur, e.g., in terms of bias of crowdsourced perceptual assessments and because data are often collected in a fragmented and discrete way (punctual samples), making the data less relevant from the management and planning perspective [17]. To overcome these limitations, a protocol for predicting and mapping emotional dimensions and qualitative aspects of urban environments from recorded sounds has been proposed for a systematic characterization of soundscape in urban contexts [17].

Outline of Experimental Protocols
The digital questionnaire will likely be more commonly applied than its printed version for gathering perceptual data on soundscapes. It will always be more easily available on participants' smartphones either during traditional guided soundwalks with the presence of an experimenter or during sound crowdsourcing campaigns. Thus, the main differences among the experimental protocols concern the instrumentation used for the measurement of acoustic and other environmental factors. The latter factors (primarily the visual aspect) should not be neglected, as there is clear evidence in the literature that merely examining the acoustic level is not sufficient for predicting the assessed soundscape quality, and that a more holistic method of characterizing the environment is required [83]. Many different experimental protocols are described in the literature and the guidelines reported in the ISO/TS 12913 Part 2 are intended to harmonize the methods used to gather environmental and perceptual data. However, these methods are most often tuned to the objectives to be achieved.
As a tentative approach, the authors propose to classify experimental protocols for in-field surveys into three types (GUIDE, MONITOR, and SMART), diversified for the level of details, accuracy, and survey duration. The choice of the most suitable protocol depends at least on the survey's objectives, the territorial extension, the resources, and the time available. From the first, best performing experimental protocol down to the others, the tasks involving the experimenter are progressively less demanding and potentially have less influence on the participants' appraisal. The first two protocols do not necessarily require a digital questionnaire, as a traditional printed version can be used.

Traditional Guided Soundwalk (GUIDE): Very Detailed, High Accuracy, Short-Term
The acquisition of the environmental physical data is performed by an expert experimenter which is, usually, in charge of undertaking the acoustic measurements by using appropriate instrumentation and according to specific experimental protocols. To take account of the multidimensional features of the environment, upgraded experimental protocols have been proposed, such as in [83,84], including not only sound level meter and binaural recordings, but also spatial audio (i.e., Ambisonics) and 360 • video recordings. These recordings could be included in archives, also shared on the web, that could be used in laboratory experiments with fully-immersive virtual reality trials.
In this protocol, the experimenter guides the participants throughout the study area, following a specific path and stopping in selected locations where, after a short listening time, the soundscape appraisals have to be given. This procedure cannot prevent the influence of the experimenter's instructions on the respondents' appraisals.
The acquisition of acoustical and perceptive data is normally limited to few hours, on repeated days if needed.

Unattended Noise Monitoring System (MONITOR): Detailed, High/Medium Accuracy, Medium-Term
There is a growth in the demand and application of unattended remote acoustic monitoring systems, most often used to check the compliance with noise limits in specific places and situations. These systems, formed by one or more terminals connected to a server, detect continuously (on a 24/7 basis) the noise level by using either a class 1 sound level meter or a low cost, less accurate device. They can provide a support in the soundscape assessment, as long as the uncompressed WAVeform audio file format recordings are available, can be collected simultaneously to the subjective appraisals and the microphone is not too far from people filling the questionnaire. These audio recordings enable post-processing to determine the acoustic and psychoacoustic parameters of interest. Contrary to the binaural recordings, the audio recordings detected by the monitoring microphone are monoaural and, therefore, are not ecologically valid [85] for being played back in a laboratory for listening trials. Furthermore, the acoustic parameters determined by post-processing enable modeling only the acoustic features of the environment and not the soundscape as a whole, because of a lack of data on other environmental factors.
The site where the monitoring terminal is located should be displayed to participants in the campaign by using a totem sign, while also reporting written instructions for the perceptual appraisals. It can be freely visited, even more than once, by participants at their preferred order and time. Video surveillance cameras installed in the selected places for security purposes and the relevant recordings can be useful to get information on the use of a place during the monitoring.
In this protocol, the experimenter has an important role in order to involve people in taking part to the campaign, also through social media calls, and to explain its objectives and the instructions in a way to reduce her/his influence as much as possible.
Due to costs and logistic constraints, this protocol can be mainly applied for medium-term monitoring (up to 1 week) in order to have autonomous power capability and a limited number of units simultaneously operating.

Smartphone-Based Protocol (SMART): Limited Details, Low Accuracy, Medium/Long-Term
All the data (audio/video recordings and subjective appraisals) are collected self-reliantly by the participants with their smartphones, in the places they select. This approach is very simple and costless but the quality of the acoustic data is questionable, as widely discussed in Section 3. However, using the right tools, i.e., up-market smartphones, and stimulating the citizens' awareness, a participatory sensing paradigm to collect and share sensed data from the surrounding environment can be achieved. It is very unlikely that citizens on their own would provide a soundscape appraisal of a certain place (although they are used to share their reviews/opinions) unless there are strong awareness campaigns and/or they could somehow get benefits from their cooperation.
The involvement of the experimenter is fundamental for the good success of the experimental campaign and is similar to that described in protocol 2.
The sound measurement is normally limited to a short-time acquisition to save smartphone resources, but the campaign can be considered as a medium/long-term collection of objective data and subjective appraisals, as the participants are free to decide when to carry out sound measurements by using the application installed on their smartphone and sending the acoustic data and the filled questionnaire to the server.

Examples of Smartphone-Based Soundscape Assessment
An example of smartphone-based soundscape assessment is that developed within the CITI-SENSE project [86], aimed at studying the acoustic comfort of urban spaces. The system is based on a smartphone application, an external microphone, and an embedded questionnaire (filled in simultaneously with the sound measurement) through which the participant reports a perceptual analysis of the surrounding acoustic environment. Post-processing of acoustic and perceptive data is carried out to provide the result in the app as soon as it has been completed, enabling the observers to receive understandable feedback of their appraisal. The outcome of the post-processing is an indicator of the acoustic comfort, called ESEI (Environmental Sound Experience Indicator). This index considers not only noise levels, namely 1 s short L Aeq , L Amax , and L Amin , but also the detection of noise events (based on a dynamic threshold principle), and the composition and perceived pleasantness of the sound sources in the area. At the time a sound event is detected during the observation, a pop-up message is displayed on the screen and the user is asked to evaluate her/his perception (i.e., pleasant or unpleasant) and the type of sound source producing the event. The flowchart of the experimental protocol is outlined in Figure 5. All the results of the observations have been uploaded and visualized on the web site of the CITI-SENSE project. The participants are involved on voluntary basis and receive the smartphone and instructions to perform the observations, including both the acoustic measurement and the questionnaire to be completed. To assure that the experimental protocol is properly applied, the participants have been accompanied by experts during their observations. Thus, participants do not have full autonomy to go to any place at any time. This approach is somewhat similar to that usually undertaken in soundwalks, where the participants are asked to walk in silence and listen to the acoustic environment. Afterwards, or at given locations along the walk, they are asked to fill in a questionnaire, most commonly on a paper sheet but sometimes available on a smartphone, or to participate in an interview about their impressions of the area they are in. For the data to be reliable, it is essential that all participants perform their soundscape assessments at the same time and in the same location(s). Audio recordings, or acoustic measurements, are obtained simultaneously to the participants' appraisal and later processed for determininge the descriptors of the corresponding acoustic environment [87].
along the walk, they are asked to fill in a questionnaire, most commonly on a paper sheet but sometimes available on a smartphone, or to participate in an interview about their impressions of the area they are in. For the data to be reliable, it is essential that all participants perform their soundscape assessments at the same time and in the same location(s). Audio recordings, or acoustic measurements, are obtained simultaneously to the participants' appraisal and later processed for determininge the descriptors of the corresponding acoustic environment [87]. Another approach, more in line with the participatory sensing concept, is that proposed by the "Beyond the Noise: Open Source Soundscapes" project [88], aimed at identifying, assessing and planning "everyday quiet areas" in cities, by implementing the soundscape approach, the citizen science paradigm, and a smartphone application (the Hush City app). The Hush City app allows the collection at the same location and by the same user of a set of qualitative and quantitative data in a short time frame, approximately 3 min. The users must verify their email before signing in and using the app. The collected data consist of the audio recording (44,100 Hz sampling, 16 bit resolution, duration set at 30 s) and related sound pressure levels (LAeq, LAmax, and LAmin), a picture of the place (6 Figure 5. Flowchart of the experimental protocol developed in the CITI-SENSE project (adapted from [86]).
Another approach, more in line with the participatory sensing concept, is that proposed by the "Beyond the Noise: Open Source Soundscapes" project [88], aimed at identifying, assessing and planning "everyday quiet areas" in cities, by implementing the soundscape approach, the citizen science paradigm, and a smartphone application (the Hush City app). The Hush City app allows the collection at the same location and by the same user of a set of qualitative and quantitative data in a short time frame, approximately 3 min. The users must verify their email before signing in and using the app. The collected data consist of the audio recording (44,100 Hz sampling, 16 bit resolution, duration set at 30 s) and related sound pressure levels (L Aeq , L Amax , and L Amin ), a picture of the place (6 MP max resolution and 24 bit color), and user feedback given through a structured questionnaire. Questions deal with emotional responses, semantic descriptors, perceived quietness, positive and negative sounds, level of oral interaction and social communication, sense of the place, landscape quality, level of maintenance and cleanliness, sense of security, accessibility to the location, major sound sources, user status, weather conditions, number of people, and major activities performed in the area. By clicking on the button "Map the quietness around you", users are guided through data collection, while clicking on the button "Quiet Areas" leads to users being guided through an exploration of datasets shared by other users and stored by the host provider in Germany. Figure 6 shows the flowchart of the Hush City app. MP max resolution and 24 bit color), and user feedback given through a structured questionnaire. Questions deal with emotional responses, semantic descriptors, perceived quietness, positive and negative sounds, level of oral interaction and social communication, sense of the place, landscape quality, level of maintenance and cleanliness, sense of security, accessibility to the location, major sound sources, user status, weather conditions, number of people, and major activities performed in the area. By clicking on the button "Map the quietness around you", users are guided through data collection, while clicking on the button "Quiet Areas" leads to users being guided through an exploration of datasets shared by other users and stored by the host provider in Germany. Figure 6 shows the flowchart of the Hush City app. Figure 6. Flowchart of the Hush City application (adapted from [89]).

Conclusions
This scoping review highlights the benefits and drawbacks of participatory noise monitoring. Empowering people to actively participate in improving their living environment, including its acoustic aspects, is a paradigm shift towards a more sustainable and healthy environment. In this Figure 6. Flowchart of the Hush City application (adapted from [89]).

Conclusions
This scoping review highlights the benefits and drawbacks of participatory noise monitoring. Empowering people to actively participate in improving their living environment, including its acoustic aspects, is a paradigm shift towards a more sustainable and healthy environment. In this context, participatory sensing is a remarkable example of progress and a powerful tool to encourage people not only to collect environmental data through their smartphone by dedicated applications, but also to provide their appraisal of their perceived living environment. This approach also increases the awareness of people towards a better and healthier environment, contributing to also transform their behaviors towards more sustainable ones. Despite the Citizen Science approach being very appealing and increasing due to the outstanding progress and diffusion on Internet of Things (IoT), criticalities and challenges are still present. Among these, voluntary participants are often not a representative sample of the population, there are difficulties attracting enough participants and convincing them to participate for a medium/long period of time, and there are concerns regarding privacy and data reliability.
On the other hand, due to the advancements in quality of smartphone hardware and in app design, sound measurements obtained by using smartphones is rapidly improving in quality, thereby reducing the gap with professional instruments. Smartphones are easy and ready to use at any time during daily life and allow an easier participation of large numbers of citizens, thus allowing researchers to gather a large range of information about more acoustic environments and spatio-temporal coverings, to obtain more dynamic noise and soundscape maps to improve the knowledge of the acoustic environment and its modeling, to plan noise mitigation actions, and to make the communication to citizens easier.
Moreover, the implementation of questionnaires on smartphones to collect perceptual data can be fruitfully used either during traditional guided soundwalks, with the presence of the experimenter, or during crowdsourcing campaigns. Experimental protocols of field soundscape studies mainly differ in the way they measure the acoustic and other environmental indicators. In this context, the authors propose to classify experimental protocols for in-field surveys into three types (GUIDE, MONITOR, and SMART), diversified for sound measurement devices used, from professional ones to smartphone apps, as well as the level of details, accuracy, and survey duration. From the first, best performing experimental protocol down to the others, the choice of the most suitable protocol depends at least on the survey's objectives, the territorial extension, the resources, and the time available.
The main future developments are expected to be related to the progress in smartphone hardware and software for sound measurements and to a wider and more systematic participation to acoustic environment crowdsourcing data, stimulated by appropriate web platforms. In this context, the concept of Participatory Experience Performances by which humans can become more aware of one's own perception of the environment should be applied even more frequently [90]. This approach, based on a better understanding of multi-sensory and sensory-aesthetic aspects, is an opportunity to identify the conditions for a pleasant urban environment and to gain the necessary skills to develop appropriate actions in the planning process to consciously enhance the design of places in a noisy environment. Another interesting application could be the identification of quiet areas, green spaces, natural surroundings, and tranquility trails in urban areas [91,92]. The importance of these areas for health protection and promotion is quite obvious, as they are potentially well suited to the needs of citizens of metropolitan areas in relieving stress and improving feelings of well-being. Further interesting developments will deal with analysis of tags used in social networks to describe everyday acoustic environments and/or pictures of places, and machine learning techniques for sound categorization in terms of perception [93,94].
The acoustic and perceptual data and their distribution over the urban area can represent thematic layers which can be overlapped with many other layers, such as those dealing with other environment pollutants (i.e., air, water), transport, and mobility. This information is very important for authorities and city planners in better assessing the status of urban contexts, thus fostering one of the core targets of Smart Cities, i.e., leveraging citizens' participation in order to make the city a better and healthier place to live in.
Author Contributions: G.B. and F.P. conceived the review, wrote, and revised the paper. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.