A Virtual-Reality and Soundscape-Based Approach for Assessment and Management of Cultural Ecosystem Services in Urban Forest

: The work develops an analysis based on integrated perception of landscape and soundscape in urban forests (UF) to classify recreational suitability at spatial level. Scientiﬁc literature stresses a lack of decision support systems allowing for management of cultural ecosystem services in UF. An innovative approach grounded on landscape and soundscape’s individual perceptions are implemented to cope with this need. Geographic Information System, virtual reality and psychoacoustic parameters are merged to allow for an improved elicitation of willingness to visit UF. Geostatistical methods and the use of the Street View application enable for spatialization of output. The test area is located in an urban park of the city of Florence (Italy). Results stress the importance of logistics and tree variables (e.g., density) to assess the cultural service in the case of visual perception. Natural and people-related sounds as well as aural loudness seem to be signiﬁcant for integrated perception, in addition to visual parameters. The open-source approach applied in the research can simplify replication to other case studies and the updating of the output. Future improvement and integration of the work for UF recreational planning are suggested.


Introduction
Smart and sustainable planning of urban forest (UF) is promoted by European policies and guidelines to favor an inclusive economy and well-being at city level, as well as to endorse the delivery of Cultural Ecosystem Services (CES). The Millennium Ecosystem Assessment [1] defines the CES as "the non-material benefits that people obtain from ecosystems through spiritual enrichment, cognitive development, reflection, recreation and aesthetic experiences". In recent years, the CES concept has assumed importance in UF planning and management. A detailed review conducted by Dickinson et al. [2] has reported the significance of CES in UF and urban green spaces, as well as the methods to quantify CES from both biophysical and economic viewpoints [3]. A peculiarity of CES is the importance of spatial valuation due to a strong relationship with place [4]. Geographic information systems (GIS) could be a useful tool to facilitate visualization and disclosure of green spaces' management process. Among topics focused on the GIS application in UF, the multi-scale assessment of CES [5] and the participatory process in planning [6] can be mentioned.
Decisions and design over CES in UF should be developed while taking into account the complexity of the environment represented by variables, such as the acoustic and visual parameters [7], the perception of people [8], the management of CES [9], as well as the relation among the above factors. In literature, several examples relate in statistical terms visual landscape preferences with map-based indicators of landscape structure [10,11]. Dramstad et al. [10], among other conclusions, stress the relevance of people preferences based on the spatial structure of indicators in rural environments. The authors also reveal that different groups of people prefer different types of landscape, underlining the need for care in output interpretation. Martìn et al. [11] developed a method to evaluate the complementarity of assessing landscape character using a series of map-based indicators, and the method for assessing the visual quality of the landscape based on pictures. Other works introduced the evaluation of soundscape, stressing the importance of acoustic elements in perceived suitability for recreation in UF [12], or the influence of spatial contexts on soundscapes in urban spaces [13].
Recent investigations have been focused on the evaluation of urban green spaces from integrated perceptions point of view, and the validity of the immersive Virtual Reality (VR) technique for multisensory evaluation of green spaces has been demonstrated in different studies [14]. Ruotolo et al. [15], through objective cognitive measures and subjective valuations, assessed the detrimental effects of a new facility (motorway) on people's well-being. The authors introduced an audio-visual approach grounded on VR technology and audio rendering techniques. The noise impact regarding perception of future renovation alternatives of urban public spaces has been also analyzed [16]. In that research, it was observed that the pleasantness of motorway barriers increases with reduction of traffic noise level, but the aesthetic aspect has a stronger impact in respect to audio. The importance of VR integration with natural sound has been demonstrated in physiological tests highlighting a link among nature visualization, the sounds of nature, and stress recovery [17]. The mutual effect and correlation between landscape and soundscape has been also evaluated [18,19]. Liu et al. [18] analyzed visual and functional landscape characteristics in relation to individual sounds, as well as overall soundscape preference, to evaluate appreciation of five city parks in China. Audio-visual experiments and eye-tracking tests are applied in Ren et al. [19] to examine the effects of soundscapes on rural landscape evaluations in terms of visual quality assessment, landscape tranquility, and landscape preference indicators.
Literature review stresses how, while the importance of CES evaluation for the planning of UF is of great importance, there is still a lack of case studies and practical tools able to consider the reciprocal influence of landscape and soundscape. This study aims to develop a spatial-based model to support the zoning of UF according to their ability in furnishing CES according to users' perception. As reported in Dickinson et al. [2], "aesthetics, recreation and sense of place are so interrelated that they often cannot be considered separately"; thus, the willingness to visit (WTV) a UF has been here considered as a proxy of CES delivery [20], thanks to its capacity to evaluate CES provision in holistic terms. The methods involve immersive VR technique for landscape as well as soundscape evaluation from respondents; a geostatistical approach for zoning UF has been applied through GRASS GIS 7.4 software. The case study is depicted in a UF of Florence city (central Italy).

Study Area and Land Informative System
The study area is the Park of Cascine, in Florence (Central Italy; Figure 1). The park accounts for a total of about 7600 trees on an area of 130 hectares. This UF is selected due to its representativeness of the general characteristics and conditions of national UF and urban green spaces. In particular, the Park of Cascine is the greatest green space in the urban and metropolitan area of Florence, the main site for sport and outside events in the city, and one of the most extensive among Italian UF [21,22]. The land informative system (LIS) includes geodata collected and supplied as open data [23]. The LIS is composed by the following vector data: • localization of single trees with specific characteristics of plants (i.e., diameter and species); • roads with car access; • footpaths. Among available open geodata, the three variables reported above have been chosen because of their representativeness for the study area and recreational function. The choice follows a participative focus group involving researchers and academic experts of the sectors (GIS, urban CES). Among available open geodata, the three variables reported above have been chosen because of their representativeness for the study area and recreational function. The choice follows a participative focus group involving researchers and academic experts of the sectors (GIS, urban CES).

Applied Techniques and Geostatistical Analysis
The general framework of the method is highlighted in Figure 2. The development of interviews based on immersive VR was possible thanks to the Street View functionality of Google Map [24] and auditory stimuli presented through headphones integrated into the VR mask [25]. WTV was examined for a sample of points (18) randomly distributed in the park and included in the Street View coverage (roads with car access, footpaths in green open space or forest). The random distribution was facilitated by the application of v.random command of GRASS GIS 7.4 software [26]. Every point was loaded on the VR headset by means of smartphone. VR allows for a total immersion of respondents in a specific environment, facilitating the trade-off between development of interviews and realistic visualization and perception of stimulus [27]. For different points, a stratified sample of respondents (24, in line with similar studies, e.g., Yu et al. [28]) was asked to elicit WTV for the area through a 5-step Likert scale (1: Very low WTV; 2: Low WTV; 3: Medium WTV; 4: High WTV; 5: Very high WTV). Interviewed were 50% male and 50% female, classified in three classes of age (18-33; 34-48; 49-64). Visual and integrated (visual + audio) perceptions were surveyed for each point (2 interviews for visual analysis and 2 interviews for integrated evaluation). Audio registrations were performed in each localization by means of handy recorder. Sounds were registered for 3 min in the period of June-August 2017 (time 9:00-12:00 a.m.) in order to be uniform with both spherical images and meteorological conditions. Sound intensity was measured by a digital phonometer equipped with datalogger (model PCE-322A© ). For the registration period, average, minimum, and maximum sound intensity were computed-in dBA of LAeq [29]-through the software SoundLevelMeter©

Applied Techniques and Geostatistical Analysis
The general framework of the method is highlighted in Figure 2. The development of interviews based on immersive VR was possible thanks to the Street View functionality of Google Map [24] and auditory stimuli presented through headphones integrated into the VR mask [25]. WTV was examined for a sample of points (18) randomly distributed in the park and included in the Street View coverage (roads with car access, footpaths in green open space or forest). The random distribution was facilitated by the application of v.random command of GRASS GIS 7.4 software [26]. Every point was loaded on the VR headset by means of smartphone. VR allows for a total immersion of respondents in a specific environment, facilitating the trade-off between development of interviews and realistic visualization and perception of stimulus [27]. For different points, a stratified sample of respondents (24, in line with similar studies, e.g., Yu et al. [28]) was asked to elicit WTV for the area through a 5-step Likert scale (1: Very low WTV; 2: Low WTV; 3: Medium WTV; 4: High WTV; 5: Very high WTV). Interviewed were 50% male and 50% female, classified in three classes of age (18-33; 34-48; 49-64). Visual and integrated (visual + audio) perceptions were surveyed for each point (2 interviews for visual analysis and 2 interviews for integrated evaluation). Audio registrations were performed in each localization by means of handy recorder. Sounds were registered for 3 min in the period of June-August 2017 (time 9:00-12:00 a.m.) in order to be uniform with both spherical images and meteorological conditions. Sound intensity was measured by a digital phonometer equipped with datalogger (model PCE-322A©). For the registration period, average, minimum, and maximum sound intensity were computed-in dBA of LAeq [29]-through the software SoundLevelMeter© v.3.0 (https://www.pce-instruments.com/english/measuring-instruments-kat_40035.htm). In the interviews involving soundscape analysis, additional audio characteristics of the single-point registration were quantified. In particular, sound dominance was elicited by described 5-step Likert scale for three acoustic typologies [30]: interviews involving soundscape analysis, additional audio characteristics of the single-point registration were quantified. In particular, sound dominance was elicited by described 5-step Likert scale for three acoustic typologies [30]:  natural sounds (wind, birds, water, etc.);  people (voices, steps, music, etc.);  traffic and other artificial sounds (car, motorbikes, work, etc.). The correlation between sound dominance (dependent variables) and territorial characteristics of the Park was verified through GIS-based multiple regression. The examined independent variables were: the distance from roads with car access and footpaths, as well as the tree parameters (average diameter and tree density). Both mean diameter and tree density were computed with focal statistics in GRASS GIS 7.4 by means of a kernel of 35 × 35 m [26].
The maps of WTV in visual (WTVv) or integrated (WTVi) evaluation were developed through interpolation (Inverse Distance Square Weighting algorithm; [26]) of sound dominance (for natural sounds, people, traffic, and other artificial sounds), distance from facilities as well as tree parameters (average diameter and tree density).
Final consideration focuses on specific variables (among the above mentioned) that could be relevant for the improvement or worsening of perception from visual to integrated assessment. The correlation between sound dominance (dependent variables) and territorial characteristics of the Park was verified through GIS-based multiple regression. The examined independent variables were: the distance from roads with car access and footpaths, as well as the tree parameters (average diameter and tree density). Both mean diameter and tree density were computed with focal statistics in GRASS GIS 7.4 by means of a kernel of 35 × 35 m [26].
The maps of WTV in visual (WTVv) or integrated (WTVi) evaluation were developed through interpolation (Inverse Distance Square Weighting algorithm; [26]) of sound dominance (for natural sounds, people, traffic, and other artificial sounds), distance from facilities as well as tree parameters (average diameter and tree density).
Final consideration focuses on specific variables (among the above mentioned) that could be relevant for the improvement or worsening of perception from visual to integrated assessment.

Results
The maps of LIS were elaborated to obtain the density of trees, the average diameter, and the distance from both roads with car access and footpaths (Figure 3).

Results
The maps of LIS were elaborated to obtain the density of trees, the average diameter, and the distance from both roads with car access and footpaths (Figure 3). The maps for natural (soundn) and people-related (soundp) sounds were spatialized, taking into account the negative correlation with tree density (p < 0.01, R 2 : 0.89) and with the distance from footpaths (p < 0.05, R 2 : 0.63), respectively (Figure 4a,b).
The WTVv has a negative correlation with average diameters (φ) (p < 0.05), and a positive one with the distance to roads (distr) (p < 0.05) (R 2 : 0.76; Figure 4c  The maps for natural (sound n ) and people-related (sound p ) sounds were spatialized, taking into account the negative correlation with tree density (p < 0.01, R 2 : 0.89) and with the distance from footpaths (p < 0.05, R 2 : 0.63), respectively (Figure 4a,b).

Results
The maps of LIS were elaborated to obtain the density of trees, the average diameter, and the distance from both roads with car access and footpaths (Figure 3). The maps for natural (soundn) and people-related (soundp) sounds were spatialized, taking into account the negative correlation with tree density (p < 0.01, R 2 : 0.89) and with the distance from footpaths (p < 0.05, R 2 : 0.63), respectively (Figure 4a,b).
The WTVv has a negative correlation with average diameters (φ) (p < 0.05), and a positive one with the distance to roads (distr) (p < 0.05) (R 2 : 0.76; Figure 4c   The psychoacoustic parameter (loudness) varies from a minimum of 24.10 to a maximum of 72.2 dBA (average of 47.2 dBA), in line with other researches in UF [13]. Figure 5 highlights improvement/worsening from WTVv to WTVi based on the sound perception as well as sound intensity. Figure 5a shows how a greater perception of traffic and artificial sounds tends to reduce the WTV; vice versa, a greater level of people-related sounds and natural aural increase WTV from visual to integrated evaluation. From intensity viewpoint, the output reveals how superior loudness seems to reduce the WTV for the UF (Figure 5b).  The psychoacoustic parameter (loudness) varies from a minimum of 24.10 to a maximum of 72.2 dBA (average of 47.2 dBA), in line with other researches in UF [13]. Figure 5 highlights improvement/worsening from WTVv to WTVi based on the sound perception as well as sound intensity. Figure 5a shows how a greater perception of traffic and artificial sounds tends to reduce the WTV; vice versa, a greater level of people-related sounds and natural aural increase WTV from visual to integrated evaluation. From intensity viewpoint, the output reveals how superior loudness seems to reduce the WTV for the UF (Figure 5b).

Discussion
Natural sounds seem to be negatively related with stand density [31] due to the presence of bird species related with open spaces alternated to forest land. People-related sounds have an intuitive negative trend associated to distance from footpaths.
The WTVv increases with the diminishing of average tree diameter, or in other terms, with a growth of forest density (see Figure 4a,b). Despite several researches highlighting how forests with lower stand density are more preferable for recreational visit due to greater visual potential and sense of safety (e.g., Scarpa et al. [32]), a change in attitudes in urban populations towards wilder landscapes and dense vegetation was also demonstrated [33,34]. Some authors [35] suggested that 150-160 trees per hectare "was considered to be the ideal against an open background, but the number dropped if the background was dense". In other terms, recreational suitability can be related with tree density by an inverse U-shaped curve. The linear correlation that emerged in our study could be caused by the density distribution in the study area (generally low: 95th percentile of density: 106.1 trees/ha).

Discussion
Natural sounds seem to be negatively related with stand density [31] due to the presence of bird species related with open spaces alternated to forest land. People-related sounds have an intuitive negative trend associated to distance from footpaths.
The WTVv increases with the diminishing of average tree diameter, or in other terms, with a growth of forest density (see Figure 4a,b). Despite several researches highlighting how forests with lower stand density are more preferable for recreational visit due to greater visual potential and sense of safety (e.g., Scarpa et al. [32]), a change in attitudes in urban populations towards wilder landscapes and dense vegetation was also demonstrated [33,34]. Some authors [35] suggested that 150-160 trees per hectare "was considered to be the ideal against an open background, but the number dropped if the background was dense". In other terms, recreational suitability can be related with tree density by an inverse U-shaped curve. The linear correlation that emerged in our study could be caused by the density distribution in the study area (generally low: 95th percentile of density: 106.1 trees/ha).
The introduction of aural stimuli leads to a significative importance of natural as well as people-related sounds in elicitation of WTVi. The above results suggest how aural perception augments the identification in the proposed context with a consequent improvement in actual WTV. In fact-on the contrary of WTVv-WTVi is related (in positive terms) with average diameter (an inverse relation with stand density confirms the majority of literature evidences; [36]). The differences in WTV assessment from visual to integrated perception are also analyzed in terms of sound characteristics. Figure 5a confirms results denoted by other studies in urban green spaces [30]. Recreational uses of UF are favored by a prevalence of natural as well as people-related sounds; traffic noise seems to mask the above sound (even if present) and to decrease WTVi. People reveal a preference for low-intensity sounds (Figure 5b). Independently of arousal typology (artificial/traffic, natural, or people-related), the worsening from WTVv to WTVi is associated with the increase of loudness. Yang and Kang [37] affirmed how negative "subjective evaluation of the sound level correlate highly with objective sound measures, especially when the sound level is above [ . . . ] 73 dBA". This assertion seems to be confirmed in our study for sound intensity below the threshold of 73 dBA.

Conclusions
Suggested methodology can facilitate recreational planning in UF, allowing for an integrated evaluation of visual and aural peculiarities. New technologies (e.g., VR) as well as geo-referenced mapping by means of GIS can facilitate depiction of specific recreational areas or thematic paths (e.g., birds sound trails, stress recovery areas, high WTV footpaths, etc.). Communication and participatory processes among policy-makers, planners, and citizens can also be favored by spatial representation of territorial peculiarities and results.
The outputs reveal how the WTV the area is influenced not only by territorial variables (e.g., size of tree), but also by sounds. In this study, both sound typology and intensity seem to have a role in elicitation of perceived recreational characteristics.
Improvements to the work should be directed to an increase in respondent number as well as point sample. In this study, a multiple regression is applied due to its robustness, even though the sample size is limited; however, future integration could investigate usefulness of additional (geo)statistical techniques such as non-parametric model (e.g., through chi squared test) or geographic weighted regression [38]. An emphasis on specific variables could be performed. For example, additional investigation on stand density can be carried out to verify correlation with WTV. Other improvements should focus on integration among different techniques, e.g., neurophysiological parameters could be evaluated to demonstrated correlation among WTV and stress recovery in UF [17]. New technologies such as high-resolution spherical video-cameras can simplify collection and presentation of stimuli.
As reported by Aletta et al. [39], the international researches on soundscape should agree on relevant soundscape descriptors. Eventually, a permanent study area should be provided to examine variation of perceptions in different times due to seasonal and long-term variability of UF characteristics.