A Scoping Review Mapping Research on Green Space and Associated Mental Health Benefits

Background: There is a growing interest in research investigating the association between green space (GS) and mental health and wellbeing (HWB), in order to understand the underlying mechanisms. Accordingly, there is a need to map the literature and create an overview of the research. Methods: A scoping review approach was used to map literature on GS, including context and co-exposures (the GS exposome), and their associations with mental HWB. The review considers mental HWB definitions and measurements and how GS is characterized. Furthermore, the review aims to identify knowledge gaps and make recommendations for future research. Results: We identified a great diversity in study designs, definitions, outcome measures, consideration of the totality of the GS exposome, and reporting of results. Around 70% of the 263 reviewed studies reported a positive association between some aspect of GS and HWB. However, there is a limited amount of research using randomized controlled crossover trails (RCTs) and mixed methods and an abundance of qualitative subjective research. Conclusions: The discords between study designs, definitions, and the reporting of results makes it difficult to aggregate the evidence and identify any potential causal mechanisms. We propose key points to consider when defining and quantifying GS and make recommendations for reporting on research investigating GS and mental HWB. This review highlights a need for large well-designed RCTs that reliably measure the GS exposome in relation to mental HWB.


Introduction
Several reviews have highlighted the positive association between green space (GS) and mental health and wellbeing (HWB). These reviews have generally focused on GS in a narrow sense, such as forest therapy [1,2], community GS [3,4], or urban GS [5][6][7], and a number of reviews have looked at GS in relation to urbanicity and urban planning [8,9]. Other reviews have focused on specific GS activities, such as community gardening [10], horticultural therapy [11,12], therapeutic gardening for the elderly, [13], spending time in a forest [2,14], and GS in the living environment [15]. Reviews have also explored the connections between biodiversity, ecosystem services, and human health and wellbeing [16][17][18]. The reviews generally identify positive associations between the narrowly defined GS investigated and measures of mental HWB.
Design of and access to GS is particularly relevant in cities where GS, among other social and environmental factors, is under pressure due to urbanization [19]. It is estimated that by 2050, more than two-thirds of the world's population will live in urban areas. This has led to a large number of research studies with a focus on mental HWB and access to urban GS. Urbanization is associated with increased levels of mental illness, including anxiety and depression [20][21][22]. Access to urban GS has been positively associated with mental HWB [23,24], but the underlying reasons for this are still not well-understood.
GS has also been shown to be associated with mental HWB in rural areas [25][26][27][28]. When Gilbert, Colley and Roberts [29] investigated subjective wellbeing in rural areas of Scotland, they found that residents living in remote rural areas reported higher levels of life satisfaction compared with non-rural areas. Other studies investigating associations between mental HWB and GS in rural areas have found a significant relationship with rurality [30,31].
There is increasing interest in understanding the factors that may make GS beneficial for HWB [46]. However, most reviews do not consider contextual factors, such as culture and accessibility, or co-exposures, such as sound and light. The developing concept of the exposome [47] encompasses the totality of exposures we face as humans, from conception onwards, and the combined effect of these exposures on HWB. An exposome approach to investigating GS could help us understand exactly what is beneficial for mental HWB.
We have carried out a scoping study to map the available literature on different types of GS, including the context and co-exposures, and their associations with mental HWB, considering how mental HWB is defined and measured and how GS is characterized. Furthermore, the review aims to identify any current knowledge gaps and make recommendations for future research on the subject.

Materials and Methods
A five-step scoping review methodology was used to collect, evaluate, and present the analysed literature [48]:

•
Identifying the research question(s); • Identifying relevant studies; • Study selection; • Charting the data; • Collating, summarizing, and reporting the results.
The following research questions (RQ) were used to underpin the search strategy: 1.
How do different types of GS (recreational, residential, urban, rural) affect HWB and how much green space is needed for health improvement? 2.
How can we best define, measure, and quantify GS and mental HWB? 3.
Do different co-exposures or contextual factors affect the mental HWB outcome? 4.
Do different age groups and population subgroups benefit differently from exposure to GS?
Theoretical, empirical, and experimental studies were included, with a focus on links between GS of any description and mental HWB of any definition. To our knowledge, no review has attempted to map the totality of literature on GS and the associated effects on mental HWB. In this scoping review, we adopt a wide definition of GS and GS activities, including small urban pockets of GS, remote rural areas, horticultural therapy, allotment gardening, and virtual green space. This was done to try to shed light on the effect of contextual factors and co-exposures potentially influencing the effects of GS on mental HWB.
Studies with a main emphasis on biological diversity or physical activity, not including a detailed investigation of associated mental HWB outcomes, were excluded. Studies focusing on children under the age of 18 were excluded, as the mechanisms and contextual factors related to mental HWB may be different in children than in adults. In situations where the age range of participants included people under the age of 18, a decision to include or exclude the paper was based on each individual study, considering the contribution the study findings and conclusions would make to this review. Studies with an emphasis on GS in war or disaster zones were excluded, as these are extreme circumstances and not applicable to the general population. Studies with a focus on urban design, not investigating any associated mental HWB outcomes, were also excluded. Only peer-reviewed literature was included, and grey literature and all conference proceedings, abstracts, or opinion pieces were excluded. Keywords for two main concepts were generated and used for the literature search (Table 1).
All papers were pooled and duplicates removed, resulting in a total of 7042 papers. The literature was initially screened by two members of the research team (CWN, JC), using a comparative and consensus orientated method ( Figure 1). After exclusion based on the title and abstract, there were 417 papers for review. When applying the inclusion/exclusion criteria, another 173 papers were excluded. An additional 19 papers were included from the reference lists from key-papers, taking the total papers for review to 263. The included literature was charted following the technique described by Arksey and O'Malley [47], to synthesize and interpret the studies by sorting them according to key issues and themes. Each study was analysed according to the type of GS investigated, health outcomes and measures, experimental design, and methods used. The quality of the included studies was not systematically assessed, so this review does not determine the robustness of findings from the included literature. The reviewed literature was then collated, summarized, and reported in four thematic groups ( Table  2) Table 2. The literature was divided into thematic groups based on the type of GS investigated (literature reviews are not included).

Type of Green Space
Group 1 Horticulture, garden, allotment (n = 43) Group 2 Urban and mixed green space (n = 140) Group 3 Wild, natural or rural green space (n = 34) Group 4 Virtual or indoor green space (n = 24) The literature was further divided into 'type of study' (cross-sectional or longitudinal, controlled trial, randomized or non-randomized, with or without crossover); 'methods' (what methods have been used to measure mental HWB and GS. Quantitative or qualitative data collection methods); 'health outcome' (the type of mental HWB assessed); and whether the study has reached a positive or negative conclusion (were initial hypothesis proven right or wrong). Comprehensive lists were generated, comprising all the different mental HWB outcomes investigated and all the different tools used to assess the health outcomes. This was done to get an overview of the totality and complexity of studies, and to identify the most commonly used methods for assessing mental HWB. The included literature was charted following the technique described by Arksey and O'Malley [47], to synthesize and interpret the studies by sorting them according to key issues and themes. Each study was analysed according to the type of GS investigated, health outcomes and measures, experimental design, and methods used. The quality of the included studies was not systematically assessed, so this review does not determine the robustness of findings from the included literature. The reviewed literature was then collated, summarized, and reported in four thematic groups (Table 2). Table 2. The literature was divided into thematic groups based on the type of GS investigated (literature reviews are not included).

Group 1
Horticulture, garden, allotment (n = 43) Group 2 Urban and mixed green space (n = 140) Group 3 Wild, natural or rural green space (n = 34) Group 4 Virtual or indoor green space (n = 24) The literature was further divided into 'type of study' (cross-sectional or longitudinal, controlled trial, randomized or non-randomized, with or without crossover); 'methods' (what methods have been used to measure mental HWB and GS. Quantitative or qualitative data collection methods); 'health outcome' (the type of mental HWB assessed); and whether the study has reached a positive or negative conclusion (were initial hypothesis proven right or wrong). Comprehensive lists were generated, comprising all the different mental HWB outcomes investigated and all the different tools used to assess the health outcomes. This was done to get an overview of the totality and complexity of studies, and to identify the most commonly used methods for assessing mental HWB.

Numerical Analysis
This analysis is used to highlight the dominant areas of research with respect to the study design, type of participants, methods used, main conclusions, and country where the study has been conducted. The papers were divided into groups based on the study design ( Table 3). The majority of studies were cross-sectional (86.3%), with only 13 studies being longitudinal (4.9%). There were nine studies with a Randomized Controlled Trial (RCT) study design with a crossover element and 21 studies using an RCT study design without a crossover element. Table 3. The included studies were divided into groups based on their study design (some papers are represented in more than one group, i.e., a cross-sectional study with an RCT design). The majority of studies used only qualitative methods 212 (80.6%), with only 29 studies using a combination of qualitative and quantitative data collection methods (11%). Twenty-two of the publications were reviews (8.4%).

Type of Study # of Studies
Different countries will face different co-exposures and contextual factors, which may potentially affect the HWB outcomes in different ways (RQ 3). To understand the representation from around the world, the literature was charted according to the continent where the study took place ( Table 4). The majority of studies were conducted in Europe (46.8%), followed by North America (24.3%), Asia (11%), and Australia (6.8%). Most of the studies conducted in Europe were from western and northern parts; the UK (38%), followed by Sweden (15.4%) and the Netherlands (6.5%). Based on Table 4, it is evident that a majority of studies have been carried out in the developed part of the world. The identified benefits of GS in developed countries may not be applicable to less developed countries. The same is the case between temperate and tropical areas, with most studies being carried out in the former. Different population subgroups might benefit differently from exposure to GS (RQ 4). To investigate what population subgroups have typically been used to assess the effects of GS on mental HWB, the literature was sorted according to participant type (Table 5). For ease of overview, the different participant types have been grouped together where reasonable overlap and similarity was identified. The most common type of study participant was the general public (30% of all included studies), followed by university and college students (14.1%) and individuals with mental health issues and disorders (12.2%). There is a long list of studies that have used more specific participant types, i.e., park users, allotment gardeners, adults with burnout syndrome, depression, mental health issues, female prisoners, woodland workers, people building their own houses etc. Therefore, despite the majority of papers focusing on the general public, there is a great variety of specific population subgroups being investigated in relation to the health benefits of various types of GS exposure ( Studies were charted as 'positive' if the hypotheses were confirmed, 'negative' if the main hypotheses were not confirmed, and 'mixed' if the hypotheses were only partly confirmed. Note that a study charted as 'negative' does not necessarily mean the study found a negative effect of GS exposure on mental HWB. Only 4.6% of studies were charted as negative (see e.g., [49,50]), 25.7% of studies were charted as 'mixed', and 70.1% of studies were charted as 'positive'. It should be noted that a proportion of the studies report a positive finding in the abstract, but when investigating the results in more detail, we found that any mixed or negative findings were played down in the summary. The percentages presented here are based on the abstracts.

Thematic Analysis
The literature has been organized according to thematic groups to address research question 1. There were 22 literature reviews identified, which are not included in this thematic analysis. An in-depth evaluation of these is beyond the scope of this review.
Group 2 encompasses studies focusing on urban GS or mixed GS (140 studies). Included in this group was any GS located in an urban setting, and studies that used a mixture of GS types where it was not possible to assign the study to one of the other groups and where there was a main focus on urbanicity. This group is large and very diverse and for many of the studies, it was difficult to categorize and determine exactly what type of GS was being investigated, due to the lack of details used to describe the space. It was therefore not practical to further divide this group into subgroups in a meaningful way. Out of the 140 studies focusing on urban green space, 130 were cross-sectional and 10 were longitudinal, 125 studies used qualitative methods, and 15 used quantitative or mixed methods.
Group 3 encompasses wild, natural, and rural GS (34). This group includes GS types such as care farms; adventure therapy; rural neighborhoods; and wild nature like mountains, national parks, beaches, and large forests. Due to the diversity of the investigated GS, this group was further divided into eight subgroups: care farms (5.9%), forest GS (29.4%), natural green exercise (2.9%), nature connectedness and restorativeness (8.8%), nature interventions (17.6%), occupational (5.9%), rural communities (11.8%), and wild camping and adventures (17.6%). Out of these 34 studies, 32 were cross-sectional, one was longitudinal, and one study was a secondary narrative analysis. Qualitative data collection methods were used in 27 of the studies, with only seven of the studies using quantitative or mixed methods [1,[57][58][59][60][61][62]. A range of objective quantitative data collection methods were used, such as cortisol measurements, cytokine serum levels, blood pressure, and heart rate variability.
The last group, Group 4 (24), includes virtual and indoor GS, e.g., photos, images, videos, and any type of GS enclosed under a roof. This group can be further subdivided into four groups: assessment by questionnaire only and no exposure to GS (50%), indoor GS exposure (8.3%), video of GS (4.2%), and images or photos of GS (37.5%). None of the studies in this group included a quantitative element; 22 studies relied on questionnaire data and two studies have used interviews. Out of the 24 studies, 23 were cross-sectional, with only one study being described as longitudinal [63]. Erikson, Westerberg and Jonsson [63] investigated a therapeutic gardening program taking place in a greenhouse; however, the longitudinal aspect of the study only stretched over three months.
The type of health outcome investigated varied greatly between the included studies. The total number of primary mental HWB outcomes observed and the number of times each outcome has been investigated were summarized (Table 6). Mental health (37), wellbeing (35), and stress (34) were the most used mental HWB outcomes. These were followed by restorativeness (22), depression (19), quality of life (13), psychological wellbeing (12), general health (11), and mental wellbeing (8). It is likely that some of these outcomes are intended to cover the same aspect of mental HWB. However, as a clear definition of the health outcome is rarely presented, it is not possible to confidently and accurately combine these outcomes and group them into fewer groups. Table 6. The studies were grouped according to the primary mental HWB outcome investigated in the study. Some studies investigate more than one primary outcome.
The number of tools used to measure mental HWB and the number of times each tool has been used are summarized in Table 7. Despite the availability of a vast range of validated tools developed to investigate mental HWB, the most common approach was to develop new questionnaires (DOQ; 15.8% of the studies). The most used validated questionnaire was PRS (7.9%), closely followed by PANAS (7.1%), PSS (6.6%), GHQ (6.2%), PS (5.8%), WEMWBS (5.4%), and HS SF-36 (4.1%), and the abbreviations are listed in Table 7. Table 7. An overview of the tools used to measure mental HWB and the number of times each tool has been used (where the available primary reference for each tool is added in brackets).

Analysis of Study Design
When testing a research hypothesis, an RCT is the most scientifically rigorous method available [359]. In an RCT, the participants are randomly assigned to one of at least two groups; a design that specifically reduces selection bias and is often considered the gold standard for research designs, when considering the efficacy of different treatments compared to a control.
There were 30 RCTs identified; 11.4% of the total number of papers selected for review. Nine (30%) of these studies included a crossover element: eight had a 2-arm design and one study had a 4-arm design. Out of the 21 RCT without a crossover element, ten had a 2-arm design, seven had a 3-arm design, and four had a 4-arm design. Eight studies used a non-randomized Controlled Trial (CT), with a 2-arm design. Two of these studies used a crossover element, and six studies had no crossover element.
It is not always convenient or possible to introduce randomization. In their study, Sung and colleagues [61] evaluated the health effects of a forest therapy program using what they call a 'convenient assignment' and not true randomization, which considers the subjects' preference and suitability to the intervention or the control group. Bang et al. [1] investigated the effects of a forest-walking program on physical and psychological health using a quasi-experimental design. The participants were assigned to the experimental or control group based on the participants' preference, to boost motivation. Dewi et al. [52] also used a quasi-experimental design, investigating already existing community garden activities. Beute and de Kort [201] investigated if lower mental health makes an individual more or less responsive to the positive health effects of GS. Accordingly, the participants were not randomized, but split into groups based on their obtained score from the BDI-II, which was an appropriate design to answer their particular research question. Non-randomized study designs like these [1,52,61] may say something about the effect of an intervention or activity on people with a predisposition for the environment chosen, which might not represent a result that is transferable to the general population.
There may be other practicalities preventing the use of randomization. Park et al. [54] used a quasi-experimental design with a non-equivalent control group; the groups being two senior community centres, with one participating in a gardening intervention, while the other one did not. Wood and colleagues [321] investigated the health and wellbeing benefits of allotment gardening, using a case-control study to compare allotment gardeners with non-gardeners. In many real-life situations, such methods [54,321] will be the only possible way to evaluate an intervention and randomization is not an option. However, if the process, context, and delivery of the intervention are considered, this type of evaluation may produce meaningful results.
Another aspect that can increase the rigidity of a study design is the incorporation of a crossover element [360]. In a study with a crossover design, all participants receive both the intervention treatment and the control treatment. The different treatments are given at different times and with a sufficient washout period in between to insure there is no carryover effect from one treatment into the next. The order of the treatments is randomized. When using a crossover design, the between-subject variability is significantly reduced as each participant serves as their own control. This results in a reduction of the variation in factors not related to the treatment, which in turn allows for the detection of smaller effect sizes using a reduced sample size [360]. However, crossover designs need careful design to minimize potential bias.
Barnicle and Midden [289] investigated the effects of a horticultural activity program on psychological wellbeing among older people in two care homes. As randomization would not be practical at the individual level, the randomization took place at the site level. It would, however, have strengthened the study design if a crossover had been introduced and participants from both care homes had been exposed to the intervention and the control treatment. The authors give no explanation as to why they chose not to include a crossover. However, this often comes down to time, funding, and the likelihood of being able to secure participation and retention for an extended period of data collection. A number of studies fall into this category; an RCT study design that would have benefitted significantly from a crossover element introduced to the design (see, for example, [202,306,319,320,324,335]).
Seven studies identified for this review used a 3-arm design; typically, two intervention treatments and one control treatment [304], or three different types of intervention treatment [282,336]. None of these 3-arm RCTs have incorporated a crossover element. This is not unexpected, as adding more arms to a study design will increase the complexity of the study and put strain on resources, such as time, money, and by no means least the participants.
Five studies used a 4-arm RCT. Sonntag-Ostrom and colleagues [310] investigated the restorative effect of visits to one urban area and three different forest environments, with each participant visiting all four outdoor environments. The authors highlight the difficulties in carrying out a study with such a complex design, e.g., a long data collection period and difficulties in recruiting participants. These difficulties resulted in a 3-year project and only 20 participants [310].
Based on the studies included in this review, the strongest design appears to be RCTs with a crossover element, a finding which is also supported in other literature [359,360]. In addition, the results from this review highlight that unless answers to very specific research questions are sought, increasing the complexity of the study design does not necessarily improve the quality of the data collected as constraints and limitations increase with increasing complexity.
There were eight studies using a 2-arm RCT with a crossover element ( Table 10). Three of the studies focused on urban GS, four on natural GS, and one study on virtual/indoor GS. Six of the studies were qualitative, and only two studies used qualitative as well as quantitative methods. Seven of the studies predominantly used questionnaires as the main tool to assess the changes in the investigated health outcome.
Berman et al. [209] used a 2-arm RCT with a crossover to show that participants exhibited a significantly increased memory span after a walk in the park compared to an urban walk. The PANAS (positive affect) revealed a significant effect of location (nature vs. urban) but not time (pre-walk vs. post-walk); for a negative effect, there was no significant effect of location and the negative effect did not decrease more for the park walk than for the urban walk. The authors were therefore not able to show conclusively that GS positively affects the mood of individuals with depression. Gatersleben and Andrews [50] found that exposure to GS with high levels of prospect (clear field of vision) and low levels of refuge (places to hide) generated a restorative effect. However, the authors also found that exposure to GS with low levels of prospect and high levels of refuge did not create a restorative effect. Such a scenario was proposed to increase stress levels and reduce attention. Im et al. [57] found that the levels of somatic and depressive symptoms, and of stress responses, were significantly reduced after exposure to a forest environment, when compared to exposure to an urban environment. The authors also found a significant reduction of immunological inflammation and an increase in the antioxidant effect after the forest exposure. However, due to the design of the study (no before-and-after measurements allowing for comparison), it is not clear if the positive changes are related to a reduction in air pollution (or other harmful urban exposures), rather than the presence of the forest environment. Lee et al. [59] found that the salivary cortisol concentration, diastolic blood pressure, and pulse rate were all significantly lower in participants after exposure to a forest environment. Self-reported subjective measures revealed that participants felt more comfortable, soothed, and refreshed when viewing a forest landscape, when compared to an urban environment. Morita and colleagues [326] investigated the psychological effects of exposure to a forest environment, when compared to exposure to a control environment. Co-exposures and contextual factors were considered, such as conditions during the forest visit and on the control day (weather, duration of visit, previous visits, accompanying people, activities undertaken, walking course and distance walked, degree of exercise, subjective feelings, objective activities undertaken). The authors found that exposure to a forest environment significantly decreased feelings of hostility and depression, and increased the feeling of liveliness, when compared to exposure to a control environment. It was also seen that the positive effect of exposure to a forest environment was greater the higher the stress level of the subject. Despite a high number of participants and a generally stringent study design, the study only used qualitative data and would have benefited from the inclusion of quantitative data. South et al. [361] found that when subjects were in view of a green vacant lot, their heart rate decreased significantly, when compared to being in view of a non-greened vacant lot or not in view of any vacant lot. The authors conclude that remediating neighborhood blight can reduce stress and improve health. Takayama et al. [233] investigated the emotional, restorative, and vitalizing effects related to forest and urban exposures and concluded that exposure to a forest environment improved mood and positive affect, and induced a feeling of subjective restoration and subjective vitality. Tenngart Ivarsson and Hagerhall [318] investigated the perceived restorativeness of gardens. Two gardens with differing levels of build and natural elements were photographed, and a set of 12 photos were selected to represent each garden. The PRS was used to examine the perceived restorativeness of the two gardens. The study also aimed to evaluate the ability of the PRS to distinguish between two different gardens with a mix of build and natural elements, rather than to distinguish a contrast between built and natural scene types. The authors found that both gardens were perceived as restorative, and the PRS can be used to discriminate between two gardens from the same scene type. Hence, one garden can be perceived as more restorative than another although they both have the same type of scene. This highlights the importance of considering the contribution of contextual factors and co-exposures to the overall health effect caused by a GS environment.
Out of the eight studies included here, with a 2-arm RCT crossover design, seven had a positive outcome. Only two of the studies included quantitative measures [57,59], with both studies having a low participant number. All studies heavily rely on qualitative subjective data (Table 10), on which it is difficult to draw comparative conclusive interpretations. However, it is evident that in the included RCTs, there is clear agreement of a positive association between GS exposure and mental HWB. Despite the lack of high-quality studies and methodological rigor between studies, the accumulated strength of these findings highlights the importance of the positive associations between GS and mental HWB.    The study found a significant reduction in self-reported rumination driven by a decreased cerebral blood flow in the sgPFC for the nature group, but not for the urban group. The small study found that both reading and gardening showed a significant reduction in cortisol levels after stress. Cortisol levels were lower after gardening compared to reading, but the difference was not significant. Positive mood was significantly higher after gardening compared to reading.
There were indications that gardening is more restorative after stress than reading. Ostracised individuals exposed to urban or nature pictures/ Non-ostracised individuals exposed to urban or nature pictures The qualitative study found that among participants with a high feeling of ostracism, those who viewed nature pictures reported a significantly lower level of aggression than those who viewed urban pictures. The authors concluded that nature exposure can counteract the relationship between ostracism and aggression. The small qualitative study found that a 10-15 min outdoor booster break during the work day results in a significantly greater reduction in stress than an indoor work break. The small qualitative study found a significantly larger reduction in stress after horticultural therapy compared to occupational art therapy. However, no significant differences were identified for anxiety or depression after the two treatments.    Qualitative Depleted individuals exposed to a natural or urban video/Non-depleted individuals exposed to a natural or urban video

3-arm Randomised Controlled Design, No Crossover
The study found no clear conclusions about the effect of viewing a natural video to counteract aggression after depletion. The study suggests that watching a natural video helps to restore self-control after depletion.
x = data missing. ± = standard deviation around the mean. * This paper consists of three small studies; only one of which is presented in this table (study 2). ** This paper consists of two studies; only study 2 is presented in this table. *** Only three participants are described in the results; one for each treatment.

Discussion
The effects of GS on mental HWB is relevant to city planning and public health policy, which is becoming increasingly important as the world's urban population grows. The published research generally shows positive associations between GS and mental HWB. However, this review has identified great diversity in study designs, GS definitions, outcome measures, inclusion of co-exposures and contextual factors, and reporting of results. This makes it difficult to aggregate the evidence to identify the underlying mechanisms for this positive association or to provide advice to help construct GS that is beneficial for mental HWB.
Based on the diversity of research available on the subject, it is not possible to unequivocally answer all of the four research questions we initially posited. However, based on the weight of evidence of the research reviewed, it is possible to conclude the following with reasonable certainty: However, based on the analysed literature it is clear that there is no universally agreed definition for GS or mental HWB and in many studies, a definition and/or detailed description of the two has been omitted. Only a few studies have attempted to quantify the GS investigated and/or the amount of GS needed for health improvement (RQ 1 & 2). RQ 1: How do different types of GS (recreational, residential, urban, rural) affect HWB and how much green space is needed for health improvement?
There are suggestions that different types of GS may affect mental HWB in different ways and that different age groups and population subgroups benefit differently from exposure to GS. There is also limited evidence that some threshold amount of GS is needed to generate positive health outcomes. However, there is insufficient coherence in the evidence to generalize the results. RQ 2: How can we best define, measure, and quantify GS?
Often, the description of the GS is limited to simple text descriptors, e.g., allotment garden, urban park, or private garden. There are some good examples of studies that have attempted to quantify the GS investigated and assess the GS quality. For example, Tilley et al. [362] included graphic Ordnance Survey maps clearly depicting the urban environments investigated, giving a clear overview of the settings and contexts. A written overview and typology was included, of quartiles of urban green and urban busy areas, derived from a Geographic Information System (GIS). The authors also used photographs giving visual evidence of the different environments, which would make it easy to replicate the study in other cities and countries. Our findings highlight the necessity to investigate further how best to define, measure, and quantify GS. With a systematic review, it would be possible to explore in more detail what types of measurements are used most efficiently to quantify GS, the accuracy of the different methods, and the reproducibility. RQ 2: How can we best define, measure, and quantify mental HWB? The World Health Organisation (1948) has defined health as "A state of complete physical, mental and social wellbeing and not merely the absence of disease and infirmity". However, wellbeing is difficult to define. Fleuret and Atkinson [363] reviewed the various ways in which wellbeing has been used in research and policy contexts. They note that the term 'wellbeing' mainly originates from Anglophone countries and in many languages, it is difficult to find and appropriate comparable terms. Often, a number of different terms are used interchangeably to describe wellbeing, such as quality of life, happiness, welfare, pleasure, wealth, and subjective and objective wellbeing [363]. These terms are rarely specified, and it is therefore impossible to know if they are synonymous. Additionally, different stakeholders in different countries adhere to the wellbeing concept in various ways and it is a matter of practice amongst stakeholders that determines how a term is defined. As far as possible, it would be an advantage to harmonize definitions of HWB and to at least explicitly describe the definition used in a research study. The definition proposed by The UK Faculty of Public Health is perhaps a good starting point: • 'Realise our abilities, live a life with purpose and meaning, and make a positive contribution to our communities; • Form positive relationships with others, and feel connected and supported; • Experience peace of mind, contentment, happiness and joy; • Cope with life's ups and downs and be confident and resilient; • Take responsibility for oneself and for others as appropriate.' (Faculty of Public Health, 2010: https://www.fph.org.uk/policy-campaigns/special-interestgroups/special-interest-groups-list/public-mental-health-special-interest-group/better-mentalhealth-for-all/concepts-of-mental-and-social-wellbeing/). This holistic definition of wellbeing incorporates a more social aspect, highlighting a change in focus from looking more at physical health to looking at the realization of the individuals' potential [364]. It is more inclusive and relevant to more diverse population subgroups, such as people with learning disabilities, who in many cases experience chronic conditions on a daily basis [365]. Furthermore, we propose that the quality of the environment, i.e., built or natural, is also taken into consideration when assessing wellbeing in such a holistic way, in line with the GS exposome. Very few studies included in this review have taken contextual factors and co-exposures into account; they were generally poorly described and so it is difficult to replicate studies. The importance of this is highlighted in a study by McMahan and Estes [46], who aimed to synthesize research on the effect of exposure to natural environments on positive and negative affect, using a meta-analysis technique. The authors only included studies with an RCT design including a comparison group and a self-report assessment of the current emotional state; 32 papers were identified. Study and design-related characteristics, such as the year of publication, location of study, mean age of sample, percent female, and instrument used to measure affective wellbeing, were examined to reveal if they had a moderating effect on the investigated outcome. The type of exposure was also addressed (i.e., real or laboratory simulations of nature), as was the type of natural environment (i.e., manicured or wild nature). The review concluded that exposure to natural environments was associated with a moderate increase in positive affect and a small decrease in negative affect. The authors found that study location, type of assessment used to measure emotion, and type of exposure moderated the effect of nature on positive affect. This indicates that co-exposures and contextual factors may play a role in mediating positive as well as negative health effects associated with GS exposure. The attempt in this review, to look at context and co-exposures, has highlighted a gap in the available literature; our knowledge on contextual factors and co-exposures in relation to the GS experience (GS exposome) is insufficient and research is needed to investigate the totality and combination of exposures related to GS that affects mental HWB. RQ 4: Do different age groups and population subgroups benefit differently from exposure to GS? Participant type varied greatly between studies and in many cases, the subjects were very specifically specified, e.g., park users, allotment gardeners, or active walkers. These groups may have an affinity for the GS being investigated. This makes it difficult to compare study results and hinders the interpretation of whether a finding can be generalized to other groups within the population. However, based on the weight of evidence, it can be concluded with reasonable certainty that different population subgroups will benefit differently to a variety of GS exposures.
Based on the analysis in this review, we suggest a number of key points that should be assessed and reported when investigating GS exposures:
Type of vegetation (creating shade or not/natural daylight); 3.
Whether the environment is natural or managed; 4.
Quantity of built elements; 5.
Traffic noise and air pollution levels; 6.
Number of people present in the environment; 8.
Setting and context.
The majority of studies rely on qualitative data collection methods and there is limited methodological consistency between studies. There is a need for more robust quantitative data collection methods, e.g., using vegetation cover maps from airborne hyperspectral and light detection and ranging (LiDAR) data to derive measures of GS [253], or measurement of stress hormones (cortisol) for the quantification of changes in stress levels after exposure to different urban and natural environments [58,59,61,62,334,335], or in relation to neighborhood GS and long-term exposure [208,329,332,366]. Ng, et al. [367] recently published the findings from an RCT (waitlist-control randomized controlled trial) investigating the effects of horticultural therapy on Asian older adults. Qualitative measures (MOCA, Zung Self-Rating Depression and Anxiety Scales) were used to investigate cognitive functioning, depression, anxiety, psychological wellbeing, and positive relations with others. Quantitative measures were used to measure nine plasma biomarkers ranging from interleukins and chemokines to hormones. Ng, et al. [367] found no significant changes in conventional psychological subjective measures of health and wellbeing after 6 months of horticultural therapy. However, there was a significant reduction in pro-inflammatory cytokines after the intervention; high levels of these cytokines are associated with depression [368]. This highlights the importance of including objective quantitative methods to underpin and clarify any subjective findings.

Recommendations
Overall, we suggest a number of key points that should be included when planning and reporting on findings from research investigating GS and mental HWB:

1.
Description of aim and research question(s); 2.
Description of the study design; 3.
Description of participant type (incl. sex, mean age, min/max age, population subgroup characteristics and other relevant socioeconomic characteristics); 4.
Description of recruitment process; 5.
Careful description and quantification of the GS investigated (study sites); 6.
Clear definition of the mental HWB endpoint(s); 7.
Justification of the choice of tools to assess the health endpoint; 8.
Measurement of contextual factors and co-exposures.
We advocate that, in future research, the entire GS exposome should be considered when investigating the impact on mental HWB. There is a need for large well-designed randomized controlled crossover trails that reliably measure a range of environmental and personal exposures associated with GS. Future studies should include standardized quantitative data collection methods to describe and define the GS investigated and to quantify the changes in mental HWB. By also including standardized qualitative data collection methods, a meaningful comparison and pooling of data across studies would be possible. This will allow a better understanding of the underlying factors responsible for positive associations between GS and mental HWB.
Author Contributions: C.W.-N. and J.W.C. conceived the study and carried out the planning and initial screening. C.W.-N. undertook the detailed review of the literature, created the initial synthesis of the evidence, and prepared the first draft of the manuscript. S.K. and M.K. advised on the literature search strategy and data management and assisted in preparation of the manuscript. All authors read and approved the final version of the text.
Funding: This research received no external funding.