A General-Purpose Spatial Survey Design for Collaborative Science and Monitoring of Global Environmental Change : The Global Grid

Recent guidance on environmental modeling and global land-cover validation stresses the need for a probability-based design. Additionally, spatial balance has also been recommended as it ensures more efficient sampling, which is particularly relevant for understanding land use change. In this paper I describe a global sample design and database called the Global Grid (GG) that has both of these statistical characteristics, as well as being flexible, multi-scale, and globally comprehensive. The GG is intended to facilitate collaborative science and monitoring of land changes among local, regional, and national groups of scientists and citizens, and it is provided in a variety of open source formats to promote collaborative and citizen science. Since the GG sample grid is provided at multiple scales and is globally comprehensive, it provides a universal, readily-available sample. It also supports uneven probability sample designs through filtering sample locations by user-defined strata. The GG is not appropriate for use at locations above ±85◦ because the shape and topological distortion of quadrants becomes extreme near the poles. Additionally, the file sizes of the GG datasets are very large at fine scale (resolution ~600 m × 600 m) and require a 64-bit integer representation.


Introduction
A number of comprehensive and long-term monitoring programs have been developed to provide a deeper understanding of the conditions and changes of human and natural systems.For example in the United States, the National Science Foundation created a network of Long-Term Ecological Research stations and, more recently, the National Ecological Observatory Network [1].Progress has also been made in specifying rigorous methods to design monitoring programs.A monitoring design specifies the resource to be monitored, what will be measured, how it will be measured (i.e., the response design), where it will be monitored (i.e., the survey design), how frequently it will be monitored, and how measurements will be summarized [2].In this paper I focus on survey design-often called sample design or spatial design [3]-and how best to generate a rigorous, useful, and flexible survey design that specifies where environmental data will be collected.Stehman [2] suggests that a good survey design should be probability-based; have a low and known estimated variance; be spatially-balanced, simple, and cost-effective; and have flexibility as a key characteristic because of real-world, practical challenges that environmental monitoring programs inevitably face [4].Four aspects of developing a sample design for monitoring landscape change are discussed here: probability-based design, spatial balance, cartographic projections, and sampling intensity (i.e., frequency).
The need for a probability-based design was underscored in recent guidance on fundamental design principles for global land-cover validation [5,6], because it ensures rigorous statistical inference.Probability-based monitoring design is a key component to the US National Park Service's Inventory and Monitoring Program [7,8]; the US Forest Service Forest Inventory Analysis [9], the US Natural Resource Conservation Service's National Resource Inventory [10], and the US Geological Survey's Land Cover Trends program [11].The US Environmental Protection Agency's Environmental Monitoring and Assessment Program was a seminal effort to develop a probability-based, comprehensive, multi-purpose sample design (EMAP [12,13]).
Building on experience gained from EMAP, Stevens and Olsen [14] developed a spatial-balanced sampling (SBS) approach called Generalized Random-Tessellation Stratified [15], and additional software programs have been subsequently developed [16,17].SBS is a combination of simple random and systematic sampling, where samples are random, but also guaranteed to be distributed across space [3].This leads to more efficient sampling, defined as providing more information per sample unit, because it ensures that every sample is distributed across the population (or domain) being sampled.This is a desirable characteristic of surveys and particularly relevant for understanding land use change [17,18].
The basis for the EMAP design was a uniform tessellation of hexagons resulting in points that are roughly 27.1 km apart on the Earth's surface [19].A uniform tessellation based on specialized, global projection systems was needed to minimize differences in the sampling unit areas to enable equal probability of resources being sampled [13,[20][21][22][23].The Global Land Cover 2000 reference data also used a probability-based and equal-area projection [24].However, I depart from this recent work that has assumed the need for an equal-area projected coordinate system by generating a global survey design that employs geographic coordinates (latitude/longitude).Although it has been well established that moving pole-ward away from the equator leads to increasing distortion in area and shape [23], spherical geometry algorithms can be used that account for non-planar situations.Moreover, ensuring that units account for changing area can be accomplished by directly incorporating the area of the geographic units when calculating the probability of a location being selected (described in more detail below).Using grid cells specified by equal-angle latitude and longitude cells (e.g., 1   ) has numerous practical advantages.Geographic coordinates are ideally suited for global representation because they are easy to understand, simple to map, avoid conceptual and computational difficulties with map projections, and software is readily available to manipulate these datasets.
A central purpose of environmental monitoring is to sample features from continuous populations distributed over space [25] so that features are counted or observed within a specified area.This emphasis on areal sampling is consistent with Holmes' [26] distinction between location-based sampling using points vs. area-based sampling using areas that completely tessellate a domain, at some level of spatial precision.As a consequence of recognizing the importance of sampling an area rather than a dimensionless location, the key aspect of a global sample design is the ability to account for varying area size in the sampling units.Ecologists have long used areal plots: a 1 m 2 , modified-Whitaker plot [27]; the FIA plot [9]; the NRI plot primary sampling area and points [10], etc.Even real-world features that are conceptualized as a 1D feature, such as a stream network, are sampled on the ground as areal features [28].
A final aspect of sample design to discuss is related to sampling intensity.Since environmental systems are dynamic, long-term monitoring designs must be robust to potential changes in the sampling frame.For example, many natural resources of interest are changing as a consequence of land use change and to climate change impacts, such as sea level rise, shifting ecotones, and changing distributions of resources of interest.One approach to address this challenge has been through "over-sampling" SBS designs (also called a master sample [26,29]).This works by drawing additional samples (e.g., 10% or 20% additional points) to be used in case a site (or location) is rejected from the original sample because of physical inaccessibility or denied access by the land owner [30].A powerful property of SBS designs is that these extra points in the master sample remain spatially balanced.This method is robust to situations where the resource of interest is not found at a given location (error of commission) or was found originally but, over time, no longer occurs there (sampling frame contracts).However, over-sampling is not robust to the situation where the initial sampling frame was imperfect so that it omitted legitimate resources (error of omission), nor if the sampling frame expands into other areas over time, nor if a stratum was incorrect.Over-sampling remains reliant on the proper specification of a static sampling frame.
Relatedly, integrated environmental monitoring design requires stronger coordination both within and between natural systems and human institutions.For example, there has been long standing recognition for the need to integrate across natural systems or resource types (e.g., terrestrial, aquatic, and atmospheric [31]).Since different resources often need to be characterized at different scales (or levels of precision), the sampling design must be hierarchical to allow nested designs at different scales [32].As a result, a multi-resolution, hierarchal sample design has important benefits.This issue suggests that a more general sample design is needed, rather than one that is specifically tied to an individual resource type (e.g., forests rather than soil erosion).Perhaps an even more practical and immediate challenge is to integrate across institutional and administrative boundaries.That is, typically, a survey design is generated for a specific geographic area that corresponds usually to an institutional boundary (e.g., a country monitoring design occurs only within a given country, or a monitoring program that is conducted on certain land owner/manager types).However, if an adjacent institution or agency wishes to conduct monitoring that will be complementary, typically they must start from scratch.Occasionally the desire to complement an existing design with increased density of samples occurs, such as a watershed group within a state/province that wants to add extra locations.
Finally, there is an opportunity to gain from an emerging trend of decentralized data collection [33] through volunteered geographic information or "citizens as sensors" [34], or crowdsourcing [35].It is increasingly easy to locate (through GPS), collect, integrate, and visualize environmental data (e.g., through Google Earth, Collect Earth, or GeoWiki; see [36]).Although recent work has addressed challenges to visualize these global datasets [37], it remains difficult to organize these often ad hoc data collection efforts.A number of efforts have used a simple systematic sampling generated by the intersections of latitudinal and longitudinal degrees, such as the Degree Confluence project [38], which attempts to provide a picture and field-based description of the latitude/longitudinal confluence at 1 • intervals and will ultimately result in over 50,000 locations being catalogued (for lower than 70 • latitude).This work is interesting but does not fulfill the probability-based requirement, nor can it be extended or the intensity increased easily, without doubling up by going to 0.5 • confluences, which requires four times the intensity of data collection.
Given these challenges and opportunities, my goal in this paper is to describe a global sample design that supports regional to global-scale monitoring of environmental resources that is flexible, comprehensive, and general purpose.To accomplish this goal, I pursue three objectives: (a) review the survey design that is generated by a GIS-based tool called Reversed Random Quadrant-Recursive Raster (RRQRR; [4]); (b) describe a global survey design (Global Grid, GG) and dataset that employs a sample generated by RRQRR; and (c) provide a canned sampling design dataset called the GG in open source format at a variety of scales that can be readily incorporated into various software platforms.

Materials and Methods
To generate a general purpose spatial survey design for monitoring global environmental change, I first describe how a spatially-balanced sample design is generated using the Reversed Random Quadrant-Recursive Raster (RRQRR) algorithm and then discuss the generation of a specific application of RRQRR called the Global Grid (GG).

RRQRR Sample Design
The approach and algorithm to develop a SBS using the RRQRR approach is detailed elsewhere [4], so here I provide a brief review of key aspects of the method, and highlight the unique properties of survey designs generated by RRQRR in light of our goal to generate a global survey design.In brief, RRQRR sample designs are probability-based, spatially-balanced samples that have great flexibility.
A sample design is generated by recursively subdividing an area into four units or cells (2 × 2) encoded with values 0, 1, 2, and 3. Rather than using a consistent ordering system, such as the Morton (or "z") order, the ordering is randomized (actually permuted into one of 24 possible configurations).Each resulting cell has an independent probability (drawn from a uniform random distribution) of being selected, which is modified by the area of the resource in a cell being selected.RRQRR has been used to develop sample designs for a variety of purposes (e.g., [39][40][41][42][43]) including sampling within NEON domains.Software is readily available (see Supplementary Material) and a version of RRQRR is implemented in ArcGIS software [44]; see the example used by [41].
A first unique property of RRQRR is that rather than generating a sample of the complete list or sampling frame, it generates a full or "quasi-complete" sample of all possible locations for the entire geographic domain that contains the sampling frame.I call this a "quasi-complete" sample because although, theoretically, there are an infinite number of locations (if one assumes 0-dimensional sample locations or points), in practice there are a finite number of areas (cells) in a given tessellation.RRQRR uses a raster representation so the spatial precision is defined by the cell area, and the raster bounds is defined to be the minimum enclosing rectangle around the sampling frame of interest, typically buffered around the boundary by 10% or so.This quasi-complete sample provides nearly complete assurance that a sample will be robust to changes in a population frame.Note that "over-sampling" approaches (e.g., [30] provide some robustness only in the situation that additional samples are needed, but not if the dataset representing the frame is incorrect.For example, if a population frame representing streams is imperfect because a stream segment was subsequently found in the field, but not originally mapped, then, by definition, it cannot be over-sampled because there was no frame to sample from in the first place.However, because RRQRR provides a quasi-complete sample of the world, then all geographic features (to a given level of resolution) can be included in a probability-based sample design.
A second unique property of RRQRR design is that, in addition to the sequence raster dataset S, an additional raster layer (R) is provided at the same resolution that contains random values drawn from a uniform distribution (0, 1).R is used to "filter" locations based on a user-provided raster (A) that specifies the probability that a given location (raster cell) will be selected, relative to other locations to account for changes in the area of a resource being sampled at a given location.This maintains the requirement of probability-based sampling that every location has a known and non-zero probability of being sampled, and the probability of a cell being included in a sample is adjusted as a function of its area.If the size of the sample unit is relatively small (<1 km 2 ) and the requirement for precision of estimates are relatively modest (i.e., within 0.01 km 2 ), then area could be approximated by using planar-based area calculations, but adjusted for the decline in areas with increasing latitude as a function of cos(Lat).However, for higher precision, spherical geometry formulas should be used [45]).The practical advantages and simplicity of this system outweigh any minor limitations with shape, distance from centroid, and topology.Due to the extreme topological distortion as latitude approaches the poles, this approach is not appropriate for locations above ±85 • .
For a global sample, the values in A would equal the ratio of the area of a cell divided by the maximum area of any cell (i.e., at the equator).Cells where R < A would be included in the sample.This accounts for the decreasing area of cells that when moving toward increasing latitudes away from the equator.For a regional or local sample, one could choose to use the maximum area of a cell within the local domain.To recognize situations where the resource of interest is variable in area within a cell (e.g., different lengths of streams), A could reflect not simply the area of the cell, but the area of the resource.Similarly, R can be compared to user-specified raster layer (I) containing inclusion probabilities to account for unequal probabilities to reflect strata, for example.That is, cells where R < I would be retained in the sample.The resulting sequence values in S may no longer be a sequential list of integers (e.g., 1, 2, 3, 4, 5 . . .), and may have breaks in them (e.g., 1, 2, 4, 6, 7 . . .), but the ordering of these sequence values remains important.This provides great flexibility to modifying and adjusting strata through the adjustment of the inclusion probabilities.Note that if unequal inclusion probabilities are used, then population estimates need to incorporate a weight w (typically w = 1/I).

The Global Grid
To facilitate collaboration amongst researchers across different geographies, I provide a canned map or database of sample locations produced using RRQRR called the Global Grid (GG).The GG was generated by representing the globe using two cells that represent an expanse of 180 • of longitude and latitude, placing the world into eastern (GG e ) and western (GG w ) hemispheres.Then, for each hemisphere, the first level (L = 1, GG w1 , GG e1 ) is ordered 0, 1, 2, 3 (northwest, southwest, northeast, southeast).After the first level, each cell or quadrant is then recursively subdivided into fours and its sequence order is randomized.For instance, GG w2 and GG w2 , cells are 45 • on a side and the globe is represented by 32 cells (Figure 1).The GG can be further subdivided, generating nested hierarchical samples at each level (Table 1).At L = 14 sample cells are roughly 1.2 km on a side.
unequal inclusion probabilities are used, then population estimates need to incorporate a weight w (typically w = 1/I).

The Global Grid
To facilitate collaboration amongst researchers across different geographies, I provide a canned map or database of sample locations produced using RRQRR called the Global Grid (GG).The GG was generated by representing the globe using two cells that represent an expanse of 180° of longitude and latitude, placing the world into eastern (GGe) and western (GGw) hemispheres.Then, for each hemisphere, the first level (L = 1, GGw1, GGe1) is ordered 0, 1, 2, 3 (northwest, southwest, northeast, southeast).After the first level, each cell or quadrant is then recursively subdivided into fours and its sequence order is randomized.For instance, GGw2 and GGw2, cells are 45° on a side and the globe is represented by 32 cells (Figure 1).The GG can be further subdivided, generating nested hierarchical samples at each level (Table 1).At L = 14 sample cells are roughly 1.2 km on a side.GG also provides the ability to modify the resulting sequence of sample locations by allowing users to adjust the relative inclusion probability based on the amount of a resource contained in a raster cell, rather than simply assuming that all (equal-area) cells contain the same amount of resource (e.g., [12,13]).Since cells (as quadrants) at different latitudes represent different surface areas, the relative inclusion probability needs to be adjusted by a weight w that ranges from 1.0 at the equator and approaches 0.0 at the poles.For example, for GG 9 , the area of cells at the equator is 1531.553km 2 , which is used to normalize the w 9 values from 1.0 at the equator to 0.003 near the poles.An extension of this concept can also incorporate the variability in the extent of a resource within a cell, e.g., the length of a stream resource represented by a cell depends on the location of the stream in the cell (going through the center or nipping a corner), as well as its sinuosity.
To account for the unequal surface area represented by each grid cell, a weight w L is computed for each cell that is a ratio of the area of a cell a L to the largest cell area in the sample, a × L , where w L = a L /a × L .If the weight is larger than a random value r (drawn from uniform distribution 0, 1), then the cell is removed from the resulting list (where cos(0.017453292520× Lat) > r).For example, Figure 2 shows the first 1000 points of GG 7 .For samples with a global extent, this results in a removal of 34% of the points.A second issue is scale, that different types of resources of interest are appropriately sampled at different scales.Hierarchical nesting provides a multi-scalar, nested design such that the GG sequence value can be obtained at any level, and will be consistent between scales if a coarser/finer resolution is needed.If unequal inclusion probability is desired to have a higher density of sampling in some areas, then w L can be multiplied by the relative inclusion probability x L (0.1 to 1.0), where w' L = x L w L .
If the geographic domain over which an estimate is needed is not global, then the cells in the desired area can be extracted from GG L to make the resulting datasets smaller and easier to manipulate.This would be done typically to develop a specific design for a given resource type, for example, to target river networks [46].Note that the sequence values in the resulting dataset will have large gaps in the sequence value (i.e., the will no longer be incremental), but the order of the sequence values remains important.If the geographic domain over which an estimate is needed is not global, then the cells in the desired area can be extracted from GGL to make the resulting datasets smaller and easier to manipulate.This would be done typically to develop a specific design for a given resource type, for example, to target river networks [46].Note that the sequence values in the resulting dataset will have large gaps in the sequence value (i.e., the will no longer be incremental), but the order of the sequence values remains important.

Results
The Global Grid (GG) dataset is available in a variety of spatial representations to make it readily useful and applied in various settings, including as raster, point, and area (i.e., quadrangles) spatial datasets.All data are generated in geographic coordinates using the WGS84 projection/datum.For each sample location, two attributes are provided: s, the sequence number that is specific to each level and hemisphere) and a random value r drawn from a uniform random distribution, which is used to compare against any inclusion probability weight during the filtering stage.In addition, the GG datasets are available as a .csvfile (a well-known text format for ingesting into QGIS [47]), shapefile (for ingesting into Esri's ArcGIS and other geographic information system software), and .kmlfor use in Google Earth (only the first 100,000 samples are provided because of display limitations).Note that these datasets are sorted ascending by sequence value, which makes it easy to use and select a subset of a specific number of samples (based on the ordered attribute FID).Raster data are also provided in a GeoTIFF format.

Discussion
The sample designs and datasets described here have numerous possible applications for environmental monitoring.For example, the RRQRR sampling design has been used to develop 6000 sample locations used to validate estimates of the degree of human modification [48].Currently, the Global Grid is being used to develop a training and/or validation dataset of global human modification and land use/cover.The design takes advantage of using inclusion probabilities to sample at different intensities-in this case to have adequate sampling of more urbanized areas, which occupy ~5% of the terrestrial surface.To do this, I used "stable nightlights" for 2013 [49], calculated the mean brightness value within a radius of 10 kilometers, and then transformed the data using a natural log transform and rounded up to generate five classes (0-4) that correspond from rural to urban areas.I then generated an initial list of ~10,000 sample locations for terrestrial areas at level 14 (~1 km 2 ) (Figure 3).The more heavily-developed portions of the world are readily visible as

Results
The Global Grid (GG) dataset is available in a variety of spatial representations to make it readily useful and applied in various settings, including as raster, point, and area (i.e., quadrangles) spatial datasets.All data are generated in geographic coordinates using the WGS84 projection/datum.For each sample location, two attributes are provided: s, the sequence number that is specific to each level and hemisphere) and a random value r drawn from a uniform random distribution, which is used to compare against any inclusion probability weight during the filtering stage.In addition, the GG datasets are available as a .csvfile (a well-known text format for ingesting into QGIS [47]), shapefile (for ingesting into Esri's ArcGIS and other geographic information system software), and .kmlfor use in Google Earth (only the first 100,000 samples are provided because of display limitations).Note that these datasets are sorted ascending by sequence value, which makes it easy to use and select a subset of a specific number of samples (based on the ordered attribute FID).Raster data are also provided in a GeoTIFF format.

Discussion
The sample designs and datasets described here have numerous possible applications for environmental monitoring.For example, the RRQRR sampling design has been used to develop 6000 sample locations used to validate estimates of the degree of human modification [48].Currently, the Global Grid is being used to develop a training and/or validation dataset of global human modification and land use/cover.The design takes advantage of using inclusion probabilities to sample at different intensities-in this case to have adequate sampling of more urbanized areas, which occupy ~5% of the terrestrial surface.To do this, I used "stable nightlights" for 2013 [49], calculated the mean brightness value within a radius of 10 kilometers, and then transformed the data using a natural log transform and rounded up to generate five classes (0-4) that correspond from rural to urban areas.I then generated an initial list of ~10,000 sample locations for terrestrial areas at level 14 (~1 km 2 ) (Figure 3).The more heavily-developed portions of the world are readily visible as the sampling intensity was stratified to place random locations in more urbanized locations.The Global Land Use Emergent Database (GLUED; [50]) protocol is followed as the response design where 10 simple-random locations are placed within the ~1 km 2 sample "chip", and interpreters are encouraged at add up to 10 additional locations selected to represent rare features within the chip.This allows population estimates to be generated from the random datasets, while training data can also include the convenience sample points as well.
the sampling intensity was stratified to place random locations in more urbanized locations.The Global Land Use Emergent Database (GLUED; [50]) protocol is followed as the response design where 10 simple-random locations are placed within the ~1 km 2 sample "chip", and interpreters are encouraged at add up to 10 additional locations selected to represent rare features within the chip.This allows population estimates to be generated from the random datasets, while training data can also include the convenience sample points as well.The Global Grid is designed to facilitate collaboration across various regions, by providing a single, stand-alone dataset that covers the entire globe at multiple resolutions.By providing both the sequence raster (S) and a random value raster (R), this database provides a platform for collaboration.For example, scientists who have expertise in a given region or domain could use the GG samples to interpret high-resolution aerial photography (or perhaps on the ground) to quantify land cover and land use types, degree of human modification, impervious surface, etc.Scientists in an adjacent region could then collect data using the same protocol and leverage the adjacent collected data because it comes from the same sample design so that spatial balance is maintained, thus ensuring the statistical rigor of a design-based sample.This same comprehensiveness also allows for easy expansion of a study area into adjacent areas that previously were not considered, perhaps to adjust to changing conditions of where a population is likely located, or simply as more study resources become available.
There are a few notable limitations of the GG.The surface area represented by each quadrant decreases as one moves pole-ward-which can be easily accounted for-but the shape and topological distortion becomes extreme near the poles.Hence, the GG is not appropriate for use at locations above ±85 degrees.Since the quadrants (cells) are predefined and the area of each quadrant at each subsequent level changes by a factor of four, it may not be appropriate for an application that requires a specific area tailored for a given purpose.Finally, the file sizes of the GG datasets are very large at fine resolution (levels > L10), particularly for >L15 which require a 64-bit integer representation.
General guidance for applying the Global Grid to a local or regional extent is as follows: 1. Level: determine the appropriate level (L) for the GGL sampling "chips" (see Table 1).Typically, GG14 is used for land use/cover validation (~1 km 2 ).Note that GG sequence values are nested among scales, but each sequence value is particular to a given level.2. Area: account for area differences in the amount of resource within each sample (quadrangle).The Global Grid is designed to facilitate collaboration across various regions, by providing a single, stand-alone dataset that covers the entire globe at multiple resolutions.By providing both the sequence raster (S) and a random value raster (R), this database provides a platform for collaboration.For example, scientists who have expertise in a given region or domain could use the GG samples to interpret high-resolution aerial photography (or perhaps on the ground) to quantify land cover and land use types, degree of human modification, impervious surface, etc.Scientists in an adjacent region could then collect data using the same protocol and leverage the adjacent collected data because it comes from the same sample design so that spatial balance is maintained, thus ensuring the statistical rigor of a design-based sample.This same comprehensiveness also allows for easy expansion of a study area into adjacent areas that previously were not considered, perhaps to adjust to changing conditions of where a population is likely located, or simply as more study resources become available.
There are a few notable limitations of the GG.The surface area represented by each quadrant decreases as one moves pole-ward-which can be easily accounted for-but the shape and topological distortion becomes extreme near the poles.Hence, the GG is not appropriate for use at locations above ±85 degrees.Since the quadrants (cells) are predefined and the area of each quadrant at each subsequent level changes by a factor of four, it may not be appropriate for an application that requires a specific area tailored for a given purpose.Finally, the file sizes of the GG datasets are very large at fine resolution (levels > L 10 ), particularly for >L 15 which require a 64-bit integer representation.
General guidance for applying the Global Grid to a local or regional extent is as follows: 1. Level: determine the appropriate level (L) for the GG L sampling "chips" (see Table 1).Typically, GG 14 is used for land use/cover validation (~1 km 2 ).Note that GG sequence values are nested among scales, but each sequence value is particular to a given level.2. Area: account for area differences in the amount of resource within each sample (quadrangle).

2.1.
Global: if a global sample is desired, then samples need to be removed relative to their area, which changes with the cosine of latitude.Query the GG L file to find samples where cos(Lat) > R, where R is a random value drawn from a uniform probability distribution.2.2.
Regional: although less flexible and extendable, for some purposes a regional sample may be desired.In this case, the area of each sample polygon (quadrangle) can be calculated, and an area-based inclusion probability A can be calculated where A = A i /A x , where A i is the area of sample i and A x is the largest area within the regional sample (typically at the latitude closest to the equator).Query the GG L file to find samples where R < A.

2.3.
Variable area: frequently, the areal extent of a resource of interest to be sampled (e.g., a river or animal habitat) may vary within a given sample unit.To account for this, an exogenous raster layer can be provided by the user that calculates a i , the proportion of the resource found within cell i.Query the GG L file to find samples where (cos(Lat) × a i ) > R.
3. Filtering: often additional filtering of samples is desired to adjust the sampling intensity to account for relatively rare features (e.g., using various strata) or to account for practical challenges in collecting response data at a given location (e.g., declining with further access from a road).
If non-uniform inclusion probabilities are desired, then a separate spatial raster layer (I) of the same resolution as GG L , can be generated with values ranging from 0.0 to 1.0.Query the GG L file to find samples where (cos(Lat) × a i ) > R and I > R. 4. Sequence: as standard practice, sequence values are sorted in ascending order (on the RRQRR L field).Capturing information typically proceeds in the order of the sequence values.

Conclusions
The Global Grid (GG) is database that provides a multi-scale, comprehensive spatial sampling design suitable for global environmental monitoring, which is generated using the Reversed Random Quadrant-Recursive Raster algorithm [4].GG is a probability-based and spatially-balanced design, and because it is simple, flexible, and provides a quasi-complete sampling of the entire globe, it supports collaboration among disparate projects and scientists to align individual efforts so that their observations can be stitched together into a coherent whole.GG provides an unprecedented platform on which to conduct global monitoring while simultaneously facilitating coordination among regional and local scale efforts.GG is open source and freely available [51].Possible future extensions of this work include providing more detailed resolution (levels 15 and beyond), making it available online through platforms, such as Google Earth, and interfacing with the Group on Earth Observations Biodiversity Observation Network Working Group 7: In Situ and Remote Sensing Integration.

Supplementary Materials:
The following are available online at www.mdpi.com/2072-4292/8/10/813/s1.Datasets in KML format of Global Grid level 10 sample locations for terrestrial locations, with sampling intensity inversely proportional to the rural-to-urban gradient class 3.

Figure 1 .
Figure1.The global sampling design strategy is hierarchical, multi-scalar, and comprehensive, which provides a tessellation of cells and a sampling order that maintains spatial balance.The first level (L) of the global grid (GG1) both the eastern and western hemisphere has cells of 90° on a side in the sequence 0, 1, 2, 3 (northwest, southwest, northeast, southeast).GG2 (top) has 16 cells of 45° on a side, while GG3 (bottom) has 32 cells of 22.5°.Note that the GGL+1 raster is nested within the GGL so that the sequence numbers follow.

Figure 1 .
Figure1.The global sampling design strategy is hierarchical, multi-scalar, and comprehensive, which provides a tessellation of cells and a sampling order that maintains spatial balance.The first level (L) of the global grid (GG 1 ) both the eastern and western hemisphere has cells of 90 • on a side in the sequence 0, 1, 2, 3 (northwest, southwest, northeast, southeast).GG 2 (top) has 16 cells of 45 • on a side, while GG 3 (bottom) has 32 cells of 22.5 • .Note that the GG L+1 raster is nested within the GG L so that the sequence numbers follow.

Figure 2 .
Figure 2. The first 1000 points of the Global Grid at level 7 (GG7), adjusted for global use.

Figure 2 .
Figure 2. The first 1000 points of the Global Grid at level 7 (GG 7 ), adjusted for global use.

Figure 3 .
Figure 3.A global sample design of the Global Grid for terrestrial land use/cover, stratified on an urban to rural gradient generated from "nightlights" imagery from 2013.

Figure 3 .
Figure 3.A global sample design of the Global Grid for terrestrial land use/cover, stratified on an urban to rural gradient generated from "nightlights" imagery from 2013.

Table 1 .
The length of a side of a cell (in degrees and kilometers) and number of cells for each level of the Global Grid.