A High-Accuracy GNSS Dataset of Ground Truth Points Collected within Îles-de-Boucherville National Park, Quebec, Canada

: A new ground truth dataset generated with high-accuracy Global Navigation Satellite Systems (GNSS) positional data of the invasive reed Phragmites australis subsp. australis within Îles-de-Boucherville National Park (Quebec, Canada) is described. The park is one of ﬁve study sites for the Canadian Airborne Biodiversity Observatory (CABO) and has stands of invasive P. australis spread throughout the park. Previously, within the context of CABO, no ground truth data had been collected within the park consolidating the locations of P. australis . This dataset was collected to serve as training and validation data for CABO airborne hyperspectral imagery acquired in 2019 to assist with the detection and mapping of P. australis . The locations of the ground truth points were found to be accurate within one pixel of the hyperspectral imagery. Overall, 320 ground truth points were collected, representing 158 locations where P. australis was present and 162 locations where it was absent. Auxiliary data includes ﬁeld photographs and digitized ﬁeld notes that provide context for each point.


Summary
Phragmites australis subs. Australis (hereafter Phragmites) is considered one of the most aggressive invasive plant species in eastern North America. It can be found in all 49 of the mainland states in the United States, as well as throughout the southern portions of six Canadian provinces, in a variety of dry and wet soil habitats (both freshwater and brackish conditions) [1,2]. Phragmites is highly aggressive and is considered an indicator of ecosystem disturbance due to its tendency to grow in dense monotypic stands that prevent the growth of other surrounding vegetation, thereby greatly reducing plant biodiversity [3]. These dense monotypic stands can grow up to 3 m tall and have extensive networks of stolons and rhizomes that make removal of the plant difficult, especially once it has spread over a large extent. Phragmites is able to survive under a wide range of conditions, although it has demonstrated an affinity for wetlands, especially those that have been enriched with nitrogen from agricultural or other residential sources [4]. Management and eradication of Phragmites populations is a difficult and ongoing process. Most removal methods require extensive physical labor and field work, over vast areas. Physical methods of removal must be carried out with great caution to avoid accidental transfer of Phragmites or the stimulation of further growth [5,6]. Therefore, it is critical that land managers are able to locate stands of Phragmites while they are still relatively young and small in extent so Data 2021, 6, 32 2 of 12 that they can more easily be eradicated or prevented from spreading using sensible and appropriate methods.
As part of the Canadian Airborne Biodiversity Observatory, airborne hyperspectral imagery (HSI) was acquired with the Compact Airborne Spectrographic Imager-1500 (CASI-1500) over the entire extent of Îles-de-Boucherville National Park in 2019. One of the purposes of the imagery was to identify and map invasive Phragmites across the entire park. For this approach, a ground truth dataset is needed to both train target detection algorithms and assess their resulting accuracies. The airborne HSI is geo-corrected with a resampled pixel size of 2 m × 2 m. The reported positional accuracy of the HSI is an average error in easting and northing of 0.91 m and 0.67 m, respectively (Table 1). However, as described by [7], only 55.5% of the signal from a given pixel originates from materials within its spatial boundaries, while the remaining signal is from the materials within neighboring pixels. For the CASI-1500 HSI, the net point spread function (PSF) extends approximately 3 m in the along-track direction and 2 m in the cross-track direction. Based on the characteristics of the imagery, for this application, ground truth points with less than 2.25 m error in both northing and easting were required. The requirement for precision of the ground truth points had been set to less than one half standard deviation of the net PSF in the cross track (i.e., narrower) direction (16.5 cm). Therefore, ground truth data needed to be collected using a high-precision GNSS (Global Navigation Satellite Systems) instrument, in order to ensure that the points are geolocated correctly within the HSI.  Figure 1. A subset of the 320 ground truth data points listed in the .csv file. The fields contain the FID, longitude, latitude, elevation, the date and time of the beginning/end of collection, the rover's solution status, and antenna height above the ground. Additionally, there are fields for the calculated precision (i.e., standard deviation, SD) in the x, y, and z dimensions, as well as the radial standard deviation (SD_r). The approximate horizontal and vertical accuracy at the 95% confidence level is also reported. Antenna height, SD, and approximate accuracy at the 95% confidence level are reported in meters.
Data 2021, 6, x 3 of 12 Figure 1. A subset of the 320 ground truth data points listed in the .csv file. The fields contain the FID, longitude, latitude, elevation, the date and time of the beginning/end of collection, the rover's solution status, and antenna height above the ground. Additionally, there are fields for the calculated precision (i.e., standard deviation, SD) in the x, y, and z dimensions, as well as the radial standard deviation (SD_r). The approximate horizontal and vertical accuracy at the 95% confidence level is also reported. Antenna height, SD, and approximate accuracy at the 95% confidence level are reported in meters.

Figure 2.
A subset of the digitized field notes for the ground truth dataset. For each point collected, there is auxiliary information detailing the survey date, FID, presence/absence of Phragmites, the site characteristics, the corresponding field photograph ID, and any other notes. The full set of digitized field notes is available for download.

Figure 2.
A subset of the digitized field notes for the ground truth dataset. For each point collected, there is auxiliary information detailing the survey date, FID, presence/absence of Phragmites, the site characteristics, the corresponding field photograph ID, and any other notes. The full set of digitized field notes is available for download.
Data 2021, 6, x 4 of 12 Figure 3. A subset of the field photographs taken for each measured location. Each photograph has an identifier that corresponds to a specific point's FID, as well as in the accompanying metadata containing location and copyright information. The full set of photos are available for download. The measured longitude of the point latitude The measured latitude of the point elevation The measured elevation of the point (ellipsoidal height) collection_start The date and time that collection of the point began collection_end The date and time that collection of the point ended Figure 3. A subset of the field photographs taken for each measured location. Each photograph has an identifier that corresponds to a specific point's FID, as well as in the accompanying metadata containing location and copyright information.
The full set of photos are available for download.

FID
The point's numerical identifier (ObjectID of the ESRI shapefile) longitude The measured longitude of the point latitude The measured latitude of the point elevation The measured elevation of the point (ellipsoidal height) collection_start The date and time that collection of the point began collection_end The date and time that collection of the point ended solution_status The solution status at the time the point was collected antenna_height The height of the antenna above the ground (m) sample_count The number of measurements that were used in the calculation of the position SD_r * The radial standard deviation (m) SD_x * The standard deviation in the x dimension (m) SD_y * The standard deviation in the y dimension (m) SD_z * The standard deviation in the z dimension (m) ACC_r The approximate horizontal accuracy at the 95% confidence level (m) ACC_z The approximate vertical accuracy at the 95% confidence level (m) * See Section 3.3 for a discussion on the SD fields as reported by the Reach RS+ unit. Table 3. A list and descriptions of the fields in the auxiliary field notes file.

GPS Survey Date
The date that the points were collected FID The point's numerical identifier (ObjectID of the ESRI shapefile) Phrag? (Y/N) "Y" for "yes" or "N" for "no", representing whether Phragmites was present at the specified point

Site Characteristics
A brief description of the immediate area surrounding the measured point Photo ID The identifier for the corresponding field photograph Other notes Any other notes/comments regarding the measured point, surrounding area, or other characteristics

Study Area
Îles-de-Boucherville National Park is located in the Saint Lawrence River between the Island of Montreal and the municipality of Boucherville in Quebec, Canada ( Figure 4). The park covers an area of approximately 8 km 2 and consists of five main islands, the largest of which is Île Grosbois (244 ha) [9]. At the end of the 17th century, the first settlers arrived at the islands and began a history of intensive agricultural activity [10], which is still continued today on two of the islands (Île de la Commune and Île Grosbois). The park is characterized by a continental climate, and is relatively flat with an average elevation of 12 m above sea level. Îles-de-Boucherville National Park contains more than 450 plant species in terrestrial, aquatic, and semi-aquatic ecosystems [11]. The majority of the park is covered by young herbaceous and shrub vegetation following the abandonment of several agricultural fields when the area was designated as a protected area by the Quebec Ministry of Forests, Wildlife and Parks in 1984 [9]. Examples include Populus deltoids (eastern cottonwood), Solidago gigantea (giant goldenrod), Solidago altissima (late goldenrod), and Phalaris arundinacea (reed canary grass). Some scattered forested areas are present, with the only mature forest located on Île Grosbois, covering an area of 18 ha [9] and consisting of Fraxinus pennsylvanica (green ash), Tilia Americana (basswood), Acer saccharinum (silver maple), Ulmus americana (American elm), Carya cordiformis (bitternut hickory), and Quercus macrocarpa (bur oak) [11,12]. The invasive common reed Phragmites australis subsp. australis is found throughout the park, within river channels and on riverbanks, along hiking trails, and in agricultural ditches, as Phragmites readily colonizes disturbed ecosystems such as former agricultural fields and farmland [4]. Additionally, the largest stand of invasive Data 2021, 6, 32

of 12
Phragmites in the province of Quebec is located in the Courant Channel along the western edge of the park.
and Phalaris arundinacea (reed canary grass). Some scattered forested areas are present, with the only mature forest located on Île Grosbois, covering an area of 18 ha [9] and consisting of Fraxinus pennsylvanica (green ash), Tilia Americana (basswood), Acer saccharinum (silver maple), Ulmus americana (American elm), Carya cordiformis (bitternut hickory), and Quercus macrocarpa (bur oak) [11,12]. The invasive common reed Phragmites australis subsp. australis is found throughout the park, within river channels and on riverbanks, along hiking trails, and in agricultural ditches, as Phragmites readily colonizes disturbed ecosystems such as former agricultural fields and farmland [4]. Additionally, the largest stand of invasive Phragmites in the province of Quebec is located in the Courant Channel along the western edge of the park.

Ground Truth Data Collection
The dataset is comprised of 320 points collected over 6 days between September 5 and 17, 2019 ( Figure 5). A Reach RS+ (EMLID, St. Petersburg, Russia) single-band real-time-kinematic (RTK)-capable GNSS receiver was used. It is capable of receiving signals from several satellite constellations, including GPS, GLONASS, Galileo, QZSS, and Beidou. The Reach RS+ was mounted to the top of a fully extended surveying monopole with an attached bubble level to ensure the system would remain stable during data collection. The antenna height was measured as 2.065 m, which includes the height of the surveying monopole (2 m) and the phase center offset, which is the given distance from the bottom of the system housing to the position of the antenna within the housing (0.065 m). The ground truth data points were collected following an established protocol for the unit [13] and utilized incoming corrections from a SmartNet North America (Norcross, GA, USA) base station, located in Longueuil, Quebec (Station QCLO), with a baseline distance that ranged up to 16 km (depending on where in the park collection was taking place at the time). Corrections were received via NTRIP (Networked Transport of Radio Technical Commission for Maritime Services (RCTM) via Internet Protocol) [14] on a RTCM3-iMAX (individualized master-auxiliary) mount point. The RCTM3-iMAX utilizes real reference base stations to send the network corrections, which provides traceability and consistency for incoming corrections received by the GNSS unit [15]. Both GPS and GLONASS constellations with an update rate of 5 Hz were used. The manufacturer reported approximate accuracy of the Reach RS+ with an NTRIP stream baseline of <10 km with a FIX position is 7 mm + 1 mm/km and 1 m for an NTRIP stream baseline <30 km for FLOAT positions [16]. The necessary wireless hotspot was creating using a Novatel Wireless MiFi-7000, and an iPhone Xs running the EMLID Reach application controlled the Reach RS+ rover. In order to respect the Société des établissements de plein air du Québec (SEPAQ) park management team's request to limit the use of automobiles within the park so that park visitors would not be disturbed, a bicycle was used as the mode of transportation on the roads and trails, with the remainder of the sampling off-trail conducted on foot.

Approximate Positional Accuracy
The Reach RS+ rover internally calculates the position of sample points relative to a continuously updating average position [17]. The deviation from the average position is reported by the unit both on-screen and in the output files as the root mean squared error (RMSE). The RMSE is a measure of the difference between values that are predicted and those that are observed. However, because there is no 'observed' known location to which the collected sample point locations are compared, the reported values are the standard deviation. In this dataset, the RMSE field names have been changed to SD to reflect what In addition to the ground truth points, detailed field notes were also collected for each sampling point. These field notes included a site photo of the measurement point (along with the appropriate photo ID number), whether invasive Phragmites was present or not, Data 2021, 6, 32 7 of 12 basic site characteristics such as the type(s) of vegetation present, and more detailed notes regarding the location or characteristics of the site. The data underwent quality assessment and any improperly collected points were removed from the dataset, such as duplicate points (due to operator error) or points that reported horizontal standard deviation of 0 m. Ultimately 320 points were retained: 158 points where Phragmites was present and 162 points where Phragmites was absent. The 320 points were distributed over as much of the accessible extent of the park as possible ( Figure 5). In order to avoid a bias of points only collected near trails or other easily accessible areas (such as near beaches or parking lots), care was taken to go off-trail into harder-to-reach areas of the park in order to capture the natural variation and distribution of vegetation species. This off-trail sampling was done with permission from the SEPAQ park management team, and no sampling took place in any of the active restoration areas of the park ( Figure 5).

Approximate Positional Accuracy
The Reach RS+ rover internally calculates the position of sample points relative to a continuously updating average position [17]. The deviation from the average position is reported by the unit both on-screen and in the output files as the root mean squared error (RMSE). The RMSE is a measure of the difference between values that are predicted and those that are observed. However, because there is no 'observed' known location to which the collected sample point locations are compared, the reported values are the standard deviation. In this dataset, the RMSE field names have been changed to SD to reflect what the values represent.
For this application, we report SD x , SD y , SD z , and the horizontal linear SD in the radial direction (SD r ) from Equation (1) [18]. These are referred to incorrectly as RMSE x , RMSE y , RMSE z , and RMSE r in the original data files output by the unit.
The SD are reported in the ground truth dataset file, as well as the attribute table of the shapefile. An approximate horizontal (~ACC r ) and vertical (~ACC z ) accuracy were calculated from Equations (2) and (3).
where δ r is the uncertainty in the radial dimension (SD r ) and δz is the uncertainty in the z dimension (SD z ). Because the SD values represent precision, rather than accuracy compared to a known position, the manufacturer-stated approximate accuracy (AA) of 1 m for baselines < 30 km is adopted here. Equation (2) is modified from the National Standard for Spatial Data Accuracy (NSSDA) horizontal accuracy (ACC r ) at the 95% confidence level [18]. However, because ACC r as described by [18] is calculated using the error compared to a known position, the NSSDA ACC r could not be determined for this dataset, and an approximate horizontal accuracy at the 95% confidence level is reported. Equation (3) is modified from the ASPRS Accuracy Standards for Digital Geospatial Data, which notes that the vertical error in vegetated terrain does not typically follow a normal distribution [18]. Because ACC z as described by [18] is calculated using the error compared to a known position, the ASPRS ACC z could not be determined for this dataset and an approximate vegetated vertical accuracy at the 95th percentile is reported. The same EMLID Reach RS+ unit with incoming corrections from SmartNet North America via NTRIP had been previously verified for accuracy using a Natural Resources Canada High Precision 3D Geodetic Network station (#95K0003) [19]. The positional accuracy was assessed to be <2.5 cm (x and y) and <3 cm (z) [20]. A brief summary of the average, minimum, maximum, and standard deviation of SD in each dimension, ∼ ACC r , and ∼ ACC z for all 320 points is given in Table 4. Tables 5 and 6 Data 2021, 6, 32 8 of 12 contain the average, minimum, and maximum SD in each dimension, ∼ ACC r , and ∼ ACC z for the ground truth points with a solution status of FIX (143 points) or FLOAT (177 points), respectively. All 320 points had an ∼ ACC r < 2.25 m and a SD r < 16.5 cm, which meet the requirements of the project as determined by the characteristics of the HSI.
All ground truth points had an ∼ ACC z ≥ 1 m, and 89.69% of all ground truth points had an ∼ ACC z < 2 m. Thirty-three points (10.31%) had an ∼ ACC z ≥ 2 m, but no points had an ∼ ACC z > 3 m. This indicates the vertical data would not be suitable for geospatial mapping projects that require high vertical accuracy, but would be suitable for visualization and projects without high vertical accuracy requirements.
The empirical cumulative distribution functions (CDF) were determined for SD r and ∼ ACC r for the full dataset, FLOAT data subset, and FIX data subset, as shown in Figure 6. Additionally, the corresponding frequency distribution histograms are shown within each corresponding CDF plot. Table 4. The average, minimum, maximum, and standard deviation of the precision and the approximate ACC r and ACC z calculated for the entire set of 320 points, reported in centimeters.

Conclusions
A new ground truth dataset containing detailed positional information regarding the presence of invasive Phragmites and other vegetation within Îles-de-Boucherville National Park is described. The dataset was collected using an established protocol and a high- Figure 6. The empirical cumulative distribution functions for different subsets of the data. Subsets (A) and (B) are the SD r and ∼ ACC r , respectively, for the full dataset of 320 points. Subsets (C) and (D) are the SD r and ∼ ACC r , respectively, for the subset of 177 points where the solution status was FLOAT. Subsets (E) and (F) are the SD r and ∼ ACC r , respectively, for the subset of 143 points where the solution status was FIX. Each plot also illustrates the frequency distribution histogram for the associated metric.

Conclusions
A new ground truth dataset containing detailed positional information regarding the presence of invasive Phragmites and other vegetation within Îles-de-Boucherville National Park is described. The dataset was collected using an established protocol and a highaccuracy GNSS system consisting of an EMLID Reach RS+ rover and incoming RTK corrections via internet using the SmartNet North America NTRIP service. The purpose of these data is to provide ground truth for the training and validation of Phragmites target detection using airborne HSI acquired over the extent of the park, in order to map the presence of Phragmites. The precision and approximate accuracy of the dataset were assessed, and were found to meet the requirements of the application (i.e., <16.5 cm for precision and <2.25 m for accuracy), as determined by the characteristics of the HSI ( Figure  7). Auxiliary field notes are provided in order to give context to the measured points, and include brief descriptions of the site characteristics as well as corresponding field photographs for visual reference. detection using airborne HSI acquired over the extent of the park, in order to map the presence of Phragmites. The precision and approximate accuracy of the dataset were assessed, and were found to meet the requirements of the application (i.e., < 16.5 cm for precision and < 2.25 m for accuracy), as determined by the characteristics of the HSI (Figure 7). Auxiliary field notes are provided in order to give context to the measured points, and include brief descriptions of the site characteristics as well as corresponding field photographs for visual reference.

User Notes
Users should be sure to note the "Solution Status" field of the dataset, as solutions of "FLOAT" and "FIX" may have implications on the overall accuracy of the collected location. While both "FLOAT" and "FIX" solution statuses indicated that base corrections are included and positioning is relative to the determined base coordinates, "FLOAT" means that the integer ambiguity is not resolved, while "FIX" means that the integer ambiguity is resolved. Precision in float mode is typically at the sub-meter level, while precision in fix mode is at the centimeter level. Points collected in dense vegetation, such as standing in a developed Phragmites stand that could measure over 2 m in height, are likely to have lower accuracy. This can be attributed to the fact that such dense vegetation restricted the view of the sky, and therefore restricted the ability of the Reach RS+ to see and acquire sufficient satellites to produce lower errors. To improve the positional accuracy of the

User Notes
Users should be sure to note the "Solution Status" field of the dataset, as solutions of "FLOAT" and "FIX" may have implications on the overall accuracy of the collected location. While both "FLOAT" and "FIX" solution statuses indicated that base corrections are included and positioning is relative to the determined base coordinates, "FLOAT" means that the integer ambiguity is not resolved, while "FIX" means that the integer ambiguity is resolved. Precision in float mode is typically at the sub-meter level, while precision in fix mode is at the centimeter level. Points collected in dense vegetation, such as standing in a developed Phragmites stand that could measure over 2 m in height, are likely to have lower accuracy. This can be attributed to the fact that such dense vegetation restricted the view of the sky, and therefore restricted the ability of the Reach RS+ to see and acquire sufficient satellites to produce lower errors. To improve the positional accuracy of the ground truth points, a multi-band receiver with incoming corrections over a shorter baseline is recommended. For the Reach RS+ specifically, a local base station (< 10 km baseline) in addition to the incoming NTRIP corrections would be required for higher accuracy.
Due to logistical constraints, it was not possible to remeasure the 320 points in order to recheck them for accuracy. Additionally, there are no previously established geodetic monuments within the park that could serve as a reference point. Because the Reach RS+ unit had been previously assessed for accuracy using a Natural Resources Canada High Precision 3D Geodetic Network station (see Section 3.3), the 320 points were collected in the field with confidence, and their approximate accuracy exceeds the requirements of the project under which they were acquired.