Identifying Invasive Weed Species in Alpine Vegetation Communities Based on Spectral Proﬁles

: This study examined the use of hyperspectral proﬁles for identifying three selected weed species in the alpine region of New South Wales, Australia. The targeted weeds included Orange Hawkweed, Mouse-ear Hawkweed and Ox-eye daisy, which have caused a great concern to regional biodiversity and health of the environment in Kosciuszko National Park. Field surveys using a spectroradiometer were undertaken to measure the hyperspectral proﬁles of leaves and ﬂowers of the selected weeds and companion native plants. Random Forest (RF) classiﬁcation was then applied to distinguish which spectral bands would differentiate the weeds from the native plants. Our results showed that an accuracy of 95% was achieved if the spectral proﬁles of the distinct ﬂowers of the weeds were considered, and an accuracy of 80% was achieved if only the proﬁles of the leaves were considered. Emulation of the spectral proﬁles of two multispectral sensors (Sentinel-2 and Parrot Sequoia) was then conducted to investigate whether classiﬁcation accuracy could potentially be achieved using wider spectral bands.


Introduction
Invasions by non-indigenous floral and faunal species are considered one of the most formidable of threats and risk factors to ecosystems and socioeconomic conditions, particularly in Australia [1]. The direct annual impact of invasive species in Australia is estimated to be as high as $6.4 billion AUD per annum [2]. This excludes other flow-on impacts on the environment including native species extinctions, reduction in biodiversity, damage to ecosystem services, reduced aesthetics, impacts on fire regimes, and other potential feedback influences [1,3,4]. The impacts are even more significant in areas such as the Australian Alps due to Indigenous and European Australian heritage and culture [5].
The Australian Alps region contains several unique ecosystems, including one of the only seasonally snow-adapted ecosystems on the continent. It is home to a large array of rare and unique floral and faunal species, accentuating the need to preserve the biodiversity of the park. The presence of noxious weeds in Kosciuszko National Park (KNP) and its surroundings has become a key threat to the local biodiversity and health of the environment, with significant potential to cause negative environmental, social, and economic impacts [6][7][8][9].
There is an urgent need for prevention of weed spread in the national park, particularly Orange Hawkweed, as it disperses easily and prolifically, and is hazardous to the environment. This prevention process traditionally involves monitoring existing infestations and scouting for additional infestations using field survey approaches [10,11]. However, the demand and urgency to control the plants' domination has led to a more diverse 1.
Develop a collection of spectral profiles (spectral library) for the three targeted weed species and the main co-occurring native species of the Australian alpine vegetation community.

2.
Determine whether the spectral profiles of the weed species measured by a spectroradiometer can be distinguished from the other companion plants. 3.
Further investigate the feasibility of mapping the targeted weed species using multispectral remote sensing systems using emulation.

Study Area
Kosciuszko National Park (KNP) is located in the Snowy Mountains of New South Wales (NSW), in south-eastern Australia (latitude: 35 • 30 S to 37 • 02 S; longitude: 148 • 10 E to 148 • 52 E; Figure 1). It is the largest national park in NSW at approximately 6900 km 2 [25]. Sites for this project were chosen in liaison with NSW National Parks officers, targeting areas within the park where Ox-eye daisy, Orange Hawkweed and Mouse-ear Hawkweeds were found and have been recorded in the past. The hawkweeds currently have a very sparse and patchy distribution due to the ongoing weed eradication program [11]. Geomatics 2021, 1, FOR PEER REVIEW 3 Figure 1. Overview of the study site Kosciuszko National Park (KNP) in south-eastern New South Wales, Australia.

Methods
This study aims to establish the spectral profiles of three noxious weed species plus co-occurring native vegetation, and statistically define their separability at ground level. This will ascertain if the existing or upcoming remote sensing systems can assist in determination of the spatial distribution of noxious weeds in the Australian Alpine environment using airborne and/or spaceborne imagery. The methods applied in this study consist of: (1) in situ data collection; (2) spectral analysis and vegetation species classification using machine learning; and (3) feasibility tests for identifying weeds using emulations of drone and satellite multispectral imagery.

In Situ Data Collection
The collection of spectral signatures must be accurate and representative of the targets. The spectral measurements are highly influenced by the methodology of their capture, environmental conditions, equipment responses and calibration quality. The "Supervising Scientist Report 195-Standards for reflectance spectral measurement of temporal vegetation plots" by the Australian Government aimed to collate the literature regarding in situ spectral data collection methods in order to collect data for a national spectral database [26]. This study adopted the data collection method recommended by the Standards.
A Spectral Evolution PSR+3500 spectroradiometer [27] was utilized for the field measurements. This device has a spectral range of 350-2500 nm and output in 1 nm increments [27]. Reflectance profiles of the targets were collected in the summer of 2017 and 2018. The first field survey was conducted in January 2017 and measured the average reflectance of the plants using an optical lens at a field of view of 8 degrees at approximately 1 m above ground. Experimental controls to minimize the spectral variation included: performing reference calibration every ten minutes or after changing illumination condition (e.g., by scattered clouds); ensuring proper warm-up time of the spectroradiometer; only performing measurements at as close to noon as possible; and ensuring a suitable number of samples were collected. As a result, a total of 11 species were sampled, which are summarized in Table 1. Our preliminary results of the field data collected in 2017 were presented in [28].

Methods
This study aims to establish the spectral profiles of three noxious weed species plus cooccurring native vegetation, and statistically define their separability at ground level. This will ascertain if the existing or upcoming remote sensing systems can assist in determination of the spatial distribution of noxious weeds in the Australian Alpine environment using airborne and/or spaceborne imagery. The methods applied in this study consist of: (1) in situ data collection; (2) spectral analysis and vegetation species classification using machine learning; and (3) feasibility tests for identifying weeds using emulations of drone and satellite multispectral imagery.

In Situ Data Collection
The collection of spectral signatures must be accurate and representative of the targets. The spectral measurements are highly influenced by the methodology of their capture, environmental conditions, equipment responses and calibration quality. The "Supervising Scientist Report 195-Standards for reflectance spectral measurement of temporal vegetation plots" by the Australian Government aimed to collate the literature regarding in situ spectral data collection methods in order to collect data for a national spectral database [26]. This study adopted the data collection method recommended by the Standards.
A Spectral Evolution PSR+3500 spectroradiometer [27] was utilized for the field measurements. This device has a spectral range of 350-2500 nm and output in 1 nm increments [27]. Reflectance profiles of the targets were collected in the summer of 2017 and 2018. The first field survey was conducted in January 2017 and measured the average reflectance of the plants using an optical lens at a field of view of 8 degrees at approximately 1 m above ground. Experimental controls to minimize the spectral variation included: performing reference calibration every ten minutes or after changing illumination condition (e.g., by scattered clouds); ensuring proper warm-up time of the spectroradiometer; only performing measurements at as close to noon as possible; and ensuring a suitable number of samples were collected. As a result, a total of 11 species were sampled, which are summarized in Table 1. Our preliminary results of the field data collected in 2017 were presented in [28]. Table 1. Summary of the 11 species sampled using optical lens in the first field survey conducted in January 2017. The total number of plants and total number of samples before and after the removal of outliers are listed.
Abb. Due to the high diversity of plants on the ground and absence of dense weed mats for sampling, another two field surveys were conducted in early January 2018 and late January-early February 2018 to measure the direct spectral profiles of the plants using a leaf-clip. A greater number of native species were sampled (total of 30 species and 724 plants; Table 2). Each sample was the average of ten readings, and each plant was sampled three to four times. However, the targeted weeds were still difficult to find due to the extensive hawkweed eradication program performed by volunteers and national park staff. Table 2. Summary of the 30 species sampled using leaf-clip in the second field survey conducted in January and February 2018. The total number of plants and total number of samples before and after the removal of outliers are listed.   The site locations and representative ground photographs are shown in Figure 2. Examples of the targeted invasive and native species are shown in Figure 3. Our surveys were aligned with the flowering season of hawkweeds in KPN, and where access to the sites was possible.

Spectral Profile Analysis and Classification
Initially, the metadata and field notes were manually cross-referenced to the spectral profile samples to identify and remove those which were erroneous and unsuitable. This included accidental triggers and variable weather conditions, e.g., wind and change of illumination due to scattered clouds (first survey only). The survey data collected by the optical lens in 2017 were trimmed to wavelengths between 400 and 1300 nm. This ensured that the noisy section of the profiles (>1300 nm) measured by the optical lens, as shown in Figure 4a, were avoided in the analyses as they would affect the results [26,28].

Spectral Profile Analysis and Classification
Initially, the metadata and field notes were manually cross-referenced to the spectral profile samples to identify and remove those which were erroneous and unsuitable. This included accidental triggers and variable weather conditions, e.g., wind and change of illumination due to scattered clouds (first survey only). The survey data collected by the optical lens in 2017 were trimmed to wavelengths between 400 and 1300 nm. This ensured that the noisy section of the profiles (>1300 nm) measured by the optical lens, as shown in Figure 4a, were avoided in the analyses as they would affect the results [26,28].
Analysis of the spectral profiles was then undertaken through several workflows in R [29] using pre-existing and new scripts specifically developed for this project [30]. To remove the outliers, an automated outlier removal algorithm based on a depth measure was applied [31,32]. The spectral profiles of the samples before and after pre-processing are shown in Figure 4. In contrast, the profiles across the full spectral range that were collected using a leaf clip in the surveys in 2018 with the outliers removed were used for the classification. Analysis of the spectral profiles was then undertaken through several workflows in R [29] using pre-existing and new scripts specifically developed for this project [30]. To remove the outliers, an automated outlier removal algorithm based on a depth measure was applied [31,32]. The spectral profiles of the samples before and after pre-processing are shown in Figure 4. In contrast, the profiles across the full spectral range that were collected using a leaf clip in the surveys in 2018 with the outliers removed were used for the classification.
Supervised classification algorithms are machine learning techniques that allow for a detailed analysis and comparison of spectral profiles. Random Forest (RF) is one of the most popular and accurate techniques for classifying large datasets [33][34][35]. The RF method works by generating a large collection of decision trees, of which each is constructed from a subset of the original data, sampled by random with replacements [34]. In this study, RF classification was performed within the caret function in R [36]. The pre-processed 2017 dataset was split at a ratio of 80:20 into training and testing datasets as the spectral variation of the same species was wider when measured by the optical lens. As more species and samples were collected using a leaf clip in the surveys in 2018, this new dataset was split at a ratio of 75:25 which allowed a higher number of validation samples.
One thousand trees were selected for the RF method, which provides a good balance between accuracy, processing time and memory usage [37]. The bootstrap resampling using 100 iterations was applied on the training data to estimate the mean classification error of the training data and build the best possible model. A variety of results were chosen to be output by the algorithm, including: confusion matrix and statistics; overall statistics; statistics by class (i.e., by species); and variable importance (key wavelengths for separability). This process provided the overall statistical analysis of the spectral profiles.   Supervised classification algorithms are machine learning techniques that allow for a detailed analysis and comparison of spectral profiles. Random Forest (RF) is one of the most popular and accurate techniques for classifying large datasets [33][34][35]. The RF method works by generating a large collection of decision trees, of which each is constructed from a subset of the original data, sampled by random with replacements [34]. In this study, RF classification was performed within the caret function in R [36]. The preprocessed 2017 dataset was split at a ratio of 80:20 into training and testing datasets as the spectral variation of the same species was wider when measured by the optical lens. As more species and samples were collected using a leaf clip in the surveys in 2018, this new dataset was split at a ratio of 75:25 which allowed a higher number of validation samples.
One thousand trees were selected for the RF method, which provides a good balance between accuracy, processing time and memory usage [37]. The bootstrap resampling using 100 iterations was applied on the training data to estimate the mean classification error

Multispectral Drone and Satellite Based Sensor Emulation
Whilst the spectral classification method provides significant insights into the separability of the species spectral profiles across the hyperspectral wavelength range, we also investigated whether the key discriminate wavelengths were in a range that is available in the current multispectral cameras/sensors. To do this, a down sampling approach was performed on the data. For this paper, the Parrot Sequoia [38] was hypothesised as a potential sensor for detecting the invasive weed species. This was due to the affordability, small size and weight, compatibility with both fixed-wing and multi-rotor drones, as well as the capacity for self-calibration, thereby improving its accessibility to field use [38].
In order to ascertain usability of this sensor to discriminate the weed species, a new classification was performed. For this step, processed scans were trimmed into the wavelengths that were available for capture on the Parrot Sequoia. These wavelengths were: 550 nm ± 10 (green); 660 nm ± 10 (red); 735 nm ± 5 (red-edge); and 790 nm ± 10 (nearinfrared). The individual wavelengths within these bands were then binned, to assume the mean across the entire range, simulating the data captured within each pixel band. These mean bands were then run by the RF classification, with the same parameters set as before. This provides an indicative insight into the potential ability of the Parrot Sequoia in detecting the invasive species.
To further evaluate weed discriminability, the Sentinel-2 multispectral satellite of the European Space Agency (ESA) Copernicus Program was selected for further spectral emulation. The Sentinel-2 multi-spectral instrument (MSI) measures the earth's surface in 13 spectral bands over visible near infrared (VNIR) and shortwave infrared (SWIR) spectrums at spatial resolutions ranging from 10 to 60 m. The detailed spatial and spectral resolutions of Sentinel-2 MSI can be found at the European Space Agency's webpage (https://sentinel. esa.int/web/sentinel/missions/sentinel-2/instrument-payload/resolution-and-swath [accessed on 25 January 2021]).

Spectral Analysis and Classification Results
Over 39 invasive and native co-occurring plant species were sampled. This included both leaves and flowers where possible to give the highest possible chance of finding the differences in the spectral profiles of the plants.
The average spectral profiles collected using the spectroradiometer and lens are shown in Figure 5. It is visually evident that there are some distinct differences across the mean profiles.
The confusion matrix of the RF classification of the first survey data is presented in Table 3. The overall accuracy of the classification is 0.6968 with a Kappa value of 0.6651, representing a substantial strength of agreement. Table 3 shows the misclassifications between the Ox-eye daisy plants only and Ox-eye daisy flowers, and this is due to the sampling with lens. The other important output, "variable importance measures", of the Random Forest classification provided an indicative insight into the wavelengths where the discriminability of the weed species is maximized. The most useful top 20 wavelengths for spectral discrimination were found to include the bands: 400-420, 440-480, 510-550, 570-580, 640-690, 710-750 and 1300 nm, as shown in Figure 5b.  The confusion matrix of the RF classification of the first survey data is presented in Table 3. The overall accuracy of the classification is 0.6968 with a Kappa value of 0.6651, representing a substantial strength of agreement. Table 3 shows the misclassifications between the Ox-eye daisy plants only and Ox-eye daisy flowers, and this is due to the sampling with lens. The other important output, "variable importance measures", of the Random Forest classification provided an indicative insight into the wavelengths where the discriminability of the weed species is maximized. The most useful top 20 wavelengths for spectral discrimination were found to include the bands: 400-420, 440-480, 510-550, 570-580, 640-690, 710-750 and 1300 nm, as shown in Figure 5b.     AGR  3  3  1  7  43  ASP  1  13  1  1  1  17  76  BOS  11  2  1  3  17  65  BSE  2  7  1  1  11  64  CAS  6  1  1  4  12  50  KAG  1  4  1  4  10  40  OHff  2  2  100  OHfp  1  14  1  16  88  OHgf  1  12  13  92  OHgp  7  1  8  88  OXf  1  30  7  1  39  77  OXp  1  3  1  2  11  13  31  42  PDf  5  1  6  83  PDp  1  3  4  75  SGR  1  1  1  22  25  88  Total  3  7  16  11  9  12  9  3  19  12  8  41  27  6  8  30  221  -PA (%)  67  43  81  100  78  50  44  67  74  100  88  73  48  83  38  73 --For the profiles collected using the leaf clip, the average spectral profiles of the sampled species are shown in Figure 6. Their error matrix results showed an overall classification accuracy of 80% for the leaves only (Table 4) and 97% for the flowers (Table 5) The error matrix results show that when the flowers and leaves are available, the accuracy of the detection increases significantly since both profiles can be used to identify the plant and reduce the likelihood of misclassifications.   The error matrix results show that when the flowers and leaves are available, the accuracy of the detection increases significantly since both profiles can be used to identify the plant and reduce the likelihood of misclassifications.

Multispectral Profile Emulation: Parrot Sequoia and Sentinel-2
The targeted weeds only have very coarse and patchy coverage (i.e., patches of a few to several plants) in the national park due to the successful long-term and ongoing weed control and mitigation program. Therefore, even the finest spatial resolution of the Sentinel-2 MSI sensor at 10 m is not small enough to detect the weeds, as the spectral profile measured by the pixel is dominated by other land covers and plants. In order to focus on the separability of the weeds and other alpine plants based on their spectral characteristics, this section examines and presents the emulated spectral profiles for Parrot Sequoia and Sentinel-2 based on the spectral library derived from the field measurements.
The resampled profiles matched to the Sentinel-2 satellite sensor (13 bands) and Parrot Sequoia drone sensor (4 bands) showed overall classification accuracies of 69.7% and 56.8%, respectively (Tables 6 and 7), which is much lower than the accuracies achieved using the hyperspectral profiles. It is important to note that the emulated multispectral profiles assume the pixels are "pure"; that is, the pixels only contain one plant species. Therefore, the effect of spatial resolution is not considered in the emulation test. Both Tables 6 and 7 showed it was more accurate to identify Orange Hawkweed while it is flowering, even with the four bands of the Parrot Sequoia camera. The accuracy dropped significantly without the flowers.

Discussion
Establishing the spectral library of vegetation species and determining their discrimination ability has provided significant insights into the potential use of remote sensing and machine learning to locate species of noxious weeds in Kosciuszko National Park in Australia. There are many notable differences and similarities between the native and invasive species, as well as across the individual species themselves based on the measurements of reflectance using a spectroradiometer. Most notably in these results is the strong visual separation of the profiles of the weeds, Orange Hawkweed, Mouse-ear Hawkweed and Ox-eye daisy, from the rest of the native species. The alpine everlasting (Xerochrysum subundulatum) resides between both weeds spectral profiles. Ox-eye daisy seems the most different, especially between 400 and 650 nm, where it has a substantially higher reflectance in comparison to all other species-potentially a result of its white flowers. The dark red-orange flowers of Orange Hawkweed are also most distinguishable.
This study of spectral profiling and classification of invasive and native species has proven to be useful as an assessment tool for determining the prospective use of remote sensing imagery capture and classification. The ability to ascertain potential classification accuracies prior to full-scale deployment increases the productivity and effectiveness of future efforts. Specifically, for Orange Hawkweed, Mouse-ear Hawkweed and Ox-eye daisy, the results of this analysis have provided substantial benefits to current and future eradication efforts. Obtaining the spectral profiles of the invasive species and their native cohabitants through field sampling and post-processing already allows for significant future work in this space.
Whilst the results of this study are positive, it is important to discuss the limitations of our study. Firstly, this study looked specifically into discriminability and separability through a statistical machine learning analysis of the plant species based on their spectral profiles alone. It is then necessary to assess this based on actual aerial hyperspectral imagery of known infestations and control samples. Further, the spatial and spectral resolutions of the sensor need to be maximised while the geometric distortion of the imaged features should be minimised. Another constraint of our study was acquiring enough samples of Orange Hawkweed and Mouse-ear Hawkweed. Due to the extreme demand for controlling the invasive species, most samples that were found had already been treated, were in isolated forms and not in patches, and were blended with the local native vegetation. As such, a lower diversity of individual plant examples was collected than was desired.

Conclusions
The aims of this study were to investigate the use of remote sensing to determine and discriminate the spectral profiles of invasive and native plant species. Specifically for Orange Hawkweed, Mouse-ear Hawkweed and Ox-eye daisy in Kosciuszko National Park (KNP), this study determined that there is significant separability between these invasive species and their native co-habitants. The consequences of untampered weed proliferation in KNP are significant, risking its significant cultural and heritage values and biodiversity, as well as causing significant environmental and socioeconomic impacts. Through utilising remote sensing in a multi-faceted approach combining the emulation of multispectral bands and analysis with ground surveys, weed management in the park will see significant benefits. Overall spectral separability accuracies of 80% for hyperspectral profiles, 69.7% for emulated Sentinel-2 multispectral bands and 56.8% for the emulated Parrot Sequoia four-band camera were identified in this study. The classification accuracy for Orange Hawkweed was increased to >95% when the flowers were present. A similar finding for Mouse-ear Hawkweed is also likely.
If established, Orange Hawkweed provides a significant threat, resulting in unbearable costs to the ecosystem and grazing industry. By conducting a focused analysis of the spectral detection abilities in KNP, this paper has provided insights into the application potential of this discipline, determining its specific use in relation to Orange Hawkweed, Mouse-ear Hawkweed and Ox-eye daisy compared to the Australian alpine natives. Ultimately, in conclusion, this paper finds that the use of spectral profiles to locate and eradicate weeds in KNP through determining and discriminating spectra is effective, and should be used as part of a multidisciplinary environmental management approach. Funding: This project was funded through the NSW Adaptation Research Hub-Biodiversity Node which is supported by the NSW Office of Environment and Heritage. We gratefully acknowledge additional funding provided by NPWS and Macquarie University for the project.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.