Weed Species Identification: Acquisition, Feature Analysis, and Evaluation of a Hyperspectral and RGB Dataset with Labeled Data

Ronay, Inbal; Lati, Ran Nisim; Kizel, Fadi

doi:10.3390/rs16152808

Open AccessArticle

Weed Species Identification: Acquisition, Feature Analysis, and Evaluation of a Hyperspectral and RGB Dataset with Labeled Data

by

Inbal Ronay

¹,

Ran Nisim Lati

²

and

Fadi Kizel

^1,*,†

¹

Laboratory for Multidimensional Analysis in Remote Sensing (MARS), Department of Mapping and Geoinformation Engineering, Technion-Israel Institute of Technology, Haifa 32000, Israel

²

Department of Plant Pathology and Weed Research, Agricultural Research Organization, Newe Ya’ar Research Center, Ramat-Yishai 30095, Israel

^*

Author to whom correspondence should be addressed.

^†

Fadi Kizel is a Neubauer Asst. Professor.

Remote Sens. 2024, 16(15), 2808; https://doi.org/10.3390/rs16152808

Submission received: 8 May 2024 / Revised: 24 July 2024 / Accepted: 27 July 2024 / Published: 31 July 2024

(This article belongs to the Special Issue Remote Sensing Data Sets II)

Download

Browse Figures

Versions Notes

Abstract

Site-specific weed management employs image data to generate maps through various methodologies that classify pixels corresponding to crop, soil, and weed. Further, many studies have focused on identifying specific weed species using spectral data. Nonetheless, the availability of open-access weed datasets remains limited. Remarkably, despite the extensive research employing hyperspectral imaging data to classify species under varying conditions, to the best of our knowledge, there are no open-access hyperspectral weed datasets. Consequently, accessible spectral weed datasets are primarily RGB or multispectral and mostly lack the temporal aspect, i.e., they contain a single measurement day. This paper introduces an open dataset for training and evaluating machine-learning methods and spectral features to classify weeds based on various biological traits. The dataset comprises 30 hyperspectral images, each containing thousands of pixels with 204 unique visible and near-infrared bands captured in a controlled environment. In addition, each scene includes a corresponding RGB image with a higher spatial resolution. We included three weed species in this dataset, representing different botanical groups and photosynthetic mechanisms. In addition, the dataset contains meticulously sampled labeled data for training and testing. The images represent a time series of the weed’s growth along its early stages, critical for precise herbicide application. We conducted an experimental evaluation to test the performance of a machine-learning approach, a deep-learning approach, and Spectral Mixture Analysis (SMA) to identify the different weed traits. In addition, we analyzed the importance of features using the random forest algorithm and evaluated the performance of the selected algorithms while using different sets of features.

Keywords:

hyperspectral; weed classification; machine learning; site specific weed management

1. Introduction

Weeds within agricultural fields present a substantial challenge, often causing detrimental impacts on crop plants and significantly reducing yield quality and quantity [1]. While herbicides effectively control weeds, their application is associated with negative environmental pollution and human health aspects. To address this pressing issue, the emergence of the Site-Specific Weed Management (SSWM) approach has offered a pathway toward more sustainable weed control strategies. This approach facilitates the precise application of herbicides based on the specific location of the weed, its density, and its species composition [2]. SSWM utilizes image data, which is employed to generate weed maps through various methodologies capable of identifying pixels corresponding to crops, soil, and distinct weed species. However, the complexity of differentiating between various weed species due to their similarities presents a formidable challenge.

In this regard, the potential of spectral data in classifying weed species has been the subject of many studies [3]. Scenarios such as dense weed populations, overlapping leaves, early growth stages, and coarse spatial resolutions pose challenges in identifying weed species using texture and shape features and can benefit from spectral data. Spectral data can capture plant physiological trait variations, encompassing leaf anatomy and biochemistry. This functionality is exhibited through the Visible Range (VIS), which responds to variations in pigment content and photosynthetic activity; the Near-Infrared Range (NIR), which is sensitive to anatomical leaf traits; and the Shortwave Infrared Range (SWIR), indicative of water, sugars, and protein content within leaves [4,5].

Spectral data can handle scenarios where several species are densely mixed and pixel-wise classification is required. Therefore, exploring methods for analyzing spectral data is highly relevant for SSWM methodologies. Machine-learning techniques, including Convolutional Neural Network (CNN) approaches, have shown promising results in classifying weed species [6]. Using CNN allows the development of non-linear and relatively more complex models than those of different machine-learning methods. As a result, classification can be applied to simple data such as RGB images.

On the other hand, the large number of features available in spectral data allows the use of simpler machine-learning models that are easier to train and require lower computational demands. However, since weed spectra exhibit variations across distinct growth stages [7] and under varying environmental conditions [8], expansive and diverse training datasets are needed to represent various scenarios. Producing labeled datasets that cover a wide range of imaging conditions is tedious and requires extensive collaboration and data sharing between research groups. Facilitating access to labeled datasets aims to support the development of machine-learning and deep-learning models in agriculture [9,10].

Unfortunately, most public weed datasets predominantly comprise only RGB images [6,11,12,13]. Some studies have acquired and published multispectral data for weed-related research. For example, there exists a dataset capturing nine weed species through UAV-mounted multispectral sensors in a sugar beet field [14] and a sugar beet/weed dataset from a controlled field experiment alongside pixel-wise labeled data [15]. However, the availability of open-access weed datasets remains limited. Furthermore, existing accessible spectral weed datasets are primarily multispectral and confined to single-day measurements. Remarkably, despite the extensive research employing hyperspectral imaging data to classify species under varying conditions [3,7,16], to the best of our knowledge, there are no open-access hyperspectral weed datasets. Therefore, recognizing the significance of hyperspectral data in studying unique weed species features, we present a dataset encompassing three common weed species during their early growth stages. In addition, our dataset includes corresponding RGB images of the scenes with a higher spatial resolution and accurately sampled labels.

This paper aims to provide a detailed description of the dataset’s construction and comprehensively evaluate the data. Consequently, we tested the performance of a machine-learning approach, a deep-learning approach, and an unmixing algorithm for identifying traits based on different features. Our results indicated that none of the examined classification approaches were significantly advantageous. However, we found that the classification of some species significantly depends on features from the VIS region, while others rely more on features from the NIR. Consequently, the classification of all species in the scene required using both spectral regions.

2. Materials and Methods

2.1. Dataset

2.1.1. Scene Construction and Data Acquisition

Our dataset comprises five scenes. Each scene was created by sowing weed seeds within a 34 cm × 54 cm sowing tray divided into cells of 2 cm × 2 cm. The weeds were grown in a greenhouse with daily irrigation. We recorded the five scenes over six days within two weeks following weed sowing, using a hyperspectral camera (Specim IQ, Oulu, Finland).

We acquired the images at 7, 8, 9, and 12, 13, 14 Days After Sowing (DAS), where the weeds started at stage 12 and reached stages 13–14 on the BBCH scale.

Some weed traits, such as the weed botanical group, are essential for making informed decisions about herbicide selection. Consequently, three distinct weed species were carefully chosen in the dataset (Figure 1 and Figure 2b): Amaranthus retroflexus (Ar), Solanum nigrum (Sn), and Setaria adhaerens (Sa). These species were selected because they represent different botanical groups and photosynthesis pathways, as detailed in Figure 2a. This selection facilitates a comprehensive exploration of the spectral resemblances and disparities among the chosen weed species.

The image acquisition process involved maintaining a consistent camera distance of 1.5, with the scene illuminated by two halogen light spots (Figure 2c). The hyperspectral images encompass 512 × 512 pixels with a spatial resolution of 0.1 cm and 204 spectral bands within the visible and near-infrared range (400–1000 nm). In addition to the hyperspectral image, the Specim camera concurrently captured an RGB image of the scene with a size of 645 × 645 pixels.

2.1.2. Image Calibration

We placed a barium-sulfate calibration panel within each scene (see Figure 3, tile a). A lossless Lambertian surface characterizes this panel. Thus, we used it to calibrate the images and convert their units into reflectance values. Accordingly, after dark correction, the reflectance factor R is calculated by dividing the pixel’s radiance by the radiance of a white reference target, as follows:

R (λ) = \frac{L_{p i x e l (λ)}}{L_{r e f e r e n c e (λ)}},

(1)

where

R (λ)

is a reflectance factor at wavelength

λ

and L_pixel(

λ)

and L_reference(

λ)

are the radiance of a pixel and white reference target at wavelength

λ

, respectively.

2.1.3. Data Labeling

The dataset includes labeled data for each image. We first sampled the labels on the RGB images (Figure 3b,c) and then transformed them into the corresponding spectral images (Figure 3d,e). The enhanced spatial resolution and quality of the RGB images have afforded higher accuracy when labeling the scenes than directly labeling the hyperspectral images.

The process involved manually assigning a corresponding label to each weed pixel within the images. Consequently, to label soil pixels, we calculated the excessive green index [20], and all non-weed pixels below an optimized threshold were assigned a soil label. Subsequently, we established a geometric transformation to align the RGB images with their corresponding hyperspectral counterparts. This alignment was facilitated by employing Oriented FAST and rotated BRIEF (ORB) features [21]. Then, the pixel labels were adjusted using the transformation to accurately fit the hyperspectral image format. Finally, we generated three distinct labeled sets, each characterizing specific attributes of weeds and soil:

Species labels: featuring designations for Ar, Sn, Sa, and soil.
Botanical group labels: including categorizations for monocotyledons (Sa), dicotyledons (Ar, Sn), and soil.
Photosynthesis mechanisms labels: distinguishing between C3 weeds (Sn) and C4 weeds (Ar, Sa) and soil.

Figure 3b,c present representative hyperspectral and RGB images from the dataset, an example of sampled weed species and soil spectra, and the corresponding labeled data. Table 1 presents the number of labeled pixels available for each class in the dataset.

With the corresponding labels, our dataset is organized to analyze the weeds’ spectral characteristics based on different traits.

2.2. Experimental Analysis and Evaluation

We applied comprehensive experiments using critical remote sensing applications to test and evaluate the dataset’s significance, advantages, and limitations. Our analysis included exploring different features for classification and assessing the performance of machine-learning and deep-learning approaches for classifying the various traits. Furthermore, we examined the performance of Spectral Mixture Analysis (SMA) for identification at the sub-pixel level.

2.2.1. Feature Selection

Random Forest

We utilized a random forest algorithm to determine the importance of various spectral features for classifying weed traits. The random forest model, consisting of 100 trees, was employed to enhance the robustness and accuracy of the classification process. Each tree in the forest was trained on a bootstrap sample of the data, with a subset of spectral features randomly selected at each split to ensure diversity among the trees. In particular, the algorithm was trained using 5% of the labeled data. Feature importance was then computed based on the mean decrease in impurity (Gini importance) across all trees, providing insights into which spectral features were most influential in distinguishing the different weed groups.

Principle Component Analysis (PCA)

Principal Component Analysis (PCA) was employed to analyze the spectral features further to classify different weed groups. PCA is a dimensionality reduction technique that transforms the original spectral data into a new set of uncorrelated variables, known as principal components, which capture the most significant variance in the data. By applying PCA, we reduced the complexity of the dataset while retaining the essential information required for classification.

2.2.2. Supervised Classification

Following feature selection, we compared the classification performance when using the selected features or a subset of the selected features to the classification achieved using the full spectra.

Support Vector Machine (SVM)

We performed an experimental analysis using the hyperspectral dataset and its corresponding labels. We assessed the Support Vector Machine (SVM) classifier. SVM is a supervised classification algorithm that finds a hyperplane in the hyperspectral feature space to classify each pixel distinctly. Previous works showed the SVM’s robustness for weed classification [22]. Accordingly, we used 5% of the labeled data to train the classifier. We trained the SVM model for each growth stage to obtain optimal classification results. To quantitatively analyze the classification results, we calculated the image’s overall accuracy for each growth stage.

One-Dimensional Convolutional Neural Network

We designed and implemented a One-Dimensional Convolutional Neural Network (1D-CNN) with the following architecture (Figure 4): The network begins with a sequence input layer tailored to the number of input channels, followed by a causal convolutional layer with a specified kernel size of [1,15], 32 filters, and casual padding, keeping the size of the vector unchanged. The convolutional layer is succeeded by a Rectified Linear Unit (ReLU) activation function to introduce non-linearity and a normalization layer to stabilize and accelerate the training process. A global average pooling layer reduces the feature map dimensions, thus preventing overfitting and reducing computational complexity. The network concludes with a fully connected layer that maps the extracted features to the desired number of classes and a SoftMax layer to output the class probabilities. The highest probability then determined the pixel’s class. We used 5% of the labeled data to train the network, 5% for validation, and 90% for the test.

2.2.3. Spectral Unmixing

Spectral mixture analysis is a common method used to extract subpixel information from spectral data and can potentially improve attempts to upscale precision agriculture applications [23]. We assessed the Vectorized Projected Gradient Descend Unmixing (VPGDU) [24]. We extracted the EMS automatically using the N-FINDR and the Vertex Components Analysis (VCA) algorithms [25]. Furthermore, we derived a supervised set of EMs for each growth stage. Specifically, we computed the mean spectra from 5% randomly selected pixels within each group for the different traits. We repeated this process eight times to mitigate bias, employing the mean spectra as the set of EMs. The results presented here are of the supervised extraction, which yielded slightly better accuracy. To quantitatively evaluate the Fraction Map (FM), we compared the predicted coverage from the FM to the actual coverage in the labeled data. We initially partitioned each FM into 64 equal cells for coverage estimation, each measuring n × n pixels (n = 64). Subsequently, we calculated the coverage of each EM within every cell based on the fraction map as follows:

C o v e r a g e ({E M}_{i}) = \frac{\sum_{y = 1}^{n} \sum_{x = 1}^{n} f_{i} (c, r)}{n^{2}}

(2)

where

f_{i} (c, r)

is the fraction value of the ith EM at the cell’s pixel located at

(c, r)

. We also computed the actual fractions in every cell from the labeled data. Finally, we calculated the Mean Absolute Error (MAE) of coverage estimation for each EM in every growth stage as follows:

M A E = \frac{\sum_{i = 1}^{n} | {f g t}_{i} (c, r) - f_{i} (c, r) |}{n}

(3)

where

{f g t}_{i} (c, r)

is the actual fraction value of the ith EM at the cell’s pixel located at

(c, r)

.

3. Results

3.1. Classification Based on Full Spectra

First, we performed a qualitative analysis by visually comparing each product to its corresponding labeled data. Figure 5 presents a segment of the classification map generated by each method compared to the labeled data. Notably, all four classification methods successfully separated soil and weed pixels. As Figure 5 shows, the different classification approaches accurately classified all weed traits. The quantitative analysis revealed that the classification accuracies when using the full spectra ranged between 0.88 and 0.96 for all traits using SVM and between 0.90 and 0.97 using the 1D-CNN (Table 2).

3.2. Features Selection and Reduction

Random forest is commonly used in spectral analysis to evaluate the importance of features [26]. Consequently, we used this model of each growth stage to assess the significance of different spectral bands. As Figure 6 shows, we observed relatively high importance of the VIS bands between 500 and 600 and NIR bands between 700 and 800 for all traits. The importance of different bands within those regions varied between growth stages. Consequently, we selected a range of 35 bands in the VIS between 500 and 600 nm and 34 bands in the NIR between 700 and 800 nm. Then, we evaluated the performance of the different classification approaches using each range separately and combined.

We also examined the PCA method for feature reduction. PCA is a method based on statistics and algebra, commonly used to reduce the dimensionality of spectral data [27]. We evaluated the performance using the first ten principal components that explain ~100% of the variability of the dataset.

3.3. Classification Based on Selected Features

Figure 7 shows a representative segment from the classification maps generated based on the different sets of features at 14 DAS. There was no significant difference between the SVM approach and the CNN approach; misclassified pixels were mainly found around the same areas in the image. The highest Overall Accuracies (OAA) were achieved using the full spectra, the combined VIS and NIR features, or the PCA features (Table 2).

However, the reduced classification performance when using a single region was not reflected in the OAA due to the relatively high number of soil pixels and the high level of separation that could be achieved between soil and weeds, regardless of the features used. The effect of selected features on the classification performance is reflected in the precision and recall values in Table 3. When we used a single region in the VIS or NIR, the recall and precision values were reduced significantly for the weed classes (Table 3).

Nonetheless, some separation could still be achieved when we used a single selected region. The VIS region allowed separation into two groups of Sa and Ar, in which Sn pixels were mainly classified as Sa. Consequently, The NIR region allowed separation into two groups of Sa and Sn, in which Ar pixels were primarily classified as Sa. This fact is reflected in Sn’s and Ar’s high precision (100) and low recall (5) for SVM classification in the VIS and NIR, respectively (Table 3). The 1D-CNN classification using the VIS region achieved a relatively similar result; however, precision values were very low for Sn, as less than 100 pixels (i.e., 0.001%) were classified in this class. While the 1D-CNN recall and precision values were lower for all weed classes when using a single region, the SVM classifier could still classify some classes correctly. The performance was significantly reduced when using the VIS alone for botanical group classification, with Sa and Sn mainly classified into the same group. Photosynthetic pathways classification results were negatively affected to a large extent when using a single region, with most pixels classified as the C4 class.

3.4. Spectral Mixture Analysis

A qualitative evaluation of the fraction maps showed spatial coherency between the fraction maps of the different EMs and their ground truth images (Figure 8). Table 4 presents the MAE values calculated for coverage estimation by spectral unmixing using different sets of features at 14 DAS. In most cases, MAE values increased slightly, by 1–4%, when we did not use the full spectra. However, we could not find a consistent reduction when using specific features across EMs. Table 5 presents the MAE values for all EMs at the different growth stages. We observed an increased MAE value as growth progressed (Table 4).

4. Discussion

This paper evaluated a hyperspectral dataset for the purpose of weed species classification, using commonly used methods in remote sensing and spectral analysis. As was demonstrated in our study, this shared dataset suggests an opportunity to examine the performance of different methods for weed classification during critical early growth stages, which are pivotal for effective weed detection and management. The chosen weed species possess diverse anatomical and physiological traits, including the weed botanical group essential for site-specific herbicide selection. The comprehensive labeled data provide a substantial number of pixels for analysis, allowing the examination of the method’s performance using different random training sets of various sizes and the stability of its performance across growth stages for each selected trait.

During the acquisition process, the controlled environment minimized spectral variation that results from natural illumination and physiological differences between the weeds, thus focusing on the ability of the tested algorithms to separate unique species traits.

The weeds in the provided dataset hold different characteristics that were expected to be spectrally differentiable. Differences between weed botanical groups, such as higher abundance of airspaces in the dicotyledons spongy mesophyll compared to monocotyledons, are the cause for spectral variation between those groups [28]. Weeds of different photosynthetic pathways differ in their leaf anatomy. C4 weeds are characterized by their Kranz anatomy. Thus, bundle sheath and mesophyll cell arrangements may cause light scattering and absorption variation [29,30]. Spectral variation between species is expected to result from such traits and unique characteristics. Consequently, our analysis assured that the hyperspectral data were well represented, as both simple linear models and more complex non-linear CNN models achieved satisfactory results for all weed traits.

Our feature analysis revealed that important features mainly exist between 500 and 600 nm and between 700 and 800 nm, i.e., in the VIS and NIR regions. Similar conclusions were reported in previous vegetation research [31]. As we attempted to classify the images using only the specified regions and their combination, we observed that the combined regions’ contribution is essential for classifying the different traits, yet we could observe that some species variation from the others could be captured based on a single region (VIS or NIR).

While hyperspectral data are relatively complex, they allow for accurate results using a linear classification approach. We expected to observe some advantage of the 1D-CNN over the SVM classifier when fewer features were used; however, this was not the case, and both methodologies produced very similar results. It is possible that a different structure of a higher-dimensional CNN, capturing spatial characteristics, would perform better with a limited number of spectral features. Consequently, the data can be used to design, train, and evaluate other CNN structures.

Feature reduction is used to reduce the dimensionality of the data and consequently reduce computational complexity in the classification process. Our experiment demonstrated that reducing the dimensionality to ten PCs is sufficient to achieve satisfactory classification results close to those achieved using the entire spectra.

Feature selection can contribute to the optimal choice of filters and sensors for specific tasks. Our analysis utilized a single feature selection method from many used in the literature. Different features are required for discriminating different combinations of plant species. Furthermore, selecting important features depends on the technique used and the data of interest [16]. In this regard, data sharing is essential to better understanding the variation between datasets and finding general patterns preserved across different datasets, environments, growth stages, and acquisition conditions. Consequently, this dataset can evaluate and compare different feature selection methods.

Acknowledging the significance of SMA methods in analyzing spectral data, we demonstrated that SMA could detect the different trait fractions with an error rate mostly lower than 10%. Our previous work [23] provides a comprehensive analysis of spectral mixture analysis using the proposed dataset, including the simulation of different spectral and spatial resolutions.

As mentioned, we include an RGB image of each scene with a higher spatial resolution and corresponding labeled data to complement the dataset. Although we only used the hyperspectral image for the analysis, the combination of spectral and RGB data suggests an opportunity for users to explore different spectral and spatial features from both hyperspectral and RGB images. This, in turn, implies a chance to study the benefits of feature fusion or other data fusion-based approaches for weed species classification [32,33].

While this data is limited and represents three common weed species, other datasets should include those from different environments and species representative of various traits. A comprehensive analysis of different datasets will allow a better exploration of the main factors driving spectral variation within and between species.

5. Conclusions

The results of our analysis in this paper demonstrated how research can use the proposed dataset to evaluate the performance of different methodologies in classifying weed species and their unique traits. Notably, a nonlinear classification approach using a 1D-CNN did not provide an advantage in this case over a simpler algorithm such as SVM. On the other hand, feature analysis revealed the importance of spectral bands within the VIS (500–600 nm) and NIR (700–800 nm) regions out of the entire spectra available of 204 bands (400–1000 nm). In this regard, the results discovered that using each of the VIS and NIR features separately allows the distinction of different species. For example, using the VIS bands allowed the accurate separation of Ar from the other species, while the NIR bands were advantageous for separating Sn from the rest. Finally, we provided a detailed explanation of the construction of the dataset together with suggestions for further uses and discussed the importance of hyperspectral data sharing for SSWM.

6. Data Construction

The dataset includes data from 30 scenes. Each of the six measurement days has five scenes captured from different sowing plates. We provide a raw image, a dark frame, a white reference, metadata, and reflectance data for each plate’s measurement. Figure 9 illustrates the data structure. At the root of the dataset are five folders corresponding to the measurement days. Within each folder, the five plates are organized in individual folders containing a set of files and subfolders as follows:

A PNG file with an RGB composite image derived from the visible bands in the spectral image.
A PNG file with the image acquired by the RGB sensor.
The Capture folder contains the raw data, dark frame, and white reference data cubes.
The Results folder contains the “.dat” and “.hdr” files of the scene’s reflectance data cube.
The Label folder contains two sub-folders: the “RGB” folder contains the original labels sampled on the RGB image and the “Hyperspectral” folder contains the transformed labels for the hyperspectral images. Both folders include three image files, with a label assigned for each pixel according to its species, botanical group, and photosynthetic mechanism.
The root folder includes a README file providing information about the numerical label for each class in the Label files.

Author Contributions

Conceptualization, I.R. and F.K.; methodology, I.R.; software, I.R.; validation, I.R., F.K. and R.N.L.; formal analysis, I.R.; investigation, I.R.; resources, F.K. and R.N.L.; data curation, I.R.; writing—original draft preparation, I.R.; writing—review and editing, I.R., F.K. and R.N.L.; visualization, I.R.; supervision, F.K. and R.N.L.; project administration, I.R.; funding acquisition, I.R., F.K. and R.N.L. All authors have read and agreed to the published version of the manuscript.

Funding

The Israeli Council for Higher Education (CHE)’s planning and budgeting committee (PBC) partially supported this work.

Data Availability Statement

The original data presented in the study are openly available in the following link: Kizel, Fadi; Ronay, Inbal (2024), “Weed Species Identification: A Hyperspectral and RGB Dataset with Labeled Data”, Mendeley Data, V1, https://data.mendeley.com/datasets/6wm4kzf9y6/1 (accessed on 26 July 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Pimentel, D.; Zuniga, R.; Morrison, D. Update on the Environmental and Economic Costs Associated with Alien-Invasive Species in the United States. Ecol. Econ. 2005, 52, 273–288. [Google Scholar] [CrossRef]
Lati, R.N.; Rasmussen, J.; Andujar, D.; Dorado, J.; Berge, T.W.; Wellhausen, C.; Pflanz, M.; Nordmeyer, H.; Schirrmann, M.; Eizenberg, H.; et al. Site-specific Weed Management—Constraints and Opportunities for the Weed Research Community: Insights from a Workshop. Weed Res. 2021, 61, 147–153. [Google Scholar] [CrossRef]
Li, Y.; Al-Sarayreh, M.; Irie, K.; Hackell, D.; Bourdot, G.; Reis, M.M.; Ghamkhar, K. Identification of Weeds Based on Hyperspectral Imaging and Machine Learning. Front. Plant Sci. 2021, 11, 611622. [Google Scholar] [CrossRef] [PubMed]
Buitrago, M.F.; Groen, T.A.; Hecker, C.A.; Skidmore, A.K. Spectroscopic Determination of Leaf Traits Using Infrared Spectra. Int. J. Appl. Earth Obs. Geoinf. 2018, 69, 237–250. [Google Scholar] [CrossRef]
Ronay, I.; Ephrath, J.E.; Eizenberg, H.; Blumberg, D.G.; Maman, S. Hyperspectral Reflectance and Indices for Characterizing the Dynamics of Crop–Weed Competition for Water. Remote Sens. 2021, 13, 513. [Google Scholar] [CrossRef]
Hasan, A.S.M.M.; Sohel, F.; Diepeveen, D.; Laga, H.; Jones, M.G.K. A Survey of Deep Learning Techniques for Weed Detection from Images. Comput. Electron. Agric. 2021, 184, 106067. [Google Scholar] [CrossRef]
Basinger, N.T.; Jennings, K.M.; Hestir, E.L.; Monks, D.W.; Jordan, D.L.; Everman, W.J. Phenology Affects Differentiation of Crop and Weed Species Using Hyperspectral Remote Sensing. Weed Technol. 2020, 34, 897–908. [Google Scholar] [CrossRef]
Zhang, Y.; Slaughter, D.C. Hyperspectral Species Mapping for Automatic Weed Control in Tomato under Thermal Environmental Stress. Comput. Electron. Agric. 2011, 77, 95–104. [Google Scholar] [CrossRef]
Persello, C.; Member, S.; Grift, J.; Fan, X.; Paris, C.; Hänsch, R.; Koeva, M.; Nelson, A. AI4SmallFarms: A Dataset for Crop Field Delineation in Southeast Asian Smallholder Farms. IEEE Geosci. Remote Sens. Lett. 2023, 20, 2505705. [Google Scholar] [CrossRef]
Nascimento, E.; Just, J.; Almeida, J.; Almeida, T. Productive Crop Field Detection: A New Dataset and Deep-Learning Benchmark Results. IEEE Geosci. Remote Sens. Lett. 2023, 20, 5002005. [Google Scholar] [CrossRef]
Krestenitis, M.; Raptis, E.K.; Kapoutsis, A.C.; Ioannidis, K.; Kosmatopoulos, E.B.; Vrochidis, S.; Kompatsiaris, I. CoFly-WeedDB: A UAV Image Dataset for Weed Detection and Species Identification. Data Brief. 2022, 45, 108575. [Google Scholar] [CrossRef] [PubMed]
Olsen, A.; Konovalov, D.A.; Philippa, B.; Ridd, P.; Wood, J.C.; Johns, J.; Banks, W.; Girgenti, B.; Kenny, O.; Whinney, J.; et al. DeepWeeds: A Multiclass Weed Species Image Dataset for Deep Learning. Sci. Rep. 2019, 9, 2058. [Google Scholar] [CrossRef] [PubMed]
Sudars, K.; Jasko, J.; Namatevs, I.; Ozola, L.; Badaukis, N. Dataset of Annotated Food Crops and Weed Images for Robotic Computer Vision Control. Data Brief. 2020, 31, 105833. [Google Scholar] [CrossRef] [PubMed]
Sa, I.; Chen, Z.; Popovic, M.; Khanna, R.; Liebisch, F.; Nieto, J.; Siegwart, R. WeedNet: Dense Semantic Weed Classification Using Multispectral Images and MAV for Smart Farming. IEEE Robot. Autom. Lett. 2018, 3, 588–595. [Google Scholar] [CrossRef]
Sa, I.; Popović, M.; Khanna, R.; Chen, Z.; Lottes, P.; Liebisch, F.; Nieto, J.; Stachniss, C.; Walter, A.; Siegwart, R. WeedMap: A Large-Scale Semantic Weed Mapping Framework Using Aerial Multispectral Imaging and Deep Neural Network for Precision Farming. Remote Sens. 2018, 10, 1423. [Google Scholar] [CrossRef]
Hennessy, A.; Clarke, K.; Lewis, M. Hyperspectral Classification of Plants: A Review of Waveband Selection Generalisability. Remote Sens. 2020, 12, 113. [Google Scholar] [CrossRef]
Gold, S. (n.d). Setaria adhaerens [Photograph]. Wild Flowers. Available online: https://www.wildflowers.co.il/images/merged/1374-l.jpg?Setaria%20adhaerens (accessed on 24 July 2024).
Gold, S. (n.d). Solanum nigrum [Photograph]. Wild Flowers. Available online: https://www.wildflowers.co.il/images/merged/190-l-1.jpg?Solanum%20nigrum (accessed on 24 July 2024).
Livne, E. (n.d). Amaranthus retroflexus [Photograph]. Wild Flowers. Available online: https://www.wildflowers.co.il/images/merged/510-l.jpg?Amaranthus%20retroflexus (accessed on 24 July 2024).
Meyer, G.E.; Neto, J.C. Verification of Color Vegetation Indices for Automated Crop Imaging Applications. Comput. Electron. Agric. 2008, 63, 282–293. [Google Scholar] [CrossRef]
Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An Efficient Alternative to SIFT or SURF. In Proceedings of the IEEE International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011. [Google Scholar]
Wang, A.; Zhang, W.; Wei, X. A Review on Weed Detection Using Ground-Based Machine Vision and Image Processing Techniques. Comput. Electron. Agric. 2019, 158, 226–240. [Google Scholar] [CrossRef]
Ronay, I.; Nisim Lati, R.; Kizel, F. Spectral Mixture Analysis for Weed Traits Identification under Varying Resolutions and Growth Stages. Comput. Electron. Agric. 2024, 220, 108859. [Google Scholar] [CrossRef]
Kizel, F.; Shoshany, M.; Netanyahu, N.S.; Even-Tzur, G.; Benediktsson, J.A. A Stepwise Analytical Projected Gradient Descent Search for Hyperspectral Unmixing and Its Code Vectorization. IEEE Trans. Geosci. Remote Sens. 2017, 55, 4925–4943. [Google Scholar] [CrossRef]
Luo, B.; Yang, C.; Chanussot, J.; Zhang, L. Crop Yield Estimation Based on Unsupervised Linear Unmixing of Multidate Hyperspectral Imagery. IEEE Trans. Geosci. Remote Sens. 2013, 51, 162–173. [Google Scholar] [CrossRef]
Sapkota, B.; Singh, V.; Cope, D.; Valasek, J.; Bagavathiannan, M. Mapping and Estimating Weeds in Cotton Using Unmanned Aerial Systems-Borne Imagery. AgriEngineering 2020, 2, 350–366. [Google Scholar] [CrossRef]
Machidon, A.L.; Del Frate, F.; Picchiani, M.; Machidon, O.M.; Ogrutan, P.L. Geometrical Approximated Principal Component Analysis for Hyperspectral Image Analysis. Remote Sens. 2020, 12, 1698. [Google Scholar] [CrossRef]
Gausman, H.W. Plant Leaf Optical Properties in Visible and Near-Infrared Light; International Center for Arid and Semiarid Land Studies (ICASALS): Lubbock, TX, USA, 1985. [Google Scholar]
Liu, L.; Cheng, Z. Mapping C3 and C4 Plant Functional Types Using Separated Solar-Induced Chlorophyll Fluorescence from Hyperspectral Data. Int. J. Remote Sens. 2011, 32, 9171–9183. [Google Scholar] [CrossRef]
Adjorlolo, C.; Mutanga, O.; Cho, M.A.; Ismail, R. Spectral Resampling Based on User-Defined Inter-Band Correlation Filter: C3 and C4 Grass Species Classification. Int. J. Appl. Earth Obs. Geoinf. 2013, 21, 535–544. [Google Scholar] [CrossRef]
Chang, G.J.; Oh, Y.; Goldshleger, N.; Shoshany, M. Biomass Estimation of Crops and Natural Shrubs by Combining Red-Edge Ratio with Normalized Difference Vegetation Index. J. Appl. Remote Sens. 2022, 16, 014501. [Google Scholar] [CrossRef]
Kizel, F. Resolution Enhancement of Unsupervised Classification Maps Through Data Fusion of Spectral and Visible Images from Different Sensing Instruments. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 11–16 July 2021. [Google Scholar]
Kizel, F.; Benediktsson, J.A. Spatially Enhanced Spectral Unmixing Through Data Fusion of Spectral and Visible Images from Different Sensors. Remote Sens. 2020, 12, 1255. [Google Scholar] [CrossRef]

Figure 1. The weed species in the dataset: (a) Setaria adhaerens (Sarah Gold, n.d [17]), (b) Solanum nigrum (Sarah Gold, n.d [18]) (Sn), and (c) Amaranthus retroflexus (Eli Livne, n.d [19]).

Figure 2. (a) The weed species included in the dataset, their photosynthetic pathways, and their classification into botanical groups. (b) The weed species noted on the RGB composite of the hyperspectral images: amaranthus retroflexus (Ar), Solanum nigrum (Sn), and Setaria adhaerens (Sa). (c) The experimental setup. (d) The Specim camera.

Figure 3. (a) The spectral signatures from the marked selected pixels in (b). (b,c) Examples of an RGB image and an RGB composite of the hyperspectral image, respectively. (d) The original labels sampled on the RGB image correspond to the area in the red box in (b). (e) The transformed labels onto the hyperspectral image conform to the area in the red box in (c).

Figure 4. A diagram of the 1D-CNN used for classification. Each pixel is given as the input to the network from the hyperspectral cube of mxn pixels with b bands. After propagation through the network, the output is a probability for each class out of the c classes. The size of each layer input is mentioned above the layer in the diagram.

Figure 5. A representative segment of the SVM classification maps on 14 DAS compared to the labeled data for the different weed traits. The black color indicates unclassified pixels.

Figure 6. Spectral feature importance as obtained from the random forest classification for (a) Species, (b) Botanical groups, and (c) Photosynthetic pathways.

Figure 7. Visualization of the classification results of the images taken at 14 DAS as predicted by SVM and the 1D-CNN using the full spectra, the VIS and NIR features separated and combined, and the PCA features: (a) Species, (b) Botanical groups, (c) Photosynthetic pathways.

Figure 8. A representative part of the unmixing abundance map on 14 DAS compared to the labeled data labels for each of the EMs of the different weed traits.

Figure 9. File tree illustrating the dataset structure.

Table 1. Number of pixels for each class in the dataset.

DAS	Sa	Ar	Sn	Monocots	Dicots	C3	C4	Soil
7	21,218	13,290	6321	21,218	19,611	34,508	6321	372,700
8	38,351	24,578	14,730	38,351	39,308	62,929	14,730	267,935
9	45,008	30,893	18,686	45,008	49,579	75,901	18,686	374,651
12	67,132	44,083	31,935	67,132	76,018	111,215	31,935	386,183
13	79,511	56,523	44,939	79,511	101,462	136,034	44,939	342,641
14	87,074	57,529	50,612	87,074	108,141	144,603	50,612	323,484

Table 2. Overall Classification Accuracies (OAA) were achieved using the different sets of features with SVM and the 1D-CNN. The highest OAA of each growth stage in each method is highlighted in bold.

		SVM					1D-CNN
	DAS	Full Spectra	VIS	NIR	VIS + NIR	PCA	Full Spectra	VIS	NIR	VIS + NIR	PCA
Species	7	0.96	0.96	0.94	0.97	0.97	0.96	0.96	0.94	0.96	0.97
	8	0.93	0.89	0.85	0.92	0.92	0.92	0.89	0.85	0.92	0.93
	9	0.93	0.91	0.87	0.94	0.94	0.93	0.91	0.87	0.94	0.95
	12	0.89	0.83	0.82	0.90	0.90	0.92	0.85	0.83	0.92	0.92
	13	0.88	0.83	0.78	0.91	0.91	0.92	0.84	0.79	0.91	0.91
	14	0.88	0.80	0.78	0.90	0.90	0.90	0.81	0.80	0.90	0.91
Botanical groups	7	0.96	0.96	0.95	0.97	0.97	0.97	0.96	0.94	0.97	0.98
	8	0.94	0.89	0.87	0.93	0.93	0.93	0.90	0.87	0.93	0.94
	9	0.93	0.91	0.89	0.95	0.95	0.95	0.91	0.89	0.95	0.95
	12	0.90	0.85	0.85	0.92	0.92	0.93	0.86	0.85	0.93	0.93
	13	0.89	0.85	0.84	0.92	0.92	0.92	0.85	0.84	0.92	0.92
	14	0.89	0.82	0.83	0.91	0.92	0.92	0.82	0.83	0.91	0.92
Photosynthetic pathway	7	0.96	0.96	0.97	0.97	0.97	0.97	0.96	0.97	0.97	0.97
	8	0.93	0.90	0.91	0.91	0.91	0.91	0.91	0.91	0.92	0.94
	9	0.93	0.92	0.93	0.93	0.93	0.93	0.92	0.93	0.93	0.95
	12	0.89	0.87	0.89	0.90	0.89	0.92	0.87	0.89	0.92	0.93
	13	0.90	0.87	0.87	0.91	0.92	0.93	0.87	0.87	0.92	0.93
	14	0.89	0.84	0.86	0.90	0.91	0.92	0.85	0.87	0.91	0.92

Table 3. Precision and recall values were achieved using the different sets of features with the SVM and the 1D-CNN. The highest precision and recall of each class are highlighted in bold.

			Precision					Recall
Trait	Method	Class	Full Spectra	VIS	NIR	VIS + NIR	PCA	Full Spectra	VIS	NIR	VIS + NIR	PCA
Species	CNN	Sa	85.5	54	51.8	81.8	85.5	82.7	83.4	70.8	86.4	85.1
		Ar	83.9	69.7	51.9	80.3	79	71.4	65.8	21.1	76.1	81.2
		Sn	87.8	26.7	61.4	83	87.8	74.4	0	48.1	73.9	72.4
		Soil	92.5	94.1	93.4	95	94.7	97.9	95.9	97.3	96.1	97
	SVM	Sa	89.3	53.4	48	84	83.5	79	86.1	84.2	84.9	84.7
		Ar	83.8	76	100	81.3	81.4	74.9	52.9	5	76.9	78.6
		Sn	91.6	100	70.6	88.5	88.5	69.2	5	44	69.3	69.3
		Soil	90.6	93.2	94.6	93.7	94.1	98.6	96.7	96.9	97.5	97.5
Botanical groups	CNN	Mono	86.3	60.6	62.6	83.3	86.1	83.8	62.7	33.4	84	84.1
		Di	88.9	67.9	60.9	87.3	88.9	85.1	56.3	80.5	83.8	85.3
		Soil	94.7	93.1	94.7	94.9	95	96.8	96.4	96.4	96	96.9
	SVM	Mono	90.2	58.4	64.4	85.4	86.2	78.5	63.1	49.8	83.8	82.9
		Di	91.6	66.1	68.1	89.3	89.6	81.3	57.6	73.3	83.1	84.1
		Soil	91.7	93.6	94	94.4	94.5	98.4	96.8	97.3	97	97.4
Photosynthetic pathways	CNN	C4	87.2	67.4	73	84.5	85.9	88.9	90.4	87.2	89.3	88.6
		C3	89.3	100	67.9	85.4	85.7	74.3	0	20.6	69.6	73.8
		Soil	95	91.8	94.5	95.4	95.5	96.7	95.2	96.6	95.8	96.3
	SVM	C4	89.5	68.9	72.4	81.7	83	84.5	89.8	90.7	89.6	89.5
		C3	92.9	100	95.6	92.4	92.8	67.6	5	14.6	53.8	59.4
		Soil	92.1	91.5	94.9	94.8	95	98.4	95.8	96.7	96.9	97

Table 4. MAE of coverage estimation by unmixing using different features for all EMs at 14 DAS.

	EM	Full Spectra	VIS	NIR	VIS + NIR	PCA
Species	Sa	10.92	8.60	9.45	7.71	10.91
	Ar	6.76	5.86	6.85	6.35	10.74
	Sn	7.82	8.71	5.64	4.58	6.34
	Soil	4.58	4.50	6.35	8.17	6.06
Botanical groups	Mono	6.75	7.83	9.83	7.46	9.43
	Dicot	6.81	11.84	9.95	7.25	8.78
	Soil	3.96	6.15	7.71	10.43	6.02
Photosynthetic pathway	C4	12.10	17.89	9.61	11.17	10.40
	C3	9.52	13.16	5.71	5.54	6.98
	Soil	4.52	6.32	6.29	9.45	6.08

Table 5. MAE of coverage estimation by unmixing for all EMs at different growth stages.

DAS	Sa	Ar	Sn	Soil	Monocots	Dicots	Soil	C4	C3	Soil
7	2.01	2.30	3.76	3.70	2.09	2.98	3.09	3.49	4.59	3.82
8	5.19	3.30	7.13	4.52	3.33	4.59	4.37	7.60	8.38	4.56
9	3.90	2.93	5.50	4.12	3.54	3.61	3.66	6.60	6.55	3.73
12	8.79	5.57	7.83	4.38	6.57	6.48	4.35	10.51	10.47	4.67
13	12.33	7.16	9.23	7.15	8.89	7.69	6.81	14.64	10.33	7.07
14	10.92	6.76	7.82	4.58	6.75	6.81	3.96	12.10	9.52	4.52

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ronay, I.; Lati, R.N.; Kizel, F. Weed Species Identification: Acquisition, Feature Analysis, and Evaluation of a Hyperspectral and RGB Dataset with Labeled Data. Remote Sens. 2024, 16, 2808. https://doi.org/10.3390/rs16152808

AMA Style

Ronay I, Lati RN, Kizel F. Weed Species Identification: Acquisition, Feature Analysis, and Evaluation of a Hyperspectral and RGB Dataset with Labeled Data. Remote Sensing. 2024; 16(15):2808. https://doi.org/10.3390/rs16152808

Chicago/Turabian Style

Ronay, Inbal, Ran Nisim Lati, and Fadi Kizel. 2024. "Weed Species Identification: Acquisition, Feature Analysis, and Evaluation of a Hyperspectral and RGB Dataset with Labeled Data" Remote Sensing 16, no. 15: 2808. https://doi.org/10.3390/rs16152808

APA Style

Ronay, I., Lati, R. N., & Kizel, F. (2024). Weed Species Identification: Acquisition, Feature Analysis, and Evaluation of a Hyperspectral and RGB Dataset with Labeled Data. Remote Sensing, 16(15), 2808. https://doi.org/10.3390/rs16152808

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Weed Species Identification: Acquisition, Feature Analysis, and Evaluation of a Hyperspectral and RGB Dataset with Labeled Data

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset

2.1.1. Scene Construction and Data Acquisition

2.1.2. Image Calibration

2.1.3. Data Labeling

2.2. Experimental Analysis and Evaluation

2.2.1. Feature Selection

Random Forest

Principle Component Analysis (PCA)

2.2.2. Supervised Classification

Support Vector Machine (SVM)

One-Dimensional Convolutional Neural Network

2.2.3. Spectral Unmixing

3. Results

3.1. Classification Based on Full Spectra

3.2. Features Selection and Reduction

3.3. Classification Based on Selected Features

3.4. Spectral Mixture Analysis

4. Discussion

5. Conclusions

6. Data Construction

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI