1. Introduction
Site–specific fertilization (SSF), which is proposed within the framework of precision agriculture [
1], aims to accurately apply fertilizer according to the spatial variability of crop and soil. This is of great significance in improving economic and environmental benefits by reducing inputs and environmental pollution [
2,
3,
4]. Maize is a main crop with a wide row space. At the seeding stage, the canopy cover of maize is limited, there are gaps between the maize plants, and weeds can grow within the crop rows. When top–dressing is conducted, areas of soil and weeds, where fertilization input is not needed, will also be covered by blanket spraying systems, causing unnecessary inputs and waste. Target fertilization (TF) is proposed as an enhanced SSF strategy to meet the requirements during the seedling period. Specifically, the target spraying systems used in SSF can automatically apply fertilizer only in areas where maize plants (targets) are located, thus reducing waste in weed and soil background areas (non–targets) from blanket spraying. High target detection performance, which relies on the accurate and rapid identification and localization of maize plant targets against a background of weeds and soil, is the critical prerequisite for TF application because it provides significant information for decision making in the subsequent TF operations [
5,
6]. However, the accurate detection of target plants is a challenge in practice, due to complex conditions such as changing light, vehicle vibration, and mixed weeds. Therefore, this study proposes a reliable method for maize plant detection, providing support for practical TF application.
Image processing technology is a promising approach for target detection tasks in agriculture [
7,
8,
9] and can be summarized into two categories: threshold– and learning–based methods [
10,
11].
The principle of threshold–based methods is to group pixels into different categories by comparing the value of each pixel in grayscale images with one or multiple preset threshold values. During this process, grayscale images are generated by transformations from original images to accentuate regions of interest (ROIs) and attenuate undesired regions. The threshold values can be determined by different thresholding methods [
12,
13].
One widely used transformation is to calculate color indices from the original RGB (red, green, and blue) values. For example, the excess green (ExG) index provides a clear contrast between plant objects and soil background and has performed well in vegetation segmentation [
14,
15,
16]. Based on the composition of the retina of the human eye—4% blue cones, 32% green cones, and 64% red cones—the excess red (ExR) index was introduced to separate leaf regions from the background [
15]. Furthermore, the ExG minus ExR (ExGR) index was defined by subtracting ExR from ExG to combine their respective advantages [
17]. Many other color indices, such as the normalized green–red difference index (NGRDI) [
18] and modified ExG, have also been introduced [
19]. Additional available color features can be obtained from the HSV (hue, saturation, and value) color spaces [
20], which can be converted from the RGB color space. The determination of threshold values is the critical element of threshold–based methods. Although a fixed threshold can be proposed on the basis of empirical knowledge, this fixed value is only suitable for dedicated situations and cannot meet the requirements of lighting variations [
21]. Dynamic adjustable threshold values are required to cope with complex field environments. Otsu’s method is a leading automatic thresholding approach, whose criteria for determining a threshold is the minimization of the within–class variance and the maximization of between–class variance [
22].
Grayscale images can be converted into binary images by a suitable threshold, with the result that vegetation pixels are distinguished from the soil background; thus, the vegetation segmentation process classifies the pixels into two classes: vegetation and non–vegetation. However, the vegetation class not only includes crops but also weeds. Therefore, a non–crop class removal process is necessary for TF application. Features such as shape [
23], texture [
24], and venation [
25] are generally used to discriminate between different plants. However, the extraction of these features is a labor–intensive and time–consuming process, which depends on expert knowledge, and the selected features are not sufficiently robust to be used in all scenarios [
13]. Therefore, the detection accuracy of threshold–based methods remains a challenge in practice due to complex conditions such as mixed weeds, changing light, and vehicle vibration [
10,
13]. As a result, research has begun to investigate learning–based methods to deal with complex field environments.
Learning–based methods classify pixels in an image into different categories according to the common properties of objects learnt by a machine learning algorithm. The learning–based algorithms for vegetation segmentation comprise the decision tree [
26], random forest (RF) [
27], Bayesian classifier [
28], back propagation neural network [
29], Fisher linear discriminant [
30], and support vector machine [
31]. Deep learning (DL), as a particular kind of machine learning, has gained momentum in various applications. DL is a powerful and flexible approach because it transforms data in a hierarchical way through several levels of abstraction [
32]. Milioto et al. [
33] proposed a convolutional neural network (CNN) to address the classification of sugar beet plants and weeds. Qiu et al. [
34] used a mask region CNN (RCNN) to detect Fusarium head blight in wheat from color images. Tian et al. [
35] used a YOLOv3–based model to realize apple detection and counting during different growth stages. Kamilaris et al. [
36] conducted a survey of 40 DL–based studies relating to agricultural tasks such as leaf classification, plant detection, and fruit counting. Moreover, a comparison of DL techniques with other existing popular techniques was also undertaken. Their findings indicated that DL performed better than other learning–based methods such as Support Vector Machines (SVM) and RF. The key advantage of DL is its capacity to create and extrapolate new features from raw input data, locating the important features itself through training, without the need for labor intensive feature extraction (FE) and expert knowledge [
37]. Another advantage of DL is that it reduces the effort required for FE [
38]; such effort occurs automatically in DL without the need for expert knowledge and professional skills [
39]. Moreover, DL models have been proven to be more robust than other methods under challenging conditions such as changing illumination, occlusion, and variation [
40,
41].
Motivated by the advantages of DL mentioned in the literature, this study addressed the maize plant detection task for future TF application, using a DL–based method. Four color index–based methods were used as benchmarks to compare detection performance. According to the challenges described above, the performance evaluations involved the accuracy of identification and localization under different conditions. Furthermore, detection time was considered for practical TF application.
The DL method used in this study was YOLOv3, which is a state–of–the–art object detection DL algorithm designed for real–time processing. As a single–stage detector, YOLOv3 transforms the input image into a vector of scores and performs detection operations using a single CNN; thus, detection is generally faster than that of two–stage detectors, such as Faster–RCNN [
42]. It has significant potential in agricultural detection tasks [
43,
44]. Furthermore, a faster version of YOLOv3—namely YOLOv3_tiny—has also been developed, using fewer convolution layers to promote real–time application with embedded computing devices, which have limited computing capability. It is reasonable that the faster speed of YOLOv3–tiny is achieved at the expense of reduced precision. Therefore, both YOLOv3 and YOLOv3_tiny were evaluated for maize plant detection in this study due to the pursuit of balance between speed and detection accuracy.
The color indices used in this study included ExG, ExR, and ExGR, which have been widely used by researchers as benchmarks for performance evaluation in their proposed methods [
27,
30,
31]. The H component of the HSV color space was also included because it can provide a separate description of the object color in addition to the lighting conditions [
20]. The subsequent thresholds were determined by Otsu’s method, which is also a widely applied method [
13].
4. Conclusions
This study attempted to address the problem of maize plant detection. The ultimate goal was the application of target fertilization, in which fertilization is only applied to maize plant targets, thus reducing fertilization waste. The study focused on the performance evaluation of conventionally used color index–based and DL methods using several indicators—namely identification accuracy, localization accuracy, robustness to complex field conditions, and detection speed—to help make reliable detection decisions for practical application. The main conclusions are as follows.
Firstly, the color index–based methods, ExG, ExR, ExGR, and H value, have a limited ability to distinguish maize plant targets from weeds, especially when the leaves of maize plants and weeds overlap. Thus, these methods are not sufficiently robust under complex field conditions and can only be used for maize plant target detection in specific scenarios.
Secondly, compared with the color index–based methods, the two DL methods, namely YOLOv3 and YOLOv3_tiny, have higher detection accuracy levels and are more robust under complex conditions. These findings reaffirm the superiority of DL methods at automatically extrapolating and locating significant features in raw input data compared with the conventional methods. However, the detection speed of DL methods is slower than that of the color index–based methods because of their computational complexity. This problem is expected to diminish with the development of network optimization technology and improvements in computing power.