In precision agriculture applications, the typical object of interest is a prescription map that will direct tractor operations like spraying nitrogen or weed treatment over the agricultural field. The generation of a correct prescription map requires the definition of specific management zones that reflect areas and their status [1
]. The planning of agricultural tasks requires a deep knowledge of crop state [2
]. For example, an important but typical case is the application of variable rate nitrogen fertilizers, as discussed in [3
]. Vineyards and fruit plants are also especially good examples of complex exercises in both crop detection and study for agricultural image segmentation [4
] problems such as weed detection [5
], nitrogen application at hot-spots and selective harvesting.
Unsupervised algorithms (e.g., hierarchical clustering, ISODATA, K-means) require that the area contains objects (e.g., tree, crop, soil) that are spectrally separable, which is not always the case. Especially the response of the soil in the presence of grass produces incorrect results considering only the spectral response of the bare soil compared to the grassland. In non-radiometric approaches crop/soil detection is usually carried out by using parametric algorithms based on feature classification by hand, for example frequency analysis [17
], Hough space clustering or total least squares as in [18
]. These methodologies have their limitations so that for the production of high quality prescription maps today fully automated techniques are used. These automatic techniques provide crop/soil segmentation while improving the spectral response of radiometric filters. The development and classification by deep learning networks of such a data filter is a general statement of the problem.
The scheme of this paper is therefore set to test this idea: a noise free DSM/NDVI probability map is derived and then fed into a “basic” CNN image classifier. The baseline result is the output of the classifier for DSM and NDVI images when it processes these separately. We do not favor a non-parametric methodology such as Support Vector Machines (SVMs) as a candidate for the classifier, because it uses specialized feature extraction and is resistant to generalization. Supervised deep learning on the other hand, if properly trained, has the capacity to be able to capture different crop typologies such grassed soil, bare soil and tree/canopy. We feel that the purpose of extending the definition of a radiometric index to the geo-radiometric DSM/NDVI generalization is to provide data to a neural image processor instead of a parametric classifier. In fact, the extraction of NDVI and DSM referred to in this paper are already forms of low order parametric classifiers.
The strategy we will use in this article is to take the DSM/NDVI image and to refine it through a very basic general purpose CNN.
The paper is structured as follows. Section 2.1
shows that the probability density of scattered objects over terrain has a natural connection to the DSM image Fourier transform (through regularly repeating objects in the terrain). These considerations lead to a segmentation formula where the DSM image data can be interpreted as a probability density across the orthophoto and the spatial frequency of the local object field is related to the integration window over the terrain. Section 2.2
applies Bayes Theorem to integrate the NDVI and DSM data streams into a single fused DSM/NDVI index. This is an attempt to solve the problem of degeneracy in the NDVI index in a formulaic way. If this succeeds, the index does not confuse, for example, canopy with spurious objects. In this sense the image is of higher quality for input to a prospective neural network classifier. The DSM/NDVI index can be understood as a form of regularization that is not applied to the neural network cost function directly, as is common practice when using parameter penalty norms, but through a deliberate reduction in the number of input cases (or input space) the network is expected to decipher. Specifically, the better the initial image construction is, the lower the ability of the neural network classifier to distinguish the vegetation, which regularizes the problem by simplifying the input problem space instead of tinkering directly with the network weights. The section will end by introducing the real DSM data models and terrain orthophotos that are used to demonstrate and verify the DSM/NDVI index in the following sections. Section 3.3
proceeds to convert the DSM formula (see Equation (13
)) into a Cartesian scanning algorithm, CARSCAN. The performance of this technique is studied on an artificially created test data set of a Gaussian hill with a constant height crop polluted with a known noise distribution that is designed to benchmark the algorithm’s segmentation performance, despite the noise. In practice, high frequency and high amplitude noise can simulate or include the presence of irregularities, crop and other objects placed upon the slowly undulating terrain. The artificially added noise seeks to demonstrate that Equation (13
) tends to automatically produce a successful segmentation provided the noise frequencies are high and not similar to harmonics present in the terrain undulation. By successfully segmenting the noise into the object field the procedure verifies the mathematical properties of Equation (13
) among which is the fact that the equation does indeed have the important property of a low pass frequency filter over the terrain. This fact is critical to the robustness of the DSM algorithm. For the sake of completeness, the results of the segmentation are then fed into a MultiLayer Perceptron (MLP) network to denoise them automatically. A reasonable copy of the original noise distribution is thus obtained (see Figure 6). Section 3.4
discretizes Equation (13
) using a radial coordinate system (see Figure 8), FANSCAN. This second system is used because it allows a better appreciation of the periodicity patterns of objects scattered on the terrain than CARSCAN, even though it is of much higher complexity. Thus, comparing the design of the two coordinate systems and an assumption of the Cauchy convergence criterion (nearly always true for low signal to noise ratio images) permits a metric that allows one to characterize the difference between the two algorithms in terms of the object distribution frequencies (Fourier transform) on the terrain they analyze. This distance leads directly to a way of discerning when the image extraction is of maximal quality (see Equation (21
)) Section 3.6
develops the simplistic CNN used for identifying vegetation areas in the DSM/NDVI index image. As already mentioned, one of the important aims of this research was to reduce the capability of neural network required for the crop/soil differentiation stage. Therefore, having developed the DSM/NDVI index we wanted to be sure that we had chosen a relatively low capability image processing CNN for the final soil/crop identification stage to test this idea. Application to deep learning is therefore discussed in the context of a simple 12 layer, 28 × 28 pixel pattern recognition convolutional neural network (CNN) with three cross-entropy classification states: ’plant’, ’soil’ and ’other’. This CNN topology is commonly available or can be easily assembled (see Table 2) in many technical calculation libraries like Python, MATLAB®
and so on. It is general purpose and designed for small image recognition problems like alphabet and hand writing image recognition work that can be easily fitted into its 28 × 28 pixel image size. An immediate advantage of this are the gains in processing speed and ease of training the CNN. This encourages better dropout [20
] and regularization in the CNN hidden layers. We provide a demonstration of this by using the DSM/NDVI filter to remove artifacts such as buildings and grass from the large input image (∼30002
pixels) to the CNN and getting the correct result. Equation (24
) quantifies the success of the entire operation by showing improvements of 65% for the DSM/NDVI index over the NDVI radiometric index alone. Section 4.1
analyzes the performance of Equation (13
) in Fourier space. The equation is seen to be effective even when treating different terrain types (see Figure 4). The stability of the DSM algorithm (see Algorithm 2) is discussed along with its mathematical properties when applied to the real data sets (see Section 2.3.2
) used in this research. Section 4.2
discusses a potentially serious problem with Algorithm 2. The problem is to do with a weakness of the segmentation Equation (13
) when a radial arm aligns with, for example, a row of trees on the DSM image. This situation along with its “solution” is discussed and codified (see Equation (25
)). Section 4.3
discusses the results obtained in Section 3.6
in detail. In particular, a detailed evaluation is made as to why Equation (24
) seems to imply gains exceeding a hundred percent in the case where only a DSM
object field is input into the CNN. Finally Section 5
is a summary of what has been achieved.