1. Introduction
One of the most important food industries is the grape growing and wine-making industry, which is currently introducing several enhanced vineyard management techniques, which involve automatic leaf estimation [
1], fruit harvesting [
2,
3], yield estimation [
4,
5,
6,
7], grape quality evaluation [
8] and grapevine variety identification [
9]. The wine industry has the challenges of performing accurate yield prediction, estimation and quality control [
10], because such factors are affected by environmental variables (soil factors, climate, plant diseases), forecast and pollen concentration [
11], farming factors, such as adding products (water, pesticide, fertilizer, herbicides) [
12,
13], and agricultural tasks [
14] (number of sprouts, informed pruning, shoot thinning, bunch thinning, number of bunches, prune weight,
etc.), which makes the feasible management of the vineyard more complicated.
Crop management can be improved by using remote sensors [
6] configured in airborne [
14,
15] and terrestrial applications, such as crop classification, crop area estimation, canopy measurements, identification of harvest dates, crop yield estimation, detection of pest occurrence, detection of disease occurrence, mapping weed infestation and monitoring abiotic stress. For example, in [
7], a terrestrial LIDAR device was proposed in order to obtain canopy volume and tree shape information in peach orchards and to analyze relationships between the measured LIDAR tree volume and yield and fruit weight. The conclusion obtained was that the LIDAR is a suitable technique to assess fruit tree production capacity. Another alternative is the use of specialized terrestrial vision systems. Another example is the location and detection of fruits on trees [
16] by placing a camera at different positions and different viewing angles (azimuth and zenith angles). In this case, the best results were obtained when locating the camera in front of the fruit with a zenith angle of 60° upwards. Additionally, the maximum detection of fruit (90%) was achieved when using five multiple viewpoints positions. In [
17], a new method based on segmenting the point cloud obtained by using a 3D camera into convex surfaces was implemented for individual fruit recognition and detection. The conclusions obtained were that the proposed method can be used for fruit detection, although this detection is extremely sensitive to changes in lighting conditions [
4,
5,
18] and the color similarity between the fruit and the background [
19]. In this direction, in [
20], an image-processing technique was proposed to detect fruits of different degrees of ripeness by using RGB images in combination with automatic machine learning, obtaining classification ratios from 0.78 to 1.00 for different ripening conditions. In [
21], the specific problems originating from daylight operation were identified: skylight transmission from the back side of trees, direct sunlight reflectance from non-fruit materials and variations in natural lighting. Similarly, in [
22], it was proven that changing solar angles, cloud cover and leaf occlusions leads to lighting variations that complicate the segmentation process.
In order to manage the fruit skin color variability in images, the proposals of [
18,
23] were to address fruit skin color daylight variability by defining a linear color model in the RGB color space and computing the pixel color intensity distance to these models for direct fruit segmentation. Nevertheless, the general conclusion is that the same segmentation techniques cannot be applied to different scenarios [
24].
Regarding the specific case of a vineyard, the problem is the definition of an automatic procedure to recognize and identify grapes or clusters of grapes [
19] in order to estimate yield or the optimal harvesting period. In general, this detection is affected by color similarities between grapes and background, the different size and scale of the grapes, occlusion originating from leaves or/and branches, weather conditions, light variations and reflections and shadows. For example, the proposal of [
5] was to characterize the diameter and the ripeness of grapes in vineyard images. The segmentation process was performed in the HSI (hue, saturation and intensity) color space by applying a threshold segmentation level and two additional restrictions: the image regions must have a predefined range of intensities and the objects must have a feasible diameter. In [
25], the proposal to avoid illumination variability was to perform the grape segmentation procedure in the CIELab color space. This segmentation considers the lightness component of the pixels in relation to the color characteristics of the crop allowing a segmentation quality of 87.2%. In [
26], an automatic system to detect berries and to determine their size and weight was proposed. In this case, the segmentation technique was based on applying a Bayesian discriminant model by using as inputs the RGB pixels from two classes of objects selected in the image; background and fruit (peel/stem). Then, these results were stored in a look up table (LUT) to perform a fast segmentation. The system estimates the berry weight (R2 > 0.96) and size (R2 > 0.97) properly, extending the suitability of the system to other types of fruits and vegetables.
The use of artificial illumination at night was proposed in [
27], where color mapping combined with morphological operators was used to localize clusters of red and white grapes, obtaining a cluster detection efficiency of 97% and 91%, respectively. More recently, [
28] showed the first complete system for vineyard yield estimation, which has been evaluated with artificial illumination at night over several years and a large quantity of vines of different vineyards. In this case, the system captures 75% of the spatial yield variance with average errors between 3% and 11% of the total yield, values that represents the state-of-the-art in this field.
In a similar direction, the proposal of this paper is to perform yield estimation by applying controlled artificial illumination at night in a vineyard in order to avoid the color variability and changes induced by daylight natural illumination. The main goal is to assess different methods suitable for grape segmentation in vineyard images, perform an estimate of the area and volume of the cluster of grapes and, finally, estimate the vineyard yield.
3. Yield Calibration
The objective of the yield calibration stage is the development of a procedure designed to obtain the relationship between grape-cluster image analysis parameters (measured directly in pixels) and grape-cluster weight (measured in grams). The calibration parameters selected in this paper are the area of the cluster of grapes and the volume of the cluster of grapes, but if available, other parameters, such as the number of grapes in the images, can be used as calibration parameters [
28,
30,
31]. These calibration parameters will enable the automatic estimate of the vineyard yield based on the analysis of the images obtained from the vineyard. The typical hypotheses adopted in this calibration procedure are:
- -
There is a relationship between the weight, size and volume of the cluster of grapes [
4,
10,
28,
30,
31].
- -
The obtained relationship between the weight and size of the cluster of grapes is valid during a measurement experiment.
- -
The grape variety analyzed in this paper is of high quality and low cluster density.
- -
The distance between the grapes and the image acquisition system is constant during the entire process [
28,
32].
Similarly to the calibration procedure proposed in [
28], the proposal for yield calibration has been performed off-line in laboratory conditions in order to guarantee enough weight and cluster size variability, but a practical development of this proposal will require the development of a detailed on-line application procedure. The yield calibration procedure is based on the following steps: (1) a set of representative clusters of grapes were manually harvested and selected for the calibration; (2) the weight of each cluster of grapes is manually measured; (3) each cluster of grapes is hung in front of a white background in order to obtain a reference image with the image acquisition device; (4) the image of the cluster of grapes is automatically segmented by applying the Otsu method [
33] combined with the application of morphological operators (10 erosions and dilations) in order to remove noisy pixels from the images (
Figure 2a); and (5) the segmented image is used to estimate the area and volume of the cluster of grapes expressed in pixels. The total area of the cluster of grapes is computed as the number of white pixels in the segmented image (
Figure 2a). This area estimate summarizes the effect of all of the existing grapes in a cluster. Alternatively, the proposal is to compute the volume of the clusters. For example, in [
28], the volume of the clusters is estimated by using a 3D ellipsoidal model, but in this paper, the proposal is to estimate the volume of the grapes by interpreting the area of the grapes as the volume of a solid of revolution (expressed in pixels or square pixels). This volume estimate is computed similarly as a solid of revolution (
Figure 2b), where the cylinder of each row is obtained around the column center of the object. For example, if
k is a row of the segmented image and
s(k) and
e(k) the first and last image column of the pixels classified as a cluster of grapes, then this partial volume slice will be computed as
π·((e(k) − s(k))/2)2.
Figure 3 and
Figure 4 show the calibration results obtained with 29 representative clusters of grapes.
Figure 3 shows the relationship between the areas of the cluster of grapes (expressed in pixels) and its weight (in grams); the coefficient of correlation was 0.9557. Alternatively,
Figure 4 shows the linear relationship between the volume of the cluster of grapes (expressed in pixels) and its weight (in grams); the coefficient of correlation was 0.9635. In [
31], similar results have been achieved in a less controlled environment.
Figure 2.
(a) Example segmented image of a cluster of grapes; (b) Representation of the solid of revolution of the cluster of grapes estimated from the segmented image.
Figure 2.
(a) Example segmented image of a cluster of grapes; (b) Representation of the solid of revolution of the cluster of grapes estimated from the segmented image.
Figure 3.
Relationship between the weight and area of the cluster of grapes analyzed.
Figure 3.
Relationship between the weight and area of the cluster of grapes analyzed.
Figure 4.
Relationship between the weight and volume of the cluster of grapes analyzed.
Figure 4.
Relationship between the weight and volume of the cluster of grapes analyzed.
4. Automatic Segmentation of Clusters of Grapes
This section is focused on the automatic segmentation of the cluster of grapes as a procedure required to perform an estimate of the yield. The hypothesis is that the controlled artificial illumination used during the acquisition of the images at night will simplify the grape segmentation procedure and will allow the application of a pixel-based segmentation method. The main advantage of using a pixel-based classification strategy is that complex classification rules can be mapped directly into a LUT [
23] and applied in a real-time implementation.
The pixel-based segmentation procedures assessed in this section are: threshold segmentation, Mahalanobis distance segmentation, Bayesian classifier, direct three-dimensional histogram and linear color models. The empirical assessment is based on the analysis of a continuous sequence of vineyard images acquired with the image acquisition system under controlled artificial illumination at night. In this paper, the images of the cluster of grapes analyzed are not affected by occlusion. The final goal of this proposal is to classify all of the pixels of the high-resolution vineyard images into grapes (“1” or white color) or background (“0” or black color). The color spaces analyzed were the original RGB and the Hue-Saturation-Value (HSV) with the H layer shifted 180° in order to move the reddish components of the grapes into the center of the H linear vector.
The tuning of the segmentation procedures used in this paper may require the previous manual selection of complementary reference templates in a representative image of the vineyard. In this paper, the proposal is to define two basic auxiliary templates (
Figure 5): grape template and background template. In the case of the grape template, a predominant reddish color from the skin of the grapes is expected, but in the case of the background template, a mixture of different predominant colors from leaves, branches and other dark image areas is expected. In this paper, these two templates will be required by some segmentation methods in order to define two basic classification classes: template and background. The effect of defining more classes (for example, splitting the background class into leaves, branches and dark background classes) has not been evaluated, except in the case of using linear color models, because doing so is mandatory for this method.
Figure 5 shows a zoomed part of a representative vineyard image where an expert human operator has already selected two reference templates by applying a circular selection tool.
Figure 5.
Example of a manual selection of the grape and background templates.
Figure 5.
Example of a manual selection of the grape and background templates.
The accurate validation of the different cluster grape segmentation methods assessed requires two operations: (1) an accurate manual labeling of the regions of the images covered by grapes; and (2) the comparison of the automatic classification results with the manual labeling in order to extract statistical similitude information. As an example,
Figure 6 shows this validation procedure applied to one cluster of grapes of the vineyard (
Figure 6a). Then,
Figure 6b shows the accurate result obtained with a manual selection of the contour of the cluster of grapes (with approximately 600 contour points), and
Figure 6c shows an example of automatic pixel-based grape segmentation. Finally,
Figure 6d shows the differences between the manual labeling of the cluster and the automatic segmentation by applying an exclusive (or XOR) function between both segmented images. Finally, the size segmentation error is computed in this paper by counting the different pixels (the XOR differences) divided by the number of pixels of the cluster of grapes obtained from the manual selection of the contour. These image differences can be further reduced by applying an optimized sequence of morphological operators to the segmented image.
Figure 6.
(a) Vineyard image with a cluster of grapes; (b) Manual labeling of the cluster of grapes; (c) Example automatic grape segmentation results; (d) XOR differences between the manual labeling and the automatic segmentation.
Figure 6.
(a) Vineyard image with a cluster of grapes; (b) Manual labeling of the cluster of grapes; (c) Example automatic grape segmentation results; (d) XOR differences between the manual labeling and the automatic segmentation.
The color spaces considered for all of the methods assessed in this paper were the original RGB color space and the transformed HSV color space [
34], with the hue component shifted 180° in order to place the red color intensity in the center of the H plane and to simplify the detection of the predominant reddish color of the grapes.
The segmentation results obtained with each method assessed can be improved with the application of an empirical optimized sequence (or minimal sequence) of morphological operators, such as the hole filling, erosion and dilation, in order to refine the segmentation. This empirically-optimized morphological filtering sequence must be interpreted as the optimal or fast combination of erosion and dilatation operators required to eliminate noisy pixels from the images. In the case of big objects, such as the cluster of grapes, the use of a larger sequence with more erosion and more dilation operators only results in requiring greater execution time, because both operators have a neutral compensating effect.
4.1. Threshold Segmentation
The application of threshold levels in one or several color layers in order to segment the areas covered by grapes or other fruits in color images is very common in the literature. For instance, in [
35], a transform to the original RGB color space into the YIQ color space was proposed, and the threshold intensity levels were established by a trial and error manual operation. The proposal of [
36] was to implement one of the methods for vineyard yield estimation by establishing the red, green and blue thresholds in a reference RGB image manually and to apply these settings to the remaining images. In [
37], a transform from the RGB image into the Ohta color space [
38] was proposed by applying a constant threshold for fruit segmentation. In [
5], the grapes were segmented by applying a threshold level to the transformed HSI color space [
39]. The threshold levels were computed by analyzing the histogram of the transformed images [
40]. Similarly, in [
27], the grapes in vineyard images taken at night were detected by defining a region of pixel intensities in the RGB color space to segment the grapes based on a trial and error procedure.
Inspired by these cited works, the threshold segmentation method is applied to five different cases: red (R), green (G), blue (B), gray (I) and transformed hue (H) layers, in order to take advantage of the existing color differences between the reddish grapes and the greenish background. In each intensity color layer considered, the segmentation threshold level was fixed by applying the Otsu method [
33] to the first image acquired with the image acquisition device.
4.2. Mahalanobis Distance Segmentation
Mahalanobis distance [
41] segmentation is the second method tested, which consist of computing the distance between the three-dimensional color intensities of the pixels of the image and the ellipsoidal pixel intensity distributions of the existing grape and background templates. This distance can be used to classify each pixel as a member of its closest template. Compared with the Euclidean distance, the Mahalanobis distance also considers the differences in the variances of each intensity layer of the reference templates, so it is a robust method against small color variance caused by small changes in lighting. For example, in [
4], eight different templates were proposed in daylight field conditions: grapes, wood, background and four classes of leaf, depending on their age.
Inspired by these cited works, the Mahalanobis distance segmentation will be applied with only two classes: grape and background. The variances of each intensity color layer were obtained by analyzing the grape template and the background template (selected in the first image obtained in the vineyard). Then, the color intensity of an image pixel is compared with those models by computing the Mahalanobis distance, and the pixel is classified with the nearest class.
4.3. Bayesian Classifier
The Bayesian classifier is a well-known classifier widely used for image analysis in agricultural applications. For instance, in [
42], a Bayesian classifier to detect plants was implemented, and in [
43], oranges in trees were discriminated with the aim of providing guidance information for a robotic manipulator.
The Bayesian classifier is a probabilistic technique based on the previous definition and analysis of image features (pixel color intensities in this paper) corresponding to different classes. Based on these characteristic statistical features, a Bayesian classifier is able to analyze and classify each pixel of the image into one of these specified classes. Inspired by the work of [
43], the implementation of the Bayesian classifier is based on the simplified discriminant function, which assumes that the covariance matrices of the two reference templates used in the learning stage are not equal, and the color features that describe the grapes and the background are not statistically independent. Finally, the color intensity of a pixel is classified as a member of the template class with a large discriminant value.
4.4. Linear Color Model Segmentation
The proposal of applying linear color models (LCM) to detect fruit in color images [
18] is based on the prior selection of small object region areas in the image whose pixels have a linear intensity relationship (in a three-dimensional space) that can be modeled with a linear regression. Then, a class or object is defined by several linear regressions that describe the different color relationships of the object, and the pixel color intensity is classified by finding the minimum Euclidean distance to all the linear regressions defined. This classification method is very powerful, as it can model objects with a non-uniform color distribution and affected by daylight illumination changes, but requires the definition of one class per image object and several templates per class in order to model all object color variabilities. In this paper, three different template selections have been used to define the color variability of the grape class, and six different template selections (leaves, branches,
etc.) have been used to define the color variability of the background (or non-grape) class.
4.5. Histogram Segmentation
An alternative way to detect the skin of the grapes in the images is by comparing pixel color intensities with an existing three-dimensional color-intensity histogram obtained from a grape template (either in the RGB and HSV color spaces). However, this detection method is dependent on the manually selected grape template and may require the selection of additional grape templates in order to include all of the skin-color relationships of the grapes. In order to overcome this problem, the proposal of [
27] was to fill the gaps in the three-dimensional histogram and to complete the color relationship by applying a morphological dilation with a structuring element of 3 × 3 boxes. Inspired by this proposal, this paper proposes to dilate the color relationships appearing in the three-dimensional color-intensity histogram computed from the grape template by convolving the histogram with a solid sphere. Then, the segmentation is performed with a zero threshold level applied to the histogram.
6. Conclusions
This paper proposed a method for vineyard yield estimation based on the application of artificial illumination in a vineyard facility during the night in order to obtain high-resolution images of clusters of red grapes with small illumination variance. Five grape segmentation methods have been empirically assessed under such illumination conditions. The quality of the segmentation was computed by comparing the automatic segmentation results with a manual labeling of the clusters of grapes in the images. The direct segmentation results have been improved by applying a sequence of morphological operators in order to fill gaps and eliminate noisy pixels from the segmented images.
Empirical results showed that the use controlled illumination at night combined with the high-resolution and high-quality images of the vineyard simplifies the detection of clusters of grapes, because the color variability is small and the number of pixels available in each grape is very large (250,000 pixels on average). In the case of clusters of grapes not affected by occlusion, nine optimized implementations have provided grape size errors under 15%. The best estimate of the area of the grapes was obtained when applying a threshold level to the transformed H color layer of the images, obtaining a cluster size estimate error of 13.55%, which was improved up to 10.01% when applying an optimized sequence of morphological operators. In this case, the H layer was shifted 180° in order to move the reddish components of the grapes into the center of the H circular vector. The additional advantage of threshold segmentation is the simplicity of its real-time implementation in an agricultural machine designed to locate clusters of grapes and to estimate their size.
The area and estimated volume of the clusters of grapes obtained from the images analyzed have been converted to vineyard yield by using specific calibration curves, which require manual operation. The results obtained with the proposed methods have shown that the yield can be predicted with an error of 16% and −17% in the cases of using the size information of the segmented area of the grapes and the volume of the solid of revolution computed from the segmented area.
The results of this paper will have application to vineyard management by optimizing the resources required to harvest, transport, store and, if needed, manufacture the vineyard product. Future work will be focused on developing a real-time system in order to estimate yield in large vineyard facilities and to evaluate the robustness of the estimators against diseases and the ripening stage of the grapes.