1. Introduction
Viticulture, a branch of horticulture, is the cultivation and harvesting of grapes and is carried out in many countries. The tasks involved in viticulture include monitoring, irrigation, adding fertilizers, canopy management, controlling pests and diseases, monitoring fruit development and characteristics, deciding the harvesting time, and vine pruning during the winter months. Among these, tasks such as harvesting and vine pruning are performed at specific times. The task of irrigating the vineyard is simple to automate. However, monitoring fruit development and characteristics is typically carried out frequently over the entire area of the vineyard, and is a labor intensive and time consuming task as it involves visual inspection of the fruit and plants by the farmer.
Frequent monitoring of the crop is important to check for pests and diseases in leaves and grapes, to check the growth of grapes, and to inspect for any damage. Typical estimation of optimal harvest time is also done visually and may vary for different types of grapes in different areas. Monitoring of humidity levels and mineral levels in the soil, temperature, etc. is done by directly embedding sensors in the soil and various IoT approaches have been proposed [
1,
2,
3,
4,
5,
6,
7,
8,
9]. However, visual inspection of crop is another key aspect of monitoring. Visual monitoring is important for a viticulturist and involves the following key features:
Grape growth: A viticulturist generally inspects the growth of grapes visually.
Damage inspection: During the flowering of vine, strong winds and hail can cause damage. Cold temperatures may cause millerandage producing clusters with varying sizes or no seeds [
10]. On the other hand, hot conditions may cause coulure causing grape clusters to either drop or not develop fully. This monitoring is often performed visually.
Oidium inspection: Oidium is a fungal disease which has the potential to attack all the green parts of the vine with devastating consequences [
11]. This is inspected visually.
Peronospora inspection: Peronospora are obligate plant pathogens producing a downy mildew disease, which in turn produces stains on leaves [
12]. Its treatment involves spraying copper sulphate [
13]. This can be inspected visually at an early stage.
Phylloxera inspection: Phylloxera [
14] is a pest of commercial grapevines worldwide and can easily be inspected visually.
Monitoring for green harvest: Green harvesting is the process in which immature and green grape bunches are purposefully removed so that the vine uses all the nutrients for developing the remaining grapes. This helps to develop and ripen the grape with good flavors.
Estimation of harvest time and areas: Visual monitoring is important to estimate the appropriate harvest time and areas of the vineyard.
Yield estimation: Visual estimation is important to estimate an approximate yield estimation of a vineyard.
Thus, visual monitoring of vineyard is crucial with many advantages. However, manual monitoring is a labor and time consuming task.
The present work was conducted in Japan, which prominently grows around 30–40 varieties of grapes, each with its own unique taste and fragrance [
15]. Worth mentioning among these are: the
Kyoho variety, which is also known as the king of grapes and popular for its juiciness and plump;
the Muscat of Alexandria; the small seedless
Delaware; and
the Pione, famous for its flavor [
15,
16]. Almost all areas except the Nansei Islands are suitable for grapes, so grapes are produced in a wide range from Hokkaido to Kyushu prefectures of Japan. In Japan, nearly 90% of produced grapes are used for raw food, whereas less than 10% of the produced grapes are used for processing wine, grape juice, confectionery, etc. Japan does not export grapes but around 10,000 tons of grapes are imported annually [
17]. The most cultivated variety in Japan is Kyoho, which is cultivated on 5465 ha. Delaware follows with 2967 ha, Pione with 2430 ha, and Campbell Early with 655 ha [
18].
There are different varieties of grapes, and the practice of viticulture varies from place to place. In Japan, vineyards are located in hilly regions and generally characterized by cultivating grape plants in a nearly straight line. Grapevines are climbing plants that do not have their own natural support as with trees; hence, the grape plants are supported between wooden pillars which are called trunks. The grape plants grow to a certain height and hang on the supporting wires or rope above a certain distance from the ground. Hence, the pillars can serve as concrete features in the vineyards.
The focus of the proposed work is on visual monitoring using autonomous robots. Autonomous robots have successfully been employed in construction, manufacturing, and many aspects of agriculture industry. The motivation behind the proposed work lies in reducing the labor of farmers and bringing efficiency in grape production through the use of autonomous robots.
Recently, significant works related to autonomous vineyard robots have been proposed for different purposes. Some researchers have focused on localization of the robots, while others have focused on trunk recognition for single- [
19,
20] and multi-robot scenarios [
21]. Apart from these, researchers have also focused on the problems of autonomous pruning [
22], irrigation [
23], yield estimation [
24,
25], and skeletonization [
26] in vineyards by using autonomous robots. Color image-based grape detection is proposed in [
27]. In [
28], a monitoring robot for mountain vineyards is proposed. Mapping and localization in vineyard is discussed in [
29]. Researchers have also focused on the improvement of plague control tasks, specifically on the distribution and placement of pheromone dispensers for matting disruption in the vineyards in [
30]. Precision agriculture using multi-rotor micro aerial vehicles and human-carried multi-spectral 3D imaging device is proposed for automated monitoring in [
31]. Wireless sensor network for vineyard monitoring that uses image processing is proposed in [
32]. Other projects (e.g., Vinbot [
33]) are also worth mentioning.
The novel contributions of this paper are summarized below:
A vineyard monitoring system which only uses inexpensive camera sensor is proposed.
We propose a novel way to semantically label image data in vineyards by detecting the pillars set in the vineyard to support grapes. The system labels the image data on the basis of field name, lane number, and pillar numbers, which are automatically identified through image processing. Unlike traditional monitoring techniques, the proposed semantic labeling enables pin-point monitoring of the vineyard. In other words, farmers do not need to access the whole data, but instead can specify the exact location in the field which needs to be monitored. This is very efficient and time saving. Feature detection is important for semantic labeling. While extracting features such as walls, corners, and straight lines are easier in indoor environments, robust feature extraction in vineyard is difficult due to the dynamic nature of the environment (viz. moving leaves, plant’s trunk, changing lighting conditions, etc.). Hence, robust feature extraction is a major challenge faced by mobile robots in farms, which generally lack static features. Due to this, many researchers tend to use expensive sensors such as GPS, dense RGBD sensors, and 3D Lidar (e.g., VLP-16/32), or sensor networks [
34,
35,
36]. However, such sensors increase the system cost. Feature detection is done using inexpensive cameras in the proposed research.
A way to increase the robustness of the system by varying the range of detection is proposed.
An algorithm to automatically turn data logging on and off has been proposed based on the motion of robot.
An interactive software has been developed through which the farmers can monitor the vineyard.
The rest of the paper is organized as follows.
Section 2 explains the main idea and system overview.
Section 3 explains the landmark (pillar) detection algorithm, which forms the basis of semantic data labeling.
Section 4 shows how the robustness of landmark detection algorithm is improved without significantly impacting the processing time. Semantic data logging is explained in
Section 5.
Section 6 discusses experimental results in real environments with actual robots.
Section 6.1 shows the results of pillar detection in actual vineyard.
Section 6.2 discusses the processing time.
Section 6.3 explains about the monitoring software to interactively monitor the vineyard on a pin-point basis. Finally,
Section 7 concludes the paper.
3. Pillar Detection Algorithm
Feature detection is important for labeling image data. In the proposed method, pillars setup in the vineyards to support the grape plants are detected as features using image processing. The algorithm to detect pillars is given in the flowchart of
Figure 2. The algorithm is divided into four parts:
Setting Pillar Parameters: We first set the static parameters used for pillar detection. These includes the approximate width () and height () of the pillars, the approximate threshold area of detection (), and the range in the horizontal axis of image within which pillars should be detected (). At the start of the algorithm, a flag () which controls the labeling and saving of images in the database is set to .
Pillar Detection in HSV Colorspace: The camera setup on the robot reads an RGB color image when the robot starts to move. The camera reads the image in full HD (1920 × 1080) pixel resolution. This image is resized to pixels for faster image processing. Next, pillars are detected in HSV colorspace and the various steps are explained below:
Checking Detected Pillar’s Dimensions: If n contours are detected, then the dimensions of each contour are checked. Using the contour parameters , and , the condition for pillar detection is done using Algorithm 1.
Image Labeling and Saving in Database: The controls the semantic indexing of image data in the robot’s database. For each frame, the is set to the output of Algorithm 1. When the flag is , the pillar number is incremented, and successive images are logged based on the new pillar index.
Algorithm 1: Contour check for pillar. |
|
Figure 4 shows the results of the pillar detection.
Figure 4a is the resized
input image in RGB colorspace.
Figure 4b shows the image
which has been converted to HSV colorspace. Masking is applied to avoid detection of pillars and soil in the background.
Figure 4c shows the result of masking. Color of the pillar is searched in this image between the
of (34, 110, 255) and
of (17, 26, 50) and the resultant binary image
is shown in
Figure 4d. In this image, the white pixels are those whose values fall within the color search range. It can be seen that the pillar and stem of the grape plant are predominantly emphasized by this operation. At the same time, noise can also be seen in the image as small and independent white blobs. By applying morphological operations of erosion and dilation, noise is removed and the result is shown in
Figure 4e. Finally, contours are retrieved from the noise removed image, and parameters of contour’s width, height, area, etc. are checked for detecting the pillar. The result of the detected pillar is shown in
Figure 4f, in which the detected pillar is marked by a blue rectangle.
Effect of X-Range Parameter
Section 3 described many parameters which were used in the pillar detection algorithm. Among these, one of the parameters
is briefly explained here. This parameter sets the range in the horizontal axis of the image within which a pillar should be detected. In the proposed work, the value of
is set to 350. Thus, pillars are detected only within
. The same pillar detected at different angles has varying height, width, and area. Therefore, different thresholds need to set for different angles. Hence, for accurate estimation, the pillar is said to be detected only when the line joining the camera and the pillar are perpendicular to the direction of robots motion. The effect of this parameter on pillar detection is shown in
Figure 5. In
Figure 5a, a red vertical line is shown at
. A pillar appears on the right side of the line. Although visible, it is not detected at this stage. As the robot moves, the pillar gets close to the red vertical line, as shown in
Figure 5b. Finally, when the pillar’s top-left x-coordinate is within
and the other conditions of height, width, and area are satisfied, the pillar is detected, as shown in
Figure 5c.
Section 3 describes masking to produce a horizontal band of HSV pixels. The results of this masking are shown in
Figure 3 and
Figure 4c. This was performed to avoid detection of pillars and soil in the background. Moreover, masking was not applied for areas where
. This is because the algorithm has an alert feature which tracks soon to appear pillars for safety.
4. Improving the Robustness of Pillar Detection Algorithm
In real-world scenarios, it is possible that an object (e.g., box) whose color resembles the color of pillars is kept in the vineyard. This may lead to false data logging. To avoid this, it is important to improve the robustness of the algorithm. To do this, once a pillar has been detected using the algorithm described in
Section 3, the pillar is searched again within a larger search-space.
The algorithm is shown in
Figure 6. The algorithm begins by reading image from the camera and initial pillar detection is done according to the flowchart given in
Figure 2. If a pillar is detected, the flowchart in
Figure 2 outputs the blob dimensions
, and
a representing the top-left x-coordinate and y-coordinate, width, height, and area of the detected pillar, respectively. As shown in
Figure 6, the search range is then expanded based on the parameters retrieved from initial detection. This expansion is done using two parameters
and
, which control the expansion of the search range on
x-axis and
y-axis, respectively. The parameter
is a vector of
and
, which represent expansion in the left and right directions along the
x-axis, respectively. Similarly, the parameter
is a vector of
and
, which represent expansion in the up and down directions along the
y-axis, respectively.
If
, and
a are the dimensions of the pillar detected using the flowchart in
Figure 2, the pillar is searched again in an expanded range given by parameters
, and
, which are given as:
The HSV colorspace image
is cropped with the dimensions given in Equation (
6) generating an image
. Color is then searched in this cropped image within an
and
. The values of these limits are the same as used in
Section 3, and
is (34, 110, 255) and
is (17, 26, 50) for the H, S, and V values. This results in a binary image
. Noise is removed using erosion and dilation and contours are retrieved using the algorithm given in [
42]. The dimensions of the detected contours are checked against new thresholds of height (
) and area (
). As shown in
Figure 6, if the conditions are satisfied, the pillar is said to be detected, and pillar number counter is incremented.
Figure 7 shows the initial range detection and expanded range detection used in this work. The initial detection range is shown in
Figure 7a. It can be seen that the pillar is detected within
and
over the entire
x-axis. Once the pillar is detected, the range is expanded, as shown in
Figure 7b. In this work, the parameters given in Equation (
6) are set as below:
This expands the search range for pillar detection between
and
along the
y-axis and between
and
along the
x-axis, as shown in
Figure 7b. Note that
is set to 0 to avoid noise due to soil. The initial detection in narrow range is performed for faster detection. Once a pillar has been detected, search range is expanded and pillar is detected again for robustness.