Automatic Tomato and Peduncle Location System Based on Computer Vision for Use in Robotized Harvesting

: Protected agriculture is a ﬁeld in which the use of automatic systems is a key factor. In fact, the automatic harvesting of delicate fruit has not yet been perfected. This issue has received a great deal of attention over the last forty years, although no commercial harvesting robots are available at present, mainly due to the complexity and variability of the working environments. In this work we developed a computer vision system (CVS) to automate the detection and localization of fruit in a tomato crop in a typical Mediterranean greenhouse. The tasks to be performed by the system are: (1) the detection of the ripe tomatoes, (2) the location of the ripe tomatoes in the XY coordinates of the image, and (3) the location of the ripe tomatoes’ peduncles in the XY coordinates of the image. Tasks 1 and 2 were performed using a large set of digital image processing tools (enhancement, edge detection, segmentation, and the feature’s description of the tomatoes). Task 3 was carried out using basic trigonometry and numerical and geometrical descriptors. The results are very promising for beef and cluster tomatoes, with the system being able to classify 80.8% and 87.5%, respectively, of fruit with visible peduncles as “collectible”. The average processing time per image for visible ripe and harvested tomatoes was less than 30 ms.


Introduction
Few crops in the world are in such a high demand as the tomato. It is the most widespread vegetable in the world and the one with the highest economic value. During the 2003-2017 period, world tomato production increased annually from 124 million tons to more than 177 million tons. In the last 15 years, consumption has experienced sustained growth of around 2.5% [1]. These data make the tomato one of the most important vegetables in terms of job creation and wealth, and its future looks every bit as positive. According to data from the FAO [1], even though tomatoes are grown in 169 countries (for both fresh consumption and industrial use), the 10 main producers in 2017 (of which Spain is in eighth place) accounted for 80.45% of the world total. These countries are: China, India, The United States, Turkey, Egypt, Italy, Iran, Spain, Brazil and Mexico. The European Union is the world's second largest tomato producer after China. In Almería (south-east Spain), where the largest concentration of greenhouses in the world is located (more than 30,000 hectares), the main crop is tomato, representing 37.7% of total production [2]. Based on the data on the overall labor distribution in tomato cultivation, between 25% and 40% of all labor is employed in the highly repetitive task of harvesting [3]. Traditionally, harvesting is done manually with low-cost mechanical aids (harvesting trolleys, cutting tools, etc.), so most of the expense corresponds to human labor. 2 of 21 Automation is essential in any production system that tries to be competitive. It reduces production costs and improves product quality [2,[4][5][6]. Protected agriculture is a sector where the application of such techniques is required, particularly for the problem of the automatic harvesting of fruit (from trees) and vegetables. This is typical of the type of process that needs to be robotized because it is a repetitive pick-and-place task.

Literature Review
Over the last 40 years, a lot of research effort has been expended on developing harvesting robots for fruits and tomatoes [5][6][7][8][9][10][11][12][13]. Mavridou et al. [14] presented a review of machine vision techniques in agriculture-related tasks focusing on crop farming. In [15], Schillaci et al. attempted to solve the problem of recognizing mature greenhouse tomatoes using an SVM (support vector machine) classifier; however, the results of this work were not quantified. Ji et al. [16] achieved a success rate of 88.6% by using a segmentation feature of the color difference 2R-G-B and a threshold to detect the tomatoes, although an artificial tomato-clip was used to detect the peduncle. Feng et al. [17] used a CCD camera and an HSI color model for image segmentation. The 3D distance to the center of each segmented tomato was obtained using a laser. The success rate for harvesting the tomatoes and the execution time of a single harvest cycle (tomato location, movement of arm and picking) were 83.9% and 24 s, respectively. In [18], the images captured by a color camera were processed, extracting Haar-like features from sub-windows in each original image. After that, an Adaboost classifier followed by a color classifier managed to recognize 96% of the ripe tomatoes, although 10.8% were false negatives and 3.5% of the tomatoes were not detected. The same authors [19] used an adaptive threshold algorithm to obtain the optimal threshold. Subsequently, two images (a* and I) from the L*a*b space were obtained and fused by means of wavelets. Ninety per cent of the test target tomatoes were recognized in a set of 200 samples. Li et al. [20] used a region segmentation method followed by erosion and dilation to enhance the contour, and fuzzy control to determine the locus of the tomatoes. According to the authors, the recognition time was significantly reduced compared with other methods, but they did not give details regarding the error rates. In [21], a human operator marked the location of the tomatoes on the screen. After that, the position was obtained by a stereo camera. With this human-robot cooperation, a detection success rate of about 94% was achieved. Taqi et al. [22] mimicked a greenhouse and a very controlled environment where it was easy to detect the ripe tomatoes by means of the red color. In [23], Wang et al. used a binocular stereo vision system with the Otsu method to segment the ripe tomatoes. The success rate for ripe tomato recognition was 99.3% while the recognition and pitching time for each tomato was about 15 s with a success rate of 86%. Zhang et al. [24] used a convolutional neural network (CNN) as a classifier with a classification success rate of 92%. Kamilaris et al. [25] did a survey of deep learning in agriculture. In [26], the R-G plane was used to segment the tomato branch. Eighty-three percent of the mature test branches were harvested, but 1.4 attempts and 8 s were needed per branch. In Malik et al. [27], an HSV transform was used to detect only red tomatoes. To separate the connected tomatoes, a watershed algorithm was used. The rate of red tomatoes detected was about 81.6%. In [28], a dual arm with binocular vision and an Adaboost and color analysis classifier achieved a classification rate of 96%. In Lin et al. [29], a novel approach for recognizing different types of fruits (lemon, tomato, mango and pumpkins) was developed using the Hough transform to detect curved sub-fragments in images of real tomato environments. To remove false positive centers, an SVM was applied to the mixed contours. Depending on the type of fruit, the precision of this method varied between 0.75 and 0.92. Yuan et al. [30] proposed a method for cherry tomato detection based on a CNN for reducing the influence of illumination, growth difference and occlusion. Yoshida et al. [31] obtained 3D images of the bunch tomato crops and detected the position of the bunch peduncle. Only six sample images were used in this work and the computation time for each image was not specified. The authors achieved a precision rate of 98.85%.

Objectives
In this work, the detection and automatic location of the ripe fruit and their peduncles in the (x,y) plane was performed with one camera. This is because, later on, it will be necessary to indicate to the mechanical system in charge of the collection the exact place where the fruit should be separated from the plant. We considered the main novelty of the work to be the tomato peduncle detection.
To date, we have not seen this issue addressed in the reviewed literature. To achieve this, an exhaustive study was conducted into the different digital image processing techniques [32], applying those that provided the best results, then analyzing the problems arising and providing possible solutions. In our study, other techniques from the fields of pattern recognition or computer vision, such as deep learning, were not used because our goal was not to recognize or classify different tomato types.
As commented on in a previous work [33], when designing a harvesting robot, the morphology must be considered in order to work with irregular volumes. Two key factors should also be taken into account: (i) given that plants and trees can be located on large areas of land, the robots need to be mobile (they are usually the harvester-hybrid type, i.e., manipulator arms loaded onto platforms or mobile robots); (ii) for the fruit-picking operation, the robot must pick the fruit and separate it from the plant or tree, thus the end-effector design is fundamental. Once the harvesting robot has been designed, it must carry out the following phases to pick up the fruit or vegetables: (1) system guidance, (2) environment positioning, (3) fruit detection (4) fruit location, (5) approaching the robot end-effector to the fruit, (6) grasping the fruit, (7) separating the fruit, and (8) stacking or storing the harvested fruits. This paper focused on the automatic detection and location of ripe tomato fruit. As Figure 1 shows, the subsystem for locating the fruit must provide the position and orientation of the end-effector ($Tool) so that it coincides with the position and orientation of the different elements of each fruit to be harvested ($Penduncle and $Fruit) in the manipulator workspace. The peduncle and fruit are considered separately because there are different end-effectors-either for separating the fruit from the plant based on cutting the peduncle [34], or embracing/absorbing the fruit [35,36].
Appl. Sci. 2020, 10, x FOR PEER REVIEW 4 of 21 The main contribution of this work is: (1) the identification and location of the ripe tomatoes and their peduncles. (2) Every image is processed in less than 30 ms. (3) The system can be used for any end-effector based on cutting or suctioning the tomatoes. It is a very important contribution because this system can be used for any tomato harvesting robot, without having to develop a new vision system for each end-effector prototype.  An optimal solution to solve the position and orientation problems involves six degrees of freedom [37]: three for positioning in space (x, y, z) and three for orientation (pitch, roll and yaw), although certain hypotheses can be considered to simplify this. The first is to not consider the orientation problem because the end-effector can be designed by only knowing the position of the fruit elements; thus, one needs to know the coordinates (X, Y, Z) of the $Peduncle and $Fruit. The idea is to combine a computer vision subsystem that provides the (x, y) coordinates with a laser mounted on a servo-based Pan-Tilt subsystem that points to the position calculated by the vision system to determine the z-coordinate of the tomato elements.
This work presents the beginning of the total automation-the automatic 2D detection and location of the ripe tomato fruits and their peduncles-as shown in Figure 2. For this, an exhaustive study was carried out employing the different computer vision and digital image processing [29] techniques, applying those that provided the best results.  This paper is organized as follows: in Section 2, the different materials and techniques used for the automatic detection and location of ripe tomato fruit and peduncles are described. In Section 3, the results of these processes are shown and discussed with regard to two tomato varieties: beef and cluster. Lastly, in Section 4, the main conclusions and future works are summarized.

Greenhouse Environment
The data used to develop the first version of the algorithm were acquired in the greenhouses of the Cajamar Foundation's Experimental Station in El Ejido, Almería Province, Spain (2°43′00″ W, 36°48′00″ N, and 151 m above sea level). The tomato crops were grown in a multi-span ''Parral-type" greenhouse, with a surface area of 877 m 2 (37.8 × 23.2 m). The greenhouse orientation is east to west, As mentioned above, some tomato harvesting systems work by first pressing and then pulling on the tomatoes. In this work, a first step towards a tomato harvesting system is presented in detail, in which multiple digital image processing tools are used to obtain not only the position of the tomato but also that of its peduncle. This was applied to two types of crops, beef and cluster tomatoes, collecting the fruit by cutting the peduncle, not by pressing on the tomato, thus avoiding possible damage. These objectives were divided into a series of sub-objectives:

•
Detection of the ripe tomatoes. From the image provided, the system must detect tomatoes that are ripe and segment them from the rest of the image.

•
Location of the ripe tomatoes in XY. After recognizing the ripe tomatoes, the system should position them in the XY plane of the image.

•
Location of the peduncle in XY. The system should provide the location of the peduncle of the ripe tomatoes in the XY plane of the image.
The main contribution of this work is: (1) the identification and location of the ripe tomatoes and their peduncles. (2) Every image is processed in less than 30 ms. (3) The system can be used for any end-effector based on cutting or suctioning the tomatoes. It is a very important contribution because this system can be used for any tomato harvesting robot, without having to develop a new vision system for each end-effector prototype.
This paper is organized as follows: in Section 2, the different materials and techniques used for the automatic detection and location of ripe tomato fruit and peduncles are described. In Section 3, the results of these processes are shown and discussed with regard to two tomato varieties: beef and cluster. Lastly, in Section 4, the main conclusions and future works are summarized.

Greenhouse Environment
The data used to develop the first version of the algorithm were acquired in the greenhouses of the Cajamar Foundation's Experimental Station in El Ejido, Almería Province, Spain (2 • 43 00" W, Appl. Sci. 2020, 10, 5887 5 of 21 36 • 48 00" N, and 151 m above sea level). The tomato crops were grown in a multi-span "Parral-type" greenhouse, with a surface area of 877 m 2 (37.8 × 23.2 m). The greenhouse orientation is east to west, whilst the crop rows are aligned north to south, with a double plant row separated by 1.5 m. The tomato crop was transplanted in August and the season finished in June (long season). This variety has indeterminate growth, the fruit ripens by height and position on the branch, so cultivation tasks are continuous throughout the season.
In this situation, tomato harvesting is carried out at least once a week from November to June. The growing conditions and crop management are very similar to those in commercial tomato greenhouses. The climate parameters inside the greenhouse are monitored continuously every 30 s. Outside the greenhouse, a weather station measures the air temperature and relative humidity (with a ventilated sensor), solar radiation and photosynthetic active radiation (with a silicon sensor) and precipitation (with a rain detector). It also records the CO 2 concentration and wind speed/direction.
During the experiments, the indoor climate variables were also recorded, especially the air temperature, relative humidity, global solar radiation, photosynthetic active radiation, soil and cover temperature, water and electricity consumption, an irrigation demand tray, water content, electrical conductivity and soil temperature.

Image Acquisition and Processing
Appl. Sci. 2020, 10, x FOR PEER REVIEW 5 of 21 whilst the crop rows are aligned north to south, with a double plant row separated by 1.5 m. The tomato crop was transplanted in August and the season finished in June (long season). This variety has indeterminate growth, the fruit ripens by height and position on the branch, so cultivation tasks are continuous throughout the season. In this situation, tomato harvesting is carried out at least once a week from November to June. The growing conditions and crop management are very similar to those in commercial tomato greenhouses. The climate parameters inside the greenhouse are monitored continuously every 30 s. Outside the greenhouse, a weather station measures the air temperature and relative humidity (with a ventilated sensor), solar radiation and photosynthetic active radiation (with a silicon sensor) and precipitation (with a rain detector). It also records the CO2 concentration and wind speed/direction.
During the experiments, the indoor climate variables were also recorded, especially the air temperature, relative humidity, global solar radiation, photosynthetic active radiation, soil and cover temperature, water and electricity consumption, an irrigation demand tray, water content, electrical conductivity and soil temperature.

Tomato Detection Algorithm
As can be observed in Figures 3 and 4, the mature tomatoes are usually located in the lower part of the plant, where there are practically no leaves.
The system performs a series of operations to detect those ripe tomatoes that are in the foreground (not occluded) and segment them from the rest of the image elements. At the end of this stage, each ripe tomato is represented by a single region. The flowchart of the operations performed to detect the ripe tomato is shown in Figure 5. The computer used was a MacBook Pro (Intel i9, 2.33 GHz, 16 GB DDR4) running a Windows 10 operating system with Bootcamp. To build our system, the NI Vision Development Module from NI Labview 2015 was used.

Tomato Detection Algorithm
As can be observed in Figures 3 and 4, the mature tomatoes are usually located in the lower part of the plant, where there are practically no leaves.   During this stage, several operations were carried out simultaneously on different copies of the original image, chosen for its characteristics to show the results of each sequence: • Tomato-Edge Detection Figure 6a illustrates a typical situation in tomato greenhouses. First of all, the green container in the image measures the amount of drip irrigation water for the plants; this is usually present in many of the greenhouse corridors. As shown in Figures 3 and 4, the tomatoes begin to ripen at the bottom of the plant, where there are few leaves. Nevertheless, there are smaller leaves and tomatoes that have been removed by segmentation and other processes on the right and left sides. In addition, in greenhouse horticulture, the leaves are usually removed from the bottom (the standard cultivation technique), so only conditions that are normal for greenhouses in this area are being reproduced in the article.
First, we choose the R component of the RGB image ( Figure 6b). To enhance the contrast, we apply a power law transform s = c r ϒ , where r are the initial gray levels of the R image, c is constant (usually 1), ϒ < 1 lightens the image and ϒ > 1 darkens the image, and s are the final gray levels of the image after the contrast enhancement ( Figure 6c and Figure 7a). The parameters ϒ and c vary depending on the type of tomato in the image because the color and reflectance are not the same for The system performs a series of operations to detect those ripe tomatoes that are in the foreground (not occluded) and segment them from the rest of the image elements. At the end of this stage, each ripe tomato is represented by a single region. The flowchart of the operations performed to detect the ripe tomato is shown in Figure 5.  During this stage, several operations were carried out simultaneously on different copies of the original image, chosen for its characteristics to show the results of each sequence: • Tomato-Edge Detection Figure 6a illustrates a typical situation in tomato greenhouses. First of all, the green container in the image measures the amount of drip irrigation water for the plants; this is usually present in many of the greenhouse corridors. As shown in Figures 3 and 4, the tomatoes begin to ripen at the bottom of the plant, where there are few leaves. Nevertheless, there are smaller leaves and tomatoes that have been removed by segmentation and other processes on the right and left sides. In addition, in greenhouse horticulture, the leaves are usually removed from the bottom (the standard cultivation technique), so only conditions that are normal for greenhouses in this area are being reproduced in the article.
First, we choose the R component of the RGB image ( Figure 6b). To enhance the contrast, we apply a power law transform s = c r ϒ , where r are the initial gray levels of the R image, c is constant (usually 1), ϒ < 1 lightens the image and ϒ > 1 darkens the image, and s are the final gray levels of the image after the contrast enhancement ( Figure 6c and Figure 7a). The parameters ϒ and c vary  After increasing the contrast, the tomato-edge detection could be carried out with one of several operators (Sobel, Roberts, Prewitt, etc.); however, after conducting an exhaustive study applying the different types of operators, we decided to use Sobel because it provided a more precise positioning of the tomatoes and peduncles (Figure 7b).  After increasing the contrast, the tomato-edge detection could be carried out with one of several operators (Sobel, Roberts, Prewitt, etc.); however, after conducting an exhaustive study applying the different types of operators, we decided to use Sobel because it provided a more precise positioning of the tomatoes and peduncles (Figure 7b). After increasing the contrast, the tomato-edge detection could be carried out with one of several operators (Sobel, Roberts, Prewitt, etc.); however, after conducting an exhaustive study applying the different types of operators, we decided to use Sobel because it provided a more precise positioning of the tomatoes and peduncles (Figure 7b).
The noise and the outline of the shadows that appear on the fruit surface make it difficult to capture their exact contour. To keep only what interests us, a series of operations were carried out on the image in Figure 7b. The first was a segmentation based on grayscale or intensity, which allows us to eliminate a large part of the image noise and the effects of the shadows (Figure 7c); this was followed by a segmentation based on size (regions of connected pixels that do not exceed a certain number are eliminated) (Figure 7d).
Following the previous functions, the morphological operation of dilation was applied (Figure 7e). The dilation objective is to be able to join all the dashed lines to form a contour without discontinuities, or at least, with many less than those presented at the beginning.
Finally, we again performed segmentation based on size (Figure 7f) to eliminate the elements that continue appearing on the fruit surface but that are not part of its contour. •

Image Binary Inversion
After detecting the fruit edge, the obtained image is still not ready to be used for the edge subtraction ( Figure 5) since the binary image of the fruit contours is inverted (Figure 8a). In addition, segmentation based on size was carried out (Figure 8b) to eliminate the small regions that remained inside the contours, making them more defined. The noise and the outline of the shadows that appear on the fruit surface make it difficult to capture their exact contour. To keep only what interests us, a series of operations were carried out on the image in Figure 7b. The first was a segmentation based on grayscale or intensity, which allows us to eliminate a large part of the image noise and the effects of the shadows (Figure 7c); this was followed by a segmentation based on size (regions of connected pixels that do not exceed a certain number are eliminated) (Figure 7d).
Following the previous functions, the morphological operation of dilation was applied ( Figure  7e). The dilation objective is to be able to join all the dashed lines to form a contour without discontinuities, or at least, with many less than those presented at the beginning.
Finally, we again performed segmentation based on size (Figure 7f) to eliminate the elements that continue appearing on the fruit surface but that are not part of its contour. •

Image Binary Inversion
After detecting the fruit edge, the obtained image is still not ready to be used for the edge subtraction ( Figure 5) since the binary image of the fruit contours is inverted (Figure 8a). In addition, segmentation based on size was carried out (Figure 8b) to eliminate the small regions that remained inside the contours, making them more defined. • Segmentation based on color 1 (Figure 9) to obtain a separated region for each mature tomato that appears in the image. • Segmentation based on color 1 (Figure 9) to obtain a separated region for each mature tomato that appears in the image. • Segmentation based on color 1 (Figure 9) to obtain a separated region for each mature tomato that appears in the image. • Edge subtraction ( Figure 10): next, we applied the logical AND function on Figure 8b and Figure  9b. The result was a new binary image where the region, or regions, representing the total ripe surface, appears divided into regions that already represent individual ripe tomatoes ( Figure  10b). The subsequent processing stage (Figure 11a) was to perform a new segmentation based on color (segmentation in color-2), in order to achieve a binary image in which each ripe tomato appears represented by a single region, separated from the rest of the regions (Figure 11b). This task is quite complicated, since ripe tomatoes often appear in the image superimposed on one another, or so close to each other that their regions come together. The difficulty lies in the fact that ripe tomatoes are all practically the same color, which makes it very difficult to obtain a separate region for each of them. Nonetheless, the tomatoes appear much brighter in their central area, and darker as we get closer to the edges. This makes it easier for us to carry out a color-based segmentation in which only the central part of the ripe tomato is detected, meaning that the tomatoes appear represented by separate regions even if they overlap (Figure 11b).  (Figures 8b and 9b).
The subsequent processing stage (Figure 11a) was to perform a new segmentation based on color (segmentation in color-2), in order to achieve a binary image in which each ripe tomato appears represented by a single region, separated from the rest of the regions (Figure 11b). This task is quite complicated, since ripe tomatoes often appear in the image superimposed on one another, or so close to each other that their regions come together. The difficulty lies in the fact that ripe tomatoes are all practically the same color, which makes it very difficult to obtain a separate region for each of them. Nonetheless, the tomatoes appear much brighter in their central area, and darker as we get closer to the edges. This makes it easier for us to carry out a color-based segmentation in which only the central part of the ripe tomato is detected, meaning that the tomatoes appear represented by separate regions even if they overlap (Figure 11b).
to each other that their regions come together. The difficulty lies in the fact that ripe tomatoes are all practically the same color, which makes it very difficult to obtain a separate region for each of them. Nonetheless, the tomatoes appear much brighter in their central area, and darker as we get closer to the edges. This makes it easier for us to carry out a color-based segmentation in which only the central part of the ripe tomato is detected, meaning that the tomatoes appear represented by separate regions even if they overlap (Figure 11b).

•
Image combination ( Figure 12): the binary images resulting from edge subtraction ( Figure 10b) and color-based Segmentation 2 ( Figure 11b) were combined into a single image using the OR (logical addition) operation. Sometimes, after subtracting the edges, a region belonging to the • Image combination ( Figure 12): the binary images resulting from edge subtraction ( Figure 10b) and color-based Segmentation 2 ( Figure 11b) were combined into a single image using the OR (logical addition) operation. Sometimes, after subtracting the edges, a region belonging to the same tomato is divided into two or more smaller regions. The objective of this step is to link them to form a single region that represents the tomato. An added value is that the area of the regions corresponding to ripe tomatoes increases, maintaining the separation between them.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 10 of 21 same tomato is divided into two or more smaller regions. The objective of this step is to link them to form a single region that represents the tomato. An added value is that the area of the regions corresponding to ripe tomatoes increases, maintaining the separation between them. • Segmentation based on size ( Figure 13): in the binary image obtained after combining the images, not only do the regions appear that correspond to the ripe tomatoes in the foreground (which are the ones that really interest us), but many others also do, those belonging to tomatoes from more remote plants, and other objects that are in the environment whose color falls within the established segmentation thresholds, etc. • Segmentation based on size ( Figure 13): in the binary image obtained after combining the images, not only do the regions appear that correspond to the ripe tomatoes in the foreground (which are the ones that really interest us), but many others also do, those belonging to tomatoes from more remote plants, and other objects that are in the environment whose color falls within the established segmentation thresholds, etc. • Segmentation based on size ( Figure 13): in the binary image obtained after combining the images, not only do the regions appear that correspond to the ripe tomatoes in the foreground (which are the ones that really interest us), but many others also do, those belonging to tomatoes from more remote plants, and other objects that are in the environment whose color falls within the established segmentation thresholds, etc. The objective of segmentation based on size is to eliminate all of these regions, keeping only those that represent the ripe tomatoes in the foreground. It also removes regions that belong to ripe tomatoes cut off by the edge of the image (Figure 13b). As we can see, two size-based segmentations were needed. The first segmentation (Figure 13c) to remove little regions, and the second to remove the regions that are less than half the size of the largest region (Figure 13d). In this way, the fact that no ripe tomato appears in the image is no longer a problem.

•
Representation of the regions ( Figure 14): this shows the user which regions obtained after the segmentation based on size represent the possible "collectible" tomatoes. To achieve this we computed the convex area of Figure 13d. Not all of these will be so, since it will depend on whether their peduncles are visible or not from the perspective from which the image was taken. The objective of segmentation based on size is to eliminate all of these regions, keeping only those that represent the ripe tomatoes in the foreground. It also removes regions that belong to ripe tomatoes cut off by the edge of the image (Figure 13b). As we can see, two size-based segmentations were needed. The first segmentation (Figure 13c) to remove little regions, and the second to remove the regions that are less than half the size of the largest region (Figure 13d). In this way, the fact that no ripe tomato appears in the image is no longer a problem.

•
Representation of the regions ( Figure 14): this shows the user which regions obtained after the segmentation based on size represent the possible "collectible" tomatoes. To achieve this we computed the convex area of Figure 13d. Not all of these will be so, since it will depend on whether their peduncles are visible or not from the perspective from which the image was taken.

Location of the Tomatoes and Their Peduncles
During this stage (Figure 15a), the system provides the location of each ripe tomato in the XY plane of the image by computing the gravity center (c.g.) of the convex area of each tomato. In the

Location of the Tomatoes and Their Peduncles
During this stage (Figure 15a), the system provides the location of each ripe tomato in the XY plane of the image by computing the gravity center (c.g.) of the convex area of each tomato. In the text, we call this the "center". In addition, it also calculates the position of the tomato's peduncle in the XY plane of the image; this is because, later on, it will be necessary to indicate to the robot the place where the ripe tomato must be separated from the rest of the plant. To begin this stage, the image from the previous stage was used (Figure 13b), in which only the regions representing ripe tomatoes (one region per tomato) appear. Before calculating the positions of the tomatoes and their peduncles, it is necessary to compute a series of descriptors for these regions.

Location of the Tomatoes and Their Peduncles
During this stage (Figure 15a), the system provides the location of each ripe tomato in the XY plane of the image by computing the gravity center (c.g.) of the convex area of each tomato. In the text, we call this the "center". In addition, it also calculates the position of the tomato's peduncle in the XY plane of the image; this is because, later on, it will be necessary to indicate to the robot the place where the ripe tomato must be separated from the rest of the plant. To begin this stage, the image from the previous stage was used (Figure 13b), in which only the regions representing ripe tomatoes (one region per tomato) appear. Before calculating the positions of the tomatoes and their peduncles, it is necessary to compute a series of descriptors for these regions.
The regions obtained for each ripe tomato after the detection stage may have gaps or "holes" inside. The first operation is to fill in the gaps (Figure 15b) in order to make the measurements carried out below more precise. After that, we computed the external gradient of the previous image ( Figure  15c). The regions obtained for each ripe tomato after the detection stage may have gaps or "holes" inside. The first operation is to fill in the gaps (Figure 15b) in order to make the measurements carried out below more precise. After that, we computed the external gradient of the previous image (Figure 15c).
For each of these regions, two sets of descriptors were obtained. The first set were: • Center X: x coordinate (in pixels) of the region's c.g.;

•
Center Y: y coordinate (in pixels) of the region's c.g.;

•
Height and width in pixels of the circumscribed rectangle; • Minor axis in pixels of the equivalent Feret ellipse; • Orientation: ellipse orientation in degrees.
To obtain the second set of descriptors, we computed the external gradient of the regions without gaps. This operator returns a binary image with the external contour of the input image regions (Figure 15c). From this new image, we build the second set of descriptors, consisting of: • X center: x coordinate (in pixels) of the center of gravity of the region's external contour. To distinguish it from Center X of the first set, we will call it Center XGdExt; • Y center: y coordinate (in pixels) of the center of gravity of the region's external contour. To distinguish it from the Y Center of the first set, we will call it Y Center YGdExt.
After carrying out a large number of tests using different combinations of these and other features, they proved to be the ones that gave us the most accurate results when locating the peduncles.

Plant Detection
The main objective of this process is to obtain the approximate position of the stem from which the tomatoes in the image "hang".
In addition to this data, we obtained a binary image showing the "green" parts (stem, branches, peduncles, calyces, etc.) of the plant that are in the foreground. Figure 16 show the steps to locate the plant stem´s centroid.

Plant Detection
The main objective of this process is to obtain the approximate position of the stem from which the tomatoes in the image "hang".
In addition to this data, we obtained a binary image showing the "green" parts (stem, branches, peduncles, calyces, etc.) of the plant that are in the foreground. Figure 16 show the steps to locate the plant stem´s centroid.

Peduncle Detection
The approximate position of the peduncle is achieved by applying a series of geometric rules based on the morphology of the plant (Figure 17), from which we obtained four possible peduncle positions for each mature tomato candidate to be collected. The final position of the peduncle will be that meeting certain requirements. If none of the four possibilities fulfil these requirements, it is assumed that the peduncle is not visible, as is usually the case.

Peduncle Detection
The approximate position of the peduncle is achieved by applying a series of geometric rules based on the morphology of the plant (Figure 17), from which we obtained four possible peduncle positions for each mature tomato candidate to be collected. The final position of the peduncle will be that meeting certain requirements. If none of the four possibilities fulfil these requirements, it is assumed that the peduncle is not visible, as is usually the case. Usually, the peduncle is on the upper straight line perpendicular to the tomato's main axis. Computing the centroid, the equivalent ellipse, and the major and minor axis of the ellipse or the circumscribed rectangle, and using elemental trigonometry, it is possible to compute Δx and Δy, and thus the peduncle position. Finally, it is necessary to check that the peduncle is not on the tomato and that it is over the plant (Figure 18).

Results
The system under study was tested using 175 images captured in a real environment for two different crop types: beef and cluster tomatoes. For each type, the success and failure rates (Tables 1  and 2) were calculated in relation to three different "objects": 1. That corresponding to the location of the tomatoes; 2. That corresponding to the location of the peduncles; 3. That corresponding to the tomato peduncle set.
There are three different types of failures, which we will call: Usually, the peduncle is on the upper straight line perpendicular to the tomato's main axis. Computing the centroid, the equivalent ellipse, and the major and minor axis of the ellipse or the circumscribed rectangle, and using elemental trigonometry, it is possible to compute ∆x and ∆y, and thus the peduncle position. Finally, it is necessary to check that the peduncle is not on the tomato and that it is over the plant (Figure 18). Usually, the peduncle is on the upper straight line perpendicular to the tomato's main axis. Computing the centroid, the equivalent ellipse, and the major and minor axis of the ellipse or the circumscribed rectangle, and using elemental trigonometry, it is possible to compute Δx and Δy, and thus the peduncle position. Finally, it is necessary to check that the peduncle is not on the tomato and that it is over the plant (Figure 18).

Results
The system under study was tested using 175 images captured in a real environment for two different crop types: beef and cluster tomatoes. For each type, the success and failure rates (Tables 1  and 2) were calculated in relation to three different "objects": 1. That corresponding to the location of the tomatoes; 2. That corresponding to the location of the peduncles; 3. That corresponding to the tomato peduncle set.
There are three different types of failures, which we will call:

Results
The system under study was tested using 175 images captured in a real environment for two different crop types: beef and cluster tomatoes. For each type, the success and failure rates (Tables 1 and 2) were calculated in relation to three different "objects":

1.
That corresponding to the location of the tomatoes; 2.
That corresponding to the location of the peduncles; 3.
That corresponding to the tomato peduncle set.  There are three different types of failures, which we will call: • Failure 1: An object that should have been detected/located is NOT detected or located; • Failure 2: An object is detected or located that should NOT have been detected/located; • Failure 3: An object that should be detected/located by the system, is detected/located but not correctly.
The reason why the system must only detect fruit located in the foreground, and which are not occluded (or that their occlusion is not relevant), is because only tomatoes meeting these characteristics (in addition to those related to the degree of maturity) can be collected first. In order to detect (and collect) the occluded ripe tomatoes, it is first necessary to harvest the tomatoes that lie in front. Moreover, one must take into account that each time a tomato is harvested, the position of the other ripe fruit is usually affected. For these reasons, if the system were implemented on a real robot picker, a new image of the plant would need to be taken after each tomato is collected, and then recalculate the position of the next tomato for harvesting, since the picking of its neighbor could have altered its previous position.

Beef Tomatoes
The success and failure rate for the "tomato peduncle" set is what predicts the final success of the system, since it indicates how many of the ripe tomatoes with visible peduncles can finally be harvested. According to the results, 80.8% of these tomatoes were classified as "collectible" by the system. The system fails to detect the remaining 19.2% which, in theory, could also be collected. A very positive outcome is that there were no errors of location nor errors classifying "not harvested" tomatoes as "harvested". Figures 19 and 20 show two examples of the results for a set of beef tomatoes.
the system, since it indicates how many of the ripe tomatoes with visible peduncles can finally be harvested. According to the results, 80.8% of these tomatoes were classified as "collectible" by the system. The system fails to detect the remaining 19.2% which, in theory, could also be collected. A very positive outcome is that there were no errors of location nor errors classifying "not harvested" tomatoes as "harvested". Figures 19 and 20 show two examples of the results for a set of beef tomatoes.
For this type of crop, 100 images were taken; these images included configurations of all kinds: using only natural lighting, using the camera flash, images taken at very different distances, and even images in which the camera was not positioned perpendicular to the ground. Of those images, only 79 met the conditions established for correct system operation. We will analyze the results provided by the system for these beef-type tomato images (Table 1).
In a research experiment, errors are never desirable, but of the different types of errors that may occur, not all are equally important. For example, making a mistake when calculating the location of a tomato or its peduncle (Failure 3) is much more serious than the system not detecting a fruit or peduncle that it should have detected (Failure 1). In a research experiment, errors are never desirable, but of the different types of errors that may occur, not all are equally important. For example, making a mistake when calculating the location of a tomato or its peduncle (Failure 3) is much more serious than the system not detecting a fruit or peduncle that it should have detected (Failure 1). This is because, if the system were implemented in a real collecting robot, a calculation error regarding the position of the fruit or peduncle could cause irremediable damage to the plant or surrounding fruit when trying to collect it. In contrast, not detecting a fruit or peduncle does not translate into any kind of harmful effect to the environment. As can be seen in Table 1, error types 2 and 3 are 0% in all cases. There is one failure 2, but it is partially covered and the peduncle is visible, which is an excellent outcome. The processing time per image was 27 ms.

Cluster Tomatoes
In this case, about 75 images were taken (42 met the conditions established for correct system operation). The system managed to classify as "collectible" 87.5% of tomatoes with visible peduncles. This percentage is at least as good as that obtained for the beef-type tomatoes, taking into account that the system was designed based solely on the results obtained for the beef-type crop. Figures 21  and 22 show two examples of the results for a set of cluster tomatoes.
In Figure 21, eight ripe tomatoes appear. All are ready for harvest because they are mature tomatoes. There is a tomato (c1) in the shade that is ready to pick but was not detected by the system. The algorithm detects and correctly locates the other seven tomatoes for harvesting. The peduncles of the seven detected tomatoes are visible, and the system manages to locate them. These results could be improved significantly if we could make the images invariant against a set of transformations like light intensity and an affine transformation composed of translations, rotations and size changes. Table 2 shows the results for cluster tomatoes.
The average processing time per image for the visible ripe tomatoes and harvested tomatoes was 29 ms. The worst results were obtained with this type of tomato (17%); this might be due to having worked with 50% fewer images than were needed to meet the established requirements. For this type of crop, 100 images were taken; these images included configurations of all kinds: using only natural lighting, using the camera flash, images taken at very different distances, and even images in which the camera was not positioned perpendicular to the ground. Of those images, only 79 met the conditions established for correct system operation. We will analyze the results provided by the system for these beef-type tomato images (Table 1).
In a research experiment, errors are never desirable, but of the different types of errors that may occur, not all are equally important. For example, making a mistake when calculating the location of a tomato or its peduncle (Failure 3) is much more serious than the system not detecting a fruit or peduncle that it should have detected (Failure 1). In a research experiment, errors are never desirable, but of the different types of errors that may occur, not all are equally important. For example, making a mistake when calculating the location of a tomato or its peduncle (Failure 3) is much more serious than the system not detecting a fruit or peduncle that it should have detected (Failure 1). This is because, if the system were implemented in a real collecting robot, a calculation error regarding the position of the fruit or peduncle could cause irremediable damage to the plant or surrounding fruit when trying to collect it. In contrast, not detecting a fruit or peduncle does not translate into any kind of harmful effect to the environment. As can be seen in Table 1, error types 2 and 3 are 0% in all cases. There is one failure 2, but it is partially covered and the peduncle is visible, which is an excellent outcome. The processing time per image was 27 ms.

Cluster Tomatoes
In this case, about 75 images were taken (42 met the conditions established for correct system operation). The system managed to classify as "collectible" 87.5% of tomatoes with visible peduncles. This percentage is at least as good as that obtained for the beef-type tomatoes, taking into account that the system was designed based solely on the results obtained for the beef-type crop. Figures 21 and 22 show two examples of the results for a set of cluster tomatoes.      In Figure 21, eight ripe tomatoes appear. All are ready for harvest because they are mature tomatoes. There is a tomato (c1) in the shade that is ready to pick but was not detected by the system. The algorithm detects and correctly locates the other seven tomatoes for harvesting. The peduncles of the seven detected tomatoes are visible, and the system manages to locate them. These results could be improved significantly if we could make the images invariant against a set of transformations like light intensity and an affine transformation composed of translations, rotations and size changes. Table 2 shows the results for cluster tomatoes.

Num. c.g. (X,Y) Peduncle (X,Y) Harvest
The average processing time per image for the visible ripe tomatoes and harvested tomatoes was 29 ms. The worst results were obtained with this type of tomato (17%); this might be due to having worked with 50% fewer images than were needed to meet the established requirements.

Discussion
In this work, the following objectives were achieved: • Detection of ripe tomatoes: the system detected those ripe tomatoes located in the foreground of the image whose surfaces were not occluded by the plant or the fruit that surround it, or at least, not so much that they could not be collected. Specifically, it detected the "candidate" tomatoes to be collected, representing each of them by a single region (convex area) separated from the rest.

•
Location of the ripe tomatoes in XY: once detected, the system located the ripe tomatoes in the XY plane of the image by calculating the position of their centers.

•
Location of the tomato peduncle in XY: for each ripe tomato detected, the system indicated whether or not its peduncle was visible from the position where the image was captured. If the peduncle was visible, the system located it by providing its position in the image's XY plane and informed us that the tomato could be collected. If the peduncle was not visible, the system advises as such, and informs us that the tomato cannot be collected.

Conclusions
It is rarely a simple task, in a given field of study, to find the sequence of processes needed to enhance and segment images. In our case, it has been particularly complex. We consider that the main novelty and contributions of this work are: 1.
The identification and location of the ripe tomatoes and their peduncles; 2.
The computing time we achieved for the processing (identification and location) of an image was of the order of milliseconds, while in other works [18,24,27], it was of the order of seconds; 3.
The use of flash to acquire the images minimized the illumination variations effects; 4.
Another very important contribution of this vision system was that it can be used for any tomato-harvesting robot, without having to develop a new vision system for each end-effector prototype, because it locates the needed tomato parts for the different types of harvesting: cutting or embracing/ absorbing.
Furthermore, as we noted in this paper, this is only a first, yet important step, leading to other tasks that will complete the harvesting automation process (calculating the z-position and cutting or suctioning the tomatoes, improving the detection of tomatoes in poor lighting, etc.).
Consequently, the objectives proposed in this work were successfully achieved although there are numerous lines of research that could be followed in the future, both to improve the performance of the system that is already implemented and to expand its computer-vision functionality and versatility in detecting commercial fruit, or sorting the tomatoes by quality criteria such as color or size, or using other algorithms (for example, applying CNNs for tomato detection). In addition, the design of the robotic part of the project and the integration of the robot and computer-vision subsystems (the z-coordinate calculation, developing the cut-end effector, exploring pressure systems and picking tomatoes) should be studied.