Intelligent Control System to Irrigate Orchids Based on Visual Recognition and 3D Positioning

This work develops a novel automatic irrigation system to implement the customized and accurate watering for an individual seedling. The system integrates the modules of visual recognition of the stem-leaf junction, identification of the stem-root junction as the watering point, and control of the spraying nozzle. The model of YOLOv3 is employed to screen the stem-leave junction of an orchid seedling, whose depth map then acquired by the method of Semi-Global Block Matching (SGBM) extracts the three-dimensional (3D) coordinates of the junction center. Next, the concept of leaf vector is introduced to identify the stem-root junction of the orchid seedling as the accurate watering point, which the spraying nozzle is controlled to reach for supplement of the specific amount of water. A number of experiments were conducted to verify the proposed irrigation system for orchid seedlings at different locations with various heights. The experimental results show that the rates of successful watering are 82% and 83.3% for the uni-pot and multi-pot orchid seedlings, respectively.


Introduction
Phalaenopsis are high-price plants, which need to be cautiously taken care of before they are sold. The orchid seedlings are traditionally irrigated by individually watering according to their growth condition. However, intensive manpower must be required for in-person watering, which would limit the massive production capacity of orchid seedlings, especially under the shortage of labors in some countries. For this intrinsic topic, the socalled sprinkler irrigation is one of the most popular and economical irrigation systems for the greenhouse, where the seedlings are watered automatically by means of showerheads installed over the assembly line. As being advantageous on watering a large area, with the addition of the machine vision, various schemes of global irrigation systems were proposed to locate and water the seedlings in the greenhouse [1,2]. Nevertheless, a high volume of water is poured onto the top of the seedlings, which then suffer overmuch moist leaves, uneven watering, or yet a deluge of water at the root. Consequently, Phalaenopsis plants would be diseased.
To solve this problem, this work aims at an effective vision-based watering approach to prevent plants on the assembly from being often mis-irrigated. In contemporary researches, the emergence of deep-learning based computer vision has contributed to detect and inspect the abnormal incidents in rather wide fields. For instance, a rapid recognition method was presented to examine the defects of electronic components [3]. Moreover, traffic conditions can be monitored by the computer vision techniques based on YOLOv3 (You Only Look Once version three) and spatial pyramid pooling (SPP) [4,5]. A lowcost swine surveillance system was accomplished by an automated vision-based object detector, for husbandmen to manage a large-scale swine farm in a cost-effective manner [6]. Additionally, even the machine vision with deep learning becomes more and more popular for the object detection in several agricultural applications, e.g., the coffee beans and the orchid seedlings [7][8][9][10][11]. ular for the object detection in several agricultural applications, e.g., the coffee beans the orchid seedlings [7][8][9][10][11].
Over the past decade, a plenty of deep-learning based techniques have been laun for the object detection and recognition. For example, the region-based convolutional ral network (R-CNN) [12] was proposed by Girshick et al. to succeed in applying the learning on the object recognition, where a convolutional neural network (CNN) a selective search for region proposals were hybridized for object detection and classi tion. Fast R-CNN [13] advanced the computational efficiency by utilizing the regio interest pooling (RoIpooling) to create the feature map for only single computation on same region proposal, which was repeatedly computed in the classical R-CNN.
Even though Fast R-CNN is superior to R-CNN in computational efficiency, bo them spend much time to acquire the region proposals by taking the same strategyselective search (SS). For this reason, the region proposal network (RPN) was firstly plemented in Faster R-CNN [14], to obtain the region proposals in an image fram using neural networks to search positive and negative anchors, which then were class by a softmax function and were positioned more accurately by an optimal bbox regres Despite satisfactory accuracies of the R-CNN based methods, high computational would be required to result in a low detection speed when using general computers, desktops and laptops.
To conquer the above-mentioned trouble, a new deep learning framework co YOLO [15] was invented by Redmon et al. Unlike Faster R-CNN, YOLO considers object detection as a regression problem and detects all interested objects of an entire age within the same period by the CNN, so it can perform better than R-CNN on th pect of the computational efficiency. Soon afterwards, Redmon and Farhadi prop YOLOv2 [16], which specialized in better speed and accuracy for the object detection recognition. The algorithm of YOLOv2 obtained the aspect ratio of the image borde introducing the idea of anchor box in Faster R-CNN and applying the k-means cluste replaced the dropout strategy with the batch normalization, and turned the CNN int darknet-19. Later in 2018, to further raise the performance of YOLOv2, Redmon and hadi developed YOLOv3 [17], being involved with the following improvements. Fir the darknet-53 with deeper layers is substituted for the darknet-19. Secondly, the class is achieved by the logistic regression, instead of the softmax function. Moreover, the Fea Pyramid Network (FPN) is also introduced to realize the multi-scale inspections. Figu proves that YOLOv3 spends only 22 ms to detect objects, and performs more accu recognition than a large portion of other detection methods. YOLOv3 competes with other detection methods for comparable performance, unde platform either an M40 or Titan X, with essentially the same GPU [17]. YOLOv3 competes with other detection methods for comparable performance, under the platform either an M40 or Titan X, with essentially the same GPU [17].
Accordingly, based on the object detection techniques of YOLOv3, this work develops a novel automatic irrigation system to implement the customized and accurate watering for an individual seedling, via the following ways: 1. Orchid seedlings are visually identified and selected by YOLOv3. 2. Orchid seedlings are accurately positioned in Appl. Sci. 2021, 11, 4531 3 of 20 three-dimensional (3D) space. 3. The stem-root junction is deemed the watering target to avoid damp leaves. 4. Intelligent control is introduced to adjust the watering amount.
The remaining parts of this work are listed as below. The system design and methodology are detailed in Section 2. The feasibility of the proposed system is verified through a number of experiments, as described in Section 3. Finally, Section 4 concludes this research and provides possible extensions.

System Design and Methodology
The proposed irrigation system for the orchid seedlings consists of several modules, such as the visual recognition, the construction of the 3D coordinate system, the positioning of the watering point, and the control for the system process, as shown in Figures 2 and 3. The framework of YOLOv3 is used for the visual recognition to identify and frame the seedlings. The 3D coordinate system is established through the depth maps corresponding to the output images of YOLOv3 for localization of the seedlings in the 3D space. The desired watering point is the stem-root junction of the seedling, which can be positioned and watered accurately by the spraying nozzle. The system process control takes into account the sequential control of automatic watering process. Accordingly, based on the object detection techniques of YOLOv3, this work develops a novel automatic irrigation system to implement the customized and accurate watering for an individual seedling, via the following ways: 1. Orchid seedlings are visually identified and selected by YOLOv3. 2. Orchid seedlings are accurately positioned in threedimensional (3D) space. 3. The stem-root junction is deemed the watering target to avoid damp leaves. 4. Intelligent control is introduced to adjust the watering amount.
The remaining parts of this work are listed as below. The system design and methodology are detailed in Section 2. The feasibility of the proposed system is verified through a number of experiments, as described in Section 3. Finally, Section 4 concludes this research and provides possible extensions.

System Design and Methodology
The proposed irrigation system for the orchid seedlings consists of several modules, such as the visual recognition, the construction of the 3D coordinate system, the positioning of the watering point, and the control for the system process, as shown in Figures 2  and 3. The framework of YOLOv3 is used for the visual recognition to identify and frame the seedlings. The 3D coordinate system is established through the depth maps corresponding to the output images of YOLOv3 for localization of the seedlings in the 3D space. The desired watering point is the stem-root junction of the seedling, which can be positioned and watered accurately by the spraying nozzle. The system process control takes into account the sequential control of automatic watering process.

Objecti Detection System
In order to accurately identify the location of each seedling to be watered and to distinguish the characteristics of each seedling from overlook view, this study frames the

Objecti Detection System
In order to accurately identify the location of each seedling to be watered and to distinguish the characteristics of each seedling from overlook view, this study frames the junction between the stem and leaf of the seedling, and collects hundreds of seedling over-view images to train the learning model of YOLOv3.
Firstly, this work manually frames the image datasets for junctions between stems and leaves of all seedlings via the software LabelImg. Secondly, the image datasets are divided into a high ratio of training, accompanied with a low ratio of validation and testing sets. Finally, the tactic of cross validation is selected to train the YOLOv3 model. The training and validation sets are used, respectively, to training the model and to prove the prediction accuracy of the trained model, which is followed eventually by the step that the use of the testing set is to provide an objective evaluation on a final model fitting the training set.

Construction of the 3D Coordinate System
The binocular visual system is established for the 3D coordinate system in this work. The distribution principle is to retrieve images of an object from different positions based on the binocular (dual lens). As illustrated in Figure 4, the true distance Z, between the object and the lens, can be calculated as follows: where d = (X L − X R ) is the disparity, X L and X R are the x-coordinates on the left and right images, respectively, b is the length of the base line (distance between the optical axes of both cameras), and f is the focal length of the camera.

Objecti Detection System
In order to accurately identify the location of each seedling to be watered and to distinguish the characteristics of each seedling from overlook view, this study frames the junction between the stem and leaf of the seedling, and collects hundreds of seedling overview images to train the learning model of YOLOv3.
Firstly, this work manually frames the image datasets for junctions between stems and leaves of all seedlings via the software LabelImg. Secondly, the image datasets are divided into a high ratio of training, accompanied with a low ratio of validation and testing sets. Finally, the tactic of cross validation is selected to train the YOLOv3 model. The training and validation sets are used, respectively, to training the model and to prove the prediction accuracy of the trained model, which is followed eventually by the step that the use of the testing set is to provide an objective evaluation on a final model fitting the training set.

Construction of the 3D Coordinate System
The binocular visual system is established for the 3D coordinate system in this work. The distribution principle is to retrieve images of an object from different positions based on the binocular (dual lens). As illustrated in Figure 4, the true distance Z, between the object and the lens, can be calculated as follows: ) is the disparity, and are the x-coordinates on the left and right images, respectively, b is the length of the base line (distance between the optical axes of both cameras), and f is the focal length of the camera.  The image coordinates (u, v) can be reprojected to the world coordinates (X, Y, Z) through the 4 × 4 reprojection matrix (Q) in order to obtain the actual 3D coordinates, which can be related according to Equation (2).
where d is the disparity and W is the distance parameters of homogeneous coordinates. The most widely used algorithm to match the feature point is Semi-Globe Block Matching (SGBM) [19][20][21], which obtains the Sum of Absolute Differences Window (SAD) to calculate the cost [22], as well as takes the left-camera image as the reference and the right-camera image as the target to perform pixel feature matching on the same epipolar line. Finally, the obtained parallax is also utilized by the SGBM algorithm to carry out the stereo vision. In this work, the left-and right-camera images are processed in grayscale, and then the feature matching is performed through the SGBM algorithm.
In the OpenCV function library [23], the function cv2.StereoSGBM_create() performs the execution of the SGBM algorithm. There are no fixed values for the parameters of SGBM to output the grayscale depth map, which further requires cv2.reprojectImageTo3D() to get the true 3D coordinates corresponding to the active pixel. After inputting the grayscale depth map and the reprojection matrix Q, the function cv2.reprojectImageTo3D() can output a depth map, each pixel of which tells 3D coordinates in the workspace with respect to the origin, i.e., the optical center of the left lens in Figure 4.
In order to make it convenient to observe the outputted depth map, the original noisy gray-scale depth map is converted into the pseudo-color depth map, which reduces the noises using the weighted least square (WLS) filter. It is a kind of edge-preserving filter that can smooth the whole image at the same time. OpenCV provides the cv2.applyColormap() function to convert gray-scale graphs to 12 color maps. Here, COLORMAP_JET mode is selected for the conversion process. On this mode, Figure 5 displays a pseudo-color map, where pixels with deeper red and deeper blue colors hold higher and lower gray values, respectively, in the original grayscale graph. Moreover, at the same pixel, the deeper red the color is, the closer the physical point is from the optical center of the left lens in  where d is the disparity and W is the distance parameters of homogeneous coordinates. The most widely used algorithm to match the feature point is Semi-Globe Bloc Matching (SGBM) [19][20][21], which obtains the Sum of Absolute Differences Window (SAD to calculate the cost [22], as well as takes the left-camera image as the reference and th right-camera image as the target to perform pixel feature matching on the same epipola line. Finally, the obtained parallax is also utilized by the SGBM algorithm to carry out th stereo vision. In this work, the left-and right-camera images are processed in grayscale and then the feature matching is performed through the SGBM algorithm. In the OpenCV function library [23], the function cv2.StereoSGBM_create() perform the execution of the SGBM algorithm. There are no fixed values for the parameters o SGBM to output the grayscale depth map, which further requires cv2.reprojectImageTo3D( to get the true 3D coordinates corresponding to the active pixel. After inputting the gray scale depth map and the reprojection matrix Q, the function cv2.reprojectImageTo3D() ca output a depth map, each pixel of which tells 3D coordinates in the workspace with re spect to the origin, i.e., the optical center of the left lens in Figure 4.
In order to make it convenient to observe the outputted depth map, the original nois gray-scale depth map is converted into the pseudo-color depth map, which reduces th noises using the weighted least square (WLS) filter. It is a kind of edge-preserving filte that can smooth the whole image at the same time. OpenCV provides the cv2.applyColor map() function to convert gray-scale graphs to 12 color maps. Here, COLORMAP_JET mode is selected for the conversion process. On this mode, Figure 5 displays a pseudo color map, where pixels with deeper red and deeper blue colors hold higher and lowe gray values, respectively, in the original grayscale graph. Moreover, at the same pixel, th deeper red the color is, the closer the physical point is from the optical center of the lef lens in Figure 4.

Positioning of the Watering Point
The established identification and depth map of YOLOv3 will be merged. Accordin to actual requirements, when seedlings appear within the field of view of binocular lens the following processes are executed: 1. Apply YOLOv3 to identify whether there is a tar get in the image (the junction between stems and leaves) through the left lens; 2. if th target object is detected, the coordinates of pixels at the upper-left and lower-right corners on the prediction box in Figure 6, are, respectively, ( , ) and ( , ), whic

Positioning of the Watering Point
The established identification and depth map of YOLOv3 will be merged. According to actual requirements, when seedlings appear within the field of view of binocular lens, the following processes are executed: 1. Apply YOLOv3 to identify whether there is a target in the image (the junction between stems and leaves) through the left lens; 2. if the target object is detected, the coordinates of pixels at the upper-left and lower-right corners, on the prediction box in Figure 6, are, respectively, (x min , y min ) and (x max , y max ), which are returned to calculate the central position of the prediction box; 3. input the center position to the map to obtain the corresponding real 3D coordinates.
After the 2D coordinates of the central point of the prediction box are obtained, its actual 3D coordinates can be obtained by feeding the 2D coordinates and the depth map to cv2.ReprojectimageTo3D().
To refrain from diseases caused by directly watering on the leaves, the direction vector of the leaves (called leaf vector elsewhere), acquired by image processing, is defined as the fitted straight-line vector of the two main contours of leaves, as shown in Figure 7. After the leaf vector is calculated, the desired watering point is locked to the stem-root junction of the seedling, as depicted in Figure 7.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 6 of 2 are returned to calculate the central position of the prediction box; 3. input the center po sition to the map to obtain the corresponding real 3D coordinates. After the 2D coordinates of the central point of the prediction box are obtained, it actual 3D coordinates can be obtained by feeding the 2D coordinates and the depth map to cv2.ReprojectimageTo3D(). To refrain from diseases caused by directly watering on the leaves, the direction vec tor of the leaves (called leaf vector elsewhere), acquired by image processing, is defined a the fitted straight-line vector of the two main contours of leaves, as shown in Figure 7 After the leaf vector is calculated, the desired watering point is locked to the stem-roo junction of the seedling, as depicted in Figure 7. After selecting the junction position between the stem and the leaf through th YOLOv3 frame, the image of the center point of the prediction box is cropped, as provided in Figure 8. As different image sizes will induce redundant adjustments during the fol lowing erosion and expansion processes, the shorter length (or width) of the cropped im age is set to be constant by the OpenCV function cv2.resize() to simplify the image pro cessing.  After the 2D coordinates of the central point of the prediction box are obtained, it actual 3D coordinates can be obtained by feeding the 2D coordinates and the depth map to cv2.ReprojectimageTo3D(). To refrain from diseases caused by directly watering on the leaves, the direction vec tor of the leaves (called leaf vector elsewhere), acquired by image processing, is defined a the fitted straight-line vector of the two main contours of leaves, as shown in Figure 7 After the leaf vector is calculated, the desired watering point is locked to the stem-roo junction of the seedling, as depicted in Figure 7. After selecting the junction position between the stem and the leaf through th YOLOv3 frame, the image of the center point of the prediction box is cropped, as provided in Figure 8. As different image sizes will induce redundant adjustments during the fol lowing erosion and expansion processes, the shorter length (or width) of the cropped im age is set to be constant by the OpenCV function cv2.resize() to simplify the image pro cessing.  After selecting the junction position between the stem and the leaf through the YOLOv3 frame, the image of the center point of the prediction box is cropped, as provided in Figure 8. As different image sizes will induce redundant adjustments during the following erosion and expansion processes, the shorter length (or width) of the cropped image is set to be constant by the OpenCV function cv2.resize() to simplify the image processing.
to cv2.ReprojectimageTo3D(). To refrain from diseases caused by directly watering on the leaves, the direction vec tor of the leaves (called leaf vector elsewhere), acquired by image processing, is defined a the fitted straight-line vector of the two main contours of leaves, as shown in Figure 7 After the leaf vector is calculated, the desired watering point is locked to the stem-roo junction of the seedling, as depicted in Figure 7. After selecting the junction position between the stem and the leaf through th YOLOv3 frame, the image of the center point of the prediction box is cropped, as provided in Figure 8. As different image sizes will induce redundant adjustments during the fol lowing erosion and expansion processes, the shorter length (or width) of the cropped im age is set to be constant by the OpenCV function cv2.resize() to simplify the image pro cessing.  Then, to reduce noises, the function cv2.cvtColor() is used for grayscale processing of the cropped image. Moreover, the function of Gaussian blur, cv2.GaussianBlur() is utilized to blur the noises, such as spots, and finally the function cv2.Canny() is applied for edge detection. The effects of grayscale processing, Gaussian blur and edge detection are demonstrated in Figure 9a-c, respectively.
Then, to reduce noises, the function cv2.cvtColor() is used for grayscale processing of the cropped image. Moreover, the function of Gaussian blur, cv2.GaussianBlur() is utilized to blur the noises, such as spots, and finally the function cv2.Canny() is applied for edge detection. The effects of grayscale processing, Gaussian blur and edge detection are demonstrated in Figure 9a-c, respectively. In order to merge the effective line segments at the edge of the main body, the contour is expanded using the function cv2.dilate(), as shown in Figure 10a. As the corner of the line segment may influence the calculation process of the leave vector, i.e., the folding of the line segment may reduce the single directionality of the contour area, the function cv2.erode() is invoked to properly erode the expanded contour, as illustrated in Figure 10b. The function cv2.findContours() is utilized to calculate the areas of all blocks, the largest ones of which, selected through the function sorted(), represents the contour-of-interest of the image. As represented in Figure 11a, the area marked in red is the largest part, for which the function cv2.fitLine() is used to get the unit vector of the fitting line, clearly overlapped on the original image, as depicted in Figure 11b.  In order to merge the effective line segments at the edge of the main body, the contour is expanded using the function cv2.dilate(), as shown in Figure 10a. As the corner of the line segment may influence the calculation process of the leave vector, i.e., the folding of the line segment may reduce the single directionality of the contour area, the function cv2.erode() is invoked to properly erode the expanded contour, as illustrated in Figure 10b.
Then, to reduce noises, the function cv2.cvtColor() is used for grayscale processing of the cropped image. Moreover, the function of Gaussian blur, cv2.GaussianBlur() is utilized to blur the noises, such as spots, and finally the function cv2.Canny() is applied for edge detection. The effects of grayscale processing, Gaussian blur and edge detection are demonstrated in Figure 9a-c, respectively. In order to merge the effective line segments at the edge of the main body, the contour is expanded using the function cv2.dilate(), as shown in Figure 10a. As the corner of the line segment may influence the calculation process of the leave vector, i.e., the folding of the line segment may reduce the single directionality of the contour area, the function cv2.erode() is invoked to properly erode the expanded contour, as illustrated in Figure 10b. The function cv2.findContours() is utilized to calculate the areas of all blocks, the largest ones of which, selected through the function sorted(), represents the contour-of-interest of the image. As represented in Figure 11a, the area marked in red is the largest part, for which the function cv2.fitLine() is used to get the unit vector of the fitting line, clearly overlapped on the original image, as depicted in Figure 11b.  The function cv2.findContours() is utilized to calculate the areas of all blocks, the largest ones of which, selected through the function sorted(), represents the contour-of-interest of the image. As represented in Figure 11a, the area marked in red is the largest part, for which the function cv2.fitLine() is used to get the unit vector of the fitting line, clearly overlapped on the original image, as depicted in Figure 11b.
Then, to reduce noises, the function cv2.cvtColor() is used for grayscale processing of the cropped image. Moreover, the function of Gaussian blur, cv2.GaussianBlur() is utilized to blur the noises, such as spots, and finally the function cv2.Canny() is applied for edge detection. The effects of grayscale processing, Gaussian blur and edge detection are demonstrated in Figure 9a-c, respectively. In order to merge the effective line segments at the edge of the main body, the contour is expanded using the function cv2.dilate(), as shown in Figure 10a. As the corner of the line segment may influence the calculation process of the leave vector, i.e., the folding of the line segment may reduce the single directionality of the contour area, the function cv2.erode() is invoked to properly erode the expanded contour, as illustrated in Figure 10b. The function cv2.findContours() is utilized to calculate the areas of all blocks, the largest ones of which, selected through the function sorted(), represents the contour-of-interest of the image. As represented in Figure 11a, the area marked in red is the largest part, for which the function cv2.fitLine() is used to get the unit vector of the fitting line, clearly overlapped on the original image, as depicted in Figure 11b.  function, the original fitted line vector is inputted to compute the vertical unit vector by the inner product. Since the vertical unit vector can be represented in two directions, the system watering is set to water from the target with a smaller X-coordinate to the one with a larger X-coordinate, in order to reduce the total moving distance of the motor. Then, the distance between the new watering and central points for the seedling plant is set and multiplied by the selected unit vector to obtain the corrected distance in the X-and Y-directions. According to the size of the seedling plant, the new targeted watering point will be 1.5 to 2.5 cm far from the center of the seedling plant, which is on the straight line of the leaf growth, as illustrated in Figure 12.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 8 of 21 The desired watering point on the stem-root junction is calculated after obtaining the actual 3D coordinates of the center point and the leaf vector. Through the customized function, the original fitted line vector is inputted to compute the vertical unit vector by the inner product. Since the vertical unit vector can be represented in two directions, the system watering is set to water from the target with a smaller X-coordinate to the one with a larger X-coordinate, in order to reduce the total moving distance of the motor. Then, the distance between the new watering and central points for the seedling plant is set and multiplied by the selected unit vector to obtain the corrected distance in the X-and Ydirections. According to the size of the seedling plant, the new targeted watering point will be 1.5 to 2.5 cm far from the center of the seedling plant, which is on the straight line of the leaf growth, as illustrated in Figure 12. In addition to the corrected distance in the direction of X-and Y-directions, since the height of the stem-root junction area will be slightly lower than the center of the seedling plant, an additional offset for the Z-direction height will be taken into account. After including the tri-axial correction values to relocate the seedling, the original 3D coordinates can be transferred to the real 3D coordinates for accurately targeting the watering point If the leaf vector cannot be found after selecting the prediction box of YOLOv3, the distances in the X-and Y-directions shall not be corrected. However, for the Z-direction, the collision between the sprinkler and the center of the seedling shall be prevented by increasingly adjusting the height for watering via the customized functions. To make clear the whole correction process for 3D coordinates of the seedling is described in Figure 13. In addition to the corrected distance in the direction of X-and Y-directions, since the height of the stem-root junction area will be slightly lower than the center of the seedling plant, an additional offset for the Z-direction height will be taken into account. After including the tri-axial correction values to relocate the seedling, the original 3D coordinates can be transferred to the real 3D coordinates for accurately targeting the watering point. If the leaf vector cannot be found after selecting the prediction box of YOLOv3, the distances in the X-and Y-directions shall not be corrected. However, for the Z-direction, the collision between the sprinkler and the center of the seedling shall be prevented by increasingly adjusting the height for watering via the customized functions. To make clear, the whole correction process for 3D coordinates of the seedling is described in Figure 13.

System Process Control
The system process control is divided into four parts: the motor control, the command control, the triggering process and the automatic watering process.
The three-axis sliding table is driven by three stepper motors with the synchronous wheel timing belts and screws. Firstly, for the motor control, the computer uses Python and Arduino to execute the operation instructions of the stepping motor through the Arduino control board. Secondly, the command control for the stepping motor to accomplish the specified task is implemented by mutual communications between Arduino and the computer via the package pySerial of Python. The function mot.write() is used to pass a set of strings combining letters and numbers (e.g., j0), which stand for the code of the task type and the relevant parameters, respectively. Afterwards, the functions, Serial.read() and Serial.Parseint(), are used in the Arduino to receive the commands to control the motor.

System Process Control
The system process control is divided into four parts: the motor control, the command control, the triggering process and the automatic watering process.
The three-axis sliding table is driven by three stepper motors with the synchronous wheel timing belts and screws. Firstly, for the motor control, the computer uses Python and Arduino to execute the operation instructions of the stepping motor through the Arduino control board. Secondly, the command control for the stepping motor to accomplish the specified task is implemented by mutual communications between Arduino and the computer via the package pySerial of Python. The function mot.write() is used to pass a set of strings combining letters and numbers (e.g., j0), which stand for the code of the task type and the relevant parameters, respectively. Afterwards, the functions, Serial.read() and Serial.Parseint(), are used in the Arduino to receive the commands to control the motor.
Thirdly, the triggering of the watering process is started to launch the automatic watering process, as addressed in Figure 14. If the seedlings enter the camera screen and the YOLOv3 boxes the junctions between stems and leaves to several blocks, the specific one of which is selected in the picture to trigger the watering process. Thirdly, the triggering of the watering process is started to launch the automatic watering process, as addressed in Figure 14. If the seedlings enter the camera screen and the YOLOv3 boxes the junctions between stems and leaves to several blocks, the specific one of which is selected in the picture to trigger the watering process. The area that triggers the process is represented as the function rectangle() with green line in the real-time image of the left image. The triggering area is located above th The area that triggers the process is represented as the function rectangle() with a green line in the real-time image of the left image. The triggering area is located above the center of the left lens, with a length of 240 pixels and a width of 120 pixels. When the central point of the prediction box enters into the triggering area, the watering process will be triggered.
Whether the prediction frame firstly enters the triggering area is determined by the fact that the number of prediction frames is greater than zero and the central points of all prediction frames are in the triggering area. At this time, the trigger signal is set to 1. If the number of frames is greater than zero, but the central point of the prediction frame does not enter the triggering area, the trigger signal is set to 0. When the number of prediction frames equals zero, the trigger signal is also set to 0. When the current trigger signal is greater than the previous one, the automatic watering process will be triggered. This approach ensures that the prediction frame staying in the triggering area will not cause repeated watering problems.
Finally, the watering process is completed by the customized functions, which at first, defines the moving boundary of each axis motor, and confirms whether each group of target coordinates exceeds the moving boundary. If any group of target coordinates exceeds the moving boundary, it will be removed. When the sprinkler reaches the plane position of the targeted watering point, the Z-axis motor is lowered to the specified height. Then, the valve is opened with a duration of time to satisfy the preset watering volume for completing the pouring step. Once the irrigation is accomplished, the Z-axis motor is raised to the specified height, e.g., 2 to 5 cm, based on the dimensions of the seedlings. The overall flowchart is detailed in Figure 15.

System Setup
The system platform is demonstrated in Figure 16, where an Arduino control bo a stepper motor drive board, a linear-slider driving system, a solenoid valve, a water

System Setup
The system platform is demonstrated in Figure 16, where an Arduino control board, a stepper motor drive board, a linear-slider driving system, a solenoid valve, a water storage container and a stereo camera are all installed on the three-axis motion platform. This platform is an aluminum extruded structure of 61.6 cm (width) × 61.6 cm (length) × 40.2 cm (height); the accessory carrier is an aluminum extruded structure of 21.0 cm (width) × 40.2 cm (length) × 62.5 cm (height). Furthermore, the watering nozzle is at a height of 22.9 cm from the ground.

System Setup
The system platform is demonstrated in Figure 16, where an Arduino control board a stepper motor drive board, a linear-slider driving system, a solenoid valve, a water stor age container and a stereo camera are all installed on the three-axis motion platform. Thi platform is an aluminum extruded structure of 61.6 cm (width) × 61.6 cm (length) × 40. cm (height); the accessory carrier is an aluminum extruded structure of 21.0 cm (width × 40.2 cm (length) × 62.5 cm (height). Furthermore, the watering nozzle is at a heigh of 22.9 cm from the ground.  For the image processing, the system employs the binocular cameras, being horizontally downward and parallel to the ground, fixed at a height of 63 cm from the ground. For the object detection and the obtainment of depth maps with YOLOv3, the left lens of the stereo camera is used as the main image acquisition device, so the center of the left lens is placed at the horizontally symmetric axis of the three-axis motion platform. Figure 17 defines positive directions indicated by yellow arrows labeled as X(+), Y(+) and Z(+), respectively towards positive directions of X-, Y-, and Z-axes of camera coordinates. As drawn in Figure 18, the origin of the three-axis motion platform is denoted as (0, 0, 0), and yellow arrows labeled with X(+), Y(+) and Z(+), representing the positive directions of X-, Y-, and Z-axes, respectively. Furthermore, the allowable travel lengths of the platform in the X-, Y-, and Z-directions are 40 cm, 40 cm and 16 cm, respectively. For the image processing, the system employs the binocular cameras, being horizon tally downward and parallel to the ground, fixed at a height of 63 cm from the ground For the object detection and the obtainment of depth maps with YOLOv3, the left lens o the stereo camera is used as the main image acquisition device, so the center of the lef lens is placed at the horizontally symmetric axis of the three-axis motion platform. Figur 17 defines positive directions indicated by yellow arrows labeled as X(+), Y(+) and Z(+ respectively towards positive directions of X-, Y-, and Z-axes of camera coordinates. A drawn in Figure 18, the origin of the three-axis motion platform is denoted as (0, 0, 0), and yellow arrows labeled with X(+), Y(+) and Z(+), representing the positive directions of X Y-, and Z-axes, respectively. Furthermore, the allowable travel lengths of the platform i the X-, Y-, and Z-directions are 40 cm, 40 cm and 16 cm, respectively.

Model Training Results
There are totally 950 images in the dataset of the frame selection in this work ing 600, 150, and 200 ones in the training, validation, and test sets, respectively. T ing and validation sets were put together into the model training. After the train cess, the accuracy rates (also known as the average precision, AP) of the validation the test set are 92.63% and 86.23%, respectively, which implies that this work is practicability. The object detection results of the partial training set are shown i 19.

Model Training Results
There are totally 950 images in the dataset of the frame selection in this work, including 600, 150, and 200 ones in the training, validation, and test sets, respectively. The training and validation sets were put together into the model training. After the training process, the accuracy rates (also known as the average precision, AP) of the validation set and the test set are 92.63% and 86.23%, respectively, which implies that this work is of high practicability. The object detection results of the partial training set are shown in Figure 19.

3D Localization
There are two conditions for the experiments of 3D localization. Under Con the central point of the target is considered the measurement point and departs f optical center of the left lens by 40 cm, as displayed in Figures 20 and 21.     In the first experiment, Condition 1 provided the measuring method, and the results of measurement errors were recorded in Figure 24, corresponding to the Z-axis distance ranging from 20 to 80 cm. When the Z-axis distance ranges from 20 to 35 cm, the error is obviously large. Until the Z-axis reaches farther than 80 cm, the error starts to rise again, but negatively. Figure 25 clearly indicates that the best working distance of this binocular camera is from 35 to 75 cm, within which the absolute values of average, maximum, and minimum errors are 0.22 cm, 0.42 cm, and 0.02 cm, respectively.  In the first experiment, Condition 1 provided the measuring method, and the results of measurement errors were recorded in Figure 24, corresponding to the Z-axis distance ranging from 20 to 80 cm. When the Z-axis distance ranges from 20 to 35 cm, the error is obviously large. Until the Z-axis reaches farther than 80 cm, the error starts to rise again, but negatively. Figure 25 clearly indicates that the best working distance of this binocular camera is from 35 to 75 cm, within which the absolute values of average, maximum, and minimum errors are 0.22 cm, 0.42 cm, and 0.02 cm, respectively.  In the first experiment, Condition 1 provided the measuring method, and the results of measurement errors were recorded in Figure 24, corresponding to the Z-axis distance ranging from 20 to 80 cm. When the Z-axis distance ranges from 20 to 35 cm, the error is obviously large. Until the Z-axis reaches farther than 80 cm, the error starts to rise again, but negatively. Figure 25 clearly indicates that the best working distance of this binocular camera is from 35 to 75 cm, within which the absolute values of average, maximum, and minimum errors are 0.22 cm, 0.42 cm, and 0.02 cm, respectively.  Additionally, Condition 2 was used for the second experiment. The measurement results were addressed in Table 1 Additionally, Condition 2 was used for the second experiment. The measurement results were addressed in Table 1. The maximum absolute errors of the X-, Y-, and Z-axes of the four measurement points are 0.18 cm, 0.2 cm, and 0.32 cm, respectively. On the other hand, the minimum absolute errors of the X-, Y-, and Z-axes of the four measurement points are 0.1 cm, 0.12 cm, and 0.18 cm, respectively. It is shown that the resulted four measurements in Figure 23 deviate from the accurate positions with mm-scale errors, which is relatively little in comparison with the nozzle size. Therefore, it can be guaranteed that the nozzle still waters on the correct position, despite the presence of the measurement errors presented in Table 1.

Water Flow Control Experiment of Solenoid Valve
This research also recorded experimental results in Figures 26 and 27, to relate the watering amount (g) with the duty time for valve opening (s). It was measured by the way that the duty time for valve opening started from 3 s and increased sequentially with the one-second interval. The experimental results are linear in the range between 4 and 11 s; hence, the relationship between the watering amount and the duty time for valve opening, formalized in Equation (3), is adopted to control the amount of watering.

Experiments of Automatic Irrigation System
Two respective experiments for the single and multiple seedlings were considered in the experimental configuration of the automatic irrigation system. In the configuration for the single-seedling experiments, as shown in Figure 28, the triggering area in the identification screen was split into five sections, named as Upper left, Lower left, Central, Upper

Experiments of Automatic Irrigation System
Two respective experiments for the single and multiple seedlings were considered in the experimental configuration of the automatic irrigation system. In the configuration for the single-seedling experiments, as shown in Figure 28, the triggering area in the identification screen was split into five sections, named as Upper left, Lower left, Central, Upper

Experiments of Automatic Irrigation System
Two respective experiments for the single and multiple seedlings were considered in the experimental configuration of the automatic irrigation system. In the configuration for the single-seedling experiments, as shown in Figure 28, the triggering area in the identification screen was split into five sections, named as Upper left, Lower left, Central, Upper right, and Lower right, as denoted in Figure 29.

Experiments of Automatic Irrigation System
Two respective experiments for the single and multiple seedlings were considered in the experimental configuration of the automatic irrigation system. In the configuration for the single-seedling experiments, as shown in Figure 28, the triggering area in the identification screen was split into five sections, named as Upper left, Lower left, Central, Upper right, and Lower right, as denoted in Figure 29.  In order to numerically evaluate the experimental results, three types of indexes are defined. The first type is the number of correct watering to the stem-root junction, as depicted in Figure 30. The second type censuses the achievements of watering to the junction between stems and leaves, as highlighted in Figure 31. The third type stands for counting the incidents of improper watering to the leaf and colliding of nozzle with the leaf, as described in Figure 32. Two kinds of the watering effectiveness are formulated by Equations (4) and (5).
Success watering rate (%) = (# 1st type + # 2nd type) (# 1st type + # 2nd type + # 3rd type) Perfect watering rate (%) = # 1st type (# 1st type + # 2nd type) where # 1st type, # 2nd type, and # 3rd type are the resultant counts of watering with the first, second, and third types, respectively. In order to numerically evaluate the experimental results, three types of indexes are defined. The first type is the number of correct watering to the stem-root junction, as depicted in Figure 30. The second type censuses the achievements of watering to the junction between stems and leaves, as highlighted in Figure 31. The third type stands for counting the incidents of improper watering to the leaf and colliding of nozzle with the leaf, as described in Figure 32. Two kinds of the watering effectiveness are formulated by Equations (4) and (5).
Success watering rate (%) = (# 1st type + # 2nd type + # 3rd type) (4 Perfect watering rate (%) = # 1st type (# 1st type + # 2nd type) (5 where # 1st type, # 2nd type, and # 3rd type are the resultant counts of watering with th first, second, and third types, respectively.   Success watering rate (%) = (# 1st type + # 2nd type) (# 1st type + # 2nd type + # 3rd type) (4 Perfect watering rate (%) = # 1st type (# 1st type + # 2nd type) (5 where # 1st type, # 2nd type, and # 3rd type are the resultant counts of watering with th first, second, and third types, respectively.   In this experiment, the distance from the optical center of the left lens of the camer to the ground is 64.1 cm. The potted plants were placed in five sections, as shown in Figur 29, and watering will be repeated with ten times. Finally, Table 2 addresses that the tota successful and perfect watering rates are 82% and 70.7%, respectively. Whether the wa tering position is proper could be impacted by the obtainment of an incorrect leaf vector due to surplus dilation and erosion on the framed orchid image. However, among th records in Table 2, it can be found that there is a low occurrence ratio for the imprope watering, which shows that the system can be feasibly applied to practical customized watering onto orchids.  In this experiment, the distance from the optical center of the left lens of the camera to the ground is 64.1 cm. The potted plants were placed in five sections, as shown in Figure 29, and watering will be repeated with ten times. Finally, Table 2 addresses that the total successful and perfect watering rates are 82% and 70.7%, respectively. Whether the watering position is proper could be impacted by the obtainment of an incorrect leaf vector, due to surplus dilation and erosion on the framed orchid image. However, among the records in Table 2, it can be found that there is a low occurrence ratio for the improper watering, which shows that the system can be feasibly applied to practical customized watering onto orchids. For the multiple-seedling experiments, four pots of seedlings were placed in a rectangular manner in the pot holder, as displayed in Figure 33, where the seedlings were spaced by 9 cm and 18 cm, respectively, for the vertical and horizontal directions. Figure 34 demonstrates the experimental configuration for placing the four pots of seedlings in the triggering area. In addition to the three types of indexes to leverage the watering results of the single-pot experiments, the fourth-type index is also provided to accumulate the failures of watering due to the surplus Z-axis distance error, given by Equation (6).
Watering operation rate (%) = (# 1st type + # 2nd type + # 3rd type) (# 1st type + # 2nd type + # 3rd type + # 4th type) (6) due to surplus dilation and erosion on the framed orchid image. However, among th records in Table 2, it can be found that there is a low occurrence ratio for the imprope watering, which shows that the system can be feasibly applied to practical customized watering onto orchids. For the multiple-seedling experiments, four pots of seedlings were placed in a rec tangular manner in the pot holder, as displayed in Figure 33, where the seedlings wer spaced by 9 cm and 18 cm, respectively, for the vertical and horizontal directions. Figur 34 demonstrates the experimental configuration for placing the four pots of seedlings in the triggering area. In addition to the three types of indexes to leverage the watering re sults of the single-pot experiments, the fourth-type index is also provided to accumulat the failures of watering due to the surplus Z-axis distance error, given by Equation (6).
Watering operation rate (%) = (# 1st type + # 2nd type + # 3rd type) (# 1st type + # 2nd type + # 3rd type + # 4th type)  During the multiple-seedling experiments, the distance between the ground and th optical center of the left lens of the camera is 64.1 cm. The experiment was repeated by 1 times to collect 60 resultant sets. Finally, the statistic results of watering provide that th rates of operated, successful, and perfect watering are 70%, 83.3%, and 62.9%, respectively The experimental statistics are demonstrated in Table 3.
From Table 3, it can be found that there exists a ratio of 30% for no occurrence o watering, which could be the following two reasons. Firstly, the measurement errors o the depth map may be caused by insufficient environmental lighting onwards the featur points. Moreover, the watering process might be cancelled due to wrongly computed moving distance to the watering point for the nozzle, which thus moved too far to stay within the working boundary. During the multiple-seedling experiments, the distance between the ground and the optical center of the left lens of the camera is 64.1 cm. The experiment was repeated by 15 times to collect 60 resultant sets. Finally, the statistic results of watering provide that the rates of operated, successful, and perfect watering are 70%, 83.3%, and 62.9%, respectively. The experimental statistics are demonstrated in Table 3.

Informed Consent Statement: Not applicable.
Data Availability Statement: Data generated or analyzed in this study can be available from the corresponding author upon reasonable request. However, some data are disclosed under the protection of our partnership company.