Innovative Design of an Experimental Jasmine Flower Automated Picker System Using Vertical Gripper and YOLOv5

: Background: Recently, there has been a growing demand for the mechanization of ﬂower harvesting to enhance harvesting eﬃciency. Therefore, the purpose of the current research was to design a jasmine ﬂ ower automated picker system (JFAPS). The picking system incorporates a grip-per that moves along the third vertical axis using an Intel depth camera and the You Only Look Once (YOLO-V5) deep learning system to locate and detect the ﬂower s . Results: For diﬀerent design cross-sections, it was observed that the least safe factor of design safety was considered high enough to marginalize any mechanical failure potential. Furthermore, according to the prediction box, the ﬂowers ' center point on the pixel plane was detected, while the real vertical position of the ﬂowers was computed using a deep learning system. Consequently, the gripper moves down to pick the ﬂowers and convey them to the storage system. In these conditions, the detection method's average precision and recall of ﬂowers were 100% and 90%, respectively. Conclusions: The JFAPS was balanced and eﬃcient in detecting ﬂowers. Therefore, future eﬀort s will be directed at evaluating this system and conﬁrm ing its eﬃcacy in collecting ﬂowers on an experimental farm.


Introduction
The cultivation of flowers, especially cut flowers like jasmine, is vital for the economy. Jasmine (Jasminum sambac) is one of the most widely grown ornamental flowering shrubs and is admired for its fragrant blossoms. Jasmine is utilized for therapeutic purposes and for its essential oil production, as well as in floral arrangements and decorations. Egypt and India have overtaken China, Morocco, and Italy as the top exporters of natural jasmine oil [1][2][3] sound investment, harvesting is one of the biggest problems in the production supply chain. Expert labor is needed for collecting blooms and it requires more person-days than any other crop [1,4,5]. In this regard, automatic/robotic harvesting is essential because manual harvesting is labor-intensive, time-consuming, and difficult to perform [1,6]. With the advent of new technologies like robotics associated with image processing, machine learning, and artificial intelligence, intelligent agricultural harvesting devices have entered agricultural production [4,[7][8][9].
Furthermore, several automated harvesting techniques for saffron flowers have been used. Designs were based on estimations about the saffron plants' location, size, and sensitivity requirements as well as other factors including the humidity of the surrounding air. Asimopoulos et al. [10] designed an autonomous vehicle that consists of the vehicle, structure, sensors, and motors. The gripper moves mechanically and automatically along the axis while the sensors are mounted to the axis shell (robotic arm). The gripper closes and advances 10 mm upward, cutting the blossom in a motion like that of a human hand, while the color sensor, which is located immediately above the gripper (and the flower), assesses whether the flower is ready for harvesting. However, other concepts are used for image analysis to compute the best cutting spot, which is then given to a driver who positions a straightforward mechanical cutting mechanism to produce a clean cut. The flower is pressed against the feed or front beater, which has a slot for the saw's cutting teeth, by a cutting disc that rotates continuously to make the cut [11]. Additionally, Denarda et al. [12] created a two-finger gripper in conjunction with a particular transmission system made up of a leaf spring and a camera that enables a delicate separation of the saffron flower from the stem for autonomous harvesting. Airflow was used to gather the harvested saffron blossoms in a storage tank.
In this regard, automation or selective fruit harvesting typically requires three units: (a) a recognition system that validates the ripeness/quality and location of the fruits; (b) a moving system that moves a program-based subunit inside the farm; and (c) a picking system that executes gripping and cutting activities [13]. Researchers have also developed a gripper with an attached RGB-D camera (red, green, blue plus depth data camera) that can "swallow" a target by spreading its fingers. It only needs the fruit location for harvesting because it is developed to focus on the crop rather than the stalk. An upgraded vision system based on color against light intensity was developed to harvest strawberries to make it more resistant to lighting conditions. Additionally, an algorithm specifically for separating obstacles was added to allow the harvesting method to select strawberries that are clustered [14][15][16].
Moreover, the length, width, and weight of the fruit were taken into consideration when designing an end-effector for harvesting watermelons, which consists of two parts: the clamping mechanism and the cutting mechanism [17]. In this case, the screw nut subassembly transforms the stepper motor's rotating action into linear motion, allowing the connected slider to move up and down. The connecting slider and flexible finger are connected by a hinge, which allows the four fingers to open and close. To stop the cutting blade from severing the primary stem of the watermelon once the flexible fingers have clamped it, the end-effector drags the fruit under the drive of the robotic arm. The cutting blade then begins to rotate quickly, powered by a DC motor. The cutting tool, which is powered by a servo motor, then swings to the shearing point to cut the pedicel.
On the other hand, in computer vision applications, object detection is a fascinating task when it is utilized in real-time applications. Deep learning techniques include You Only Look Once (YOLO)-based CNNs, region-based convolution neural networks (R-CNNs), etc. [18]. One of the most advanced object detection algorithms in use today is called YOLOv5 and is a member of the YOLO family. Zhaoxin et al. [19] confirmed the link between the positions of tomatoes and peduncles using YOLOv5. According to the fruit's growth traits, the matching depth information is gathered and the robot is controlled to accomplish the picking task using the boundary boxes' centers on the peduncles as the picking spots. For a successful and lossless harvest, it is crucial to forecast, locate, and segment separation spots on tomato images. According to the findings, this approach is capable of identifying and pinpointing tomato harvesting locations even against complex near-color backgrounds. A single frame image's typical recognition time is 104 ms, which satisfies the automatic picking criteria for real-time processing. In addition, Egi et al. [20] utilized YOLOv5 to identify and count various tomato fruits. As a result, the current study sought to design a two-step automated system for picking jasmine flowers. The first step was to design a vertical gripper with three axes of movement based on the geometric characteristics of the jasmine shrubs and flowers. The second step was to use the image processing algorithm YOLOv5 to detect the flowers.

Experiment Setup
The current research was carried out at Council of Scientific and Industrial Research-Central Mechanical Engineering Research Institute (CSIR-CMERI), Centre of Excellence for Farm Machinery, Punjab, India during the 2022 season.

Jasmine Shrubs' and Flowers' Geometric Properties
To design the dimensions of jasmine flower automated picker system (JFAPS), the geometric properties of shrubs and flowers were collected from a traditional flower farm at Doraha, Punjab, India (30°47′43.4′′ N 76°01′16.7′′ E), where jasmine shrubs are planted in rows ( Figure 1). The distance between rows and shrubs, shrubs' height, and radius surface area for each shrub were measured using a meter (m), as well counted the number of flowers in a group. In addition, flower diameter (mm), bud and sepal length (mm), and flower radius (mm) were measured using digital vernier scale. Moreover, the detachment and holding force (N) were measured using 1sf-1df series digital force gage.

Experimental Design of Jasmine Flower Automated Picker System (JFAPS)
The experimental JFAPS components were constructed of three main parts, as illustrated in Figure 2. The first part is the mechanical parts including aluminum frame, moving rails, guide bars, brackets, gripper, and wheels. The second part is the control system containing camera, IR sensor, Arduino UNO, Raspberry Pi, stepper motors, motor drivers, power supply, and battery. In addition, the third part is the storage parts which include bracket and holder and storage bag. The 3D design of JFAPS drawn with Autodesk Inventor 2017 is shown in Figure 3.  Each part of JFAPS was fabricated from locally available materials that are described as follows.

Mechanical Parts
The mechanical parts of JFAPS design are based on a computer numerical control machine (CNC). These parts are constructed with aluminum profiles (40 × 40 mm). The whole frame was made primarily from aluminum profiles and linear rail (ball screw axes) that permit linear motion. The final dimensions of the system frame were 970 × 970 × 1500 mm for length, width, and height, respectively, with the vertical arm having a length of 700 mm. Four wheels (35 × 40 mm) made of rubber were attached to the system frame for easy movement between rows in the farm. As illustrated in Figure 4, the ball screw axes (moving rail) are made up of a linear module with a ball screw that allows for the backand-forth movement of a carriage and a rotatory servo motor that powers the motion. JFAPS consists of 3 axes; the Y-axis cross rail is enabled by the X-axis rails connecting the end point with the screw-bearing subassembly to the Z-axis, while the vertical axis is mounted on the screw-bearing subassembly on the Y-axis with the help of specially designed "C" bracket. The "C" bracket, as shown in Figure 5, was mounted on the Y-axis rail for balance between Y-axis and Z-axis (vertical axis) for easy movement. The "C" bracket made of aluminum contains 3 parts: (a) bracket base with dimensions of 130 × 86 × 17.5 mm, with a bearing no. (SKF6002, SKF, Gothenburg, Sweden) fitted with nylon wheel ring on the outer face, (b) side plate for connecting the screw-bearing subassembly of the Y-axis, and (c) screw side plate for fixing the screw-bearing subassembly of the Z-axis. In this study, the gripper shown in Figure 6 has an overall size of 220 × 30 × 140 mm and was attached to the Z-axis end. A holder is added to the gripper; this holder helps in flower picking (especially closed flowers depending on the purpose used). The holder is designed to be 23 mm in diameter based on the average diameter of fully developed unopened flower buds, which are 20 mm in diameter. In addition, according to the flower's mechanical and physical properties, this holder was made using a 3D printer from soft material that does not cause any physical damage or color change in the flower's petals during the picking process. After flower detection, the flowers were transferred to the storage part using a suitable holding force ranging between 1.2 and 2.4 N (Tables 1 and 2) using 1SF-1DF SERIES DIGITAL FORCE GAGE (Cutwel Ltd, Cleckheaton, UK) without affecting the flower quality.

Storage Parts
The storage parts were designed to be lightweight on the frame and facilitate the gripper movement during flower collecting, as shown in Figure 7A. They consist of a supporting bracket made of aluminum installed in the main frame by means of a corner connector, a rectangle loop made of aluminum pipe, and a collector bag with 180 L of volume. The collector bag is made of wet gunny, shown in Figure 7B, of a kind used to bulk-pack and sell jasmine flowers in wholesale markets, as mentioned by [21]. This type of bag is designed to maintain some moisture and extend shelf life with less physiological loss of jasmine weight. The bag is fitted into the supported bracket to be easily removed and fit the other collecting bag.

Control System
The Raspberry Pi communicates with the Arduino using the serial protocol that was used to operate the entire system, in the beginning sending the actuation signal to the camera for image acquisition and starting work on the YOLOv5 deep learning program for flower detection and placement, then sending a reference to Arduino UNO (A1) to carry out the required task as shown in (Figure 8). The JFAPS has four stepper motors (model no. ISC176, having a torque value of 2.5 N/mm 2 ), two for the X-axis and a single motor for the Y, and another for the Z-axis. The four microstep drivers move based on the position of the detected flower. To position the system at a specific coordinate or to turn on and/or off the gripper, commands must be communicated to the Arduino UNO (A2).
Li-ion batteries offer an unrivaled combination of high energy and power density, making them ideal for powering motors and their control system. An Intel ® RealSense™ depth camera D435 (Santa Clara, CA, USA) was used to acquire the field images. The camera was attached at the bottom of the "C" bracket, which has a number of benefits, including set height; its installation at a height of 1500 mm on a Y-axis rather than a Z-axis, which allows it to move up and down during harvesting, making it easier to harvest jasmine and resulting in a sufficient field of view across the breadth of the shrub; and an unobstructed view of the field.
Knowing the location of flowers in relation to the gripper is crucial for achieving the best control of the cutting position. For this reason, the distances from the flowers to the IR SHARP (0A41SK0F) sensors were calculated. The distance was set to 4 cm from the sensor for opening the gripper and picking the flowers in order to control the gripper's opening and closing near the target flowers. The distance measurements were recorded every 89 ms, which was adequate for closed-loop control and continuous measurement. Figure 9 shows the JFAPS photograph.

Mathematical Model and Simulation of JFAPS Design
A mathematical model was developed to predict bending moment loading based on force analysis using Formulas (1) and (2) to obtain the load distribution over all the beams and the severing positions during loading. Moreover, force analysis is vital step to review the behavior of loading on the bending moment diagram of the beams, as bending of beams is a critical obstacle facing the gripper movement.
where Bz is the reaction of pin B in the z direction.
Az is the reaction of pin A in the z direction. is the weight of the gripping system. is the distance between and Pin A. is the length of the beam. The mathematical model was solved using MATLAB code. Based on the mathematical model, the JFAPS was drawn on SOLIDWORKS (SW) to simulate the severing position.

Flower Detection
This study provides a method based on a depth camera and YOLOv5 deep learning for flower detection and placement; the target candidate box was generated and then classified in order to realize real-time flower harvesting on shrubs. The input image is divided into a grid of S × S, and each grid is in charge of identifying the target that falls into it. A total of 5300 photographs of flowers were taken. Figure 10 shows flower samples, each at a distinct zoom level, angle, and illumination. These data were used to forecast the flower detection probabilities. The training model's ability to adjust changes in background and lighting was greatly helped by the choice of photographs. This is crucial for teaching the network how to recognize flowers that are partially engulfed in foliage. The images were scaled down to 283 × 378 pixels in order to lower the training model's resource requirements. The training could be processed more effectively after reducing the size of the photos.

Camera, Raspberry Pi, and YOLOv5 Testing
To assess the accuracy of flower detection using a camera and Raspberry Pi, YOLOv5 was set up for the Raspberry Pi and attached to the camera. The photos of flowers were captured from five jasmine shrubs by the depth camera D 435 at a height of 1350 mm. The flowers should primarily be located in the middle of the images. In order to train the YOLOv5 model, about 3000 final photos were used. The center points of the flowers on the pixel plane were determined in accordance with the prediction box. The outline of the flowers was chosen using a rectangle box and using the MAKE SENSE application to label the target flower. Then, the depth camera's real-time 3D point data were employed to determine the actual position of the flowers.
Based on the object's (flower's) location in the image and its distance from the coordinates, the relative distances were determined for each coordinate. For the detected objects, flowers' central coordinates in the plane coordinate system were determined using the center point of the prediction box produced using YOLOv5. The points (x1, y1) and (x2, y2) are the left and right vertex points in the pixel coordinates of the observed prediction box obtained with the YOLOv5 model (Figures 11 and 12). Accordingly, the center point coordinate (x, y) was calculated using the following Formula (3) according to [22].  To recognize every single ground truth, a recall was calculated using the Formula (4) according to [23]. In addition, to determine the actual predicted positives, the precision average was calculated using Formula (5) according to [23].
where . is the number of real positive detections of flowers. .
is the number of unfound negative detections. .
is the number of unfound positive detections. The final processing is shown in the following flow chart (Figure 13).

Statistical Analysis
The data were statistically processed to estimate the average ± standard deviation (SD) of triplicates. Statistical analysis was performed at 0.05 level of significance with oneway analysis of variance (ANOVA) using SPSS program for Windows (Version 21) (SPSS, IBM Corporation, Armonk, New York, NY, USA).

Shrubs' and Flowers' Geometric Properties
Jasmine shrubs were located in rows with a maximum spacing of 1.30 m, and the distance between shrubs was 1.00 m. In addition, it was observed that the flowers were located in clusters of three to twelve. The flowers were divided into fully formed flowers and unopened flowers ( Figure 14). Tables 1 and 2 illustrate the most important geometric properties of jasmine shrubs and flowers that were used in designing the JFAPS. The maximum height of the jasmine shrubs while bearing the flowers was about 1.5 m, with a surface area of 0.0000785 m 2 . One shrub bore white flowers, which were about 20 mm in diameter. The average diameter of the unopened flower buds (one day before opening) was 8 mm, with an area of 200 mm 2 , and the detachment force between the petal and sepal was 1.5 N. In addition, the compressive strength was 4.92 N. However, the average diameter of the fully opened fresh flowers was 20 mm, the surface area of the opening flowers was 1245 mm 2 , and the detachment force between petal and sepal was 0.7 N.

Mathematical Model and Simulation Study
The mechanism shown in Figure 15 gave the required functions: moving in the X-Y plane and moving upward and downward in the Z direction to collect jasmine flowers. Since the mechanism is symmetric, the analysis conducted on the X-Z plane for links is exactly the same for the Y-Z plane. MATLAB code was used for solving this mathematical model see (Supplementary A) in the X-Z plane only (Figure 15). The severing position during loading is attained when the gripper is at one of the corners (Figures 16 and 17).

Torque on link while moving in X-Y direction
Based on the mathematical model, the mechanism was drawn in SOLIDWORKS (SW) (https://www.solidworks.com/) followed by a simulation study at the severing position. This study showed the stress distribution on the mechanism, the position of maximum defection, and its value (Figure 18a,b). Also, Figure 18c illustrates the factor of safety for the mechanism. It was observed that the minimum factor of safety is too high (higher than six). This is due to the cross-section of links used in the mechanism. The reason for selecting this cross-section was to recycle scrapped links we already had. The complete report on the study see a Supplementary B. On checking the reaction forces obtained from mechanism link analysis, it was observed that the direction of the reactions was upward, which indicates the vital need for a counterweight to prevent system revaluation. This counterweight is also needed for operation safety.

Flower Detection
As shown in Figure 19, the proposed YOLOv5 model was used to identify and categorize various targets. Each bounding box was marked with the category to which it belongs and given a confidence score to indicate the likelihood that the detected object belongs to a particular class or that the target's center was found in the grid. Due to the vast quantity of training examples, the open jasmine class's average predictions have greater accuracy levels and distinctive color characteristics. However, the closed and yellow jasmine classes have less accurate predictions. This is caused by the closed flowers' modest size, location among the leaves, and likeness to yellow blossoms. As a result, the open jasmine, closed jasmine, and yellow jasmine provided average predictions that were roughly 0.80, 0.65, and 0.55, respectively. Furthermore, Figures 20 and 21 illustrate the average precision when using YOLOv5 for five jasmine shrubs, and it was found that the precision was 100% for the open and closed flowers whereas, for the yellow flowers, it was between 95 and 98% because these flowers were very small and possibly due to how their environment and the lighting conditions affected the images. On the other hand, it was found that the average recall was between 80 and 100% for open, closed, and yellow flowers, as illustrated in Figure 18.

Discussion
Based on the geometrical traits of the jasmine shrubs and flowers, different designs were proposed to automatically harvest these flowers. The first system relies on movement between the rows of jasmine shrubs. Although one of its benefits is that the shrubs are harvested on both the right and left, one of its most important drawbacks is that the machine cannot be moved due to the increase in shrub branches and their bifurcation between rows .The other proposed system was harvesting by suction. This system was determined to be difficult to accomplish due to the high consumption of energy required; the increased component number, which places stress on the system; and its balance during moving . Therefore, this study designed the current JFAPS in order to avoid the other systems' drawbacks during the harvesting process in regard to system balance and energy consumption. This system is also in agreement with [24], who ensured that the most important reason for quality losses was the load increasing on the flowers due to wiper blades reducing the stem length when using linearly moving picking combs with additional stalk cutting. In addition, the results of ref. [25] were not compatible with our results, as they were shown to have better performance in terms of energy saving when collecting the detached flower through a vacuum collector.
On the other hand, in the current investigation, the authors expanded the training data set using the data augmentation technique, which improved the prediction accuracy and then made flower detection easier; these results were in agreement with [19,20]. Additionally, by implementing the YOLOv5 algorithm in the system, the counting process's robustness was improved. This was in disagreement with the findings of [20], who found that the far-off tomatoes and flowers could not be counted using the YOLOv5 algorithm because there were not enough flowers used in the training process. Moreover, Refs. [17,26] stated that the object detection algorithm can be improved using multimodal fusion to obtain both color information and location information by fusing color and depth images as the model input using the YOLOv5 algorithm. Correspondingly, our results revealed that the average precision when using YOLOv5 jasmine shrubs in the natural environment (open field) was 100%, while [27] reported that about 60% of saffron stems and flowers were detached using a photodetector and a scan plane to compose the vision system when this photodetector was placed in a cylindrical darkroom on one side of the cylindrical vertical pipe of the module. Furthermore, Krishnaveni and Pethalakshmi [1] captured flower images with an accuracy of 83% using a high-resolution camera, EOS 5DS.R, while feature extractions using the average color difference (ACD), CEDD, LBP, and Zernike moments methods were simulated using MATLAB version 13.0 with the background set to blue to avoid illumination.
Additionally, Refs. [28,29] mentioned that the ZED camera can be used to provide 3D perception, while all the analyzed depth cameras (D435, D415, ZED, and ZED 2i) were suitable for applications where robot spatial orientation and obstacle avoidance were mandatory in conjunction with image vision. This is also consistent with the current study's results. On the other hand, Refs. [28,29] used a YOLOv4-TINY algorithm as a detection algorithm, in which the average precision and recall rate for potted flowers were 89.72% and 80%, respectively. Furthermore, Ref. [30] utilized the YOLOv4 deep learning system to detect apple blooms, in which the precision in a natural setting was 89.43%. This was due to the flowers being so near to the background color and there being no space between two neighboring flowers, which exacerbated the complexity of feature extraction and resulted in flowers being overlooked. These findings concur with this study's conclusions because the YOLOv5 method performs better in terms of detection accuracy than YOLOv4 and YOLOv3 [31].

Conclusions
The current design system described in this paper was discussed to study the system's performance and to define future work possibilities. The proposed JFAPS was designed based on the geometric properties of jasmine shrubs and flowers. This system depends on three arms with separate Y-axis and Z-axis rails installed on X-axis rails. It incorporates a gripper that moves along the third vertical axis to pick jasmine buds. Moreover, it was observed that the minimum factor of safety was too high. In addition, the system uses an Intel depth camera and YOLOv5 deep learning system to locate and detect the flowers. A total of 5300 images of flowers were divided into training, validation, and test sets in a ratio of 60:30:10, respectively. The average predictions were roughly 0.80, 0.65, and 0.55 for open, closed, and yellow flowers, respectively. Moreover, the average precision was 100% for the open and closed flowers, whereas for the yellow flowers, it was between 95 and 98%. Furthermore, the average recall was between 80 and 100% for the three types of flowers.

Future Scope
Future efforts will be directed at evaluating this system for wide use in jasmine harvesting and confirming its efficacy in collecting jasmine flowers. Additionally, we will further improve the gripper to account for situations in which the flowers are packed closely together. Additionally, more visual techniques will be developed for use in frame detection from various perspectives in order to view frames that are obscured from one perspective and are visible from another. Moreover, we will evaluate the usefulness of this approach for gathering flowers in large quantities.