A Two-Step Phenotypic Parameter Measurement Strategy for Overlapped Grapes under Different Light Conditions

Phenotypic characteristics of fruit particles, such as projection area, can reflect the growth status and physiological changes of grapes. However, complex backgrounds and overlaps always constrain accurate grape border recognition and detection of fruit particles. Therefore, this paper proposes a two-step phenotypic parameter measurement to calculate areas of overlapped grape particles. These two steps contain particle edge detection and contour fitting. For particle edge detection, an improved HED network is introduced. It makes full use of outputs of each convolutional layer, introduces Dice coefficients to original weighted cross-entropy loss function, and applies image pyramids to achieve multi-scale image edge detection. For contour fitting, an iterative least squares ellipse fitting and region growth algorithm is proposed to calculate the area of grapes. Experiments showed that in the edge detection step, compared with current prevalent methods including Canny, HED, and DeepEdge, the improved HED was able to extract the edges of detected fruit particles more clearly, accurately, and efficiently. It could also detect overlapping grape contours more completely. In the shape-fitting step, our method achieved an average error of 1.5% in grape area estimation. Therefore, this study provides convenient means and measures for extraction of grape phenotype characteristics and the grape growth law.


Introduction
Phenotypic characteristics of grape particles are important indicators in grape water diagnosis, berry growth monitoring, and grape growth modeling. They are also an important basis for detecting the healthy growth of grapes and ensuring high-quality size control during their growing process [1][2][3]. Machine vision and its related advanced technology have great application prospects in modern fruit planting and phenotype detection [4]; for example, the machine vision method is applied to the segmentation of grape clusters in vineyards [5]. However, accurate detection of edge contours has become an urgent unsolved problem in the field of grape phenotypic feature detection due to the common overlaps of grapes in natural environment.
Many researchers have proposed various methods for contour detection and segmentation using traditional machine vision technology. Jin Yan et al. [6] used an improved Hough transform to segment the grapes that had low detection accuracy and could easily miss when grapes overlapped. Arman Arefi et al. [7] used a watershed algorithm to segment overlapping tomatoes; the recognition accuracy reached 96.36%, but more easily resulted in over-segmentation. Xiang Rong et al. [8] proposed an overlapping tomato-recognition method based on edge curvature analysis. The accuracy of slightly occluded overlapping tomatoes was 90.9%, but when the fruit occlusion rate was high, the accuracy of recognition effect was reduced. Wang Qiaohua et al. [9] used the characteristics of curvature angle and mutation point to extract sizes of grape particles. Lei Yan et al. [10] proposed a particle-segmentation method based on contour and ellipse fitting. This method had high detection accuracy for elliptical fruit, but when the contour of the fruit was too short, it Sensors 2021, 21, 4532 2 of 17 was easy to cause over-segmentation. Chenglin Wang et al. [11] used wavelet transform and a Retinex-based image-enhancement algorithm to segment oranges with an accuracy of 97.2%, but it was difficult to segment overlapping fruits. The gPb-Owt-Ucm image segmentation method proposed by Arbeláez Pablo et al. [12] achieved a score of 74% on the Berkeley segmentation data set, but the algorithm ran slowly. Dollár Piotr et al. [13] proposed an edge-detection algorithm based on structured forest that worked well, but the detection contour was thick. Although these methods achieved good results on grape segmentation to some extent, there still exists some shortcomings and limitations: (1) The traditional segmentation process is always multi-stage, and the complicated pre-processing and post-processing steps are relatively inefficient. In addition, since most pre-processing and post-processing steps are not universal, when the environmental parameters and grape colors change, the parameters in the processing flow also need to be fine-tuned in order to achieve the best results. (2) General detection algorithms have weak resistance to false edges and noise in images under complex conditions, which affects the accurate detection of grapes. (3) The detected edges might be inconspicuous, discontinuous, or missing for overlapped grape particles.
With the rapid development of deep learning in recent years, neural networks make it possible to extract high-dimensional contour features of overlapped grapes under complex conditions. The current methods can be divided into two branches. The first branch detects contours based on local areas, such as N4-field [14], DeepEdge [15], and DeepContour [16]. Their core pipeline is extracting the pixel block with the pixel as the center, extracting features using CNN, comparing the features with real contours, and predicting the pixellevel probability of contours. The above algorithm can better identify the contour of objects, but it also has the disadvantages of being time-consuming and memory-intensive in local area sampling. The other branch is end-to-end contour detection, represented by HED [17], RCF [18], and CEDN [19]. Its main idea is directly predicting the probability of pixel-level contours using CNN. Such an algorithm is simple and efficient, and the detection accuracy is high.

Methodology
To better extract features from overlapped grapes, this paper proposes a method with two steps: initial contour detection using a CNN-based network, and contour refinement using prior boundary shape knowledge. These will be further introduced in Sections 2.1 and 2.2.

Step One: Contour Detection
An HED network is proposed to solve the ambiguity problem of edge detection in automatic rich-image hierarchy learning [20]. It performs an end-to-end pixel-level prediction using a backbone of VGG16 that contains 13 convolutional layers. These layers are divided into five stages, with an individual pooling layer at the end. The last convolutional layer of each stage is connected to the side output layer that produces an expected edge-prediction map, and the final edge prediction map is the fusion of all stages.
The above-mentioned original HED network can detect most edges in the image, but sometimes these edges are thick and not obvious, and some edges with low contrast might be missed, which might affect the accuracy of subsequent grape phenotypic parameter measurement. Therefore, this paper proposes three improvement measures to address these problems, as shown in Figure 1. First, more side output network structures are introduced in order to merge more edge-prediction maps at different levels and make it more conducive to extracting multi-scales features. Then, the Dice loss function [21] is added to the original weighted cross-entropy loss function. Finally, an image pyramid for multi-scale feature fusion is also introduced. Details of these adjustments are given below. First, conventional HED only leads out the side output layer at the last convolutional layer of each stage, which does not make full use of the edge information contained in other convolutional layers. Inspired by RCF [18], we fused the resulting layers of VGG16 network in partial stages. Given the fact that our target scene was contour detection of grape particles, we paid more attention to high-dimensional feature maps to characterize the outline of the main part of the image, while low-dimensional feature maps are more suitable for characterizing image texture, noise, and other details. Therefore, compared to RCF, which accumulates the resulting layers in each stage using an eltwise layer to attain hybrid features, the improved HED network only connects all the convolutional layers in the third to fifth stages to the side output layers.
Taking the third stage as an example, each convolutional layer leads to a side output layer, the size of the convolution kernel is 1 × 1, and the channel depth is 25. The feature maps generated in each stage are superimposed and connected to a 1 × 1 − 1 convolutional layer. The edge-prediction image at this stage is obtained using upsampling that adopts a bilinear interpolation method to restore the predicted image to original size. The feature maps of all five stages are merged to predict the final edge map.
Second, most of the pixels in the image are non-edge pixels, and the distribution of edge/non-edge pixels is very unbalanced, so the general cross-entropy loss function used in classic two classification problems is difficult to adapt to such unbalance. To solve this problem, borrowed from [22], our loss function involves weighted cross-entropy loss: where Y + and Y − represent the collection of edge pixels and the collection of non-edge pixels in the picture, respectively. β = |Y − |/|Y|, 1 − β = |Y + |/|Y|, and |Y| = |Y + | + |Y − | represent all pixels. X represents the input picture, and Pr(y j X; W, w) is calculated by the pixel j through the Sigmoid function. Weight coefficient β in Equation (1) is used to solve non-convergence in training caused by the unbalanced picture pixel distribution, but it might also cause unclear edges in prediction results. In order to solve the contradiction caused by the weight coefficient, the Dice coefficient is introduced as the loss function [22]: where G is the actual edge map of the image, P is the predicted image, and P i and g i represent the pixel value of the i-th point in the predicted image and the actual edge image, respectively. The Dice coefficient is used as a solution of unbalanced image pixel distribution because it defines a measure of pixel-level similarity between the edge-prediction image and ground truth. In this way, it supervises the model to converge to a high-quality outcome with thinner and more accurate boundaries.
Considering the characteristics of the research object in this paper, a large number of single grape particles are used as data samples during the training process, and the effective edge pixels of the fruit occupy a small proportion of the whole picture. The problem of unbalanced image pixel distribution is more serious compared with other objects. Obviously, Dice loss is more suitable for our object. Therefore, the final loss of the model in this paper is the sum of weighted cross-entropy and the Dice loss: where L D (P, G) represents the Dice loss function, and L w (P, G) represents the weighted cross-entropy loss function. Third, grape particles usually exist in small-scale form in monitoring images, so image pyramids are used to enhance the model's ability of representing grape contours at different scales. Specifically, the resolution of input image is adjusted to various levels using bilinear interpolation to form an image pyramid as network input. Then, different-size edge-probability maps are predicted by the network and resized to the same size as the input using bilinear interpolation upsampling. Finally, these images are fused to obtain the final prediction image. The whole process is illustrated in Figure 2.

Step Two: Contour Fitting
Under some poor environmental conditions, such as large overlapped areas or low light intensity, there might be quite large notches of contours, as shown in Figure 3. This would bring difficulties to further feature-extraction processes. Therefore, this paper proposes a new method based on grape contour fitting in candidate regions. As shown in Figure 4, based on the original image of the grape cluster, the contour extraction is performed, and target detection of fruit is performed to obtain candidate areas. After combining the contour map with the candidate area, each rectangular candidate area contains the contour of a grape. The grape contour fitting is performed on each candidate area in turn, so as to obtain the contour results of all grape particles. This method, combined with the improved HED network, forms our proposed two-step measuring strategy, which can detect and fit as many grapes as possible, and ensure a high-precision prediction of the long axis and the projected area of the fruit.

Candidate Region Generation
Learning-based object-detection methods can be divided into two-step or single-step methods. The representative method of two-step detection is Faster R-CNN, proposed by Ren et al. [23], while the representative method of single-step detection is YOLOv3 [24], which directly predicts targets from inputs and performs at a higher speed. Therefore, in this paper, YOLOv3 was used for candidate region generation.

Iterative Least Squares Ellipse Fitting
There sometimes exist some noises, non-grape contours, or other grape contours in the new candidate area, and they might lead to biased outputs if all the sample data points are directly submitted into the calculation. In order to reduce the influence of error pixels on ellipse fitting, this paper proposes an iterative least squares ellipse fitting method based on the RANSAC algorithm. For a set of raw data points, they can be divided into two categories: inner points and outer points. Inner points are target data points, and their distribution can be described by a certain mathematical model. In this paper, inner points refer to outline pixels of the target grape. On the contrary, outer points such as noise, non-grape contour pixels, and other grape contour pixels are not applicable to the above mathematical models. The RANSAC algorithm was introduced to search for the best fit model that correctly clustered inner and outer points from iterative mathematical criterions. In this way, the proposed iterative least squares ellipse fitting can be summarized as follows: (1) For the input edge contour map, record the edge contour pixels as E; (2) Determine the number of iterations K as: where p is the probability of a situation in which all sampling points are inner points in at least one of K iterations, err is the proportion of outer points among all data points, and s is the number of sample points selected for each sampling. (3) Determine the evaluation criteria of model fitness: where C is the perimeter of ellipse obtained by fitting, and N Dis≤d is the number of pixels in the image whose closest orthogonal distance to the ellipse is less than d.

Experiments
The experiments were carried out in the greenhouse (31 • 11 N, 121 • 29 E) of the Modern Agricultural Engineering Training Center of Shanghai Jiao Tong University. In order to take pictures regularly, we developed an automatic image-acquisition system that included a digital camera with a maximum resolution of 3072 × 2304, a light source, and a timing controller. Figure 6 shows images of Sunshine Muscat grapes under different lighting conditions: side light, front light, and back light. Among them, the pictures were continuously collected every hour during the expanding period, and at the color-changed period of the grapes in 10 days.

Grape Collection and Labeling
A total number of 200 4032 × 3024-pixel photos of single Sunshine Muscat grapes under different backgrounds and environments were captured and uniformly reduced to 384 × 544 pixels, and the contours of the grapes were manually marked as ground truths, as shown in Figure 7. They were then augmented to 19,200 pictures after being sequentially flipped, rotated every 22.5 • , expanded to 1.5 times, and educed to 0.5 times.

Data Preparation for Object Detection
The data set used in for object detection contained a total number of 300 pictures of grape clusters. In order to facilitate training, the sizes of the pictures were uniformly converted to 500 × 375 or 375 × 500, depending on their length and width. LabelMe [25] was used for annotation. The grapes that were not covered or the covered area that did not exceed about 80% were labeled, as shown in Figure 8.

Model Performance Comparison
Three criteria, optical dataset scale (ODS), optical image scale (OIS), and frame per second (FPS), were used as the model's detection indicators. ODS refers to the detection score when all test set images used the same fixed threshold, and OIS refers to the detection score when the best threshold was used for each image in the test set. Before edge detection, it was necessary to perform non-maximum suppression (NMS) on the edge-prediction image, and then use Edge box [26][27][28] to perform testing.
In order to control the variables, the network adopted the optimized loss function, where α in Equation (3) is set to the optimal parameter value 0.6 determined by comparative experiments. Comparison results of our algorithm with DeepEdge, multi-scale version of HED, and multi-scale version of RCF are shown in Table 1 and Figure 9. We concluded that improved HED was higher than other models in terms of OIS and FPS. The reason why RCF outperformed our method in terms of ODS was because of the bias in the test data set, which consisted of much more individual grape particles than grape clusters. However, Figure 10 shows that improved HED could predict more complete contours of grape clusters.  was weak, and the contours were blurry. Under front-light conditions, some areas of grape surfaces formed bright white spots due to light reflection, and the contours of grapes were easy to confuse with the background. The contours of overlapped grapes detected by our improved HED network were more significant, the completeness of contours was higher under side-light conditions, and the unobstructed contours of grapes under back-light and front-light conditions were successfully recognized, although part of the contours of grapes blocked inside might have been missed.

Comparison Test of Different Illumination Angles
In this section, the performances of our model are compared with Canny, HED-MS, DeepEdge, RCF-MS, and an ablated version without Dice loss. The following indicators were introduced to measure the effect of grape contour detection: Dice (D) coefficient, as the similarity between predicted contour map and ground truth; recall rate (R), as the ratio of the true examples in false-negative examples to the true examples in the predicted grape contour map; contour redundancy rate (A), as the ratio of the number of edge pixels in ground truth to that of all edge pixels in the predicted contour map, which reflects the model's anti-interference ability against the contours or noise of objects other than grapes.
where X is the predicted map; Y is ground truth; TP is the number of edge pixels of the fruit that are correctly predicted; FN is the number of edge pixels of the fruit that are predicted as background pixels; and Count(X) is the total number of edge pixels in the output contour prediction image. In this section, a manual labeling method was used to evaluate the outline of grapes.
In natural environments, changes in lighting conditions will affect grape contour detection. In order to prove the adaptability of our method in various light conditions, we tested and compared the edge-detection abilities of different algorithms on grapes under front light, side light, and back light.
It can be seen in Figure 10 that the edges of the grapes are clear and easy to separate under the side light, and the surface of the grapes is uniformly illuminated, so the edge detection was easier. Under back-light conditions, the light intensity on surfaces of grapes was weak, and the contours were blurry. Under front-light conditions, some areas of grape surfaces formed bright white spots due to light reflection, and the contours of grapes were easy to confuse with the background. The contours of overlapped grapes detected by our improved HED network were more significant, the completeness of contours was higher under side-light conditions, and the unobstructed contours of grapes under back-light and front-light conditions were successfully recognized, although part of the contours of grapes blocked inside might have been missed.
Compared with HED, the contours of overlapped grapes detected by our algorithm were more complete, while HED had the error of recognizing reflections as contours. Furthermore, although RCF-MS might detect smoother contours of grapes, our method provided more complete contours for overlapped grapes under all three conditions, especially side light and back light. In addition, the integrity of contour detection was higher with the introduction of Dice loss, and the effect did not degrade much, even in the ablated version without Dice loss. In general, our algorithm was more comprehensive in contour detection of overlapped grapes.
It can be seen in Table 2 that the Dice coefficient of improved HED was more than 3% higher than other algorithms, and the recall rate was more than 2% higher than other algorithms. The redundancy rate of grape contours under different lighting conditions was only lower than RCF, and was higher than other algorithms.  Figure 11a,b show the detection results of grape clusters without overlaps, while Figure 11c, show those with overlaps. It can be seen that most obscured and unobstructed grapes could be detected successfully. In addition, in Figure 11d, not only the grapes in the target area, but also some small-scale grapes in the background, were successfully detected. Tests were performed on a new set that contained 100 images of grape clusters. As shown in Table 3, YOLOv3 reached 95.44% in the F1 test for grapes, a 94.56% accuracy rate, and a 96.24% recall rate. Recall rate defines the ratio of the number of successfully fitted grape contours to the actual number of grape contours, which is calculated as follows:

Candidate Region Generation Results
where N D is the number of correct grape contours obtained by fitting, and N A is the actual number of grape contours in the image. A total of 60 grapes with different clusters shapes and different environmental conditions were tested. Due to the growth characteristics of grapes, the fruit grains were mainly blocked by other adjacent fruit grains, and the cases of being blocked by branches and leaves rarely occurred. As shown in Figure 12, the heavily occluded grapes were ignored (most of the outlines of such grapes were not visible). Three algorithms were used to detect 60 grape cluster images and calculate the average recall rate. As shown in Table 4, the recall rate of our method was higher than that of the other two algorithms.

Contour-Fitting Accuracy Test
In this section, different grapes with manually made missing or noisy contours were used to verify the accuracy of our ellipse-fitting algorithm for grape particles.
Results of fitting tests on 60 artificially processed grapes are shown in Figure 13; our method accurately simulated missing or noisy grape contours. The detection results showed that it had high accuracy for incomplete grape-grain contour fitting under a complex background, and a strong anti-interference ability of non-grape contour edges and noise. Table 5 shows that the error of long axis measurement of the artificially processed grape particle profile was about 1%. Table 6 shows that the error of projected area measurement of the artificially processed grape particle contour was within 1.1%, which met the actual application requirements. In these two tables, AARD means the average absolute relative deviation. Figure 13. Grape contour-fitting results. Table 5. Measurement results of long axis of different grape particle contours by different algorithms.

Serial Number
The Least Square Method/Pixel

Serial Number
The

Continuous Monitoring of the Projected Area of Grapes
In this session, continuous monitoring of grapes cultivating was used as a practical verification of the effectiveness of our method, during which hourly and overall changes of projected areas of grapes were measured within six days during the expansion period. The instance-wise average was carried out over raw measurement results to reduce random errors.
The measurement results are shown in Figures 14 and 15. It can be seen that the projected area of grape fluctuated similar to a sinusoid curve every day, and the area showed an upward trend, which was consistent with the law that the projected area of grape fruit shrinks during the day and expands at night. It can be seen that the curve of projected area of fruit measured using our method was consistent with the theoretical law obtained in paper [29], which verified the effectiveness of our method. Furthermore, continuous monitoring of the projected area of grapes can be used for irrigation decisionmaking during grape fruit development [30].

Summary and Conclusions
This paper proposed a two-step measurement strategy for contour analysis of overlapped grapes that consisted of edge detection and contour fitting. For particle contour detection, an improved HED network with weighted Dice loss was introduced. It made full use of internal features from each convolutional layer and applied image pyramids to achieve multi-scale edge detection. For contour fitting, a method that combined iterative least squares ellipse fitting and a region-growth algorithm was proposed to calculate the projected area of grapes. Experiments showed that in the edge-detection step, compared with current prevalent methods, our improved HED was able to detect and extract complete fruit particle edges more clearly, accurately, and efficiently. In the shape-fitting step, our method achieved an average error of 1.5% in grape area estimation. Therefore, we hope this study can provide convenient means and measures for grape phenotype characteristic extraction and the grape growth law.