Sugarcane Stem Node Recognition in Field by Deep Learning Combining Data Expansion

: The rapid and accurate identiﬁcation of sugarcane stem nodes in the complex natural environment is essential for the development of intelligent sugarcane harvesters. However, traditional sugarcane stem node recognition has been mainly based on image processing and recognition technology, where the recognition accuracy is low in a complex natural environment. In this paper, an object detection algorithm based on deep learning was proposed for sugarcane stem node recognition in a complex natural environment, and the robustness and generalisation ability of the algorithm were improved by the dataset expansion method to simulate different illumination conditions. The impact of the data expansion and lighting condition in different time periods on the results of sugarcane stem nodes detection was discussed, and the superiority of YOLO v4, which performed best in the experiment, was veriﬁed by comparing it with four different deep learning algorithms, namely Faster R-CNN, SSD300, RetinaNet and YOLO v3. The comparison results showed that the AP (average precision) of the sugarcane stem nodes detected by YOLO v4 was 95.17%, which was higher than that of the other four algorithms (78.87%, 88.98%, 90.88% and 92.69%, respectively). Meanwhile, the detection speed of the YOLO v4 method was 69 f/s and exceeded the requirement of a real-time detection speed of 30 f/s. The research shows that it is a feasible method for real-time detection of sugarcane stem nodes in a complex natural environment. This research provides visual technical support for the development of intelligent sugarcane harvesters.


Introduction
The area occupied by sugarcane planting in China ranks third in the world. However, the mechanisation of sugarcane harvesting is still at a relatively low level, for the main reasons that the mechanical harvesting destroys the stem nodes kept in the soil for the second year of growth, the impurity rate is high and the cutter is seriously worn by cutting into the soil. In contrast, although inefficient and labour-intensive, manual harvesting is widely adopted with good quality and flexibility. Therefore, it is necessary to improve the intelligence of sugarcane mechanical harvesting, and the recognition of the sugarcane cutting location is the first step toward intelligence.
Machine vision technology offers the possibility of identifying sugarcane stem nodes against a single background. Moshashai et al. [1] first studied the recognition of sugarcane stem nodes by comparing the diameter of different parts of the sugarcane and found that the diameter of the stem node was larger than the rest, which can be used to determine the position of the stem section. Shangping Lu et al. [2] proposed a feature extraction and recognition method of sugarcane stem nodes through the support vector machine method by extracting features of the S and H component images in the HSV colour space of sugarcane segment pictures. However, the background of the sugarcane image was an ideal one with a single colour. The local mean [3] was another method used to identify the sugarcane stem node by filtering the image and image segmentation on H components of the HSV colour space, and it then found the maximum grey value as the stem node position. The experimental object was the image with only a single sugarcane stem node. Weizheng Zhang et al. [4] studied the method of identifying and locating sugarcane stem nodes based on high spectral light imaging technology. Its recognition range was limited to the area around the sugarcane stem node, and the recognition accuracy was 98.33%. Yanmei Meng et al. [5] proposed a sugarcane node recognition algorithm based on multithreshold and multi-scale wavelet transform, even though the sugarcane could only be identified by stripping the sugarcane leaves in advance to expose the sugarcane node. Deqiang Zhou et al. [6] proposed a method of sugarcane stem node recognition based on Sobel edge detection to satisfy the working requirements of the sugarcane seed cutter. Jiqing Chen et al. [7] proposed a sugarcane nodes identification algorithm based on the sum of local pixels of the minimum points of vertical projection function to analyse the recognition of a single node and double nodes.
The methods mentioned above mainly relied on the traditional image-processing machine version method. It is clear that the algorithm of the machine version can identify the object mainly by analysis of the reflecting light and perspective light on the surface of the object, which needs to work in a simple environment. Although some of them cannot meet the requirements of real-time detection against a complex background or deal with sugarcane stem node recognition amidst sugarcane leaf wrapping, the research still puts forward feasible visual identification techniques for sugarcane seeding or harvesting machines.
Unlike traditional image processing that focuses on image feature recognition, deep learning is a learning method driven by big data, which has been widely used in agriculture [8][9][10], crop classification [11], crop image segmentation [12] and crop object detection [13]. Parvathi, S et al. [14] used the Faster R-CNN based on a deep learning algorithm to detect coconut in a complex background. Liang, Cuixiao et al. [15] studied the performance of the SSD algorithm in identifying lychee fruits and litchee branches at night. In 2020, Biffi, Leonardo Josoé Mitishita Edson et al. [16] studied the detection of apples in apple orchards based on the RetinaNet algorithm through the ground remote sensing system. Wu, Dihua et al. [17] discussed and compared the identification accuracy of the apple flower in the field based on the You Only Look Once v4(YOLO v4) algorithm and YOLO v3 algorithm. Deep learning is a new and efficient method for intelligent cultivation in sugarcane planting. In 2020, J. Scott [18] used deep learning to study the furrow mapping of sugarcane billet density, and Srivastava, S [19] proposed an approach based on deep learning for sugarcane disease detection. These research studies demonstrated that deep learning has a stronger identification ability in the natural environment and sugarcane field. In 2019, Shangping Li et al. [20] introduced the object detection technology for sugarcane stem node recognition based on deep learning, which was applied in the sugarcane cutting process for the first time. It used an improved YOLO v3 network to establish an intelligent recognition convolutional neural network model. However, the sugarcane samples were pre-processed by removing the leaves manually first in a pre-processed single-colour background environment. Table 1 shows the relevant studies by the above-mentioned scholars on sugarcane stem node recognition. Although deep learning has been adopted in other crop recognition applications, it is still rarely used in sugarcane stem node recognition.
The visual identification of sugarcane stem nodes in the complex natural environment still goes unreported due to the following difficulties: (1) The complex lighting conditions in the natural environment and unstable sunlight during the day reduce the image quality and affect the accuracy of the detection algorithm; (2) sugarcanes grow in clusters, and some of the sugarcane stem nodes are more or less covered by leaves; and (3) the diversity of the biological characteristics of sugarcane, including different stalk diameters and peel colour, increase the difficulty of identification. In order to solve these problems, this paper proposes a sugarcane stem node recognition algorithm based on deep learning driven by big data in the natural environment. The big data acquisition experiments were conducted at a real sugarcane farm, and the big data samples of the sugarcane stem node consisted of different light conditions and different shooting angles using the data expansion technique and images lighting conversion. The object detection algorithm based on deep learning can learn and understand the characteristics of different sugarcane stem nodes in the natural environment by learning big data. The rest of this article is organised as follows. The second section introduces the experimental procedure and data processing, including image acquisition, data expansion and the creation of image datasets. The third section introduces the sugarcane stem node detection model based on the YOLO v4 [21] algorithm in the natural environment. This algorithm is currently the best one-stage detection algorithm. The fourth part is the experimental part, which mainly discusses and analyses the experimental results. The last is the conclusion and prospect of this article. Figure 1 shows the systematic research route of this study.

Image Data Acquisition
The images of the bottom of the sugarcane were collected from the sugarcane farm in Fusui County, Guangxi, China. The sugarcane variety was Guitang No. 49, the sugarcane was in the mature stage and the average stem diameter was about 2.5 cm. The sugarcane was grown in the open air and planted side by side according to the requirements for mechanical harvesting. In order to match the diversity of the sample environment, images were collected at 8:00, 12:00 and 18:00, and the lighting conditions include side light, forward light and back light. These were the three moments when the light intensity changed the most in the daytime. During this period, the sugarcane photos in different light directions can be obtained by adjusting the camera shooting angle. During image acquisition, the shooting direction of the camera can simulate the forward light, side light and back light by setting in the same, vertical and opposite directions as the light propagation direction. Considering that the camera's shooting angle will affect the detection performance, images were collected from multiple shooting angles during the image acquisition process.
The image set collected was composed of images of one single sugarcane stem node and images of multiple sugarcane stem nodes at a ratio of 1:3 to improve the robustness of the algorithm model. These two types of images are shown in Figure 2. Then, 1600 images were expanded to 8000 images using data expansion to generate the training data set and testing data set. The training data set was 7200 images, and the testing dataset was 800 images at the ratio of 9:1.

Image Data Acquisition
The images of the bottom of the sugarcane were collected from the sugarca in Fusui County, Guangxi, China. The sugarcane variety was Guitang No. 49, th cane was in the mature stage and the average stem diameter was about 2.5 cm. arcane was grown in the open air and planted side by side according to the requ for mechanical harvesting. In order to match the diversity of the sample envir images were collected at 8:00, 12:00 and 18:00, and the lighting conditions incl light, forward light and back light. These were the three moments when the light changed the most in the daytime. During this period, the sugarcane photos in light directions can be obtained by adjusting the camera shooting angle. Durin acquisition, the shooting direction of the camera can simulate the forward light, s and back light by setting in the same, vertical and opposite directions as the ligh gation direction. Considering that the camera's shooting angle will affect the d performance, images were collected from multiple shooting angles during the im quisition process.
The image set collected was composed of images of one single sugarcane st and images of multiple sugarcane stem nodes at a ratio of 1:3 to improve the ro of the algorithm model. These two types of images are shown in Figure 2. Th images were expanded to 8000 images using data expansion to generate the train set and testing data set. The training data set was 7200 images, and the testing dat 800 images at the ratio of 9:1.

Image Data Acquisition
The images of the bottom of the sugarcane were collected from the sugarcane farm in Fusui County, Guangxi, China. The sugarcane variety was Guitang No. 49, the sugarcane was in the mature stage and the average stem diameter was about 2.5 cm. The sugarcane was grown in the open air and planted side by side according to the requirements for mechanical harvesting. In order to match the diversity of the sample environment images were collected at 8:00, 12:00 and 18:00, and the lighting conditions include side light, forward light and back light. These were the three moments when the light intensity changed the most in the daytime. During this period, the sugarcane photos in different light directions can be obtained by adjusting the camera shooting angle. During image acquisition, the shooting direction of the camera can simulate the forward light, side light and back light by setting in the same, vertical and opposite directions as the light propagation direction. Considering that the camera's shooting angle will affect the detection performance, images were collected from multiple shooting angles during the image acquisition process.
The image set collected was composed of images of one single sugarcane stem node and images of multiple sugarcane stem nodes at a ratio of 1:3 to improve the robustness of the algorithm model. These two types of images are shown in Figure 2. Then, 1600 images were expanded to 8000 images using data expansion to generate the training data set and testing data set. The training data set was 7200 images, and the testing dataset was 800 images at the ratio of 9:1.

Image Data Expansion
Because the angle and intensity of the light change greatly during the day, the ability of the neural network to process the images collected at different times of the day depended on the integrity of the training dataset. In order to enhance the diversity of the data and improve the recognition ability of the model under different images, the collected images were pre-processed with a random colour, brightness, rotation and a mirror flip. In this experiment, the programming language Python3.6 was used to implement the data expansion, the framework was PyCharm and the libraries used were Pillow, Numpy and OpenCV. The images processed are shown in Figure 3. of the neural network to process the images collected at different times of the day depended on the integrity of the training dataset. In order to enhance the diversity of the data and improve the recognition ability of the model under different images, the collected images were pre-processed with a random colour, brightness, rotation and a mirror flip. In this experiment, the programming language Python3.6 was used to implement the data expansion, the framework was PyCharm and the libraries used were Pillow, Numpy and OpenCV. The images processed are shown in Figure 3.

Data Expansion by the Random Colour Method
Human beings recognise objects through the visual system, which is not affected by the changes of light and colour on the surface of objects, but a visual imaging device does not have such ability. Different lighting conditions will cause a certain deviation between the image colour and the true colour. Random colour processing of the image can further eliminate the influence of ambient light and improve the robustness of the detection model. The colour of the images was randomly adjusted by changing the saturation sharpness, contrast and brightness of the image, and superimposing it to achieve the effect of random colour processing.

Data Expansion by the Image Rotation and Flip Method
In order to further extend the image dataset, the original image was rotated 30 degrees and flipped. Table 2 shows the number of images in the dataset after rotation and flip.

Data Expansion by the Image Brightness Method
In sugarcane fields in a wild environment the sugarcane leaves often block out the sun, which will result in insufficient light at the bottom of the sugarcane. Using the method of image brightness enhancement to expand the dataset, it is possible to simulate the condition of making up for the lack of illumination using an added artificial light These extended datasets can compensate for the small variation of illumination intensity due to the short collection time. The number of images processed is shown in Table 2.

Image Annotation and Data Set Generation
The sugarcane images were manually labelled with LabelImg with bounding boxes drawn, classified into categories and saved in PASCAL VOC format. Marked rectangles were used to identify the sugarcane stem nodes. Data with insufficient or unclear pixel areas were not used to prevent overfitting in the neural network. The complete dataset is shown in Table 2.

Data Expansion by the Random Colour Method
Human beings recognise objects through the visual system, which is not affected by the changes of light and colour on the surface of objects, but a visual imaging device does not have such ability. Different lighting conditions will cause a certain deviation between the image colour and the true colour. Random colour processing of the image can further eliminate the influence of ambient light and improve the robustness of the detection model. The colour of the images was randomly adjusted by changing the saturation, sharpness, contrast and brightness of the image, and superimposing it to achieve the effect of random colour processing.

Data Expansion by the Image Rotation and Flip Method
In order to further extend the image dataset, the original image was rotated 30 degrees and flipped. Table 2 shows the number of images in the dataset after rotation and flip. In sugarcane fields in a wild environment the sugarcane leaves often block out the sun, which will result in insufficient light at the bottom of the sugarcane. Using the method of image brightness enhancement to expand the dataset, it is possible to simulate the condition of making up for the lack of illumination using an added artificial light. These extended datasets can compensate for the small variation of illumination intensity due to the short collection time. The number of images processed is shown in Table 2.

Image Annotation and Data Set Generation
The sugarcane images were manually labelled with LabelImg with bounding boxes drawn, classified into categories and saved in PASCAL VOC format. Marked rectangles were used to identify the sugarcane stem nodes. Data with insufficient or unclear pixel areas were not used to prevent overfitting in the neural network. The complete dataset is shown in Table 2.

YOLO v4
The YOLO network is a one-stage object detection algorithm of a deep learning method that converts the detection problem into a regression problem. Compared with the Faster Region-based Convolutional Neural Network (Faster-RCNN) [22], it does not need a region proposal network, and can directly generate bounding box coordinates and the probability of each category through regression. This end-to-end object detection greatly improves the detection speed. YOLO v4 is the latest algorithm of the YOLO series and is regarded as an improved version of YOLO v3 [23]. Compared with YOLO v3, it adopts Mosaic data expansion in data processing, and optimises the backbone, network training, activation function and loss function, which is faster than YOLO v3, and achieves the best balance between accuracy and speed in these real-time object detection algorithms.
As shown in Figure 4, the YOLO v4 network uses the open-source neural network framework Centre and Scale Prediction Darknet53 (CSPDarknet53) [24] as the main backbone network for training and extracting image features; then, the Path Aggregation Network (PANet) [25] was used as the neck network to better integrate the extracted features; the head was the same as YOLO v3's method of detecting objects. The main modules of the sugarcane stem node detection model based on YOLO v4 in the complex natural environment were as follows:

YOLO v4 Algorithm Training Process
In order to realise the rapid detection of sugarcane stem nodes based on YOLO v4, the model weights of YOLO v4 were pre-trained on the Microsoft Common Objects (Context MS CoCo) dataset, and the model parameters of the network input size, number of categories, batch size and learning rate were fine-tuned. The total number of training epochs was 84, and the first 25 epochs were freezing training, which can ensure that the initial weight was not destroyed and speeds up the training. The main settings are shown in Table 3.

YOLO v4 Algorithm Training Process
In order to realise the rapid detection of sugarcane stem nodes based on YOLO v4 the model weights of YOLO v4 were pre-trained on the Microsoft Common Objects (Con text MS CoCo) dataset, and the model parameters of the network input size, number o categories, batch size and learning rate were fine-tuned. The total number of training epochs was 84, and the first 25 epochs were freezing training, which can ensure that th initial weight was not destroyed and speeds up the training. The main settings are shown in Table 3.  The training set and testing set were used to train and test the YOLO v4 sugarcane stem node detection model. As shown in Formulas (1)-(4), the loss function used to train the YOLO v4 sugarcane stem node detection model mainly included the position loss of the bounding box, the confidence loss and the classification loss.
The c and d in Formula (2) are the distance between the centres of the two bounding boxes and the diagonal distance of their union, respectively.
S is the number of grids and B is the anchor number corresponding to each grid in Formulas (3) and (4).
where IOU, as an abbreviation for Intersection over Union, is the ratio of the intersection and union of the ground truth (A) and the predicted value boundary boxes (A ) in Formula (5).
where BCE represents the cross-entropy loss function of the true value (n) and the predicted value (n).
where w gt and h gt are the height and width of the ground truth bounding box, and w and h represent the width and height of the predicted bounding box. Formula 8 was derived jointly from Formulas (5) and (7).
In Formula (9), K stands for weight. If there is an object in K, its value is 1. p is the probability that the detection object is a sugarcane stem node. Figure 5 shows the total loss function curve during training. In the initial training stage of the sugarcane stem node detection model, the model learning efficiency was high, and the training converged fast. With the deepening of the training, the slope of the training curve decreased gradually. When the number of training iterations reached 80, the model learning gradually stabilised.
where IOU, as an abbreviation for Intersection over Union, is the rati and union of the ground truth (A) and the predicted value boundar mula (5).

= 4 arc tan ℎ − arctan ℎ
where w gt and h gt are the height and width of the ground truth bound h represent the width and height of the predicted bounding box. Formula 8 was derived jointly from Formulas (5) and (7).
In Formula (9), K stands for weight. If there is an object in K, its probability that the detection object is a sugarcane stem node. Figure 5 shows the total loss function curve during training. In stage of the sugarcane stem node detection model, the model learning and the training converged fast. With the deepening of the training, th ing curve decreased gradually. When the number of training iterati model learning gradually stabilised. The detection result of sugarcane stem nodes based on YOLO v4 is shown in Figure 6. In addition to the images processed by random colours, the algorithm can detect the cane stem node from the original image and three kinds of data-enhanced images, which proved that the algorithm had a high accuracy. In Figure 6e, the lowest sugarcane stem node was over-exposed after random colour processing, so it could not be identified. The detection result of sugarcane stem nodes based on YOLO v4 is shown in Figure  6. In addition to the images processed by random colours, the algorithm can detect the cane stem node from the original image and three kinds of data-enhanced images, which proved that the algorithm had a high accuracy. In Figure 6e, the lowest sugarcane stem node was over-exposed after random colour processing, so it could not be identified.

Performance Evaluation Index of Algorithm Model
Five commonly used indicators, precision P, recall rate R, mAP (Formula (14)), detection speed and F1 (Formula (12)), were used to verify the performance of the model. For a binary classification problem, the samples can be divided into four types according to the combination of the true category and the predicted category of the sample: TP (True Positive), FP (False Positive), TN (True Negative) and FN (False Negative). In this paper, when IOU ≥ 0.5, it was a True Positive; when 0 < IOU < 0.5, it was a False Positive. When IOU = 0, the background was detected, and it was regarded as a True Negative. When IOU ≥ 0.5, and the prediction result considers that the IOU was less than 0.5, it was a False Negative. The confusion matrix for the classification results is shown in Table 4.

Performance Evaluation Index of Algorithm Model
Five commonly used indicators, precision P, recall rate R, mAP (Formula (14)), detection speed and F1 (Formula (12)), were used to verify the performance of the model. For a binary classification problem, the samples can be divided into four types according to the combination of the true category and the predicted category of the sample: TP (True Positive), FP (False Positive), TN (True Negative) and FN (False Negative). In this paper, when IOU ≥ 0.5, it was a True Positive; when 0 < IOU < 0.5, it was a False Positive. When IOU = 0, the background was detected, and it was regarded as a True Negative. When IOU ≥ 0.5, and the prediction result considers that the IOU was less than 0.5, it was a False Negative. The confusion matrix for the classification results is shown in Table 4. The precision P and recall rate R were defined as Formulas (10) and (11). P was used to describe the proportion of the samples where the prediction is positive that the prediction is positive. R was used to describe the proportion where the labelling was positive that the prediction was positive. The higher the values of these two, the better the performance of the algorithm. Using the precision P as the vertical axis and the recall rate R as the horizontal axis, the precision recall (PR) curve was achieved.
The F1 score is a reference value derived from Recall and Precision, and its value is usually close to the smaller of the two. If the F1 score is high, it indicates that both the Recall and Precision are high, so it was hoped to obtain a high F1 score. The F1 score is defined as: The average precision (AP) can show the overall performance of a model under different score thresholds. In this paper, AP was obtained by averaging the precision value on the PR curve, and was defined as Formula (13). mAP was the sum of AP values for all categories/the number of categories, and C was the number of categories. Since only sugarcane stem nodes were detected in this paper, C = 1 was used in this study.
The detection speed (f/s) is the derivative of the computational time required by each method to recognise one sample. All the aforementioned methods were coded and developed in Python 3.6, and the deep learning framework was Keras. A workstation with a 2.3 GHz Intel 5218 × 2 processor, 64 GB RAM and 11GB NVIDIA RTX 2080Ti GPU was used for calculation and images processing.

The Recognition Effect of Different Algorithms
Four object detection algorithms, Faster R-CNN, SSD300 [27], RetinaNet [28] and YOLO v3, were selected and compared with the YOLO v4 algorithm to verify the recognition effect of the algorithm. The backbones of these four algorithms were ResNet50, VGG16, ResNet50 and Darknet53.
The training set was applied to the above five algorithms, and the test set was employed to evaluate the performance of the different detection algorithms. The P-R curves of the different algorithms are shown in Figure 7, which is a two-dimensional curve with precision and recall as the vertical and horizontal coordinates. When the P-R curve of one algorithm was surrounded by another algorithm, the latter performed better than the former. In Figure 7, except for the YOLO v3 algorithm and YOLO V4 algorithm, the curves of the other three algorithms all approached the coordinate point (1,0) at the end. Combined with Formulas (10) and (11), it was clear that the false detection rates (FP) of the Faster-RCNN, SSD300 and RetinaNet algorithms on the test set were all fairly high.
The F1 scores varying with the confidence threshold score are shown in Figure 8. The F1 scores are a harmonic average between the results of precision and recall, and range from 0 to 1, where 1 represents the best output of the model and 0 represents the worst output of the model. When the threshold value was set as 0.5 in this paper, the F1 value of YOLO v3 and YOLO v4 achieved the highest scores, indicating that the optimal output of the algorithm can be obtained when the two methods simultaneously meet the requirements of high precision and high recall rates. of the algorithm can be obtained when the two methods simultaneously meet the requirements of high precision and high recall rates.    In terms of detection speed, although YOLO v3 was slightly faster than YOLO v4, both of them far exceeded the real-time detection requirement of 30 f/s. As for the AP of the algorithm can be obtained when the two methods simultaneously meet the requirements of high precision and high recall rates.    In terms of detection speed, although YOLO v3 was slightly faster than YOLO v4, both of them far exceeded the real-time detection requirement of 30 f/s. As for the AP   In terms of detection speed, although YOLO v3 was slightly faster than YOLO v4, both of them far exceeded the real-time detection requirement of 30 f/s. As for the AP results, YOLO v4's AP is 16.3%, 6.19%, 4.29% and 2.48% higher than the other four algorithms, respectively. Through the analysis of the test results, it can be seen that the detection accuracy of YOLO v4 for sugarcane stem nodes was higher than the other four algorithms, and the detection accuracy was very close to the fastest algorithm. It was clearly more in line with the requirements of sugarcane stem node recognition in the complex natural environment.

Comparative Experiments of Recognition under Different Lighting Factors
The light environment will change during the continuous harvesting of sugarcane. In this experiment, different shooting time periods (morning, noon and nightfall) were used as control variables to represent different illuminance levels, which were respectively oblique strong light, direct strong light and oblique weak light. The number of images in each time period was 100. The statistical detection results are shown in Table 6, and some of the image detection results are shown in Figure 9. It can be seen from Table 6 that the precision, AP and F1 score of the YOLO v4 algorithm were the highest. The intensity of illuminance had a great influence on the accuracy of all the algorithms. The key factor determining the accuracy of the algorithm was whether the stem nodes were under direct strong light. The detection accuracy was reduced when the light was oblique strong light or oblique weak light. Therefore, it is recommended that the intelligent sugarcane harvester can increase its illumination device to improve the detection accuracy when working continuously in dim daytime light.
It can be seen from Figure 9 that the colour and texture of the sugarcane stem nodes were clear and easy to recognise in the morning and at noon. At nightfall, due to the dimming of the illuminance and the shade from the branches and leaves, the illuminance of the sugarcane peel was reduced greatly, although the object detection algorithm based on deep learning can still accurately identify the location of the sugarcane stem nodes.

The Recognition Effect of Different Data Expansion Methods
As mentioned above, four data expansion methods were used in this article: Rotation, mirror flip, random colour processing and brightness enhancement. In order to verify the effect of these four data expansion methods on the performance of the algorithm, the variable control method was applied to delete the image data corresponding to each data expansion method in the training set, then the testing set was used to test the trained algorithm model and the YOLO v4 algorithm detection results were obtained. The results are shown in Table 7. Appl. Sci. 2021, 11, x FOR PEER REVIEW 13 of 17 It can be seen from Figure 9 that the colour and texture of the sugarcane stem nodes were clear and easy to recognise in the morning and at noon. At nightfall, due to the dimming of the illuminance and the shade from the branches and leaves, the illuminance of the sugarcane peel was reduced greatly, although the object detection algorithm based on deep learning can still accurately identify the location of the sugarcane stem nodes.   Table 7, the method of rotation was very helpful to improve the detection accuracy. By deleting the images produced by the rotation method, the AP of the YOLO v4 detection model was reduced by 16.69%, and the F1 score was reduced by 0.11. The method of flipping had the least impact on improving the detection accuracy. After removing the mirror flip images, the performance of the training model was only slightly lower than that of the complete dataset. The AP of the YOLO v4 detection model was reduced by 5.73%, and the F1 score was reduced by 0.03.
Compared to the dataset without the images processed by a random colour, the model trained with the complete dataset had higher detection accuracy. After the removal of the random colour processing from the training set, the AP of the YOLO v4 detection model decreased by 9.27% and the F1 score decreased by 0.82. This indicated that random colour processing was very beneficial to improve the robustness of the model.
The recognition model without images processed by brightness enhancement was worse than that of the model trained with the complete dataset. The AP of the YOLO v4 detection model was reduced by 9.27% and the F1 score was reduced by 0.04. Bright-ness enhancement helped the model adapt to the lighting conditions of the complex natural environment.

Comparison with Previous Related Recognition Methods
In 2014, Girshick et al. proposed the RCNN (Region-based Convolutional Neural Network) [29] algorithm, which opened up a new era of object detection algorithms based on deep learning, and deep learning technology had also begun to be applied to the agricultural field on a large scale [8]. The previous methods of intelligent identification of sugarcane stem nodes have been fully discussed in Table 1 above, but they did not address the impact of lighting changes, sugarcane leaves and biological characteristics on recognition in complex environments. The research of this paper focuses on the recognition of sugarcane stem nodes in the field under the complex natural environment, which is still not understood at present. In order to improve the robustness and detection accuracy of the algorithm model, the data expansion method was used to enrich the datasets and simulate sugarcane images under different light conditions. Table 1 shows our research results on the recognition of sugarcane stem nodes.
In Table 1, comparing this paper with previous studies, we can find that deep learning technology has the advantage of not only recognising image features but also understanding the image content; this technology can detect sugarcane stem nodes more than 10 times faster than machine vision technology [3], which can thus satisfy the requirements of real-time detection.
At the same time, it is worth noting that, after fully considering the influence of light conditions, sugarcane leaves and biological characteristics in the complex environment, the detection speed of this paper's method was twice as fast as that of the YOLO v3 method on a simple background [20], and the accuracy was 4.74% higher than it too [20].

Conclusions
The object detection algorithm for sugarcane stem node recognition based on YOLO v4 in the natural environment was introduced in this paper for the first time and achieved rapid and accurate recognition of sugarcane stem nodes during harvest in the natural environment, while the robustness and generalisation ability of the algorithm were improved by the dataset expansion method to simulate different illumination conditions. The images were collected in different lighting conditions of side light, forward light and back light. The impact of the data expansion and lighting conditions at different times of the day on the detection results of sugarcane stem nodes was discussed, and the superiority of YOLO v4, which performed best in the experiment, was verified by comparison with four different deep learning algorithms, namely Faster R-CNN, SSD300, RetinaNet and YOLO v3. The main conclusions are as follows.
In the absence of a large amount of data, a data expansion method was adopted by simulating different illumination conditions and different shooting angles to train the detection model of sugarcane stem node recognition based on YOLO v4. The 1600 original images were expanded to 8000 images using data expansion to generate the training dataset and testing dataset. Through this method, the robustness of the model was effectively improved.
The AP of the object detection algorithm based on YOLO v4 was the highest, at 95.17%. Although the detection speed of YOLO v3 (72 f/s) was slightly faster than that of YOLO v4 (69 f/s), both of them far exceeded the real-time detection requirement of 30 f/s. By comparison with the previous studies on sugarcane stem node recognition, the object detection algorithm based on YOLO v4 in a complex natural environment can detect sugarcane stem nodes wrapped by leaves more than 10 times faster than machine vision technology in a pre-processed single-colour background environment. Meanwhile, after fully considering the influence of light conditions, sugarcane leaves and biological characteristics in a complex environment, the detection speed of this paper's method was twice as fast as the previous method using YOLO v3 in a pre-processed single-colour background, and the accuracy was 4.74% higher too. The result indicated that the detection method based on YOLO v4 was feasible for fast and accurate detection of sugarcane stem nodes in the complex natural environment. This method provides effective visual technical support for the intelligent sugarcane harvester.