Research on Vision-Based Navigation for Plant Protection UAV under the Near Color Background

: GPS (Global Positioning System) navigation in agriculture is facing many challenges, such as weak signals in orchards and the high cost for small plots of farmland. With the reduction of camera cost and the emergence of excellent visual algorithms, visual navigation can solve the above problems. Visual navigation is a navigation technology that uses cameras to sense environmental information as the basis of an aircraft ﬂight. It is mainly divided into ﬁve parts: Image acquisition, landmark recognition, route planning, ﬂight control, and obstacle avoidance. Here, landmarks are plant canopy, buildings, mountains, and rivers, with unique geographical characteristics in a place. During visual navigation, landmark location and route tracking are key links. When there are signiﬁcant color-di ﬀ erences (for example, the di ﬀ erences among red, green, and blue) between a landmark and the background, the landmark can be recognized based on classical visual algorithms. However, in the case of non-signiﬁcant color-di ﬀ erences (for example, the di ﬀ erences between dark green and vivid green) between a landmark and the background, there are no robust and high-precision methods for landmark identiﬁcation. In view of the above problem, visual navigation in a maize ﬁeld is studied. First, the block recognition method based on ﬁne-tuned Inception-V3 is developed; then, the maize canopy landmark is recognized based on the above method; ﬁnally, local navigation lines are extracted from the landmarks based on the maize canopy grayscale gradient law. The results show that the accuracy is 0.9501. When the block number is 256, the block recognition method achieves the best segmentation. The average segmentation quality is 0.87, and time is 0.251 s. This study suggests that stable visual semantic navigation can be achieved under the near color background. It will be an important reference for the navigation of plant protection UAV (Unmanned Aerial Vehicle).


Introduction
In recent years, the UAV (Unmanned Aerial Vehicle) has been widely used in agriculture, which has an important application value for crop growth data acquisition, pesticide spraying, pest detection, and so on.In particular, visual-based UAV navigation has attracted more and more attention.Visual navigation is a navigation technology that uses cameras to sense environmental information as the basis of an aircraft flight.It is mainly divided into five parts: Image acquisition, landmark recognition, route planning, flight control, and obstacle avoidance.Here, landmarks are plant canopy, buildings, mountains, and rivers, with unique geographical characteristics in a place.During UAV visual navigation, landmark recognition and route planning are the core links, and complex image algorithms are often used to extract and match colorful landmarks.Compared with navigations in other areas, Symmetry 2019, 11, 533 2 of 16 farmland mainly consists of a near color background.The classical segmentation methods are able to achieve good results under the background of significant color differences [1,2].However, under the near color background, there is no mature, effective, and universal solution for plant canopy landmark segmentation.For example, a green leaves background makes it difficult to recognize green cucumbers, green peppers, and green apples when robots pick fruits or vegetables [3,4].Similarly, the green duckweed and cyanobacteria background makes it hard to extract the navigation line when robots navigate a paddy field visually [5].In order to effectively segment a plant canopy landmark under the near color background, it is necessary to develop specific segmentation algorithms.
Visual navigation based on a plant canopy landmark is one of the main navigation modes in agriculture, which can be achieved by three steps.First, vegetation is segmented; then, plant canopy landmarks are segmented from the vegetation; finally, the navigation line is detected according to the landmark.For the first two steps, many algorithms have been developed for vegetation and crop canopy segmentation, such as color-index-based segmentation: CIVE (color index of vegetation), NGRDI (normalized green red difference index), VEG (vegetative index), COM1 (combined indices 1) [6][7][8][9]; threshold-based segmentation: EH (entropy of a histogram), AT (automatic threshold) [10,11]; and learning-based segmentation: FC (fuzzy clustering), SVM (support vector machines), DTSM (decision tree based segmentation model), PSO-MM (particle swarm Optimization clustering and morphology modelling) [12][13][14][15].Color-index-based and threshold-based segmentation occupy less memory resources, but color-index-based segmentation is susceptible to light interference.As there is no significant difference from the green leaf base to the green leaf edge, it is difficult to achieve crop canopy segmentation based on a pixel feature threshold.Not only that, the green background is also one of the main reasons for poor segmentation quality.For some learn-based methods, such as FC and PSO-MM, poor real-time performance is a common drawback.About the last step, a variety of navigation line detection methods have been proposed in recent years.Generally, these methods are classified into several categories according to their detection principles, such as HT (Hough transform), LR (linear regression), SA (speckle analysis), SV (stereo vision), and HF (horizontal fringes) [16][17][18][19][20].It is worth noting that the crop row is approximated as a straight line in the above methods.However, actual crop rows are discrete broken lines and a more specific route is often needed during the slow-speed and high-precision navigation process for weeding and fixed-point fertilization and so on.After analysis, it was found that piecewise linearization of the crop row is a good idea to solve the deficiencies of the existing research.
In order to stabilize navigation, plenty of colorful landmarks are often created for route tracking.As mentioned, there are still great challenges for the landmark location under the near color background.Fortunately, deep learning has developed rapidly in recent years [21,22].It has successfully solved various recognition problems, such as speech recognition, image processing, and natural language processing.At the same time, good results have also been achieved in crop segmentation and recognition [23][24][25][26][27]. Based on the above facts, a good solution is to apply deep learning to landmark location under the near color background, which will improve the segmentation effect, and further ensure the robustness of visual navigation.In this study, the fine-tuned Inception-V3 deep convolution neural network framework is developed for landmark recognition [28].First, the block recognition method based on fine-tuned Inception-V3 is developed; then, the maize canopy landmark is recognized based on the above method; finally, piecewise linearized local navigation lines are extracted from the landmarks based on the maize canopy gray gradient law.

Maize Samples in Laboratory Environment
Cultivated maize seedlings were used as ground landmarks for visual navigation in a laboratory environment as shown in Figure 1.Up to now, "Zhengdan 958" has been the largest maize variety in China.It has been popularized as evidenced by the nearly 30 million hectares in the whole country.In order to ensure the experiment was universal, "Zhengdan 958" seedlings were used as canopy recognition objects in the laboratory environment.According to the estimation of the canopy and root growth, maize seedlings were planted in white pots with a diameter of 18 cm.In addition, maize is adapted to grow in the hot summer and early autumn, and has high requirements for temperature and light intensity.Therefore, the temperature was adjusted to 30 degrees Celsius, and the light intensity was 10,000 to 60,000 lx.Cultivated maize seedlings were used as ground landmarks for visual navigation in a laboratory environment as shown in Figure 1.Up to now, "Zhengdan 958" has been the largest maize variety in China.It has been popularized as evidenced by the nearly 30 million hectares in the whole country.In order to ensure the experiment was universal, "Zhengdan 958" seedlings were used as canopy recognition objects in the laboratory environment.According to the estimation of the canopy and root growth, maize seedlings were planted in white pots with a diameter of 18 cm.In addition, maize is adapted to grow in the hot summer and early autumn, and has high requirements for temperature and light intensity.Therefore, the temperature was adjusted to 30 degrees Celsius, and the light intensity was 10,000 to 60,000 lx.

Maize Samples in Farmland Environment
The research was carried out in the State Key Laboratory of Crop Science and the farmland environment samples were from Mao Village Crop Cultivation Base, Zhengzhou.In order to facilitate mechanized operation and intelligent recognition, maize plants were planted according to agronomic requirements that the row spacing and row spacing of maize plants were 60 cm and 40 cm, respectively [29].A training set and verification set was created: First, 1000 maize plant images and 1000 vegetation background images were captured, respectively, in different scenes, time, weather, and light conditions; then, the captured images were divided into image blocks whose size could be recognized by the fine-tuned Inception-V3; finally, 2800 image blocks were randomly selected as a training set and 800 image blocks as a verification set.

UAV Image Acquisition Device
As shown in Figure 2, the 's70w' UAV with a size of 40 cm * 40 cm * 16 cm was used for maize canopy image acquisition, carrying a nine-axis high precision gyroscope and GPS sensors.Its main functions include a high pressure setting; 2.4 GHz remote control; real-time picture transmission; one-button take-off, return, and landing; pointing follow, which greatly facilitated the work of image acquisition.The navigation time was about 15 minutes, and the picture quality was 1080 P, which ensured the canopy image quality.

Maize Samples in Farmland Environment
The research was carried out in the State Key Laboratory of Crop Science and the farmland environment samples were from Mao Village Crop Cultivation Base, Zhengzhou.In order to facilitate mechanized operation and intelligent recognition, maize plants were planted according to agronomic requirements that the row spacing and row spacing of maize plants were 60 cm and 40 cm, respectively [29].A training set and verification set was created: First, 1000 maize plant images and 1000 vegetation background images were captured, respectively, in different scenes, time, weather, and light conditions; then, the captured images were divided into image blocks whose size could be recognized by the fine-tuned Inception-V3; finally, 2800 image blocks were randomly selected as a training set and 800 image blocks as a verification set.

UAV Image Acquisition Device
As shown in Figure 2, the 's70w' UAV with a size of 40 cm * 40 cm * 16 cm was used for maize canopy image acquisition, carrying a nine-axis high precision gyroscope and GPS sensors.Its main functions include a high pressure setting; 2.4 GHz remote control; real-time picture transmission; one-button take-off, return, and landing; pointing follow, which greatly facilitated the work of image acquisition.The navigation time was about 15 minutes, and the picture quality was 1080 P, which ensured the canopy image quality.

Deep Learning Workstation
The parameters of the workstation are shown in Table 1.

Image Preprocessing
The data enhancement method can greatly increase the sample size of a training data set and the generalization ability of the network model.Most deep learning frameworks have some basic functions, which can directly implement common data conversion.In order to build a specific image recognition system, our task is to determine a meaningful transformation method for existing data sets.Common conversion methods include: Pixel color jitter, rotation, cutting, random clipping, horizontal flip, lens stretching, and lens correction, etc.
In a farmland environment, the spatial orientation of objects depends on the shooting position.It is difficult to photograph maize leaf images from every angle to meet all possibilities.Image

Deep Learning Workstation
The parameters of the workstation are shown in Table 1.

Image Preprocessing
The data enhancement method can greatly increase the sample size of a training data set and the generalization ability of the network model.Most deep learning frameworks have some basic functions, which can directly implement common data conversion.In order to build a specific image recognition system, our task is to determine a meaningful transformation method for existing data sets.Common conversion methods include: Pixel color jitter, rotation, cutting, random clipping, horizontal flip, lens stretching, and lens correction, etc.
In a farmland environment, the spatial orientation of objects depends on the shooting position.It is difficult to photograph maize leaf images from every angle to meet all possibilities.Image rotation is an effective preprocessing method to solve the above problems.The pixel position can be represented by the process of rotation transformation as show in Equation ( 1): where x 0 , y 0 are the original pixel coordinates, θ is the rotation angle, and x and y are the coordinates after rotation.As shown in Figure 3, the image is rotated and generates four images, in which the angle of rotation is 90 rotation is an effective preprocessing method to solve the above problems.The pixel position can be represented by the process of rotation transformation as show in Equation ( 1): where x 0 , y 0 are the original pixel coordinates, θ is the rotation angle, and x and y are the coordinates after rotation.
As shown in Figure 3, the image is rotated and generates four images, in which the angle of rotation is 90 °, 180 °, and 270 °, and there is mirror symmetry.

Fine-Tuned Inception-v3 and Network Layer Feature Representation
In order to classify image blocks, the fine-tuned Inception-V3 classifier was designed as shown in Figure 5. First, the block features were extracted by Inception-V3, which includes the convolution layer, pooling layer, and mixed layer.Figure 4 shows the feature representation at different layers.rotation is an effective preprocessing method to solve the above problems.The pixel position can be represented by the process of rotation transformation as show in Equation ( 1): where x 0 , y 0 are the original pixel coordinates, θ is the rotation angle, and x and y are the coordinates after rotation.
As shown in Figure 3, the image is rotated and generates four images, in which the angle of rotation is 90 °, 180 °, and 270 °, and there is mirror symmetry.

Fine-Tuned Inception-v3 and Network Layer Feature Representation
In order to classify image blocks, the fine-tuned Inception-V3 classifier was designed as shown in Figure 5. First, the block features were extracted by Inception-V3, which includes the convolution layer, pooling layer, and mixed layer.Figure 4 shows the feature representation at different layers.In order to classify image blocks, the fine-tuned Inception-V3 classifier was designed as shown in Figure 5. First, the block features were extracted by Inception-V3, which includes the convolution layer, pooling layer, and mixed layer.Figure 4 shows the feature representation at different layers.Then, the pool_2 layer was connected to the feature classifier that consists of a FC layer (full connection layer) and a soft-max layer.Finally, the probability distribution results were output.Then, the pool_2 layer was connected to the feature classifier that consists of a FC layer (full connection layer) and a soft-max layer.Finally, the probability distribution results were output.Among them, the filter was the basic function unit of the above network layer as shown in Figure 6.The former convolution layer, X i L−1 , data, namely an image or a feature map, was processed by filter k ij L to generate the latter convolution layer, X j L , data, namely a new feature map, which can be expressed as Equation ( 2).The lower sampling layer is also called the pooling layer.Its specific operation is basically the same as that of the convolution base layer, except that the convolution core of the lower sampling only takes the maximum and average values of the corresponding positions (maximum pooling and average pooling), and does not undergo the modification of reverse propagation, as shown in Equation (3): where '*' is the convolution operation, b j L is the bias term, f(. ) is the activation function, β j L is the weight term, and down(.) is the down-sampling function.Among them, the filter was the basic function unit of the above network layer as shown in Figure 6.The former convolution layer, X L−1 i , data, namely an image or a feature map, was processed by filter k L ij to generate the latter convolution layer, X L j , data, namely a new feature map, which can be expressed as Equation (2).The lower sampling layer is also called the pooling layer.Its specific operation is basically the same as that of the convolution base layer, except that the convolution core of the lower sampling only takes the maximum and average values of the corresponding positions (maximum pooling and average pooling), and does not undergo the modification of reverse propagation, as shown in Equation ( 3): where '*' is the convolution operation, b L j is the bias term, f(.) is the activation function, β L j is the weight term, and down(.) is the down-sampling function.

Loss Function
Before the emergence of CNN, the RBF function was often used as an activation function for a single hidden layer neural network because of its excellent local approximate feature.However, the RBF function does not have good probability characteristics compared with the soft-max function.Therefore, the soft-max function is generally adopted as the activation function of the output layer of CNN.For the soft-max function, the probability that the image block, x, belongs to n is expressed as follows: An extreme example is given to illustrate the probabilistic properties of soft-max.Equation (4) shows that if one of z x j is far greater than the others, the largest z x j mapping values gradually approximate to 1 while the rest are approximated to 0. Next, the cross-entropy loss function was used to measure the prediction performance for the N class samples of the M group, as shown in Equation ( 5):

Loss Function
Before the emergence of CNN, the RBF function was often used as an activation function for a single hidden layer neural network because of its excellent local approximate feature.However, the RBF function does not have good probability characteristics compared with the soft-max function.Therefore, the soft-max function is generally adopted as the activation function of the output layer of CNN.For the

Weight Optimization
In order to prevent the activation value from being too large or too small, w = 1/n was set, where the regularized Gauss distribution was used to initialize the weights of the full connection layer, as shown in Equation ( 6): where W L is the weight matrix of layer L, and G(.) is the regularized Gauss distribution function.
For the Inception-V3 classifier, the activation value is denoted as X L−1 and the forward propagation can be expressed as Equations ( 7) and ( 8): where W L is the weight of the full connection layer, b L is the bias term, and Y L is the activation value calculated by the f(.) activation function.
In addition, the error, ε L , produced by soft-max back propagation can be expressed as Equation ( 9): where ∇ is the gradient operator, is the Hadamard product operator used for point-to-point multiplication between matrices or vectors, and ∆(.) is the difference operator.
Finally, the SGD training parameters can be expressed as Equations ( 10) and (11): where η is the learning rate.

Other Parameter Settings
Table 2 shows the specific parameter settings.In general, if η is too small, the model converges too slowly.On the contrary, the cost function will oscillate and the optimal solution will not be obtained.The batch size determines the direction of optimization.The bigger the batch size is, the more accurate the descent direction is, but an excessive batch size leads to high memory occupancy.Weight decay is multiplied before the regular term and controls the weight of the regularization term in the loss function.The size of epoch selection varies with different data sets.When epoch is small, under-fitting will occur.On the contrary, there will be over-fitting.

Inception-V3 Network Training
Figure 7A shows that the average recognition accuracy is about 0.95 after 500, which shows that the Inception-V3 classifier has good recognition ability for maize image blocks.At the same time, it can be seen that the validation curve and training curve have good consistency, and the distribution range of the residuals is narrow, which indicates that the network has strong generalization ability under different backgrounds and weather.Figure 7B shows that the loss reaches the requirements quickly after 250, which shows that the SGD optimization algorithm can make the weight converge quickly.

Inception-V3 Network Training
Figure 7A shows that the average recognition accuracy is about 0.95 after 500, which shows that the Inception-V3 classifier has good recognition ability for maize image blocks.At the same time, it can be seen that the validation curve and training curve have good consistency, and the distribution range of the residuals is narrow, which indicates that the network has strong generalization ability under different backgrounds and weather.Figure 7B shows that the loss reaches the requirements quickly after 250, which shows that the SGD optimization algorithm can make the weight converge quickly.

Design of the Mask Extractor
After the implementation of the Inception-V3 classifier, the block mask extractor needs to be designed.First, the Inception-V3 classifier was integrated into the classifier-array, which converts the patch into the class-matrix; then, the class-matrix was transformed into the block classification label for maize regions; finally, the mask was generated according to the label as shown in Figure 8.

Design of the Mask Extractor
After the implementation of the Inception-V3 classifier, the block mask extractor needs to be designed.First, the Inception-V3 classifier was integrated into the classifier-array, which converts the patch into the class-matrix; then, the class-matrix was transformed into the block classification label for maize regions; finally, the mask was generated according to the label as shown in Figure 8.

Maize Canopy Landmark Location and Local Route Tracking
Previous studies have found that the canopy of maize plants has a gray gradient distribution law in the radial direction (Figure 9B) (Wang et al., 2018).The canopy landmarks in the maize region were identified according to the above law (Figure 9C).As a result, local route tracking was realized based on maize canopy landmark information and the canopy gray gradient distribution (Figure 9D).

Maize Canopy Landmark Location and Local Route Tracking
Previous studies have found that the canopy of maize plants has a gray gradient distribution law in the radial direction (Figure 9B) (Wang et al., 2018).The canopy landmarks in the maize region were identified according to the above law (Figure 9C).As a result, local route tracking was realized based on maize canopy landmark information and the canopy gray gradient distribution (Figure 9D).

The Effect of Neural Technologies for Fine-Tuned Inception-V3
In order to verify the influence of image preprocessing, a contrast experiment was performed under the same experimental conditions.The experiment showed that the Inception-V3 network

Maize Canopy Landmark Location and Local Route Tracking
Previous studies have found that the canopy of maize plants has a gray gradient distribution law in the radial direction (Figure 9B) (Wang et al., 2018).The canopy landmarks in the maize region were identified according to the above law (Figure 9C).As a result, local route tracking was realized based on maize canopy landmark information and the canopy gray gradient distribution (Figure 9D).

The Effect of Neural Technologies for Fine-Tuned Inception-V3
In order to verify the influence of image preprocessing, a contrast experiment was performed under the same experimental conditions.The experiment showed that the Inception-V3 network

The Effect of Neural Technologies for Fine-Tuned Inception-V3
In order to verify the influence of image preprocessing, a contrast experiment was performed under the same experimental conditions.The experiment showed that the Inception-V3 network with image preprocessing obtains an accuracy of 95.01%, and the proposed model without image preprocessing only obtains an accuracy of 93.15%.The experimental result shows that the recognition accuracy is improved by about 1.86%.It shows that image enhancement can improve the generalization ability and robustness of the model by increasing the amount of training set data.In addition, batch regularization technology and dropout technology bring about obvious improvements, and if these model enhancement technologies are used simultaneously, the accuracy of the verification set will be raised from about 90.15% to about 95.01%, and the improvement effect is very obvious.
In the process of model training, it was found that not all optimization algorithms can converge quickly, for example, a comparison of the convergence using the RMS-prop and SDG algorithms.
As shown in Figure 10, the training accuracy of RMSProp is still below 0.5 under the same iteration times, while the SGD algorithm quickly reaches above 0.8.Compared with the RMSProp optimization algorithm, the SGD algorithm can converge quickly.Generally, the SGD optimization algorithm has a stable convergence rate and it can optimize the network with the minimum number of training times and prevent over-fitting.
with image preprocessing obtains an accuracy of 95.01%, and the proposed model without image preprocessing only obtains an accuracy of 93.15%.The experimental result shows that the recognition accuracy is improved by about 1.86%.It shows that image enhancement can improve the generalization ability and robustness of the model by increasing the amount of training set data.In addition, batch regularization technology and dropout technology bring about obvious improvements, and if these model enhancement technologies are used simultaneously, the accuracy of the verification set will be raised from about 90.15% to about 95.01%, and the improvement effect is very obvious.
In the process of model training, it was found that not all optimization algorithms can converge quickly, for example, a comparison of the convergence using the RMS-prop and SDG algorithms.
As shown in Figure 10, the training accuracy of RMSProp is still below 0.5 under the same iteration times, while the SGD algorithm quickly reaches above 0.8.Compared with the RMSProp optimization algorithm, the SGD algorithm can converge quickly.Generally, the SGD optimization algorithm has a stable convergence rate and it can optimize the network with the minimum number of training times and prevent over-fitting.

Performance of Fine-Tuned Inception-V3 on the Training and Test Set
Figure 11 shows that the accuracy of BP-Net is the lowest, averaging 0.64.SVM is an excellent algorithm in small sample data classification.It can converge after less iteration, but the recognition accuracy is also relatively low, averaging at 0.68.As Inception-V3 is able to extract general features, only the back-end network needs to be fine-tuned.Compared with the former two, it achieves amazing classification accuracy on the training set, averaging 0.95.Then, taking VGG16 and the original Inception-V3 as benchmarks, the fine-tuned Inception-V3 was evaluated on the test set.

Performance of Fine-Tuned Inception-V3 on the Training and Test Set
Figure 11 shows that the accuracy of BP-Net is the lowest, averaging 0.64.SVM is an excellent algorithm in small sample data classification.It can converge after less iteration, but the recognition accuracy is also relatively low, averaging at 0.68.As Inception-V3 is able to extract general features, only the back-end network needs to be fine-tuned.Compared with the former two, it achieves amazing classification accuracy on the training set, averaging 0.95.Then, taking VGG16 and the original Inception-V3 as benchmarks, the fine-tuned Inception-V3 was evaluated on the test set.
Where, MAP (mean average precision) is the solving of the limitation of the single value of the P, R, F-measure, taking into account the ranking of the retrieval results.Table 3 shows that the recognition accuracy and speed of the fine-tuned Inception-V3 on the test set were not significantly decreased.Where, MAP (mean average precision) is the solving of the limitation of the single value of the P, R, F-measure, taking into account the ranking of the retrieval results.Table 3 shows that the recognition accuracy and speed of the fine-tuned Inception-V3 on the test set were not significantly decreased.

Recognition Results of Maize Region Under the Background of Non-Significant Color Differences
Figure 12 shows that the green vegetation in the background is highly similar to the green maize plant, and some green background, such as weeds, will be identified as part of the maize region.However, the misidentified blocks are in the "island" state, which can be easily filtered out by post processing [30].In general, the algorithm realizes the recognition of the maize region under the background of the non-significant color difference.Next, the accuracy and effect of the block recognition were evaluated and compared with the classical green vegetation segmentation method, as shown in Table 4.

Recognition Results of Maize Region Under the Background of Non-Significant Color Differences
Figure 12 shows that the green vegetation in the background is highly similar to the green maize plant, and some green background, such as weeds, will be identified as part of the maize region.However, the misidentified blocks are in the "island" state, which can be easily filtered out by post processing [30].In general, the algorithm realizes the recognition of the maize region under the background of the non-significant color difference.Next, the accuracy and effect of the block recognition were evaluated and compared with the classical green vegetation segmentation method, as shown in Table 4.As there is a color cross between the green vegetation and green maize plants, the color-based method and threshold-based method have poor segmentation, such as COM1 and AT, in Table 4.The clustering method is based on the principle that feature similarity is divided into the same super-pixel, so the green vegetation and green maize plants will be recognized as the same super-pixel.Moreover, the clustering method is time-consuming, for example, the time of PSO-MM is 0.215 s, as shown in Table 4. Compared with the above method, in the same time-consuming situation, good segmentation results can be obtained by the mask extractor based on the fine-tuning

Inception-V3 Performance of the Mask Extractor under Different Block Numbers
It is necessary to quantitatively evaluate the impact of block numbers on the recognition performance.Accuracy and recall were used as the main criteria to evaluate the segmentation effect.Typically, there are four kinds of classification results: TP (true positives, positive classes judged as positive classes), FN (false negatives, positive classes judged as negative classes), FP (false positives, negative classes judged as positive classes), and TN (true negatives, negative classes judged as negative classes).As a result, the equations of precision and recall can be expressed as Equations ( 12) and ( 13): R = TP/(TP + FN) These two comprehensive parameters can be used to evaluate the segmentation performance comprehensively.The evaluations are shown in Table 5.Table 5 shows that when the number of blocks is in the range (64, 144), the P-mean is high, but the R-mean is very low, which indicates that the integrity of the mask cannot be guaranteed when the number of blocks is small.The performance of the mask extractor is the best when the number of blocks is 256.P-variance and R-variance variances are large when the number of blocks is over 256, which indicates the Inception-V3 classifier performance decreases when the input image block is small.

Comparison of the Ideal Route and Visual Route
To verify the feasibility and accuracy of route planning in the field environment, the experiment was carried out in the maize plant plot.Figure 13 shows that the performance of the visual route tracking, taking the route planning in the laboratory environment as the benchmark.It shows that except for individual outliers, the visual path fluctuates around the ideal path in a small range, and has a stable positive and negative standard deviation.

Comparison of the Ideal Route and Visual Route
To verify the feasibility and accuracy of route planning in the field environment, the experiment was carried out in the maize plant plot.Figure 13 shows that the performance of the visual route tracking, taking the route planning in the laboratory environment as the benchmark.It shows that except for individual outliers, the visual path fluctuates around the ideal path in a small range, and has a stable positive and negative standard deviation.

Conclusions
This study set out to research the process of maize canopy landmark location under a background with non-significant color differences and realized visual route planning for agricultural intelligent equipment.This research has shown that there is a robust and effective block recognition method to identify landmarks based the fine-tuned inception-V3 classifier.The findings of this study suggest that stable visual semantic navigation can be achieved under the background of non-significant color differences.This research will serve as a base for future studies and visual navigation based on crop landmarks.A limitation of this study was that the optimal block size was

Conclusions
This study set out to research the process of maize canopy landmark location under a background with non-significant color differences and realized visual route planning for agricultural intelligent equipment.This research has shown that there is a robust and effective block recognition method to identify landmarks based the fine-tuned inception-V3 classifier.The findings of this study suggest that stable visual semantic navigation can be achieved under the background of non-significant color differences.This research will serve as a base for future studies and visual navigation based on crop landmarks.A limitation of this study was that the optimal block size was bound by the Inception-V3 network structure.More research is required to eliminate the above limitation and promote the practical application of this method.

Figure 2 .
Figure 2. Unmanned aerial vehicle and image acquisition device.

Figure 2 .
Figure 2. Unmanned aerial vehicle and image acquisition device.

Figure 3 .
Figure 3. Image rotation: (A) Initial; (B) 90 • ; (C) 180 • ; (D) 270 • ; and (E) mirror symmetry.Furthermore, the change of illumination causes great interference to target recognition under the near color background.To improve the generalization ability of the CNN (Convolutional Neural Network), it is essential to train the images under different illumination conditions.The corresponding image was generated by adjusting the brightness, contrast, and sharpness as shown in Figure4.

Figure 6 .
Figure 6.Feature representation at different network layers.

Figure 9 .
Figure 9. Maize canopy landmark location and local route tracking.

Figure 9 .
Figure 9. Maize canopy landmark location and local route tracking.

Figure 9 .
Figure 9. Maize canopy landmark location and local route tracking.

Figure 11 .
Figure 11.Contrast of training characteristics of classifiers.

Figure 11 .
Figure 11.Contrast of training characteristics of classifiers.

Symmetry 2019 , 16 Figure 12 .
Figure 12.Recognition results of the maize region under the background of non-significant color differences.

Figure 12 .
Figure 12.Recognition results of the maize region under the background of non-significant color differences.

Figure 13 .
Figure 13.Comparison of the ideal route and visual route.

Figure 13 .
Figure 13.Comparison of the ideal route and visual route.

Table 1 .
Parameters of the workstation.

Table 2 .
Other parameter settings.

Table 3 .
Comparison of the recognition accuracy of the classifier on the test set.

Table 3 .
Comparison of the recognition accuracy of the classifier on the test set.

Table 4 .
Comparison of the segmentation results with different segmentation methods.

Table 4 .
Comparison of the segmentation results with different segmentation methods.

Table 5 .
The recognition results of the evaluation of the mask extractor under different block numbers.