An Improved UNet-Based Path Recognition Method in Low-Light Environments

Zhong, Wei; Yang, Wanting; Zhu, Junhuan; Jia, Weidong; Dong, Xiang; Ou, Mingxiong

doi:10.3390/agriculture14111987

Open AccessArticle

An Improved UNet-Based Path Recognition Method in Low-Light Environments

by

Wei Zhong

¹,

Wanting Yang

²,

Junhuan Zhu

¹,

Weidong Jia

^1,*,

Xiang Dong

¹ and

Mingxiong Ou

¹

School of Agricultural Engineering, Jiangsu University, Zhenjiang 212013, China

²

School of Mechatronic Engineering, Taizhou University, Taizhou 225300, China

^*

Author to whom correspondence should be addressed.

Agriculture 2024, 14(11), 1987; https://doi.org/10.3390/agriculture14111987

Submission received: 14 October 2024 / Revised: 30 October 2024 / Accepted: 4 November 2024 / Published: 6 November 2024

(This article belongs to the Section Digital Agriculture)

Download

Browse Figures

Versions Notes

Abstract

:

The fruit industry is a significant economic sector in China, with modern orchards gradually transitioning to trellis orchards. For mechanized orchard operations, automatic vehicle navigation is essential. However, in trellis orchards, the shading from trees results in low average light intensity and large variations in lighting, posing challenges for path navigation. To address this, a path navigation algorithm for trellis orchards is proposed based on the UNet-CBAM model. The network structures of UNet, FCN, and SegNet are compared to identify and select the optimal structure for further improvement. Among the three attention mechanisms of channel attention, spatial attention, and combined attention, the most effective mechanism is identified. The optimal attention mechanism is incorporated into the optimized network to enhance the model’s ability to detect path edges and improve detection performance. To validate the effectiveness and generalizability of the model, a total of 400 images were collected under varying lighting intensities. The experimental results show that this method achieves an accuracy of 97.63%, a recall of 93.94%, and an Intersection over Union (IoU) of 92.19%. These results significantly enhance path recognition accuracy in trellis orchards, particularly under low light under conditions. Compared to the FCN and SegNet algorithms, this method provides higher detection accuracy and offers a new theoretical foundation and research approach for path recognition in low-light environments.

Keywords:

path recognition; UNet agriculture; deep learning; trellis orchard; intelligent agriculture

1. Introduction

Agriculture is essential for human survival and the development of the national economy [1]. Fruits represent a key agricultural component, with China leading global apple production and ranking third in citrus production [2,3]. Most traditional orchards are transitioning into modern systems, with trellis cultivation emerging as the predominant method due to its numerous advantages. The trellis cultivation model offers strong resistance to adverse conditions, high fruit quality, high unit efficiency, easy field management, and efficient use of land, water, and light resources, making it widely used in the production of kiwifruit, pears, and grapes [4]. In the trellis orchard cultivation model, fruit trees are planted at fixed intervals, and most branches are tied to supports made of steel pipes and wires through pruning and other horticultural techniques. The tree canopy is concentrated on a horizontal plane at a certain height, creating straight pathways between the trees. This organized structure surpasses that of traditional orchards, providing stronger support for the advancement of orchard mechanization.

Automatic navigation of orchard vehicles is a necessary method for mechanized operations in orchards [5,6], and autonomous positioning and navigation technology are the prerequisite for achieving this [7,8]. This technology uses satellites, LiDAR, or vision-based systems to enable the robot to sense its surrounding environment, thereby achieving precise self-positioning, target trajectory tracking, and obstacle detection [9]. However, in trellis orchards, the tree canopies are primarily positioned at the upper level in a horizontal arrangement, with relatively slender trunks, and both the canopies and fruits are mainly distributed in the middle and upper sections. Additionally, there are many weeds on the ground. These characteristics make path recognition in trellis orchards more complex and challenging to study.

With the development of navigation technology, many navigation methods have been created [10]. Among these methods, satellite-based positioning is the most widely used. For example, Xiong Bin et al. designed an automatic spraying machine using the BeiDou satellite navigation system (N71J-BDS, CTI, Shenzhen, China), which can plan navigation paths based on the terrain characteristics of orchard environments [11]. Liu Zhaopeng et al. developed a navigation system for a sprayer using RTK-GNSS dual-antenna sensors (BD982, Trimble, Westminster, CO, USA), enabling straight-line driving and headland turns. In trellis orchards, however, satellite navigation is often obstructed by dense tree canopies and branches, leading to signal loss and making it difficult to achieve stable and continuous navigation [12]. As LiDAR technology has advanced, more researchers have utilized it for navigation. Zhang J et al. designed an automatic orchard vehicle navigation system that generates navigation reference lines using LiDAR (SICK LMS511-20100 PRO^®, Waldkirch, Germany), allowing the vehicle to navigate autonomously at a speed of 0.8 m/s [13]. Yue Shen et al. developed a posture-adjustable laser scanner to scan artificial trees on uneven road surfaces. However, in trellis orchards, LiDAR is easily affected by weeds and branches, making it difficult to extract accurate navigation lines [14]. Additionally, the similar morphological characteristics throughout the orchard, without obvious distinguishing features, make precise positioning in point cloud matching challenging. These systems require near-perfect working environments and are relatively expensive, making them unsuitable for agricultural applications.

With the advancement of depth cameras, an increasing number of researchers are using vision-based navigation. Neto proposed a monocular vision-based navigation software that utilizes distributed processing, Sobel operators combined with a region-growing strategy, and the maximum between-class variance method to detect road edges [15]. Zhang Tiemin et al. used the 2G-R-B color difference method to identify green plants, extract edge information points, and fit the centerline [16]. They applied the Hough transform to fit multiple lines for path extraction. Cheng HY et al. combined the Mean Shift technique with Bayesian theory to effectively classify irregular road areas and identify suitable road boundaries [17]. Hong-Kun et al. used a Naive Bayesian classifier to distinguish between tree trunks and the ground in orchards and developed a new algorithm to accurately recognize orchard paths [18]. Radcliffe designed a vision-based navigation system that captures sky and canopy information by leveraging the clear contrast between the sky and the tree canopy, The root mean square error (RMSE) of this system is 2.13 cm [19].

In the trellis orchard environment, the presence of shadows, light spots, and varying light intensities complicates vision-based navigation [20]. The performance of these methods is influenced by several factors, such as the distribution of weeds, different lighting conditions, and the overlap or obstruction between crops and weeds. These factors make the real-world environment highly complex, and there is an urgent need for an automated and efficient algorithm that can function under such challenging conditions. In recent years, significant improvements in computing power and advances in deep learning frameworks have driven innovations in many fields. Semantic segmentation techniques based on convolutional neural networks (CNNs) have enabled efficient feature extraction from regions of interest, significantly enhancing the accuracy of object–background separation in complex environments [21]. However, deep neural network structures are complex, making it difficult to identify an effective semantic segmentation network. In trellis orchards, light variations further impact performance, requiring a lightweight network structure for accurate path and tree row segmentation.

Most research in this field has focused on relatively simple road environments, while trellis orchards present a more complex scenario. This paper proposes a deep network model tailored to navigate complex environments in trellis orchards, achieving accurate path segmentation. To validate the proposed model, a comparative study of three network models was conducted. Additionally, a trellis orchard path recognition method based on attention mechanisms was introduced, combining the actual needs of orchard environment recognition to build an orchard environment model.

2. Material and Methods

2.1. Data Acquisition

This study focuses on a standard pear orchard in Jurong, Jiangsu Province. The photos were taken in June 2023. The distance between rows is 4 m, the distance between trees is 3 m, and the trellis height is 1.8 m. The trunk height of the pear trees is between 0.9 and 1.0 m. A camera with a resolution of 1920 × 1080 was used to capture the orchard paths, and the photos were taken between 8 a.m. and 6 p.m. to include images with different light intensities for diversity. A total of 400 images were selected after excluding blurred ones. Data augmentation techniques were applied to increase diversity and improve the model’s generalization ability and robustness. The images were rotated clockwise and counterclockwise by 10°, 20°, and 30°, as shown in Figure 1. This increased the number of images from 400 to 2400. The dataset was then divided into training and test sets in a 3:1 ratio. Figure 1 shows the images after data augmentation. The camera used in this study was the Kinect V2, with a resolution of 1920 × 1080 and a frame rate of 30 fps. Path labeling was conducted using the “Labelme” tool (CSAIL, MIT, Cambridge, MA, USA), with the designated label being “road”.

2.2. Convolutional Neural Network

The U-Net (UNet), Fully Convolutional Network (FCN), and Segmentation Network (SegNet) algorithms are generally based on an encoder–decoder architecture. Both UNet and FCN incorporate skip connections, while SegNet employs unpooling for upsampling, relying heavily on unpooling layers and stepwise reconstruction within the decoder. The input image first passes through the encoder module, which consists of multiple convolutional layers and pooling layers that progressively extract features and reduce spatial resolution. The encoder output is then connected to a bottleneck layer to further compress feature information. Subsequently, in the decoder module, a series of deconvolutional layers and upsampling layers restore the spatial resolution, generating an output image that matches the input size. Each convolutional and deconvolutional operation is followed by a ReLU activation function to introduce non-linearity. The final output image retains the same size as the input image, as shown in Figure 2.

The UNet architecture is a symmetric encoder–decoder structure designed for image segmentation tasks [22]. The encoder part consists of multiple convolutional layers and max-pooling operations, which extract features at different scales while progressively reducing the resolution of the feature maps. The decoder part uses transposed convolutions (upsampling) to gradually restore the spatial resolution of the image. Additionally, skip connections are employed at each layer to fuse the features from the corresponding encoder layers, preserving detailed information. This structure enables precise localization and segmentation of targets within the image, making it especially effective for tasks requiring fine-grained segmentation.

The Fully Convolutional Network (FCN) is another type of CNN used for image segmentation [23]. It replaces the fully connected layers of a traditional CNN with convolutional layers, allowing pixel-level predictions for input images of any size. The FCN extracts features using convolution and pooling layers, and then up-samples the feature maps through transposed convolutions to restore the original resolution. Skip connections merge spatial information from shallow layers with semantic information from deeper layers, enhancing segmentation accuracy. As an innovative model in image segmentation, FCN enables efficient, end-to-end processing.

SegNet is a convolutional neural network specifically designed for semantic segmentation, featuring an encoder–decoder architecture [24]. The encoder part consists of multiple convolutional layers and max-pooling layers, progressively extracting image features and reducing feature maps resolution, similar to the VGG16 network. In the decoder part, transposed convolutions are used for upsampling, restoring the feature maps to the original image resolution. During the decoding process, SegNet leverages max-pooling indices from the encoder to retain spatial information, allowing for better detail recovery during upsampling. This design enables SegNet to achieve high-precision pixel-level segmentation while maintaining robustness and flexibility, making it widely applicable in fields such as autonomous driving.

2.3. Attention Mechanism

Accurate recognition of edge information is essential for effective orchard navigation. The attention mechanism enhances this process by continuously focusing on critical areas, capturing relevant information while filtering out irrelevant details. This visual attention mechanism significantly enhances our efficiency and accuracy in processing information. There are three types of attention mechanisms: channel attention mechanism, spatial attention mechanism, and combined attention mechanism. By focusing on both spatial and channel dimensions, these mechanisms can better capture important information within the features.

The Channel Attention Mechanism (CAM) enhances the performance of convolutional neural networks by adaptively adjusting the weights of features in each channel, thereby emphasizing important features and suppressing less significant ones [25]. This mechanism typically uses global average pooling or global max pooling to compress the spatial information of the feature map into channel descriptors. These descriptors are then processed through a series of fully connected layers to generate attention weights for each channel. These weights are multiplied element-wise with the original feature map, achieving a weighted adjustment of the feature map, ultimately increasing the model’s focus on key features and enhancing its expressiveness. First, the input feature map

F \in R^{H \times W \times C}

(where C is the number of channels, and H and W are the height and width of the feature map) is processed. A global average pooling operation is performed to generate the channel descriptor vector

Z \in R^{C}

, which represents the global features of each channel. The channel descriptor vector Z is then input into a small fully connected neural network, and an activation function is applied to generate the channel attention weights

M \in R^{C}

, representing the importance of each channel. The generated attention map is multiplied element-wise with the original feature map to obtain the weighted feature map

F^{'} \in A Θ F

, where

Θ

denotes element-wise multiplication. The feature map processed by the spatial attention mechanism can then serve as input for subsequent layers of the network, improving the final classification, detection, or segmentation results.

z_{c} = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} x_{i, j, c}

(1)

M = σ (W_{2} δ (W_{1} z))

(2)

Y_{c} = M_{c} F_{c}

(3)

where

z_{c}

is the compressed value of the c-th channel;

W_{1}

and

W_{2}

are learnable weight matrices;

δ

is the ReLU activation function;

σ

is the Sigmoid activation function;

M_{c}

is the generated attention map; and

F_{c}

is the original feature map.

The Spatial Attention Mechanism (SAM) is a technique in deep learning used to process spatial information, commonly applied in image processing, computer vision, and natural language processing [26]. The primary goal of this mechanism is to allow the model to allocate different levels of attention to different positions in the input data, enabling more flexible handling of local information. The input feature map is represented as

F \in R^{C \times H \times W}

(where C is the number of channels, and H and W are the height and width of the feature map). The spatial attention map

A \in R^{1 \times H \times W}

is computed using convolution operations (typically 1 × 1 convolution), capturing the importance of various spatial locations.

F^{a v g} = A v g P o o l (F), F^{m a x} = M a x P o o l (F)

(4)

A^{s p a t i a l} = σ ({C o n v}_{3 \times 3} (F^{a v g}, F^{m a x}))

(5)

Y_{c} = A^{s p a t i a l} Θ F

(6)

where F represents the input feature map;

A^{s p a t i a l}

denotes the spatial attention map;

σ

is the Sigmoid activation function; and

Y_{c}

is the generated attention map.

The Combined Attention Mechanism (CBAM) is designed to enhance the neural network’s focus on both spatial and channel information [27]. This mechanism enables the network to selectively concentrate on specific regions and channels of the input data, improving the model’s ability to extract important features. The CBAM network consists of two main modules, the channel attention module and the spatial attention module. First, global pooling is applied to the feature map to obtain the channel descriptor, and a feedforward neural network generates the channel attention weights

A_{c},

which emphasize the important channels. The spatial attention

A_{s}

is applied to the feature map F, resulting in the weighted feature map

F^{'} = A Θ F

. The channel attention

A_{c}

is then applied to

F^{'}

, yielding the final feature map

F^{″} = A Θ F^{'}

. The resulting weighted feature map

F^{″}

can be used as input for subsequent layers of the network [28].

F 2 (I) = [A v g P o o l (F) M a x P o o l (F)]

(7)

F 3 (I) = σ (f^{7 \times 7} F 2 (I))

(8)

where

A v g P o o l (F) \in R^{H \times W \times 1}

represents the average pooling operation;

M a x P o o l (F) \in R^{H \times W \times 1}

denotes the maximum pooling operation;

F 2 (I)

refers to the feature map in the third dimension; and

F 3 (I)

signifies the attention features.

2.4. Improved UNet Model

The UNet network was selected as the primary model for this study due to its accuracy to boundary information and superior performance in target recognition under low-light conditions, making it particularly suitable for orchard path recognition tasks. Furthermore, compared to Fully Convolutional Networks (FCNs) and SegNet, UNet’s symmetric encoding–decoding architecture and skip connection mechanism more effectively preserve and utilize multi-scale feature information, thereby enhancing robustness in complex environments.

In the improved UNet model, attention mechanisms are introduced at the end of each downsampling module, as shown in Figure 3. The input images of trellis orchard paths have a resolution of 512 × 512. These images pass through five effective feature layers in the main feature extraction part, where each layer applies two 3 × 3 convolutions and one 2 × 2 max-pooling operation. After each max-pooling operation, an attention mechanism is added. The images are downsampled four times, extracting detailed feature information. In the enhanced feature decoding section, the five decoded effective feature layers undergo upsampling, and feature fusion techniques are applied to merge all features into a single effective feature layer. Each layer in the decoding process first uses bilinear interpolation for upsampling, followed by concatenation with the corresponding shallow feature layer through convolution. This process continues until the final layer, which applies a 1 × 1 convolution for channel adjustment, producing the final segmentation output.

2.5. Evaluating the Model’s Performance

Different network architecture models are primarily evaluated using three parameters: accuracy, recall, and Intersection over Union (IoU). These metrics ensure the model’s generalization ability and aid in adjusting its structure. Accuracy is a metric used to evaluate classification models, representing the ratio of correctly predicted instances to the total number of instances. Recall is the proportion of true positive samples that are correctly identified among all positive samples. Intersection over Union (IoU) is a fundamental metric used to quantify the overlap between predicted regions and ground truth regions in object detection and segmentation tasks. A higher IoU value indicates better alignment between the predicted and actual regions, reflecting greater accuracy of the model.

The number of true positive samples (labeled as positive and classified as positive) is referred to as True Positive (TP). The number of false negative samples (labeled as positive but classified as negative) is referred to as False Negative (FN). The number of false positive samples (labeled as negative but classified as positive) is referred to as False Positive (FP). The number of true negative samples (labeled as negative and classified as negative) is referred to as True Negative (TN).

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(9)

R e c a l l = \frac{T P}{T P + F N}

(10)

I O U = \frac{T P}{T P + F P + F N}

(11)

2.6. Model Training Platform and Experimental Conditions

In our experiments, all models were trained using Anaconda (Python 3.7) and PyCharm (PyCharm2023, JetBrains, Prague, Czech Republic). GPU acceleration was utilized during both the training and testing phases. The experimental hardware environment included an Intel^® Core™ i7-7700T (1.3 GHz base frequency, up to 3.9 GHz with Intel^® Turbo Boost technology, 8 MB cache, and 4 cores) (Intel, Santa Clara, CA, USA). Since the images were captured at different sizes, all images were resized to 1920 × 1080 pixels. Three models were trained using the training and testing datasets. The entire dataset was divided into 70%, 10%, and 20% for training, validation, and testing, respectively. The Adam optimizer was utilized to enhance efficiency and minimize the loss function.

3. Results

3.1. Overall Model Performance for Different Combination of Dataset

In the experiments, the UNet, FCN, and SegNet models were trained using a consistent input image size of 512 × 512, a convolution kernel size of 3 × 3, and a pooling window size of 2 × 2. The initial channel count was set to 64, with a learning rate of 0.001 and a batch size of 4. The Adam optimizer was used, and skip connections were implemented in the decoding stage to link corresponding encoding layers, enhancing feature recovery. To expand the training dataset, data augmentation techniques were applied, including angle rotation (±30°) and resizing, which were implemented via Python scripts. This enriched our dataset, enabling the training of the UNet, FCN, and SegNet models using the training data, followed by validation testing on 10% of the dataset. The accuracy of the three models after 250 epochs of training is presented in this study. The UNet model performed the best, achieving an accuracy of 97.32% and a recall of 94.49%.

3.2. Model Performance with Different Combinations of Attention Mechanisms

As shown in Figure 4, all three models effectively recognize the features of the pear orchard pathways under varying lighting conditions, demonstrating strong segmentation performance. The UNet model yields smoother results, particularly excelling in accurately fitting the edges of the pathways.

Due to variations in the dataset, quality, and model architecture, the loss rates of the models were compared over 250 epochs, as shown in Figure 5. Additionally, the training accuracy of the models decreased as the number of epochs increased. The UNet model stabilized after 120 iterations, indicating that it reached a steady state, with the loss value converging to 0.0094, the lowest among the three models. The SegNet model also stabilized after 130 iterations, with its loss value converging to 0.0098, reflecting stability in its performance. In contrast, the FCN model exhibited significant oscillations after a rapid decrease in loss, suggesting difficulties in handling the complex dataset. Even after 200 iterations, oscillations persisted, with the loss value ultimately converging to 0.019.

In comparing the three networks in terms of accuracy, recall, and Intersection over Union (IoU), the UNet model outperforms the others across all metrics. Specifically, UNet’s accuracy is 0.7% higher than that of the FCN and 0.4% higher than that of SegNet. In terms of recall, UNet improves upon the FCN by 1.2% and SegNet by 2.2%. Furthermore, UNet’s IoU is 0.7% higher than that of the FCN and 1.3% higher than that of SegNet, as shown in Figure 6.

This study selects the UNet convolutional neural network as the overall network architecture, incorporating attention mechanisms at the end of each downsampling module. The input images of the pear orchard pathways are sized at 512 × 512 pixels and pass through five effective feature layers in the backbone feature extraction part. Each feature layer performs two 3 × 3 convolutions and one 2 × 2 max pooling operation, followed by the addition of the attention mechanism after max pooling. This process is followed by four downsampling steps to capture detailed feature information from the images. In the enhanced feature decoding section, the five decoded effective feature layers undergo upsampling, and feature fusion techniques are employed to combine all features into a single effective feature layer. Each layer in the decoding part first performs bilinear interpolation for upsampling, then concatenates the result with the corresponding shallow feature layer through convolution. This process continues, and the final layer undergoes convolution with a 1 × 1 kernel for channel adjustment, resulting in the output segmentation. The experimental parameters were set with an initial learning rate of 0.0001, a total of 250 epochs, a batch size of 2, and two segmentation classes.

Based on the experimental results, the UNet model demonstrates strong performance in accuracy, recall, and Intersection over Union (IoU). Further optimizations were made to the UNet model by adding three types of attention mechanisms at the end of each encoding stage, focusing on key areas of the images during the downsampling process. These enhanced models are named UNet_CAM, UNet_SAM, and UNet_CBAM.

As shown in Figure 7, the UNet_CBAM model demonstrates superior performance in terms of accuracy, recall, and Intersection over Union (IoU). Specifically, its accuracy is higher by 0.8% and 1.22% compared to the UNet_CAM and UNet_SAM models, respectively. Additionally, the recall for UNet_CAM is greater by 2.14% and 3.18% compared to the other two models, while the IoU is higher by 1.3% and 2.49%, respectively.

Among the three network architectures, the UNet model outperforms the others in accuracy and has been retained for predicting orchard field data. After training, the addition of spatial and channel attention mechanisms resulted in high accuracy in path recognition. The model exhibits high confidence in predicting both fruit trees and pathways.

4. Discussion

Pathway recognition in orchards is a prerequisite for vehicle navigation, and improving the accuracy of pathway edge detection can significantly enhance the precision of vehicle movement. By employing image recognition techniques, the impact of tree branch occlusion is reduced, thereby expanding the applicability of orchard machinery. In this study, different pathway recognition methods affect the accuracy of edge detection; however, suitable algorithms can improve pathway recognition precision. Notably, there are substantial differences in edge detection accuracy, particularly in low-light environments. This paper demonstrates the applicability of the attention mechanism in such conditions, and based on the experimental results, establishes the feasibility of the UNet_CBAM algorithm for visual navigation under low-light conditions. The attention mechanism effectively mitigates issues related to low light intensity and branch interference in trellis-style orchards. Nevertheless, the proposed method exhibits notable limitations, as its generalization ability across multiple seasons, climates, and orchard environments requires further investigation [29]. Future research could explore multimodal fusion; multispectral imaging has the potential to capture more detailed information, enabling better differentiation of pathway edges [30]. Traditional LiDAR or GNSS navigation methods could enhance depth perception, and through the integration of various sensor data, more efficient pathway recognition solutions can be provided for intelligent orchards.

5. Conclusions

This study addresses the issue of path recognition in trellis orchards and evaluates the accuracy, recall, and Intersection over Union (IoU) of three models: UNet, FCN, and SegNet. The UNet model shows a clear advantage, achieving an accuracy of 97.32%, a recall of 94.49%, and an IoU of 91.41%. Building on the UNet model, three types of attention mechanisms were added at the end of each encoding stage to focus on key areas of the images during the downsampling process. The impact of these attention mechanisms on the model’s recognition accuracy was investigated, with the UNet_CBAM model outperforming in terms of accuracy, recall, and IoU.

Author Contributions

W.Z., W.J. and M.O. conceived research idea and designed the experiments; W.Z., J.Z. and W.Y. performed the experiments; W.Z. analyzed the data and wrote the paper; W.J. and X.D. reviewed and revised the paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Plan of China (grant number: 2023YFD2000503) and the Priority Academic Program Development of Jiangsu Higher Education Institutions (grant number: PAPD-2023-87).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data presented in this study are available within the article.

Acknowledgments

The author thanks the Faculty of Agricultural Equipment of Jiangsu University for its facilities and support.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wang, H.; Zhou, Z.; Yang, Z.; Cao, Y.; Zhang, C.; Cheng, C.; Zhou, Z.; Wang, W.; Hu, C.; Feng, X.; et al. Constraints on the high-quaLlity deVelopment of Chinese fmit industry. China Fruits 2023, 1–9. [Google Scholar] [CrossRef]
Suo, G.D.; Xie, Y.S.; Zhang, Y.; Luo, H. Long-term effects of different surface mulching techniques on soil water and fruit yield in an apple orchard on the Loess Plateau of China. Sci. Hortic. 2019, 246, 643–651. [Google Scholar] [CrossRef]
An, X.; Liu, X.; Jiang, J.; Lv, L.; Wang, F.; Wu, S.; Zhao, X. Exposure risks to pesticide applicators posed by the use of electric backpack sprayers and stretcher-mounted sprayers in orchards. Hum. Ecol. Risk Assess. Int. J. 2020, 26, 2288–2301. [Google Scholar] [CrossRef]
Rui, D.; Zhang, R.; Wang, Z.; Yan, Y.; Liu, Y.; Jiang, S. Technical specifications for grape cultivation in horizontal trellis system. Jiangsu Agric. Sci. 2011, 39, 167–168. [Google Scholar]
Guo, C. Research on Key Technologies of Automatic Navigation System for Intelligent Orchard Vehicles. Ph.D. Thesis, Northwest A&F University, Yangling, China, 2020. [Google Scholar]
Jia, W.; Tian, Y.; Duan, H.; Luo, R.; Lian, J.; Ruan, C.; Zhao, D.; Li, C. Autonomous navigation control based on improved adaptive filtering for agricultural robot. Int. J. Adv. Robot. Syst. 2020, 17, 1729881420925357. [Google Scholar] [CrossRef]
Xu, Y. Research and Design of Positioning and Navigation System for Orchard Robots. Ph.D. Thesis, Nanjing University of Science and Technology, Nanjing, China, 2020. [Google Scholar]
Gao, X.; Chen, L.; Tai, K.; Cai, Y.; Wang, H.; Li, Y. Deep learning-based hybrid model for the behaviour prediction of surrounding vehicles over long-time periods. IET Intell. Transp. Syst. 2022, 16, 1404–1412. [Google Scholar] [CrossRef]
Li, H.; Han, W.; Shi, Y. Research on the autonomous inter-row navigation system of orchard operation robots. China Agric. Inf. 2019, 31, 51–64. [Google Scholar]
Li, Y.; Cai, Y.; Malekian, R.; Wang, H.; Sotelo, M.A.; Li, Z. Creating navigation map in semi-open scenarios for intelligent vehicle localization using multi-sensor fusion. Expert Syst. Appl. 2021, 184, 115543. [Google Scholar] [CrossRef]
Xiong, B.; Zhang, J.X.; Qu, F.; Fan, Z.; Wang, D.; Li, W. Automatic navigation control system of orchard sprayer based on BDS. Trans. Chin. Soc. Agric. Mach. (Engl. Ed.) 2017, 48, 45–50. [Google Scholar]
Liu, Z.P.; Zhang, Z.G.; Luo, X.W.; Wang, H.; Huang, P.; Zhang, J. Design of GNSS automatic navigation system for Lovol ZP9500 high-clearance sprayer. Trans. Chin. Soc. Agric. Eng. 2018, 34, 15–21. [Google Scholar]
Zhang, J.; Kantor, G.; Bergerman, M.; Singh, S. Monocular visual navigation of an autonomous vehicle in natural scene corridor-like environments. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura-Algarve, Portugal, 7–12 October 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 3659–3666. [Google Scholar]
Shen, Y.; Addis, D.; Liu, H.; Hussain, F. A LIDAR-Based Tree Canopy Characterization under Simulated Uneven Road Condition: Advance in Tree Orchard Canopy Profile Measurement. J. Sens. 2017, 2017, 8367979. [Google Scholar] [CrossRef]
Neto, A.M.; Rittner, L. A simple and efficient road detection algorithm for real time autonomous navigation based on monocular vision. In Proceedings of the 2006 IEEE 3rd Latin American Robotics Symposium, Santiago, Chile, 26–27 October 2006; IEEE: Piscataway, NJ, USA, 2006; pp. 92–99. [Google Scholar]
Zhang, T.M.; Zhuang, X.L. Field path recognition and navigation system for high-clearance vehicle based on DM642. Trans. Chin. Soc. Agric. Eng. 2015, 31, 160–167. [Google Scholar]
Cheng, H.-Y.; Yu, C.-C.; Tseng, C.-C.; Fan, K.C.; Hwang, J.N.; Jeng, B.S. Environment classification and hierarchical lane detection for structured and unstructured roads. IET Comput. Vis. 2010, 4, 37–49. [Google Scholar] [CrossRef]
Lyu, H.-K.; Park, C.-H.; Han, D.-H.; Kwak, S.W.; Choi, B. Orchard free space and center line estimation using Naive Bayesian classifier for unmanned ground self-driving vehicle. Symmetry 2018, 10, 355. [Google Scholar] [CrossRef]
Radcliffe, J.; Cox, J.; Bulanon, D.M. Machine vision for orchard navigation. Comput. Ind. 2018, 98, 165–171. [Google Scholar] [CrossRef]
Shang, G.; Liu, G.; Zhu, P.; Han, J.; Xia, C.; Jiang, K. A deep residual U-Type network for semantic segmentation of orchard environments. Appl. Sci. 2020, 11, 322. [Google Scholar] [CrossRef]
Liu, Y.; Gao, G.; Zhang, Z. Crop disease recognition based on modified light-weight CNN with attention mechanism. IEEE Access 2022, 10, 112066–112075. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; proceedings, part III 18. Springer International Publishing: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
Jaderberg, M.; Simonyan, K.; Zisserman, A. Spatial transformer networks. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar] [CrossRef]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Shahi, T.B.; Sitaula, C.; Neupane, A.; Guo, W. Fruit classification using attention-based MobileNetV2 for industrial applications. PLoS ONE 2022, 17, e0264586. [Google Scholar] [CrossRef]
Zhou, M.; Xia, J.; Yang, F.; Zheng, K.; Hu, M.; Li, D.; Zhang, S. Design and experiment of visual navigated UGV for orchard based on Hough matrix and RANSAC. Int. J. Agric. Biol. Eng. 2021, 14, 176–184. [Google Scholar] [CrossRef]
Shahi, T.B.; Dahal, S.; Sitaula, C.; Neupane, A.; Guo, W. Deep Learning-Based Weed Detection Using UAV Images: A Comparative Study. Drones 2023, 7, 624. [Google Scholar] [CrossRef]

Figure 1. Image geometric transformations.

Figure 2. Overall workflow of path detection using deep neural networks.

Figure 3. Comparison of network architectures.

Figure 4. Model segmentation results.

Figure 5. Number of epochs.

Figure 6. Accuracy, recall, and Intersection over Union (IoU) of three network models.

Figure 7. Accuracy, recall, and Intersection ratio of the improved model.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhong, W.; Yang, W.; Zhu, J.; Jia, W.; Dong, X.; Ou, M. An Improved UNet-Based Path Recognition Method in Low-Light Environments. Agriculture 2024, 14, 1987. https://doi.org/10.3390/agriculture14111987

AMA Style

Zhong W, Yang W, Zhu J, Jia W, Dong X, Ou M. An Improved UNet-Based Path Recognition Method in Low-Light Environments. Agriculture. 2024; 14(11):1987. https://doi.org/10.3390/agriculture14111987

Chicago/Turabian Style

Zhong, Wei, Wanting Yang, Junhuan Zhu, Weidong Jia, Xiang Dong, and Mingxiong Ou. 2024. "An Improved UNet-Based Path Recognition Method in Low-Light Environments" Agriculture 14, no. 11: 1987. https://doi.org/10.3390/agriculture14111987

APA Style

Zhong, W., Yang, W., Zhu, J., Jia, W., Dong, X., & Ou, M. (2024). An Improved UNet-Based Path Recognition Method in Low-Light Environments. Agriculture, 14(11), 1987. https://doi.org/10.3390/agriculture14111987

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Improved UNet-Based Path Recognition Method in Low-Light Environments

Abstract

1. Introduction

2. Material and Methods

2.1. Data Acquisition

2.2. Convolutional Neural Network

2.3. Attention Mechanism

2.4. Improved UNet Model

2.5. Evaluating the Model’s Performance

2.6. Model Training Platform and Experimental Conditions

3. Results

3.1. Overall Model Performance for Different Combination of Dataset

3.2. Model Performance with Different Combinations of Attention Mechanisms

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI