Next Article in Journal
Optimal Nitrogen Accumulation and Remobilization Can Synergistically Improve Maize Yield and Nitrogen-Use Efficiency Under Low-Nitrogen Conditions
Previous Article in Journal
Point Cloud Completion of Occluded Corn with a 3D Positional Gated Multilayer Perceptron and Prior Shape Encoder
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Design and Research on a Reed Field Obstacle Detection and Safety Warning System Based on Improved YOLOv8n

1
Nanjing Institute of Agricultural Mechanization, Ministry of Agriculture and Rural Affairs, Nanjing 210014, China
2
Graduate School of Chinese Academy of Agricultural Sciences, Beijing 100083, China
*
Authors to whom correspondence should be addressed.
Agronomy 2025, 15(5), 1158; https://doi.org/10.3390/agronomy15051158
Submission received: 17 March 2025 / Revised: 27 April 2025 / Accepted: 28 April 2025 / Published: 9 May 2025
(This article belongs to the Section Precision and Digital Agriculture)

Abstract

:
Unmanned agricultural machinery can significantly reduce labor intensity while substantially enhancing operational efficiency and production benefits. However, the presence of various obstacles in complex farmland environments is inevitable. Accurate and efficient obstacle recognition technology, along with a reliable safety warning system, is a crucial prerequisite for ensuring the safe and stable operation of unmanned agricultural machinery. This study proposes a lightweight model for farmland obstacle detection by improving the YOLOv8n object detection algorithm. Specifically, we introduce the Context-Guided Block (CG Block) in the C2f module and the Context-Guide Fusion Module (CGFM) in the Feature Pyramid Network (FPN) to enhance the model’s contextual information perception during feature extraction and fusion. Additionally, we employ a Lightweight Shared Convolutional Separable Batch Normalization Detection Head in the detection head, which significantly reduces the number of parameters while improving detection accuracy. Experimental results demonstrate that our method achieves a mean average precision (mAP) of 92.3% at 59.1 frames per second (FPS). The improved model reduces parameter count and computational complexity by 31.9% and 33.4%, respectively, with a model size of only 4.2 MB. Compared to other algorithms, the proposed model maintains an optimal balance between parameter efficiency, computational cost, detection speed, and accuracy, exhibiting distinct advantages. Furthermore, we propose a safety warning strategy based on the relative velocity and distance between obstacles and the unmanned agricultural machinery. Field experiments conducted under this strategy reveal an overall warning accuracy of up to 86%, verifying the reliability of the safety warning system. This ensures that unmanned agricultural machinery can effectively mitigate potential safety risks during field operations.

1. Introduction

Reeds (Phragmites australis) play a significant role in China’s agriculture and wetland conservation due to their extensive economic uses and ecological value [1]. In recent years, the planting area of reeds in China has surpassed 3 million acres, and the large-scale, intensive planting model has revealed numerous bottlenecks within traditional agriculture, such as labor shortages, low operational efficiency, and safety risks. To address these challenges, unmanned agricultural machinery has increasingly garnered the attention of researchers in the field of agricultural engineering [2]. However, in reed fields, obstacles such as person, poles, pylons, and stones are inevitably present, posing numerous challenges for unmanned machinery operations. Therefore, the detection of obstacles in agricultural fields is an essential component for the realization of unmanned agricultural machinery. Safety warnings, which can be provided through onboard sensors, enable real-time detection of the distance between obstacles and the machinery’s surrounding environment. When a collision risk is identified, the system provides an appropriate level of alert to the operator, ensuring the safety of the agricultural operations.
The effectiveness and accuracy of farmland obstacle detection largely depend on the sensors employed. Currently, three types of sensors are widely used in the field of obstacle detection: LiDAR [3,4,5], millimeter-wave radar [6], and vision sensors [7,8]. Compared to LiDAR and millimeter-wave radar, vision sensors offer advantages such as lower cost, easier deployment, and the ability to capture richer environmental information [9]. Vision sensors primarily include monocular and stereo cameras. Monocular cameras have a simple structure but can only obtain two-dimensional (2D) information in non-specific environments. In contrast, stereo cameras not only capture 2D image data but also acquire three-dimensional (3D) spatial information [10]. Since this study requires both obstacle detection and localization, stereo cameras are the optimal choice. They provide scene images with depth data, striking a balance between accuracy and sensor cost.
Obstacle detection based on stereo vision requires capturing environmental image data through cameras, followed by the application of relevant object detection algorithms to identify obstacles in the images [11]. Currently, object detection algorithms can be categorized into two main types: traditional object detection algorithms and deep learning-based object detection algorithms [12]. Traditional object detection algorithms rely on predefined classifiers to detect a limited number of categories and typically depend on manually designed feature extraction methods, such as Histogram of Oriented Gradients (HOG) [13] and Scale-Invariant Feature Transform (SIFT) [14], along with classifiers like Support Vector Machine (SVM) [15] and Adaboost [16]. These algorithms follow a multi-step process: first, candidate regions are generated using a sliding window approach; then, each candidate region is classified and undergoes bounding box regression using the classifier, ultimately determining the object’s position and category [17,18]. However, these methods often struggle with complex scenes, multi-scale objects, and large category sets, exhibiting limited performance and low computational efficiency. With advancements in deep learning, object detection algorithms based on Convolutional Neural Networks (CNNs) have gradually replaced traditional methods. CNN-based models can automatically learn features and achieve end-to-end detection, significantly improving both detection accuracy and efficiency. Current mainstream deep learning-based detection algorithms can be broadly divided into two categories: two-stage detection algorithms, represented by the R-CNN [19,20,21,22] series, and single-stage detection algorithms, such as the Single Shot MultiBox Detector (SSD) [23] and You Only Look Once (YOLO) [24,25,26,27,28,29,30,31] series [32]. Two-stage detection algorithms employ a more complex detection pipeline, yielding higher detection accuracy but requiring longer inference times. Consequently, they fail to meet the real-time detection requirements of unmanned agricultural machinery.
Compared to two-stage object detection algorithms, single-stage object detection algorithms complete the detection process with only one feature extraction step, resulting in superior speed while maintaining considerable detection accuracy [33]. In recent years, numerous researchers worldwide have focused on applying single-stage object detection algorithms to autonomous obstacle avoidance, aiming to achieve effective obstacle detection. Liu et al. [34] improved the SSD algorithm to address the challenge of accurately detecting obstacles in complex orchard environments. Wei et al. [35] enhanced YOLOv3 by employing Darknet-53 as the feature extraction network and incorporating residual modules to mitigate gradient issues, achieving accurate obstacle detection in farmland. Cai et al. [36] proposed an improved YOLOv4-based orchard obstacle detection algorithm, reducing the model size by 75% and increasing inference speed by 29.4%. Wang et al. [37] introduced a hierarchical residual structure and SE attention mechanism into YOLOv5 and applied model pruning to enable efficient obstacle detection for mowing robots. Liu et al. [38] developed a tree trunk detection algorithm tailored for camellia orchards, integrating an attention mechanism into YOLOv7, which improved detection accuracy to 89.2%. Similarly, Brown et al. [39] utilized YOLOv8s to detect tree trunks and measure their width in orchards, achieving a mean average precision (mAP) of 98.9%. In another study, Zhang et al. [40] enhanced YOLOv8 by incorporating a large-separable kernel attention mechanism and an auxiliary detection head, improving obstacle detection accuracy in complex farmland environments to 91.8%. Although these studies have successfully advanced obstacle detection, most models still struggle to achieve an optimal balance between detection speed and accuracy.
In summary, considering the driving safety and operational efficiency of unmanned agricultural machinery, the rapid detection of field obstacles and an effective safety warning system are of paramount importance. Due to its outstanding performance, the YOLO algorithm has been widely applied across various engineering fields. Among its versions, YOLOv8 stands out for its high detection speed and accuracy. Building upon the aforementioned studies, we improved the YOLOv8 algorithm with a focus on balancing detection speed and accuracy, ensuring its adaptability to low-cost hardware. Furthermore, we addressed the critical issue of minimizing safety hazards for agricultural machinery by designing a dedicated safety warning system for field obstacles. This system integrates target frame information with a stereo matching algorithm to achieve precise obstacle localization. Additionally, it calculates the Time-to-Collision (TTC) between obstacles and agricultural machinery and implements a graded warning mechanism for potential collision risks.
The remainder of this paper is structured as follows: Section 2 describes the dataset construction process, including data collection, data augmentation, and implementation details during training. Section 3 presents the proposed improvements in detail and introduces the design of the obstacle detection and safety warning system. Section 4 outlines the experimental setup and methodology, followed by an in-depth analysis of the experimental results. Section 5 provides a discussion, and finally.

2. Dataset

2.1. Data Collection

The experimental data collection site was located in Yancheng, Jiangsu Province. To ensure that the collected data used for training closely aligns with the actual detection perspectives encountered in real-world agricultural machinery operations, maintaining both authenticity and feasibility, we utilized a ZED 2 stereo camera mounted at the front of an unmanned agricultural vehicle to record videos of obstacles in reed fields. The ZED 2, developed by Stereolabs, is a stereo camera designed for outdoor applications. It enables the timely detection of obstacles while ensuring that the unmanned agricultural vehicle maintains a safe operating distance. After recording, we employed Python scripts to segment the video into individual frames and, following a selection process, obtained a self-collected dataset consisting of 1200 images. To enhance the model’s generalization capability, we further expanded the dataset by using web crawlers to retrieve images from Google Images based on the keyword ‘field obstacles’. To ensure dataset reliability, we intentionally selected images featuring obstacles from diverse locations, in various orientations, and with distinct feature details. This approach contributes to the robustness of the deep learning model during detection. Combining both sources, we obtained a raw dataset comprising 2589 images, all with a resolution of 640 × 480 pixels. Each image contains at least one obstacle, with the dataset primarily including five common field obstacles: person, pylon, agricultural machinery, pole, and stone. Representative data samples are illustrated in Figure 1.

2.2. Data Preprocessing

The dataset images were annotated using the open-source annotation tool Labelme (v5.4.1), with the annotation data saved in txt file format. Notably, during the annotation process, we only labeled obstacles within a 10 m range from the camera. This range was determined based on a comprehensive consideration of the operating speed and braking performance of agricultural machinery. A 10 m distance was identified as a reasonable and effective annotation range since objects beyond this threshold appear too small, leading to a significant decline in detection accuracy. Consequently, obstacles outside this range were excluded from the annotation process. To ensure a balanced distribution of annotated samples across different obstacle categories, we applied data augmentation techniques to underrepresented classes. This targeted augmentation increased the diversity of images for each category, thereby improving the overall dataset quality and enhancing the model’s training performance. After augmentation, the final dataset expanded to 4459 images. A comparison of the number of annotations before data augmentation is presented in Table 1.
Ensuring a balanced data distribution and reliable experimental results, we employed a stratified sampling strategy to randomly divide the original dataset into training, validation, and test sets in an 8:1:1 ratio. This approach ensures that each subset maintains a consistent sample distribution while remaining mutually exclusive and independent. Additionally, to simulate diverse environmental conditions encountered by agricultural machinery in real-world field operations, we considered various weather conditions (e.g., sunny, overcast, and light rain), different lighting directions (e.g., front light and backlight), various times of the day (morning, noon, afternoon, and evening), and scenarios with blurred camera captures. To enhance the model’s adaptability to these real-world conditions, we applied Albumentations to introduce weather and environmental effects into selected training images while maintaining a consistent sample distribution. These effects included haze, rain, facula, and darkening adjustments, thereby improving the model’s generalization capability and robustness. The image processing methods are illustrated in Figure 2.

3. Methods for Field Obstacle Detection and Warning Systems

3.1. Framework for Obstacle Detection and Safety Warning System

To ensure the safe operation of unmanned agricultural machinery in complex field environments, we designed a system that integrates stereo vision and deep learning algorithms to achieve rapid obstacle detection, precise localization, and timely warning. The principle framework of the system is illustrated in Figure 3.
The system is primarily divided into three main components:
(1)
Utilizing a stereo vision camera to capture RGB images and depth information of the environment in front of the agricultural machinery. The RGB images are used for obstacle detection, while the depth information is employed to calculate the spatial position of obstacles. The CGL-YOLOv8n object detection algorithm is applied to process the RGB images, accurately identifying obstacle categories and extracting their pixel coordinates.
(2)
Integrating the depth information provided by the stereo camera, the obstacle’s pixel coordinates are transformed into three-dimensional coordinates in the camera coordinate system. Through coordinate transformation, the absolute position of the obstacle in the world coordinate system is obtained, enabling precise localization of obstacles.
(3)
After acquiring the world coordinates of the obstacles, the system further calculates the TTC between the obstacle and the unmanned agricultural machinery. TTC is computed based on the relative velocity and distance between the obstacle and the machinery, serving as a critical indicator for collision risk assessment.

3.2. Improved YOLOv8n Lightweight Network for Field Obstacle Detection

Currently, YOLOv8 demonstrates outstanding performance in object detection tasks. However, due to the complexity of field environments and the limited computational capacity of onboard equipment in unmanned agricultural machinery, obstacle detection remains a challenging task. The primary issues can be summarized as follows: Firstly, field obstacles are often occluded by reeds, making it difficult to detect small objects, which can lead to missed detections and false positives. Secondly, the detection accuracy of the algorithm is affected by outdoor lighting conditions and various adverse weather conditions, requiring further improvements. Thirdly, the original YOLOv8n model has a high computational burden and a large number of parameters, resulting in relatively slow obstacle detection. This makes it difficult to achieve an optimal balance between detection speed and accuracy. To address these challenges, we propose the CGL-YOLOv8n model for real-time obstacle detection in reed fields, aiming to enhance both detection accuracy and real-time performance. The overall structure of the improved model is shown in Figure 4. The main optimization strategies are as follows:
  • The CG Block is used as the main gradient flow branch to replace all BottleNeck Blocks in the C2f module. This enhances the model’s ability to extract feature information in complex field environments, improving its capacity to detect small or hard-to-detect objects;
  • The Context-Guide Fusion Module is introduced into the Feature Pyramid Network (FPN), which strengthens important features during the feature fusion process while suppressing irrelevant features. This effectively addresses potential issues with detection accuracy caused by factors such as lighting and weather conditions;
  • The original detection head is replaced with a lightweight Shared Convolutional Separated BN Detection Head to resolve discrepancies in statistical features across different levels. This significantly reduces the number of parameters, resulting in a more lightweight model.

3.2.1. YOLOv8 Object Detection Algorithm

YOLOv8 utilizes deep neural networks for object recognition and localization. Its architecture is a variant of the YOLO series object detection models, using a single neural network to predict the bounding boxes and labels of objects in an image. The network structure of YOLOv8 consists of several key components, including the backbone network, neck network, and detection head. The backbone network is responsible for extracting feature maps from the input image, while the neck and head networks are used to predict the bounding boxes and labels of objects in the feature maps. From the structure of YOLOv8, it is evident that it inherits the architectural concepts of YOLOv5, including the use of the CSPDarkNet (Cross Stage Partial DarkNet) [41] as the backbone network and the design of the neck section, as well as considerations for models at different scales. However, compared to YOLOv5, YOLOv8 includes several notable improvements, including the following:
Backbone: YOLOv8 draws inspiration from the design philosophy of YOLOv7’s Efficient Layer Aggregation Network (ELAN) and uses the CSPNet with two fused layers (C2f) module in the backbone section to replace the CSP Bottleneck with three convolutions (C3) module in YOLOv5, achieving further lightweight optimization. At the same time, it retains the Spatial Pyramid Pooling Fusion (SPPF) module used in architectures such as YOLOv5.
Neck: Although YOLOv8 still adopts the combined idea of Feature Pyramid Network (FPN) [42] and Path Aggregation Network (PAN) [43], structurally, it removes the convolutional structure in the PAN-FPN upsampling stage from YOLOv5 and replaces the C3 module with the C2f module.
Detection Head: The detection head network has been replaced with the currently popular Decoupled-Head structure, which decouples the classification and regression tasks, making the network’s training and inference more efficient. Additionally, the Anchor-Based approach has been discarded in favor of the Anchor-Free concept.
Loss: The loss function used in the YOLOv8 architecture is a combination of classification loss and regression loss. BCE Loss is used for classification loss, while DFL Loss and CIoU Loss are employed for regression loss.

3.2.2. C2f—Context-Guided

The C2f module extracts and transforms input data features through operations such as feature transformation, branch processing, and feature fusion, thereby generating more representative outputs. While branch processing and feature fusion enhance gradient flow information, the structure of the C2f module remains relatively fixed, limiting its ability to effectively leverage global contextual information. In complex scenarios, the model struggles to fully capture the relationship between targets and their surrounding environment. Moreover, the Bottleneck layer in C2f is a crucial component originally introduced in deep residual networks (ResNet) [44], as illustrated in Figure 5. The Bottleneck layer reduces the number of channels to lower computational complexity; however, this compression leads to the loss of critical feature information. This drawback becomes particularly significant in complex field environments, where target diversity and background interference are high. In such cases, the model exhibits insufficient feature extraction capabilities for small or difficult-to-detect targets.
To enhance the model’s feature extraction capability in complex field environments and improve its ability to detect small or hard-to-detect targets, we modified the Bottleneck component of the C2f module by incorporating the Context-Guided Block (CG Block) from CGNet [45]. The core concept of CGNet is to leverage contextual information to assist in object detection. Traditional object detection algorithms primarily focus on the features of the target itself while often neglecting the surrounding contextual information. The CG Block serves as the fundamental unit of CGNet, designed to enhance the model’s representation capability by integrating local features, surrounding contextual features, and global contextual information. Specifically, the CG Block first reduces the number of input channels using a 1 × 1 convolution to lower computational complexity. It then employs a 3 × 3 standard convolution to extract local features and a 3 × 3 dilated convolution to capture a broader range of surrounding contextual information. The extracted features are concatenated, followed by batch normalization and activation function processing. Subsequently, a global context module further refines the features. Finally, based on the add parameter, the output may be summed with the input to form a residual connection, thereby achieving multi-scale contextual information fusion and efficient feature enhancement.
The CG Block consists of four key components, as illustrated in Figure 6, each playing a crucial role in feature extraction and fusion:
Local Feature Extractor  f l o c ( ) : Utilizes standard convolution to extract local features. Through convolution operations, it captures fine details in the input feature map, such as edges and textures. These local features serve as the foundation for subsequent contextual understanding.
Surrounding Context Extractor  f s u r ( ) : Employs dilated convolution to extract surrounding contextual features. Dilated convolution expands the receptive field without increasing the number of parameters, enabling the model to capture a broader range of contextual information. This is particularly beneficial for understanding the global structure of an image, especially in complex scenes.
Joint Feature Extractor  f j o i ( ) : Integrates local and surrounding contextual features using a simple concatenation layer, followed by batch normalization and the PReLU activation function. Batch normalization accelerates training, while the activation function enhances the model’s nonlinear representational capability, ensuring a more effective fusion of local and contextual features.
Global Feature Extractor  f g l o ( ) : Extracts features through a global pooling layer and then connects two fully connected layers to generate a weight vector. This weight vector guides the fusion of joint features, enabling the model to focus on critical features while enhancing useful information during the fusion process. By incorporating global information, this component further optimizes detection performance.
To conclude, replacing all Bottleneck Blocks within the C2f modules with CG Blocks introduces an improvement centered on multi-scale feature extraction and fusion. This approach effectively captures both fine-grained local details and global contextual information, thereby enhancing the model’s performance in complex field environments. Beyond addressing challenges associated with detecting small or hard-to-detect targets and reducing missed or false detections, this modification also contributes to the lightweight design of the model.

3.2.3. ContextGuideFPN

The PAN-FPN used in YOLOv8 is an optimized and improved version of the classic Feature Pyramid Network (FPN), designed to enhance the model’s capability in detecting multi-scale objects by fusing feature maps at different scales, as illustrated in Figure 7a. However, PAN-FPN primarily relies on simple upsampling and addition operations for feature fusion, lacking effective utilization of contextual information and exhibiting limited feature representation capabilities. In complex field environments, where the distinction between targets and background is low, PAN-FPN struggles to adaptively enhance critical features or suppress noisy ones. Relying solely on local features makes accurate target recognition challenging, particularly under uneven lighting conditions or adverse weather.
Based on the aforementioned issues, we propose the Context-Guide Fusion Module (CGFM), an innovative feature fusion module designed to improve the FPN in YOLOv8. Inspired by PSFusion [46], CGFM incorporates contextual guidance and adaptive adjustment in multi-scale feature fusion. By introducing a channel attention mechanism and bidirectional feature interaction, CGFM significantly enhances feature fusion effectiveness, particularly improving object detection performance in challenging conditions such as lighting variations and weather disturbances.
The forward propagation process of CGFM is illustrated in Figure 7b. Initially, a 1 × 1 convolution is applied to adjust the channel number of the input feature X 0 to match that of X 1 . Then, X 0 and X 1 are concatenated along the channel dimension, forming a richer feature representation. The module incorporates a channel attention mechanism based on the Squeeze-and-Excitation (SE) block [47], which fundamentally aims to adaptively recalibrate channel-wise feature responses by leveraging global contextual information, thereby promoting informative features while suppressing redundant ones. After the channel attention calculation, X 0 and X 1 generate weight maps X 0 _ w e i g h t and X 1 _ w e i g h t , respectively. These weights are then used to re-weight the original features by performing an element-wise multiplication, where X 1 _ w e i g h t is applied to X 0 , and X 0 _ w e i g h t is applied to X 1 . This mutual adjustment allows for bidirectional information flow, avoiding isolated information, as shown in Equations (1) and (2). This method of cross-adding and concatenating the weighted features, through the weighted feature reorganization operation, enhances the discriminative power of the feature map. Additionally, it ensures that the fused features retain both their original information and the important information from the other feature, preventing complete overlap or loss of the original features. The overall structure of CGFM is relatively simple, not introducing excessive computational overhead, making it highly suitable for real-time object detection tasks.
X 0 i = X 0 i ( X 1 i δ ( S E A n ( G A P ( C ( X 0 i , X 1 i ) ) ) ) )
X 1 i = X 1 i ( X 0 i δ ( S E A n ( G A P ( C ( X 0 i , X 1 i ) ) ) ) )
where C denotes the concatenation operation, and G A P represents global average pooling.
In conclusion, CGFM enhances the performance of YOLOv8 in multi-scale feature fusion by introducing a channel attention mechanism and bidirectional feature interaction. Its improved contextual information perception, dynamic channel adjustment mechanism, and robustness to complex environments enable YOLOv8 to maintain high detection accuracy even under conditions such as lighting variations and weather disturbances. This improvement not only enhances the model’s practicality but also provides a new solution for applying object detection tasks in complex environments.

3.2.4. SharedSepHead

In the detection head of YOLOv8, due to significant differences in statistical properties between features at different levels, the Normalization layer remains indispensable. However, directly introducing Batch Normalization (BN) in the shared-parameter detection head can lead to errors in the sliding average values, thereby affecting model performance. While Group Normalization (GN) can avoid this issue, it increases the computational overhead during inference. To address this, we refer to the approach used in NASFPN [48] and design a lightweight Shared Convolutional Separated BN Detection Head. This design allows the detection head to share convolutional layers while independently computing the BN layers, without parameter sharing. As a result, each detection layer can learn distinct feature distributions, enhancing the model’s expressive capacity. This design maintains lightweight characteristics while avoiding feature confusion, thereby improving detection accuracy.
As shown in the SharedSepHead structure diagram in Figure 8, its core architecture consists of multiple detection layers. Each detection layer first uses independent convolutional layers to convert the input feature map’s channel count to the number of hidden layer channels. The feature map then undergoes feature extraction and normalization through a shared 3 × 3 convolutional layer and separated Batch Normalization (BN) layers. Afterward, a SiLU activation function is applied to introduce non-linearity. Finally, two 1 × 1 convolutional layers are used to separately output the bounding box regression results and the class prediction results, which are then concatenated as the final output. During the inference stage, the module dynamically generates anchor points and strides, decodes the bounding boxes, and calculates class probabilities, ultimately outputting the detection results. It also supports various export formats to accommodate different deployment scenarios.
To summarize, SharedSepHead significantly reduces the number of parameters and computational load by using shared convolutional layers, where multiple detection layers share the same convolutional weights. Compared to the original detection head in YOLOv8, this design reduces model complexity while maintaining performance, making it more suitable for deployment on resource-constrained devices.

3.3. Formatting of Mathematical Components

The principle of a stereo camera is similar to that of the human eye. The human eye is able to perceive the distance of objects because there is a disparity in the images presented by the two eyes of the same object, which is referred to as “disparity”. Therefore, stereo imaging is based on the principle of disparity, as illustrated in the model shown in Figure 9. The two cameras are placed horizontally, with a distance b between them and a focal length of f . Point P represents a point in the real world, which is projected onto points P l on the left imaging plane Y l and P r on the right imaging plane Y r , while Z c is the distance from point P c to the depth camera. Since the cameras are horizontally placed, the positional deviation only occurs along the X -axis, meaning the deviation along the Y -axis is identical. The positional disparity of point P c between the left and right imaging planes is denoted as C l P l = X l , P r C r = X r ( X r < 0 ) . From P c P l P r ~ P c O l O r similarity, the following relationship can be derived:
Z c Z c f = b b ( X l X r )
In the equation, b and f are the intrinsic parameters of the camera, obtained through calibration. X l X r represents the disparity, which is obtained through pixel point matching. The final depth distance Z c is given by
Z c = f b X l X r
Mapping a point from the two-dimensional pixel plane to a position in the three-dimensional world, a series of transformations are required, including the conversion between the pixel coordinate system, the image coordinate system, the camera coordinate system, and the world coordinate system [49]. The pixel coordinate system is first translated to the image coordinate system, and the image coordinate system is then transformed to the camera coordinate system through perspective projection. The three-dimensional coordinates of the obstacle’s location in the camera coordinate system can be calculated using Equation (5):
X c = Z c ( u u 0 ) f x Y c = Z c ( v v 0 ) f y Z c = f b X l X r
The transformation from the camera coordinate system to the world coordinate system is a rigid body transformation, which can be achieved through rotation and translation. The transformation formula is as follows:
X w Y w Z w 1 = R T 0 1 1 X c Y c Z c 1

3.4. Safety Warning Strategy

The safety warning strategy is a core element in ensuring the safety of autonomous agricultural machinery during field operations. Currently, safety distance-based forward safety warning systems are one of the mainstream approaches. The safety distance refers to the minimum distance that an autonomous agricultural machine must maintain from an obstacle in order to avoid a collision during its operation. However, warning systems that solely rely on safety distance have certain limitations, especially when dealing with dynamic obstacles. The relative speed of the agricultural machinery can significantly affect the effectiveness of the warning. Therefore, we introduce the concept of safety time and design a safety time-based warning system, which becomes an important direction for enhancing the scientific and practical nature of the warning system.
Safety time refers to the minimum time required for an autonomous agricultural machine to avoid a collision with an obstacle ahead. Compared to safety distance, safety time not only considers the relative distance between the agricultural machine and the obstacle but also incorporates the factor of relative speed, thus providing a more comprehensive reflection of potential collision risks. Figure 10 illustrates the relationship between the vehicle’s distance and the target ahead based on safety time. By acquiring continuous frame images from the visual sensors, the system can calculate the inter-frame distance change and, combined with the time interval, derive the relative speed between the agricultural machine and the obstacle. Based on relative distance and relative speed, the TTC formula, which aligns with safety warning logic, can be further derived:
T T C d 1 v
v = d 0 d 1 t = d t
In Equation (8), d 1 represents the distance between the agricultural machine and the obstacle ahead at the current frame; d 0 represents the distance between the agricultural machine and the obstacle ahead in the previous frame; t is the time interval between the two frames.
T T C can dynamically reflect the safety status between the agricultural machine and the obstacle. When the unmanned agricultural machine starts and reaches a constant operating speed, the safety warning system is activated and categorizes the warning levels based on the T T C value. Specifically, when T T C falls within the interval (−∞, −6.2 s], the system does not issue a warning, indicating no collision risk and that the environment is safe. When T T C is in the interval (−6.2 s, −4.2 s], the system triggers a level one yellow warning, alerting the operator to be cautious of the obstacle ahead. When T T C is in the interval (−4.2 s, 0 s], the system triggers a level two red warning, urging the operator to take evasive or braking actions.

4. Experimental Results and Analysis

4.1. Experimental Environment

To ensure the fairness and reliability of each experiment, all experimental parameters are listed in Table 2.
The setting of hyperparameters is aimed at ensuring efficient training and validation of the model, maximizing its performance and accuracy. The hyperparameter settings used in the experimental model are provided in Table 3.
It is worth noting that we implemented a stopping mechanism during the training process. If no improvement in adaptability is observed over 50 consecutive epochs, training will be halted early to prevent potential overfitting. Additionally, in each of the last 10 training epochs, mosaic data augmentation is disabled once, which helps to better adjust the model parameters.

4.2. Evaluation Metrics

Evaluating the performance of our CGL-YOLOv8n model in detecting obstacles in reed fields, we used three commonly used evaluation metrics in object detection tasks: Precision, Recall, and Average Precision (AP). These metrics comprehensively assess the model’s performance in terms of classification accuracy, localization precision, and operational efficiency. Therefore, the combined evaluation of these metrics provides important guidance for model optimization. The formulas for calculating P r e c i s i o n , R e c a l l , A P , and m A P are shown in Equations (9)–(12) below:
P r e c i s i o n = T P T P + F P × 100 %
R e c a l l = T P T P + F N × 100 %
A P = 0 1 P ( R ) d R
m A P = 1 n i = 1 n A P n
where P r e c i s i o n —precision rate; R e c a l l —recall rate; m A P —mean average precision.
T P —number of correctly detected obstacles;
T P —number of false positive detections;
F N —number of missed obstacles;
A P —the area under the Precision–Recall curve.
In addition to the above evaluation metrics, we also use FLOPs, Params, and FPS as evaluation indicators. FLOPs and Params represent the number of floating-point operations executed per second during model training and the number of parameters, respectively, and are used to assess the model’s computational complexity. FPS refers to the number of image frames processed per second during model training and is used to characterize the inference speed of the model.

4.3. Model Training and Detection Results

We compared the performance of the CGL-YOLOv8n model with the baseline YOLOv8n model. Both models were trained using the same dataset, but the baseline YOLOv8n model did not incorporate the Context-Guided Block (CG Block), Context-Guided Fusion Module (CGFM), and Shared Convolutional Separated BN Detection Head (SharedSepHead).
From Figure 11, we can clearly analyze the changes in box loss and mAP during the training process. It is evident from the figure that CGL-YOLOv8n converges faster than YOLOv8n. The predicted bounding boxes are able to locate closer to the true bounding boxes more quickly, while also exhibiting lower loss values. This indicates that CGL-YOLOv8n achieves better convergence performance. The change in mAP further demonstrates that CGL-YOLOv8n significantly outperforms YOLOv8n in terms of accuracy during training.

4.4. Ablation Experiment

We designed ablation experiments using the same dataset to verify the effectiveness of the proposed Context-Guided Block, Context-Guide Fusion Module, and SharedSepHead in improving model performance. The ablation experiments involve adding or replacing the improved modules on the baseline model for parameter comparison, aiming to evaluate the effectiveness and feasibility of the improved YOLOv8 algorithm. The results of the ablation experiments are shown in Table 4.
As shown in Table 4, after using the C2f-CG module, although Precision and FPS experienced a slight decline, both Recall and mAP@0.5 significantly improved, while FLOPs and Params were greatly reduced by 28.4% and 30.2%, respectively. Furthermore, after introducing the CGFPN module, Precision, mAP@0.5, and FPS returned to levels similar to the original model, while Recall increased substantially by 2.6%. However, the results for Params and FLOPs showed a slight increase in model complexity. After incorporating the SharedSepHead module, we observed that while FLOPs and Params decreased, Precision and Recall further improved, achieving model lightweighting while also accelerating inference speed. Finally, when the C2f-CG, CGFPN, and SharedSepHead modules were combined, Precision and mAP@0.5 were significantly enhanced by 1.5% and 0.7%, respectively, while FLOPs and Params were notably reduced to 5.4 G and 2.05 M. Although the inference speed per image was 1.8 frames per second slower than the original network, the overall network detection speed still reached 59.1 frames per second, meeting the real-time requirements for practical applications. In summary, all three methods positively impacted YOLOv8’s detection performance, demonstrating the rationality of our improvements.
The results of the ablation experiments are illustrated in Figure 12. From the final outcomes, it is evident that our improved model significantly outperforms the baseline model in terms of Precision, Recall, mAP@0.5, FLOPs, and Params. This combined approach achieves a substantial reduction in model complexity while simultaneously delivering comprehensive improvements in Precision and mAP@0.5, making it the optimal choice for balancing both performance and efficiency.

4.5. Comparison Experiment

Several popular models were trained on the same dataset, including Faster R-CNN, Cascade R-CNN, RetinaNet, and the YOLO series (YOLOv5n, YOLOv6n, YOLOv7, YOLOv8n, YOLOv9t, YOLOv10n). These models were compared with the proposed improved algorithm in terms of detection accuracy, computational load, model size, and FPS to verify the performance of the proposed algorithm. The experimental results are shown in Table 5.
As shown in the table, the proposed CGL-YOLOv8n achieves a good balance between performance and efficiency compared to other mainstream object detection models. First, in terms of detection accuracy (mAP), CGL-YOLOv8n reaches 92.3%, significantly outperforming the two-stage object detection algorithms, Faster R-CNN and Cascade R-CNN. It also outperforms the one-stage object detection algorithms, RetinaNet and the YOLO series models, demonstrating its superior performance in object detection tasks. Secondly, in terms of computational efficiency, CGL-YOLOv8n further reduces the computational load compared to the commonly used lightweight algorithm, YOLOv5, with a FLOPs of only 5.4 G, standing out among all models and indicating its significant advantage in computational efficiency. Additionally, CGL-YOLOv8n has a model size of only 4.2 MB. Although YOLOv9t also excels in accuracy and model size, it has a relatively low FPS. For real-time field target detection, ensuring both accuracy and speed is crucial. Therefore, the proposed CGL-YOLOv8n strikes a better balance between accuracy and detection speed compared to the aforementioned different types of detection models. Moreover, its smaller model size makes CGL-YOLOv8n the optimal choice for deploying perception systems.
We selected the lightweight YOLOv5n, YOLOv6n, YOLOv8n, YOLOv9t, YOLOv10n, and CGL-YOLOv8n for comparative evaluation on the test set of the dataset. This approach allows for a more intuitive assessment of the performance of the improved algorithm in terms of detection accuracy. The test results are presented in Figure 13. As observed in the figure, compared to other YOLO algorithms, the proposed algorithm demonstrates superior accuracy, particularly in scenarios where obstacles are partially occluded by crops or significantly affected by lighting conditions. Moreover, when detecting small obstacle targets such as stones, the proposed model exhibits no missed detections or false positives. Overall, these results validate the effectiveness of the improved YOLOv8 model for obstacle detection in reed fields and confirm that the model successfully balances detection speed and accuracy requirements.

4.6. Accuracy Test of the Safety Warning System

To verify the reliability of the safety warning system, real-world field tests were conducted in Yancheng, Jiangsu Province. Before the experiment, a binocular stereo camera was securely mounted on the unmanned agricultural vehicle, with its lenses facing directly forward toward the farmland. By utilizing the binocular stereo camera, common obstacles in the field ahead of the unmanned agricultural vehicle were detected, and their corresponding categories, confidence scores, and three-dimensional coordinates were obtained. Finally, the unmanned agricultural vehicle was set to move forward at a constant speed, and obstacles appearing in the operational area ahead were classified into different safety warning levels accordingly.
During the obstacle detection experiment, the numbers of detected targets for Person, Pylon, Agricultural Machinery, Pole, and Stone were 38, 29, 35, 31, and 30, respectively. The successfully detected targets were 36, 27, 35, 29, and 27, corresponding to detection success rates of 94.7%, 93.1%, 100%, 93.5%, and 90%, respectively. The overall detection success rate reached 94.5%, demonstrating the accuracy of our improved model in obstacle detection. To further establish the reliability of the safety warning system and validate its positioning accuracy, we conducted a positioning accuracy test for five types of obstacles. The test was performed within a 2–10 m range, where obstacles present in the reed field were localized, and their three-dimensional center coordinates were recorded. A laser rangefinder (with an error of ±1.5 mm) was employed to verify and obtain the actual distances, which were then used as reference measurements for evaluation and analysis. The experiment was repeated multiple times while continuously altering the distance, and all results were recorded accordingly. To better illustrate the discrepancy between actual and measured distances, we conducted a comparative analysis of 50 experimental trials (10 trials per obstacle type). The comparison included actual distance, measured distance, absolute error, and relative error, and a statistical comparison chart was plotted, as shown in Figure 14. In summary, as the distance increased, the measurement accuracy for all obstacle types gradually decreased. However, within a 5 m range, the relative error remained below 3%, and within a 10 m range, the relative error was below 5%. Overall, the detection accuracy fully meets the real-time distance measurement requirements.
Based on ensuring the accuracy of obstacle recognition and the precision of localization, a safety warning experiment was designed to validate the reliability of the safety warning system. The experimental design is as follows: First, a fixed calibration point was set in the field as the warning initiation point. The unmanned agricultural vehicle traveled at a constant speed of 3 km/h, and obstacle monitoring commenced as the vehicle passed this point. Various types of field obstacles were placed ahead of the calibration point to assess the accuracy of the system’s warnings. The warning triggers were classified into two levels: the first-level warning was triggered 6.2 s before an anticipated collision, while the second-level warning was triggered 4.2 s prior to a collision. The warning trigger points were calculated based on vehicle speed and estimated collision time. The evaluation criteria for the experiment included the following: before reaching the calibration point, the system should remain in a no-warning state. If no warning was issued, it was recorded as a valid no-warning case; if a false alarm was triggered, it was not considered a valid no-warning case. After passing the calibration point, the system was required to accurately trigger the corresponding first- or second-level warning based on obstacle position; failure to do so was recorded as a warning failure. In the experiment, manually recorded theoretical warning occurrences were used as the reference standard to evaluate system performance. This design allows for an assessment of the unmanned agricultural vehicle’s ability to trigger warnings accurately at different time points, ensuring that no warnings are issued before the calibration point while correctly triggering warnings after it. Otherwise, the warning is considered invalid. Through this experiment, the accuracy and reliability of the warning algorithm can be systematically evaluated. Table 6 presents the statistics on warning occurrences and accuracy.
Table 6 shows that the overall warning accuracy achieved is 86%, confirming the reliability of the safety warning system and demonstrating that the system can effectively address the issue of minimizing safety risks for agricultural machinery. The field detection and safety warning visualization results are shown in Figure 15. During the experiment, we observed that the system still experienced some degree of missed alarms and false positives during actual operation. After a thorough analysis of the sample data, we identified two main reasons for this: First, the stereo camera is affected by the accuracy of stereo matching, and the target may experience some deviation during the localization process, resulting in errors in the calculated target distance. The accumulation of these distance errors further impacts the accuracy of the TTC calculation, potentially causing the system to issue incorrect warnings or fail to provide timely warnings. Second, when insufficient valid localization pixels are successfully retrieved within the target box, the system may incorrectly judge that the target does not exist, leading to a missed alarm. This situation may be related to factors such as changes in environmental lighting and weak surface texture features of the target. To address these issues, future work will focus on optimizing the stereo matching algorithm to improve target localization accuracy, as well as refining the effective pixel extraction strategy within the target area to reduce the occurrence of false positives and missed alarms, thereby enhancing the overall reliability of the system.

5. Conclusions

The agricultural environment is complex and highly variable, with various obstacles inevitably present in the fields, posing significant safety risks during the autonomous operation of unmanned agricultural machinery. Due to the limitations of existing satellite navigation technologies, which are susceptible to obstruction and interference in field conditions, visual sensors are employed to provide accurate detection and target localization for unmanned agricultural machinery. To address the issues of high model complexity and poor real-time performance in existing detection algorithms, this paper presents an improved lightweight YOLOv8n model for field obstacle detection based on CG Block, CGFPN, and SharedSepHead. Additionally, by incorporating stereo cameras to obtain the three-dimensional coordinates of obstacles, a safety warning strategy is proposed. The strategy calculates the Time to Collision (TTC) using the relative speed and distance information between the obstacles and the agricultural machinery, and classifies the warning levels based on the TTC value. The main conclusions are as follows:
(1)
On the test set, the improved model demonstrates excellent discrimination and recognition performance for five types of obstacles. Compared to the YOLOv8n model, the improved model achieves a mean Average Precision (mAP) of 92.3%, with a reduction in the number of parameters and computational complexity by 31.9% and 33.4%, respectively. The model size is only 4.2 MB, reduced by 30%, and the inference speed for a single image is 59.1 frames per second. Although there is a slight decrease compared to the baseline model, the improved model still meets the real-time detection performance requirements. When compared to Faster R-CNN, Cascade R-CNN, RetinaNet, and the YOLO series models (YOLOv5n, YOLOv6n, YOLOv7, YOLOv8n, YOLOv9t, YOLOv10n), the main advantages of our model lie in its ability to maintain the best balance between parameters, computational load, detection speed, and accuracy. While maintaining high detection precision, the model is more lightweight, consumes less memory, and offers better real-time performance. This makes it more suitable for deployment on edge devices with limited computational power and memory, thus reducing costs and improving efficiency. Additionally, the model lowers the usage threshold for unmanned agricultural machinery, facilitating the advancement of agricultural automation.
(2)
In the field dynamic experiments, the overall detection success rate for obstacle target detection reached 94.5%, with positioning accuracy showing a relative error of less than 3% within a 5 m range and less than 5% within a 10 m range, fully meeting the requirements for real-time detection and distance measurement. Based on ensuring the accuracy of obstacle recognition and positioning, further safety warning experiments were conducted. By comparing the number of manual warnings with the system’s warnings, the final overall warning accuracy reached 86%, confirming the reliability of the safety warning system. This study also provides a reference for obstacle detection and safety warning in other scenarios.
(3)
Future research will focus on further optimizing obstacle detection and early warning systems under various complex weather and lighting conditions, enhancing the robustness and adaptability of models in extreme environments. Additionally, plans include integrating unmanned agricultural machinery control systems to develop a unified autonomous farming platform, promoting the integration of detection, early warning, and decision control applications. Furthermore, the proposed lightweight detection models and safety warning methods can be directly deployed on agricultural machinery terminals with limited computational power, not only improving the intelligence level of agricultural operations but also helping to reduce equipment costs and usage difficulty, with promising prospects for widespread application.

Author Contributions

Conceptualization, Y.Z. and J.H.; methodology, Y.Z., K.T. and J.H.; software, Y.Z. and J.H.; validation, Y.Z. and B.Z.; formal analysis, Y.Z., B.Z. and Z.M.; investigation, Y.Z. and B.Z.; resources, Y.Z., J.H., K.T. and Z.M.; data curation, Y.Z., K.T. and Z.M.; writing—original draft preparation, Y.Z.; writing—review and editing, Y.Z. and K.T.; visualization, Y.Z. and Z.M.; supervision, Z.M. and J.H.; project administration, J.H.; funding acquisition, J.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Jiangsu Province Agricultural Science and Technology Independent Innovation Fund Program (CX(22)3096).

Data Availability Statement

The data presented in this study are available in the article.

Acknowledgments

The authors thank the editor and anonymous reviewers for providing helpful suggestions for improving the quality of this manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Yin, Q.; Li, Y.M.; Ji, B.B.; Chen, L.P. Design and Experiment of Clamping and Conveying Device for Self Propelled Reed Harvester. J. Agric. Mech. Res. 2023, 45, 113–118. [Google Scholar]
  2. Li, S.; Xu, H.; Ji, Y.; Cao, R.; Zhang, M.; Li, H. Development of a following agricultural machinery automatic navigation system. Comput. Electron. Agric. 2019, 158, 335–344. [Google Scholar] [CrossRef]
  3. Shang, Y.H.; Zhang, G.Q.; Meng, Z.J.; Wang, H.; Su, C.H.; Song, Z.H. Field Obstacle Detection Method of 3D LiDAR Point Cloud Based on Euclidean Clustering. Nongye Jixie Xuebao/Trans. Chin. Soc. Agric. Mach. 2022, 53, 23–32. [Google Scholar]
  4. Hu, L.; Wang, Z.M.; Wang, P.; He, J.; Jiao, J.K.; Wang, C.Y.; Li, M.J. Agricultural robot positioning system based on laser sensing. Trans. Chin. Soc. Agric. Eng. 2023, 39, 1–7. [Google Scholar]
  5. Xie, P.; Wang, H.C.; Huang, Y.X.; Gao, Q.; Bai, Z.H.; Zhang, L.N.; Ye, Y.X. LiDAR-Based Negative Obstacle Detection for Unmanned Ground Vehicles in Orchards. Sensors 2024, 24, 7929. [Google Scholar] [CrossRef]
  6. Xue, J.L.; Cheng, F.; Wang, B.Q.; Li, Y.Q.; Ma, Z.B.; Chu, Y.Y. Method for Millimeter Wave Radar Farm Obstacle Detection Based on Invalid Target Filtering. Trans. CSAM 2023, 54, 233–240. [Google Scholar]
  7. Lai, H.R.; Zhang, Y.W.; Zhang, B.; Yin, Y.X.; Liu, Y.H.; Dong, Y.H. Design and experiment of the visual navigation system for a maize weeding robot. Trans. Chin. Soc Agric. Eng. 2023, 39, 18–27. [Google Scholar]
  8. Liu, H.; Zheng, X.P.; Shen, Y.; Wang, S.Y.; Shen, Z.F.; Kai, J.R. Method for the target detection of seedlings and obstacles in nurseries using improved YOLOv5s. Trans. Chin. Soc. Agric. Eng. 2024, 40, 136–144. [Google Scholar]
  9. Lan, Y.B.; Yan, Y.; Wang, B.J.; Song, C.C.; Wang, G.B. Current status and future development of the key technologies for intelligent pesticide spraying robots. Trans. Chin. Soc. Agric. Eng. 2022, 38, 30–40. [Google Scholar]
  10. He, Y.; Jiang, H.; Fang, H.; Wang, Y.; Liu, Y.F. Research progress of intelligent obstacle detection methods of vehicles and their application on agriculture. Trans. Chin. Soc. Agric. Eng. 2018, 34, 21–32. [Google Scholar]
  11. Sun, Y.P.; Sun, J.; Yuan, B.K.; Fang, Z.; Qin, Y.; Zhao, D.A. Ligntweight crab pond obstacle detection and location method based on improved YOLOv5s. Trans. Chin. Soc. Agric. Eng. 2023, 39, 152–163. [Google Scholar]
  12. Xu, Y.; Xiong, J.J.; Li, L.; Peng, Y.J.; He, J.J. Detecting pepper cluster using improved YOLOv5s. Trans. Chin. Soc. Agric. Eng. 2023, 39, 283–290. [Google Scholar]
  13. Navneet, D.; Bill, T. Histograms of Oriented Gradients for Human Detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; pp. 886–893. [Google Scholar]
  14. David, G.L. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar]
  15. Vapnik, V.N. The Nature of Statistical Learning Theory; Springer: Berlin/Heidelberg, Germany, 1995. [Google Scholar]
  16. Freund, Y.; Schapire, R.E. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef]
  17. Viola, P.; Jones, M. Robust real-time face detection. J. Comput. Vis. 2004, 57, 137–154. [Google Scholar] [CrossRef]
  18. Viola, P.; Jones, M. Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Kauai, HI, USA, 8–14 December 2001; pp. 1584–1598. [Google Scholar]
  19. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
  20. Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
  21. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
  22. Cai, Z.; Vasconcelos, N. Cascade R-CNN: High quality object detection and instance segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 43, 1483–1498. [Google Scholar] [CrossRef]
  23. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, D.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Singleshot Multibox Detector; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
  24. Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
  25. Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
  26. Glenn, J. YOLOv5 Release v6.1. 2022. Available online: https://github.com/ultralytics/yolov5/releases/tag/v6.1 (accessed on 16 March 2025).
  27. Li, C.Y.; Li, L.L.; Jiang, H.L.; Weng, K.H.; Geng, Y.F.; Li, L.; Ke, Z.D.; Li, Q.Y.; Cheng, M.; Nie, W.Q.; et al. YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
  28. Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
  29. Glenn, J. Ultralytics YOLOv8. 2022. Available online: https://github.com/ultralytics/ultralytics (accessed on 27 April 2025).
  30. Wang, C.Y.; Yeh, I.H.; Liao, H.Y.M. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv 2024, arXiv:2402.13616. [Google Scholar]
  31. Wang, A.; Chen, H.; Liu, L.H.; Chen, K.; Lin, Z.J.; Han, J.G.; Ding, G.G. YOLOv10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
  32. Zou, Z.; Chen, K.; Shi, Z.; Guo, Y.H.; Ye, J.P. Object detection in 20 years: A survey. Proc. IEEE 2023, 111, 257–276. [Google Scholar] [CrossRef]
  33. Diwan, T.; Anirudh, G.; Tembhurne, J.V. Object detection using YOLO: Challenges, architectural successors, datasets and applications. Multimed. Tools Appl. 2023, 82, 9243–9275. [Google Scholar] [CrossRef]
  34. Liu, H.; Zhang, L.S.; Shen, Y.; Zhang, J.; Wu, B. Real-time Pedestrian Detection in Orchard Based on Improved SSD. Nongye Jixie Xuebao/Trans. Chin. Soc. Agric. Mach. 2019, 50, 29–35, 101. [Google Scholar]
  35. Wei, J.S.; Pan, S.G.; Tian, G.Z.; Gao, W.; Sun, Y.C. Design and experiments of the binocular visual obstacle perception system for agricultural vehicles. Trans. Chin. Soc. Agric. Eng. 2021, 37, 55–63. [Google Scholar]
  36. Cai, S.P.; Sun, Z.M.; Liu, H.; Wu, H.X.; Zhuang, Z.Z. Real-time detection methodology for obstacles in orchards using improved YOLOv4. Trans. Chin. Soc. Agric. Eng. (Trans. CSAE) 2021, 37, 36–43. [Google Scholar]
  37. Wang, X.Y.; Yi, Z.Y. Research on obstacle detection method of mowing robot working environment based on improved YOLOv5. J. Chin. Agric. Mech. 2023, 44, 171–176. [Google Scholar] [CrossRef]
  38. Liu, Y.; Wang, H.R.; Liu, Y.H.; Luo, Y.Y.; Li, H.Y.; Chen, H.F.; Liao, K.; Li, L.J. A trunk detection method for camellia oleifera fruit harvesting robot based on improved YOLOv7. Forests 2023, 14, 1453. [Google Scholar] [CrossRef]
  39. Brown, J.; Paudel, A.; Biehler, D.; Thompson, A.; Karkee, M.; Grimm, C.; Davidson, J.R. Tree detection and in-row localization for autonomous precision orchard management. Comput. Electron. Agric. 2024, 227, 109454. [Google Scholar] [CrossRef]
  40. Zhang, Y.; Tian, K.; Huang, J.; Wang, Z.L.; Zhang, B.; Xie, Q. Field obstacle detection and location method based on binocular vision. Agricluture 2024, 14, 1493. [Google Scholar] [CrossRef]
  41. Mish, M.D. A Self Regularized Non-Monotonic Neural Activation Function. arXiv 2019, arXiv:1908.08681. [Google Scholar]
  42. Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.M.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar]
  43. Liu, S.; Qi, L.; Qin, H.F.; Shi, J.P.; Jia, J.Y. Path aggregation network for instance segmentation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
  44. He, K.M.; Zhang, X.Y.; Ren, S.Q.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  45. Wu, T.Y.; Tang, S.; Zhang, R.; Zhang, Y.D. CGNet: A Light-weight Context Guided Network for Semantic Segmentation. IEEE Trans. Image Process. 2021, 30, 1169–1179. [Google Scholar] [CrossRef] [PubMed]
  46. Tang, L.F.; Zhang, H.; Xu, H.; Ma, J.Y. Rethinking the necessity of image fusion in high-level vision tasks: A practical infrared and visible image fusion network based on progressive semantic injection and scene fidelity. Inf. Fusion 2023, 99, 101870. [Google Scholar] [CrossRef]
  47. Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
  48. Ghiasi, G.; Lin, T.Y.; Pan, R.M.; Le, Q.V. NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 15–20 June 2019; pp. 7029–7038. [Google Scholar]
  49. Duan, J.L.; Wang, Z.R.; Zou, X.J.; Yuan, H.T.; Huang, G.S.; Yang, Z. Recognition of bananas to locate bottom fruit axis using improved YOLOv5. Trans. Chin. Soc Agric. Eng. 2022, 38, 122–130. [Google Scholar]
Figure 1. Partial Data Samples.
Figure 1. Partial Data Samples.
Agronomy 15 01158 g001
Figure 2. Image Processing Methods: (a) Sensor Noise; (b) Darkening; (c) Shadows; (d) Haze; (e) Rain; (f) Facula.
Figure 2. Image Processing Methods: (a) Sensor Noise; (b) Darkening; (c) Shadows; (d) Haze; (e) Rain; (f) Facula.
Agronomy 15 01158 g002
Figure 3. Principle Framework of Obstacle Detection and Safety Warning System.
Figure 3. Principle Framework of Obstacle Detection and Safety Warning System.
Agronomy 15 01158 g003
Figure 4. Improved YOLOv8n network model architecture.
Figure 4. Improved YOLOv8n network model architecture.
Agronomy 15 01158 g004
Figure 5. C2f and Bottleneck Structure Diagrams.
Figure 5. C2f and Bottleneck Structure Diagrams.
Agronomy 15 01158 g005
Figure 6. An overview of the Context-Guided block.
Figure 6. An overview of the Context-Guided block.
Agronomy 15 01158 g006
Figure 7. (a) PAN-FPN; (b) Context-Guide Fusion Module.
Figure 7. (a) PAN-FPN; (b) Context-Guide Fusion Module.
Agronomy 15 01158 g007
Figure 8. Shared Convolutional Separated BN Detection Head.
Figure 8. Shared Convolutional Separated BN Detection Head.
Agronomy 15 01158 g008
Figure 9. Schematic diagram of distance measurement principle.
Figure 9. Schematic diagram of distance measurement principle.
Agronomy 15 01158 g009
Figure 10. Relationship Between the Vehicle and the Target Ahead Based on Safety Time.
Figure 10. Relationship Between the Vehicle and the Target Ahead Based on Safety Time.
Agronomy 15 01158 g010
Figure 11. Analysis of the training process. (a) Box loss; (b) mAP.
Figure 11. Analysis of the training process. (a) Box loss; (b) mAP.
Agronomy 15 01158 g011
Figure 12. Ablation Experiment Results Analysis Diagram, where A: C2f—CG; B: CGFPN; C: SharedSepHead. (a) Comparison of P, R, and mAP values; (b) Comparison of FLOPs and Params values.
Figure 12. Ablation Experiment Results Analysis Diagram, where A: C2f—CG; B: CGFPN; C: SharedSepHead. (a) Comparison of P, R, and mAP values; (b) Comparison of FLOPs and Params values.
Agronomy 15 01158 g012
Figure 13. Detection results of five types of field obstacles across different models. (a) YOLOv5n; (b) YOLOv6n; (c) YOLOv8n; (d) YOLOv9t; (e) YOLOv10n; (f) CGL-YOLOv8n.
Figure 13. Detection results of five types of field obstacles across different models. (a) YOLOv5n; (b) YOLOv6n; (c) YOLOv8n; (d) YOLOv9t; (e) YOLOv10n; (f) CGL-YOLOv8n.
Agronomy 15 01158 g013
Figure 14. Statistical Analysis Chart of Localization Experiment Results. (a) Comparison of actual distance and test distance; (b) absolute error statistics analysis; (c) relative error statistics analysis.
Figure 14. Statistical Analysis Chart of Localization Experiment Results. (a) Comparison of actual distance and test distance; (b) absolute error statistics analysis; (c) relative error statistics analysis.
Agronomy 15 01158 g014
Figure 15. Field Detection and Safety Warning Visualization Results.
Figure 15. Field Detection and Safety Warning Visualization Results.
Agronomy 15 01158 g015
Table 1. The number of images and labels for the five categories.
Table 1. The number of images and labels for the five categories.
ClassImageInstance
PersonPylonAgri-MachineryPoleStoneAll
Original2589133478366293812244941
Final4459198215321590173616708510
Table 2. Experimental environment.
Table 2. Experimental environment.
TypeParameter
OSWindows 11
GPUNVIDIA RTX A4000 GPU
CPUIntel(R) Xeon(R) Gold 5218R CPU @ 2.10 GHz
RAM/VRAM128 GB/16 GB
Interpreted language versionPython 3.8.16
Framework and gas pedal versionsPyTorch 1.12.1, CUDA 11.3, CUDNN 8.2.1
Table 3. Hyperparameter settings.
Table 3. Hyperparameter settings.
HyperparameterValue
Epochs300
Batch size32
Workers4
OptimizerSGD
Patience50
Close mosaic10
Table 4. Comparison Results of Ablation Experiments.
Table 4. Comparison Results of Ablation Experiments.
C2f-CGCGFPNSharedSepHeadP/%R/%mAP@0.5/%FLOPs/GParams/MFPS
91.384.691.68.13.0160.9
90.686.892.15.82.1055.2
91.587.291.78.33.1659.5
91.885.691.86.52.3668.5
91.385.392.17.02.6957.6
92.885.092.35.42.0559.1
Note: The check mark (√) in the table denotes that the corresponding module is activated.
Table 5. Comparative Results from Various Object Detection Models.
Table 5. Comparative Results from Various Object Detection Models.
ModelmAPFLOPsModel SizeFPS
OthersFaster R-CNN89.7201.0 G166.8 MB14.5
Cascade R-CNN90.5228.0 G276.8 MB12.0
RetinaNet90.1211.9 G145.6 MB16.2
YOLOv5n90.77.1 G5.0 MB58.1
YOLOv6n91.011.8 G8.3 MB69.5
YOLOv790.9105.2 G71.4 MB28.3
YOLOv8n91.68.1 G6.0 MB60.9
YOLOv9t91.87.6 G4.4 MB26.4
YOLOV10n91.56.5 G5.5 MB53.7
OursCGL-YOLOv8n92.35.4 G2.05 MB59.1
Table 6. Statistical Table of Warning Frequency and Accuracy.
Table 6. Statistical Table of Warning Frequency and Accuracy.
Obstacle TypeVehicle SpeedManual Warning CountSystem Warning CountPer-Class Warning AccuracyOverall Warning Accuracy
No/Level 1/Level 2NoLevel 1Level 2
Person3 km/h10891090%86%
Pylon3 km/h1078877%
Agri-machinery3 km/h10991094%
Pole3 km/h10910994%
Stone3 km/h1078877%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, Y.; Mu, Z.; Tian, K.; Zhang, B.; Huang, J. Design and Research on a Reed Field Obstacle Detection and Safety Warning System Based on Improved YOLOv8n. Agronomy 2025, 15, 1158. https://doi.org/10.3390/agronomy15051158

AMA Style

Zhang Y, Mu Z, Tian K, Zhang B, Huang J. Design and Research on a Reed Field Obstacle Detection and Safety Warning System Based on Improved YOLOv8n. Agronomy. 2025; 15(5):1158. https://doi.org/10.3390/agronomy15051158

Chicago/Turabian Style

Zhang, Yuanyuan, Zhongqiu Mu, Kunpeng Tian, Bing Zhang, and Jicheng Huang. 2025. "Design and Research on a Reed Field Obstacle Detection and Safety Warning System Based on Improved YOLOv8n" Agronomy 15, no. 5: 1158. https://doi.org/10.3390/agronomy15051158

APA Style

Zhang, Y., Mu, Z., Tian, K., Zhang, B., & Huang, J. (2025). Design and Research on a Reed Field Obstacle Detection and Safety Warning System Based on Improved YOLOv8n. Agronomy, 15(5), 1158. https://doi.org/10.3390/agronomy15051158

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop