1. Introduction
Soybean is rich in high-quality plant protein and various essential nutrients, making it a major source of dietary protein. It is widely used in edible oil processing, feed production, and the food industry and plays a crucial role in ensuring global food security and promoting sustainable agricultural development [
1,
2]. Consequently, soybeans are widely cultivated worldwide. Weed management during the seedling stage is a crucial component of soybean production, as weeds directly compete with crops for essential resources such as nutrients, water, and light, thereby adversely affecting crop growth and yield [
3,
4]. Currently, weed control in soybean fields is predominantly achieved through the application of herbicides. Conventional herbicide application in farmland mainly relies on continuous full-coverage spraying. However, a large proportion of the applied chemicals either evaporates or infiltrates into the soil, and only a small fraction is effectively deposited on the target weeds. This practice leads to excessive herbicide use, chemical residues, and environmental pollution, thereby hindering sustainable agricultural development [
5,
6,
7]. In China, the effective utilization rate of pesticides for major grain crops is only 41.8%. Target spraying technology enables real-time identification of weeds and other targets through object detection, allowing pesticides to be applied precisely according to their spatial distribution and size. Compared with conventional continuous full-coverage spraying, this approach significantly reduces pesticide consumption, improves application efficiency, and mitigates environmental pollution [
8].
Accurate target detection is a fundamental prerequisite for the effective implementation of target spraying. According to the type of sensors employed, existing systems can be broadly classified into ultrasonic-based, LiDAR-based, and vision-based approaches. LiDAR measures distances by emitting laser pulses and recording the time of flight of the reflected signals. When combined with the rotational angle of the scanning unit, LiDAR enables the acquisition of three-dimensional (3D) point cloud data of plant structures. Owing to its high spatial resolution, strong robustness to illumination variations, and large sensing range, LiDAR has been widely applied in orchard environments for 3D reconstruction and for estimating canopy characteristics such as volume, height, and density, thereby providing a basis for variable-rate spraying decisions [
9,
10,
11,
12]. According to measurement configuration, LiDAR systems can be classified into single-line and multi-line scanners. Single-line LiDAR utilizes a single laser channel to perform point-by-point scanning and is typically used for reconstructing local or partial structures [
13], whereas multi-line LiDAR employs multiple laser beams to capture high-density point cloud data and is more suitable for reconstructing entire orchard scenes [
14]. Ultrasonic sensors measure distances by emitting sound waves and measuring the time required for the echoes to return. However, their response speed is relatively slow due to the limited propagation velocity of sound in air and the constraints of analog signal processing. Ultrasonic sensors have been used in agricultural applications, including the estimation of canopy volume and density [
15,
16], leaf area density in Osmanthus trees [
17], and the height of blueberries and weeds [
18]. In recent years, machine vision technology has been extensively applied in crop and weed detection research [
19]. Visual sensors are used to capture images, which are subsequently analyzed using traditional machine learning methods or deep learning algorithms to achieve accurate crop and weed recognition. Commonly used visual sensors include monocular cameras, stereo cameras, and RGB cameras [
20]. Monocular RGB cameras operate based on the pinhole imaging principle, projecting three-dimensional spatial information onto a two-dimensional image plane. Owing to their simple structure, low cost, and the availability of well-established algorithms, these sensors have been widely used in agricultural applications such as canopy analysis, pest and disease detection [
21], fruit detection [
22,
23], and weed identification [
24,
25]. In contrast, RGB-D cameras are capable of actively acquiring depth information, primarily through infrared structured light projection or time-of-flight (TOF) techniques. As a result, they are particularly suitable for tasks requiring three-dimensional perception and have been widely employed in fruit detection and localization applications [
26,
27].
Conventional weed detection algorithms predominantly rely on machine learning approaches to distinguish crops from weeds, such as k-nearest neighbors (k-NN) clustering [
28], support vector machines (SVMs) [
29,
30], random forests [
31], and artificial neural networks (ANNs) [
32] to classify crops and weeds. However, these traditional approaches typically rely on manually designed features, such as color, texture, and shape. Consequently, their robustness is limited under real field conditions, including variable illumination, high morphological similarity between crops and weeds, leaf occlusion, and complex backgrounds [
33,
34]. Moreover, due to constraints in sample size and variations in data distribution, these methods often demonstrate poor generalization across different weed species and operational scenarios, making it difficult to satisfy the accuracy and stability requirements of target spraying applications. Recent advances in deep learning have significantly improved weed detection in agricultural fields. Representative object detection algorithms include R-CNN, Faster R-CNN, SSD, Mask R-CNN, and the YOLO series [
35,
36,
37]. Among them, YOLO is an end-to-end, single-stage object detector that performs target localization and classification within a single forward pass. Owing to its high detection speed, strong real-time capability, and relatively simple network architecture, YOLO has been widely applied in real-time recognition of weeds, crops, and pests in agricultural scenarios. Several improved YOLO-based models have been proposed to enhance detection accuracy and efficiency. Wang et al. [
38] proposed a YOLOv5-SGS model for multi-species weed recognition in wheat fields, achieving a mean average precision (mAP) of 91.4% and an F1 score of 85.3%. Xu et al. [
39] proposed W-YOLOv5 algorithm for crop seedling detection, reporting an overall mAP of 87.6%, demonstrating its capability in recognizing multiple weed species. Rahman et al. [
40] evaluated thirteen one-stage and two-stage detectors, including YOLOv5n and Fast R-CNN, for weed detection in cotton fields. RetinaNet (R101-FPN) achieved the highest detection performance with an mAP@0.50 of 79.98%, although its inference time was relatively long. Rai et al. [
41] proposed YOLO-Spot model based on YOLOv7-tiny, reducing parameters by over 75% and GFLOPs by 86%, while improving mAP@0.50 by 2.7% compared with YOLOv7-Base. Sunil [
42] trained six YOLOv8 and eight YOLOv9 variants on datasets comprising eight crop species and five weed species, achieving an overall mAP@50 of 86.2%. Li et al. [
43] developed an improved YOLOv10n-FCDS model for weed detection in UAV-acquired rice field images. By integrating a FasterNet backbone with a CGBlock module, the model effectively enhanced the detection accuracy of small and occluded weeds, achieving an mAP@50 of 87.4%. To address computational efficiency, lightweight models have also been developed. Fan et al. [
44] proposed YOLO-WDNet for weed detection in cotton fields, reducing parameters by 82.3%, and model size by 91.6%, compared with contemporary models. He et al. [
45] developed EDS-YOLOv8 weed detection algorithm, employing EfficientViT as the backbone, optimizing key modules, and integrating the SimAM attention mechanism, resulting in significant performance improvement. Liu et al. [
46] proposed an improved lightweight weed detection model based on YOLOv9s, achieving an mAP of 81.7% by optimizing anchor boxes and introducing the SPPELAN-ECA and AGSConv modules. Overall, YOLO-based models demonstrate strong capabilities in weed detection; nevertheless, their high computational requirements often limit real-time performance and operational efficiency. Therefore, achieving a balance between model lightweighting and recognition accuracy is essential to enhance practical applicability in field operations.
In precision target spraying systems and devices, Wang et al. [
38] developed a lightweight and improved YOLOv5s model and designed a target spraying decision and hysteresis algorithm. The experiments indicated that, at operational speeds of 0.3–0.6 m/s, the system achieved a spraying accuracy of 95.7%, demonstrating its effectiveness in real-time field applications. Zhao et al. [
30] developed a cabbage identification and pesticide spraying control system based on an artificial light source, in which weeds were detected using SVM. The results demonstrated that the fitted curve coefficient achieved a maximum identification accuracy of 95.7%. However, as the vehicle speed increased, target displacement also increased, with a maximum centroid deviation of 28.6 mm observed at 0.93 m/s. Xu et al. [
39] proposed a hierarchical detection algorithm for multi-species weed identification and developed a variable-rate spraying system based on the severity of weed infestation, categorized into five levels. Field trials demonstrated that the system could achieve a spraying accuracy of 90.32% at an operational speed of 4 km·h
−1. Jiang et al. [
47] developed a weeding method in which herbicides were applied following mechanical injury to weed tissues. Field tests on Chinese cabbage demonstrated a weed removal rate of 94.5% while using only 15.3% of the herbicide required by conventional chemical methods. Sunil et al. [
48] proposed a grid map creation algorithm using the YOLOv4 model to control the nozzles of a robotic platform. Based on the grid map algorithm, herbicide application was reduced by 79%. Although existing research on target spraying systems has made significant progress in weed detection algorithms, few studies have addressed the challenge of maintaining accurate herbicide application at varying operational speeds using advanced control strategies, which remains a critical issue for practical field applications.
In this study, weed control is investigated during the seedling stage of non-GMO cultivars of soybean, specifically at the V1–V2 growth stage (one- to two-trifoliate leaf stage). To address the challenges of accurate and efficient weed management at this stage, a deep learning-based targeted spraying method for soybean fields is proposed, along with a grid-based matching spraying algorithm for precise weed elimination. Field weed images are first acquired using a camera, and weeds are detected using an improved YOLOv5 model. Based on the detection results, the proposed algorithm controls the opening and closing of solenoid valves in real time to ensure accurate herbicide application to target weeds. The performance of the developed system is subsequently evaluated through both laboratory and field experiments. Overall, this study provides a practical solution for integrating weed detection with precision target spraying, enabling reliable spraying accuracy under variable field conditions.
2. Materials and Methods
2.1. Design of the Target Spraying Device
The target spraying device was integrated into a 3WPZ-200 self-propelled electric boom sprayer and comprised four main subsystems, i.e., an image acquisition unit, a pesticide supply unit, a spray execution unit, and a traveling system, as shown in
Figure 1. The image acquisition unit was responsible for real-time field image collection, while the pesticide supply and spray execution units cooperatively enabled precise pesticide delivery. The traveling system provided stable forward motion during field operations.
The image acquisition unit consisted of two cameras (MV-CA016, Hangzhou Hikvision Digital Technology Co., Ltd., Hangzhou, China) and an onboard computer (Intel NUC, Intel Corporation, Santa Clara, CA, USA) equipped with an Intel i7-1165G7 CPU, an NVIDIA GTX 2060 GPU with 6 GB memory, and 16 GB RAM. The cameras, each with a resolution of 1440 × 1080 pixels and equipped with a 4 mm focal-length lens, were used to acquire field images. To ensure full coverage of the spray boom, the two cameras were mounted symmetrically at one-quarter and three-quarters of the boom length on the left and right sides, respectively, with each camera responsible for monitoring half of the operating area. The cameras were installed at a height of 0.5 m above the spray boom. The onboard computer received the video streams captured by the cameras, executed deep learning-based target detection and precision target spraying strategy in real time, and transmitted control commands to the controller via serial communication.
The pesticide supply unit mainly consisted of a pesticide tank, a filter (Kaiping WOEN Sanitary Ware Co., Ltd., Kaiping, China), a pump (Dafengda 5G-210, Chaozhou, China), a buffer tank (TY-11-0.5G-5, Taizhou Tianyang Electrical Co., Ltd., Taizhou, China), a flow sensor (Shanghai Weill Instrument Co., Ltd., Shanghai, China), and a pressure sensor (MIK-P300, Hangzhou MEACON Automation Technology Co., Ltd., Hangzhou, China). The filter was installed upstream of the pump to remove impurities from the pesticide solution. The pump was used to pressurize the spray liquid and deliver it to the nozzles for atomization. The buffer tank was used to attenuate pressure fluctuations in the liquid flow, thereby ensuring stable spray pressure. The flow sensor was used to measure the real-time flow rate in the pipeline. The pressure sensor was used to monitor the pressure of the spraying system in the range of 0–1.0 MPa, with a measurement accuracy of 0.005 MPa.
The spray execution unit consisted of a controller, solenoid valves (2V025, AirTAC International Group, Ningbo, China), MOSFET-based valve driver circuits, an incremental encoder (E6B2-CWZ3E, VEHA Corporation, Shenzhen, China), and nozzles (model 2501, Dongguan Wuyuan Spraying and Purification Technology Co., Ltd., Dongguan, China). The controller was based on an STM32F103ZET6 microcontroller (Guangzhou Xingyi Electronic Technology Co., Ltd., Guangzhou, China). The solenoid valve controlled the opening and closing of the nozzle at a voltage of DC 24 V and a pressure range of 0–1.0 MPa, with a maximum switching frequency of 10 Hz. In the de-energized state, the solenoid valve remained closed under the action of the spring. When the solenoid coil was energized, the valve core was rapidly attracted, switching the valve to the open state and enabling spray on/off operation. The MOSFET-based valve driver circuits converted the control signals from the controller into driving signals for the solenoid valves. The incremental encoder was used to measure the forward speed of the sprayer. Flat-fan stainless-steel nozzles with a spray angle of 25° were used, with a nozzle spacing of 15 cm. When the solenoid valves were activated, the pesticide solution flowed through the nozzle, enabling precision target spraying.
The traveling unit consisted of a liftable spray boom, chassis, steering system, battery (60 V), and drive motors. A four-wheel steering mode was adopted to enhance maneuverability under field conditions. The spray boom had a working width of 3 m. Detailed technical parameters of the sprayer are summarized in
Table 1.
During operation, the cameras capture field images and transmit them to the onboard computer. The computer preprocesses the images and performs target detection using a pre-trained deep learning model. Based on the detection results, decisions are made regarding the opening and closing of the solenoid valve assembly. These control signals are transmitted in real time to the spray execution unit via serial communication as data frames. The controller of the spray execution unit parses the data frames, sets the corresponding control pins to the low or high level, and, after processing by the valve driver circuit, actuates the solenoid valves. Consequently, the pesticide solution is sprayed from the nozzles, enabling precision target spraying. Meanwhile, the buffer tank of the pesticide supply unit mitigates pressure fluctuations in the pipeline caused by intermittent spraying, maintaining a constant supply pressure and ensuring stable nozzle atomization quality. The schematic diagram of the main components of the target spraying system is shown in
Figure 2.
2.2. Weed Detection Method
2.2.1. Image Acquisition
Field image data were collected at three locations in Henan Province in July 2022: Yuan Zhuang Village, Suixian County (34.136° N, 115.343° E); Zhou Zhuang Village, Linying County (33.774° N, 113.837° E); and the experimental field at the Changyuan Branch of the Henan Academy of Agricultural Sciences (35.428° N, 114.289° E). All experiments were carried out in non-GMO soybean fields at the early post-emergence stage (V1–V2, one- to two-trifoliate leaf stage). Weed samples in soybean fields are shown in
Figure 3. The images were captured using a smartphone (Redmi K30 Pro, Xiaomi Corporation, Beijing, China) in JPEG format with a resolution of 1440 × 1080 pixels. The weed species collected was
Cirsium setosum (also known as
Cirsium arvense var.
integrifolium). The dataset included images taken under various weather conditions, such as sunny, cloudy, and post-rain, as well as different land backgrounds, including bare soil and wheat stubble fields. These images exhibit diversity in environmental lighting and background conditions, which enhances the generalization ability of the trained model.
2.2.2. Data Augmentation
To improve the generalization ability of the model, data augmentation was applied to increase the diversity and size of the original dataset [
49]. Mosaic online data augmentation was employed during model training. This technique involves randomly cropping and concatenating multiple images to create a new training sample, referred to as a mosaic sample, which contains multiple objects and backgrounds. During training, the model learns to detect and classify these different targets while distinguishing their relationships with the background. Mosaic augmentation also reduces dependence on the training data, mitigates the risk of overfitting, and improves model performance. Weed-labeled images after mosaic augmentation are shown in
Figure 4.
In this study, a total of 3200 images were annotated using the CVAT image annotation tool. The annotated dataset was then randomly divided into training, validation, and test sets at a ratio of 7:1:2, resulting in 2240 images for the training set, 320 images for the validation set, and 640 images for the test set. These subsets were subsequently used for model training and evaluation.
2.2.3. Weed Detection Model Based on YOLOv5-MobileNetv3-SE
This study is based on the YOLOv5 object detection algorithm. The extensive convolutional operations in the CSPDarknet backbone of YOLOv5 require substantial computational resources and time, which makes it unsuitable for deployment on resource-constrained edge devices [
50]. To improve efficiency, the model was lightened by replacing the CSPDarknet backbone with the more lightweight MobileNetV3 model, reducing computational load and model size. MobileNetV3 is constructed using depthwise separable convolutions and inverted residual blocks with linear bottlenecks, which significantly reduce the number of parameters and floating-point operations while preserving feature extraction capability. Additionally, given the complexity of field images and the small size of some weed targets, which may lead to misdetections or missed detections, an SE (Squeeze-and-Excitation) attention module was added after each of the three output layers of the backbone network. The attention mechanism, similar to human visual selective attention, selects the most relevant information for the task, suppresses irrelevant data, and increases the weight of useful features. This enables the network to automatically learn and improve computational efficiency, enhance the weight of effective feature channels, and focus on important features, ultimately improving the accuracy of small-target weed detection. Specifically, the SE module first performs a squeeze operation using global average pooling to capture global channel-wise contextual information. This is followed by an excitation operation implemented through fully connected layers to generate channel-wise weighting coefficients. These weights are then applied to the original feature maps to adaptively enhance important feature channels while attenuating irrelevant ones, ultimately improving the detection accuracy of small weed targets in complex field environments. The network takes 640 × 640 RGB images as input, and its output consists of three tensors of different sizes: 80 × 80 × 255, 40 × 40 × 255, and 20 × 20 × 255, corresponding to detection layers with strides of 8, 16, and 32, respectively. In each feature map, the dimension “255” represents the prediction vector at each spatial location, which is generated based on five preset anchors and consists of 4 bounding box coordinates, 1 objectness confidence score, and 50 class probabilities (i.e., 5 × (4 + 1 + 50)). The Architecture of the improved model is shown in
Figure 5.
2.2.4. Model Training and Parameter Settings
The hardware environment used for model training in this study consists of an NVIDIA GeForce RTX 3090 GPU with 24 GB of VRAM, an Intel® Core™ i9-12900K processor, and 64 GB of RAM. The software environment includes the Windows 10 operating system, Python 3.7, PyTorch 11.7.1, CUDA 11.5, and PyCharm 2020. The input image size was set to 640 × 640, with padding applied to maintain the original aspect ratio. The initial learning rate was set to 0.1, and a cosine annealing schedule was used during training. The Adam optimizer was employed for model optimization, and the model was trained for 300 epochs. Model convergence was assessed by monitoring the loss value during training and the variation in mAP curves on the validation set. Once convergence was achieved, the weights corresponding to the lowest loss in the final training epochs were selected as the trained model.
2.3. Precision Target Spraying Strategy
Based on the weed detection results obtained by the proposed YOLOv5-MobileNetv3-SE model, a precision target spraying strategy was developed to achieve accurate synchronization between target position and spray actuation. The overall strategy consisted of a grid-based matching spraying algorithm, system time delay analysis, and a time-delay compensation method.
2.3.1. Grid-Based Matching Spraying Algorithm
Line-crossing detection algorithms have been widely used in applications such as video surveillance and traffic safety, where targets are identified by determining whether they cross predefined virtual lines, enabling effective monitoring and management of designated regions [
51,
52]. Based on this concept, an improved grid-based matching spraying algorithm was proposed in this study. The algorithm established a correspondence between the targets detected by the image acquisition unit and the spray nozzles mounted on the boom, thereby determining which nozzles should be activated and when they should be triggered. Based on this matching relationship, control data frames for the solenoid valve array were generated and transmitted to implement an on–off control strategy, whereby each solenoid valve was switched on or off according to the algorithm decision results. This enabled precise regulation of solenoid valve opening and closing, ultimately achieving accurate target spraying.
The specific implementation of the proposed algorithm is illustrated in
Figure 6. A series of grids is overlaid on the image plane, where each grid corresponds one-to-one with a solenoid valve on the spray boom. The width of each grid is set equal to the average spray width of a single nozzle, while the grid height is fixed at 60 pixels. During forward operation of the boom sprayer, images move downward in the image frame. The deep learning-based target detection algorithm continuously detects weed targets and generates regions of interest (ROIs), which are represented by red bounding boxes. When an ROI overlaps with a grid, it indicates that the target has entered the spraying area of the corresponding nozzle. The intersection area between each grid and the ROI is calculated. If the intersection area exceeds a predefined threshold, the corresponding grid is assigned a value of 1, indicating that the solenoid valve should be activated for spraying; otherwise, it is assigned a value of 0, indicating that the valve remains closed. In this study, the threshold is set to 20% of the grid area.
Several spatial relationships between ROIs and grids may occur during operation. When the ROI of a single weed intersects with only one grid, a single nozzle is activated. When the ROI intersects with two or more grids, multiple nozzles are activated simultaneously. In cases where multiple weeds overlap spatially, the resulting ROIs intersect multiple grids, and the corresponding solenoid valves are activated accordingly. Furthermore, due to the inherent opening and closing response time of the solenoid valves during practical operation, a continuous spraying strategy is adopted when adjacent weeds are closely spaced along the forward travel direction. Specifically, if the longitudinal spacing between adjacent weeds is shorter than the effective spraying distance corresponding to the valve response time, the grid control signal is maintained at 1 (as indicated by the yellow region in
Figure 6), ensuring uninterrupted spraying and preventing missed targets. This effective spraying distance is determined by the product of the sprayer’s forward speed and the measured solenoid valve response time, thereby defining a clear distance threshold for continuous spraying. In this study, each camera is responsible for controlling ten nozzles. Data exchange between the onboard computer and the controller is performed via serial communication.
2.3.2. System Time Delay Analysis
In the above analysis, it is assumed that the grid positions coincide with the locations of the spray boom within the camera field of view. Accordingly, when the intersection area between the ROI and a grid exceeds a predefined threshold, the onboard computer sends an opening signal to the corresponding solenoid valve. However, during actual operation, a certain amount of time is required from the moment the control command is issued by the computer to the moment when pesticide droplets are deposited on the weeds. This time delay causes a longitudinal offset between the actual spraying region and the target spraying region, resulting in the missed spraying region in precision target spraying applications, as illustrated in
Figure 7.
Further analysis indicates that the total system time delay in the target spraying process mainly consists of three components: image processing delay, communication and control delay, and spray deposition delay. Therefore, it is necessary to quantitatively measure each delay component through dedicated experiments to determine the overall system latency. The resulting total delay can then be used to compensate for the spraying lag by adjusting the spatial offset distance, thereby reducing spray omission and improving target spraying accuracy.
- 1.
Image processing delay
The image processing delay mainly originated from the inference time required by the deep learning-based weed detection model. Although the target detection model was lightweighted in this study to reduce computational complexity, the delay introduced by image processing remained non-negligible. To quantify the image processing delay, the trained detection model was deployed on the onboard computer. A total of 200 field-acquired images were processed using a weed detection model based on YOLOv5-MobileNetv3-SE. The total inference time was measured as 5.58 s, corresponding to an average processing time of 27.9 ms per image.
- 2.
Communication and control delay
The communication and control delay referred to the time consumed from the moment when the onboard computer transmitted a control command to the moment when the microcontroller decoded the data frame, set the corresponding control pins, and output the driving voltage through the solenoid valve driver circuit to actuate the solenoid valve. To measure the communication and control delay, a single-channel solenoid valve control setup was constructed, consisting of a microcontroller, a solenoid valve driver board, and a solenoid valve, as shown in
Figure 8. A valve-opening command was transmitted from the onboard computer using serial communication debugging software to trigger the opening of the solenoid valve. Meanwhile, a digital oscilloscope (TBS1102C, Tektronix, Inc., Beaverton, OR, USA) was used to monitor the signals. One channel of the oscilloscope was connected to the serial input pin of the microcontroller, while the other channel was connected to the output terminal of the solenoid valve driver board. By capturing and comparing the waveforms from the two channels, the time difference between the input control signal and the output driving voltage was determined. The results showed that the time interval from the reception of the first pulse signal at the microcontroller serial port to the output of the 24 V driving voltage by the solenoid valve driver board was 6.37 ms.
- 3.
Spray deposition delay
The spray deposition delay referred to the time interval from the moment when the solenoid valve received the driving signal from the driver board and initiated opening to the moment when the pressurized pesticide was atomized through the nozzle, traveled through the air, and finally deposited onto the weed. In this study, the spray process was captured using a high-speed camera, and the deposition delay was determined through frame-by-frame image analysis, as shown in
Figure 9. The high-speed camera operated at a frame rate of 960 frames per second, corresponding to a temporal resolution of 1.04 ms per frame. To accurately determine the initial moment when the solenoid valve was energized, a green indicator light was installed above the solenoid valve. The indicator illuminated simultaneously with valve energization and turned off when the power was cut, thereby serving as a reference for the timing of valve coil energization and de-energization. A container filled with water dyed with carmine was placed beneath the nozzle. When spray droplets reached the water surface, visible surface disturbances were generated, which were used to identify the moment when the droplets arrived at the target surface.
During the experiment, the spray pressure was set to 0.3 MPa, and the nozzle height above the ground was 50 cm. The solenoid valve was briefly energized to generate a single intermittent spraying event. The spraying process was recorded using a high-speed camera, and the corresponding image frames are shown in
Figure 10. The spraying sequence was divided into six key stages: signal onset, spray initiation, droplet arrival at the ground, continuous spraying, signal termination, and spray termination. The frame in which the indicator light switched from off to on was defined as frame 0. Spray droplets were first observed emerging from the nozzle at frame 6. At frame 35, visible ripples appeared on the water surface in the collection tray placed on the ground, indicating that the droplets had reached the ground surface. Continuous spraying was maintained until frame 43, when the indicator light began to turn off. Subsequently, the ripples on the water surface gradually weakened and nearly disappeared by frame 90, marking the end of the intermittent spraying event. Based on the frame intervals, the elapsed time from the indicator light turning on to the initial droplet ejection was approximately 6.24 ms, while the droplet travel time from the nozzle to the ground was approximately 36.40 ms. After the control signal was terminated at 44.72 ms, spraying continued until 93.60 ms, resulting in a solenoid valve closing delay of 48.88 ms from the open to the closed state. Therefore, the spray deposition delay was determined to be 36.40 ms.
Based on the above experimental measurements, the total system time delay of the precision target spraying system was determined to be 70.67 ms.
2.3.3. Time-Delay Compensation Method
In the camera field of view, the grid position determines the triggering timing of the solenoid valves. As discussed above, when the grid position coincides with the actual spray boom position, an overlap between the grid and the ROI leads to mistimed spraying due to system time delay, resulting in spray misalignment or missed spraying. Therefore, advancing the grid position relative to the spray boom to trigger the solenoid valve earlier effectively compensates for the system time delay, thereby reducing spray omission and improving the precision of target spraying. A schematic illustration of the time-delay compensation method is shown in
Figure 11. A planar coordinate system XOY is established, in which the center of the camera field of view is defined as the
X-axis and the forward traveling direction of the sprayer is defined as the
Y-axis. The distance between the nozzle and the camera along the forward direction is denoted as
e. When the target spraying system is stationary, the grid centerline
L2 coincides with the spray centerline
L1, ensuring that the pesticide droplets accurately deposit onto the target. During forward operation, the grid is shifted upward in the image by a certain distance, allowing the predicted bounding box to intersect the grid earlier and thereby compensating for the system delay. Consequently, the relative distance
g between the matching grid centerline
L2 and the
Y-axis in the world coordinate system can be expressed as:
where
d is the distance between the grid centerline and the spray centerline (m). The distance
e represents the separation between the nozzle and the camera along the forward traveling direction (m), which was set to 0.1 m in this study.
The value of
d is determined by the system delay time and the forward speed of the sprayer.
where
v is the real-time forward speed of the sprayer (m·s
−1), and
t represents the total delay time (s). Based on the above experimental measurements,
t was determined to be 70.67 ms. The real-time forward speed of the sprayer was obtained using an incremental encoder.
The incremental encoder is coaxially mounted on the sprayer wheel. Given the known wheel diameter and the encoder resolution of 1000 pulses per revolution, the real-time forward velocity
v of the sprayer is calculated by counting the number of encoder pulses within a fixed sampling interval. The sprayer velocity is computed as
where
n is the number of pulses detected during the sampling interval Δ
t,
N is the encoder resolution (1000 pulses per revolution), and
D is the wheel diameter (m).
Accordingly, the grid position in the image is dynamically adjusted according to the sprayer speed to advance the opening timing of the solenoid valves, thereby achieving accurate alignment between the spray deposition area and the target region.
To implement the grid-based matching algorithm, the compensated grid position is first transformed from the world coordinate system to the pixel coordinate system. Subsequently, an intersection calculation is performed between the transformed grid and the pixel coordinates of the ROI obtained from target detection. The grid state (1 or 0) is then determined to control the opening and closing of the corresponding solenoid valves. Through this dynamic grid adjustment strategy, accurate alignment between the spray deposition area and the target spraying region is achieved. The world coordinate system is a three-dimensional Cartesian coordinate system used to describe the spatial relationship between the camera and observed objects, whereas the pixel coordinate system is defined on the image plane output by the camera and is used to represent pixel locations in the image. In this study, the world coordinate system is denoted as
OXYZ, where the
Z-axis points toward the camera viewing direction, the
X-axis points to the right side of the image, and the
Y-axis points downward, as shown in
Figure 12. A spatial point
P in the world coordinate system is represented as [
X,
Y,
Z]
T. The pixel coordinate system corresponding to the camera imaging process is denoted as
ouv, with the origin
o located at the upper-left corner of the image. The
u-axis is parallel to the
X-axis, and the
v-axis is parallel to the
Y-axis. Accordingly, the pixel coordinates of point
P on the image plane can be expressed as
p[
u,
v]
T.
According to the pinhole camera model, the relationship between the world coordinate system and the pixel coordinate system can be expressed as follows:
Here, the matrix composed of the intermediate parameters represents the intrinsic parameter matrix
K of the camera. In this study, the camera was calibrated using Zhang’s calibration method. The calibration results indicate that the focal lengths in the horizontal and vertical directions are approximately 1205.5 pixels, and the principal point is located near (759.8, 558.7) in the image plane. The camera is mounted above and slightly ahead of the spray nozzle. The width of the field of view on the ground is related to the lens viewing angle and the installation height, as described below.
where
h1 is the installation height of the spray nozzle (m) and
h2 is the installation height of the camera relative to the nozzle (m). In this study, the nozzle installation height was set to 0.5 m, and the camera was mounted 0.5 m above the nozzle. Consequently, the pixel coordinates of point
p in the image plane can be obtained.
The above analysis mainly aims to ensure that the onset of the actual spraying region coincides with that of the target spraying region. High-speed camera observations of the spraying process indicate that, after the stop signal is issued, spray droplets continue to be discharged for a short duration before completely ceasing, resulting in overspray at the end of the spraying operation, as illustrated in
Figure 13. Therefore, to ensure that the termination of spraying accurately coincides with the target area, the stop signal must be issued in advance during the fitting of the weed prediction bounding box, as shown in
Figure 7a. Accordingly, the length
L of the actual target spraying region can be expressed as:
where
LROI is the length of the ROI fitted by the deep learning-based detection algorithm after target recognition, and
t0 is the time interval from the issuance of the stop signal to the complete cessation of spraying, measured as 48.88 ms (
Figure 10). With this advance stopping strategy, the termination position of the actual spraying precisely coincides with the target spraying region.
The overall control flow chart of the target spraying system is illustrated in
Figure 14.
2.4. Evaluation Experiments
An experimental evaluation of the target spraying system was conducted using both laboratory and field experiments. The laboratory experiments primarily assessed model recognition performance, target spraying accuracy, pesticide reduction rate, and spray distribution uniformity. The laboratory test platform, shown in
Figure 15, consisted of a rail system, a power supply module, an electric rail vehicle, and the target spraying system. The platform had a maximum load capacity of 150 kg, a maximum forward speed of 8 km·h
−1, and an adjustable spray boom height ranging from 0 to 120 cm.
2.4.1. Model Detection Performance
An experimental evaluation was conducted to assess the weed recognition performance of the trained YOLOv5-MobileNetv3-SE model. The evaluation metrics included precision (
P), recall (
R), and mean average precision at 0.5 intersection over union (mAP@0.5), model size, and frames per second (FPS).
where
TP is true positive,
FP is false positive,
TN is true negative, and
FN is false negative.
To further evaluate the performance of the proposed model, comparative experiments were conducted with several classical deep learning-based detection models, including Faster R-CNN, YOLOv3, YOLOv5s, and YOLOv5x. All models were trained using identical training strategies, data augmentation methods, and parameter settings, and were evaluated on the same experimental platform using the same dataset.
2.4.2. Target Spraying Accuracy
The laboratory test for target spraying accuracy is illustrated in
Figure 16. Considering the acceleration and deceleration phases of the electric rail vehicle, the first and last 2 m of the track were excluded, and only the central 8 m section with constant speed was used for testing. The width of the test area was 1.2 m. To facilitate the experiments, plastic weed models were used to replace real weeds. Each plastic weed model was approximated as a circular projection on the ground with a diameter of 0.12 m, corresponding to a single-weed coverage area of 0.0113 m
2. With a weed coverage rate of 10%, a total of 85 plastic weed models were randomly distributed within the test area. Additionally, three rows of plastic soybean models (60 plants in total) were uniformly arranged as interference objects, with both row spacing and plant spacing set to 0.5 m. To determine whether spray droplets reached the weeds, a 2 cm × 2 cm water-sensitive paper was attached to the leaf of each plastic weed model. During the experiment, the forward speed of the electric rail vehicle was set to 1, 2, 3, and 4 km·h
−1. The experimental metrics include weed detection accuracy rate (WDAR) and spraying accuracy rate (SAR).
where
Wa is the total number of weeds in the test area,
Wt is the actual number of detected weeds,
Ws the number of weeds effectively sprayed. Detection accuracy was determined by the appearance of a purple bounding box around the weed (
Figure 16). For spraying accuracy, if the actual spray length covered at least 60% of the target spray length, the weed was considered successfully sprayed (
Figure 13b).
2.4.3. Pesticide Reduction Rate
Precision target spraying can effectively reduce pesticide consumption compared to conventional full-coverage spraying. In this study, the pesticide reduction rate of target spraying under different weed coverage levels was evaluated by comparing it with conventional continuous spraying. Weed coverage is an important factor influencing the pesticide reduction rate. To simulate different weed coverage scenarios, plastic weed models were used instead of real weeds. Each plastic weed model had an approximately circular ground projection with a diameter of 0.12 m, corresponding to an area of approximately 0.0113 m
2. Four simulated field environments with weed coverage rates of 5%, 10%, 15%, and 20% were established within a test area of 1.2 m × 5 m, corresponding to 13, 27, 53, and 106 weed models, respectively. The plastic weed models were randomly distributed within the experimental area. Water was used in the pesticide tank instead of actual pesticide, and target spraying experiments were conducted on the precision spraying test platform at a forward speed of 2 km·h
−1. For each weed coverage level, 30 experimental runs were performed. Additionally, a control group was tested using conventional continuous spraying, in which all solenoid valves remained fully open, also repeated 30 times. After each test, the remaining liquid volume in the pesticide tank was recorded. The pesticide reduction rate
S was calculated as the percentage reduction in liquid consumption achieved by precision target spraying compared with conventional continuous spraying:
where
Qc and
Qt are the pesticide consumption under conventional continuous spraying and precision target spraying, respectively.
2.4.4. Spray Distribution Uniformity
The uniformity of pesticide droplet distribution directly affects the efficacy of pesticide application. The target spraying system is an upgraded version of traditional sprayers, retaining compatibility with conventional full-coverage spraying. In target spraying mode, ensuring even droplet distribution is crucial for optimal application effectiveness. Therefore, a droplet distribution scanner (SALVARANI, AAMS Co., Ltd., Maldegem, Belgium), as shown in
Figure 17, was used to measure spray distribution uniformity. Each collecting channel of the scanner had a width of 10 cm, and a standard graduated cylinder equipped with a liquid level sensor was installed beneath each channel. The filling time of each cylinder was automatically recorded, enabling determination of the spray flow rate for each channel. The scanner was mounted on a motor-driven rail, allowing lateral movement beneath the spray boom to measure the overall transverse distribution of the spray volume.
The experiment was conducted in accordance with the requirements of the national standard GB/T 24677.1–2009 [
53], China. Water was used as the test medium, and the measurements were performed at a pressure of 0.3 MPa, a nozzle spacing of 15 cm, and a nozzle height of 50 cm above the ground. The coefficient of variation (CV) was employed to evaluate the spray distribution uniformity. A lower CV value indicates a more uniform distribution of spray volume along the boom.
where
qi is the spray flow rate of the
ith nozzle (L/min);
n is the number of nozzles;
is the average spray flow rate (L/min);
is the total spray flow rate of the sprayer (L/min);
S is the standard deviation (L/min). After three repeated tests, the CV of the nozzles was 5.9%. This value satisfies the requirement specified in GB/T 24677.1–2009 [
53], which stipulates that the CV of spray volume distribution should be less than 20%.
2.4.5. Field Performance of Precision Target Spraying
To verify the field performance of the precision target spraying system, field experiments were conducted in Suixian City, Henan Province, China (34.136° N, 115.343° E). The experiment employed commonly used selective herbicides appropriate for the soybean seedling stage, with application rates strictly following pesticide registration safety regulations to ensure that the herbicides would not harm the soybean crops while evaluating system performance. Non-genetically modified soybeans at the seedling stage were used, planted using a conventional row spacing of 40 cm. All field tests were conducted under clear weather conditions with wind speeds below 2 m/s to minimize environmental interference and ensure accurate assessment of the system’s performance. Within the experimental field, three rectangular plots, each measuring 20 m × 3 m, were delineated within the experimental field as sampling areas. The number of weeds and the dimensions of their bounding rectangles within each plot were recorded. Precision target spraying trials were conducted at forward speeds of 2, 3, and 4 km·h
−1. The field experimental setup and spraying process are illustrated in
Figure 18.
A 2 cm × 2 cm water-sensitive paper was affixed to the leaves of each weed. The paper changed color to red upon contact with spray droplets, allowing verification of whether the pesticide effectively reached the target, as shown in
Figure 19. The weed detection accuracy rate and spraying accuracy rate were then calculated using Equations (11) and (12), respectively.