2.2.2. Software
YOLO11-WeedNet
To achieve robust and efficient weed detection in complex field environments, the YOLO series (nano variants) was evaluated as the baseline framework. The target classes were defined as ‘weeds’ and ‘cabbage’, enabling precise differentiation to support subsequent weeding operations. A comparative performance analysis of commonly used YOLO versions was conducted, and the results are summarized in
Table 2 (epochs = 300, batch size = 16, optimizer = SGD, seeds = 42, lr0 = 0.01) [
15,
16,
17,
18,
19]. Precision mechanical weeding requires not only high detection accuracy but also reliable real-time performance. Based on the comparative results, YOLO11 was selected as the base model because it provides a favorable balance between detection accuracy, inference speed, and parameter size, making it suitable for real-time deployment on agricultural platforms.
To improve feature representation and detection robustness in complex field environments, targeted optimizations were applied to both the backbone and detection head of the YOLO11 network. In the backbone, an Efficient Channel Attention (ECA) module was introduced to enhance channel-wise feature interactions in shallow layers. This design allowed the model to better capture weed targets with soil-like colors and weak texture characteristics [
20]. In the detection head, a Convolutional Block Attention Module (CBAM) was embedded after each feature fusion layer. The CBAM module adaptively refined spatial and channel feature distributions, suppressed background interference, and emphasized discriminative weed features [
21].
In this study, an Efficient Channel Attention (ECA) module was embedded into the initial layer of the Backbone to enhance adaptive channel weighting. By facilitating local cross-channel interactions, the ECA module optimizes feature perception, effectively highlighting target characteristics in low-contrast areas and complex backgrounds. Furthermore, in the Neck architecture, a Convolutional Block Attention Module (CBAM) was appended to each C3k2 block. By unifying spatial and channel attention mechanisms, the CBAM adaptively prioritizes target regions while effectively attenuating background noise. The synergistic integration of these modules significantly bolsters the model’s ability to discern fine-grained weed features, thereby improving detection robustness and stability in unstructured field environments. The enhanced architecture, designated as YOLO11-WeedNet, is illustrated in
Figure 5.
To further evaluate the effect of structural improvements on feature representation, a heatmap visualization method was used to compare attention regions during weed detection between YOLO11-WeedNet and the original YOLO11. This method visualizes intermediate feature responses of the network and highlights the spatial regions emphasized during inference. It provides an intuitive basis for interpreting model behavior in subsequent result analyses.
In the generated heatmaps, different colors represent the response intensity of the model, with transitions from cool to warm colors indicating increasing confidence levels. Target centers and primary coverage areas are typically highlighted by warm colors, whereas non-target regions are mainly represented by cool colors. This visualization enables a direct comparison of model attention in terms of target location, spatial distribution, and scale, thereby revealing differences in feature extraction and attention allocation between the two models.
Coordinate Mapping
During selective weeding, accurate conversion of weed image coordinates obtained by visual detection into operational coordinates is required to ensure precise positioning of the end-effector. The intrinsic parameters of the RGB camera were calibrated using the OpenCV checkerboard-based method. The resulting camera intrinsic matrix is expressed as:
where the radial and tangential distortion coefficients are given by
.
In this study, a depth camera was mounted above the weeding mechanism, and its positional layout is shown in
Figure 1. For extrinsic parameter calibration, we performed a camera-to-world coordinate transformation to align the camera coordinate system with the working coordinate system. Additionally, RGB-depth alignment error analysis was conducted by calculating the reprojection error between the RGB and depth images. The center coordinates
of the detection box output by YOLO11-WeedNet were used to index the corresponding pixel in the depth map and extract the depth value
. As the camera captured RGB images and depth information simultaneously, the image coordinates
were directly mapped to spatial coordinates
enabling three-dimensional weed localization.
The spatial coordinate-depth projection model was expressed as:
where
and
are the camera intrinsic parameters, Z is the depth value, and (
u,
v) are the center coordinates of the detection box.
With the camera oriented vertically downward and its optical axis essentially aligned with the Z-axis, the operational coordinates can be expressed as:
where
is the offset between the camera’s optical center and the origin O of the operational coordinate system.
The depth image was aligned with the RGB image using both intrinsic and extrinsic parameters, and the alignment accuracy was evaluated through the reprojection error. The results show that the alignment error had minimal impact on system performance, with an average error of 0.2154 mm.
In this study, an operational coordinate system (O-XY) was established for the end-effector. The origin
was located at the lower-left corner of the mechanism. The X-axis pointed left along the transverse synchronous belt, and the Y-axis pointed forward (downward) along the longitudinal synchronous belt. To prevent damage to crop roots during weeding operations, a protected area was defined within the working region. The radius of the protected area was carefully selected based on the average size of cabbage seedlings and the safe working distance to avoid contact with the root system. The radius was set to 10 cm and calibrated to ensure it did not overlap with the cabbage root system during weeding. As shown in
Figure 6, the red circular area represents the protected zone. When a weed detection point falls within this area, its coordinates are excluded from subsequent path planning and weeding operations. This radius was verified through multiple trials to ensure it effectively protects the seedlings while controlling weeds.
The acquired depth values did not directly represent three-dimensional spatial coordinates. Therefore, the pinhole camera model was applied to project pixel coordinates and depth information into the camera coordinate system, generating a three-dimensional point representation in camera space. This intermediate 3D representation provided the geometric basis for mapping visual coordinates onto the planar operational coordinate system of the execution platform. After applying the crop protection constraint, the retained 3D weed coordinates were used as inputs for the path planning module. Through this process, a consistent geometric correspondence was established between visual detection results and the actuator motion space, enabling reliable weeding path planning. The entire coordinate conversion process was illustrated in
Figure 7.
AHA+
In each frame, the YOLO11-WeedNet model identified
n weed targets and output the corresponding physical coordinate:
The objective of path planning was to minimize the total travel distance of the end-effector while covering all weed points and avoiding repetitive motion.
The total path length was expressed as:
As the number of weeds in a single frame was limited, the problem was simplified to a Traveling Salesman Problem (TSP).
To balance computational efficiency and real-time performance and to evaluate the applicability of different path planning algorithms in field scenarios, benchmark tests were conducted on representative methods. The tested algorithms included Nearest Neighbor (NN) [
22], Artificial Hummingbird Algorithm (AHA) [
23], Ant Colony Optimization (ACO) [
24], Genetic Algorithm (GA) [
25], Particle Swarm Optimization (PSO) [
26].
The evaluation metrics included total path length
reflecting actuator motion efficiency; runtime
representing computational cost in a single-threaded Python3.9 environment; and the mean and maximum segment-to-segment jumps, used to measure path smoothness (Target Points = 15, Iterations = 350, Start/End Point = (0, 0), Seed = 42). The test data consisted of the weed physical coordinate set
obtained in
Section 2.3. The comparative results of each algorithm are presented in
Table 3.
The comparative results reveal clear differences among the algorithms in terms of total path length, computation time, and path smoothness. As shown in
Table 2, all algorithms generated feasible paths within the 500 × 600 mm working area; however, their performance differences have a substantial impact on practical mechanical execution. In a three-axis synchronous belt mechanism, insufficient path smoothness can cause frequent sharp turns, increased vibration, and repetitive end-effector motions, which reduce operational efficiency. Therefore, both path smoothness and real-time performance are critical criteria for selecting a path planning algorithm.
The Nearest Neighbor (NN) algorithm provides the highest computational efficiency but produces longer paths with limited smoothness. Ant Colony Optimization (ACO) and the Artificial Hummingbird Algorithm (AHA) generate shorter and smoother paths, although ACO requires longer computation time. Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) exhibit intermediate performance in both planning efficiency and path smoothness. Overall, AHA achieves a favorable balance between computational efficiency and path optimization, producing shorter paths with smoother transitions. Considering both real-time requirements and planning accuracy, AHA was selected as the core algorithm for subsequent actuator path control, supporting efficient and precise selective weeding during the cabbage seedling stage.
Although the original Artificial Hummingbird Algorithm (AHA) provides a reasonable balance between exploration and exploitation, it exhibits slow convergence when handling dense weed clusters or closely distributed target points. In addition, fixed control parameters and limited neighborhood perturbation restrict its adaptability to complex and dynamic field environments. To address these limitations, an enhanced AHA algorithm was developed. The proposed method introduces adaptive parameter adjustment and a multi-phase local search strategy, improving convergence stability and enhancing path smoothness.
- (1)
Adaptive Parameter Adjustment Strategy
To balance global exploration and local exploitation, an adaptive parameter adjustment mechanism was introduced. This mechanism enables dynamic optimization by fine-tuning the step size coefficient during the search process. Specifically, the step size coefficient α and the perturbation coefficient β are dynamically updated according to the iteration index t using cosine annealing and exponential decay functions, respectively:
where
and
are the initial coefficients, and
is the maximum number of iterations. This dynamic control mechanism enables the algorithm to conduct extensive global exploration in the early stages and shift towards refined local search in later phases, effectively enhancing convergence accuracy while preventing premature stagnation.
- (2)
Neighborhood Perturbation and Periodic Local Optimization
A neighborhood perturbation operator was introduced during the foraging phase. This operator executes a random segment reversal operation (2-opt) every 10 iterations to locally optimize the best path sequence and eliminate redundant intersections.
The acceptance probability for inferior solutions follows the Metropolis criterion:
where
represents the change in path length, and
is the annealing temperature that decays exponentially with the number of iterations. This probabilistic acceptance mechanism ensures the algorithm can escape local optima and maintain population diversity during the optimization process.
The enhanced Artificial Hummingbird Algorithm (AHA+) incorporates three key improvements over the standard AHA. First, a greedy strategy is employed to generate high-quality initial paths, reducing ineffective traversal during the early search stages. Second, adaptive parameter tuning is introduced to dynamically balance guided, territorial, and migratory foraging behaviors across iterations, thereby improving global exploration capability. Third, a periodic lightweight 2-opt local search is embedded to enhance path refinement and convergence stability in the later stages of optimization.
2.2.3. Hardware
To ensure that the three-axis synchronous belt mechanism met the speed, precision, and load capacity requirements of selective weeding operations, appropriate motor selection and parameter matching were conducted for the drive system. To prevent fatigue or damage to the synchronous belts caused by repeated acceleration and deceleration, the system was configured with a maximum operating speed of and a maximum acceleration of .
During operation, each axis was required to bear the combined loads from the slider assembly, the end-effector, and soil contact forces. The axial equivalent force can be expressed as:
where
and
are the masses of the X, Y, and Z-axis sliders and their carriages, respectively;
is the mass of the end-effector weeding mechanism;
is the gravitational acceleration;
is the initial frictional force;
is the soil resistance encountered when the auger enters the soil (Here, the theoretical maximum soil resistance is referenced based on compacted clay conditions, where
[
27], However, the actual operating environment for cabbage seedlings consists of loose topsoil, and the rotary cutting action of the auger significantly reduces axial resistance compared to static penetration. Therefore, based on the specific soil texture and the rotary operational mode, the effective axial resistance
) and
is the pitch diameter of the synchronous pulley.
Motor selection was required to simultaneously satisfy the torque, speed, and positioning accuracy demands of the system. In summary, the selected motor was required to meet the following comprehensive criteria:
where
is the safety factor (set to 2 in this study),
is the belt tooth pitch, and
is the number of belt teeth.
The end-effector was subjected to direct operational loads during weeding, including cutting, soil breaking, and impact forces, and its performance directly influenced weeding depth, removal efficiency, and overall operational stability. In this study, a single-auger end-effector was adopted. It was driven by a Z-axis linear module to perform a cyclic sequence of downward pressing, rotation, and lifting. To satisfy the power requirements and ensure stable soil-breaking and weed-removal performance under varying soil conditions, the load characteristics of the end-effector were analyzed, and an appropriate drive motor was selected accordingly.
The total load and total torque on the end-effector in the Z-axis direction are as follows:
where
is the acceleration in the Z-axis direction,
is the soil-cutting resistance torque,
is the frictional resistance torque of the auger and drive shaft, and
is the inertial torque required to accelerate the auger’s own inertia.
In summary, based on the load and acceleration requirements of the three-axis synchronous belt mechanism and the transmission parameters of the synchronous pulleys, the driving force and torque demands calculated using Equations (11)–(13) were all below 1 N·m. Considering uncertainties such as dust accumulation, belt tension variations, and impact loads during operation, a safety factor of was applied. For the X- and Y-axes, closed-loop stepper motors (model 57-112) with a rated torque of 3.2 N·m were selected to ensure stable high-speed reciprocating positioning.
The Z-axis was responsible for lifting, pressing, and vertical driving of the end-effector, resulting in more complex loading conditions. Therefore, a stepper motor (model 86BYG250B) with a rated torque of 4.5 N·m was selected to ensure sufficient soil-breaking and weed-removal capability under varying soil conditions. The force and torque requirements of the end-effector were determined using Equations (14) and (15). Considering the measured range of soil resistance and the applied safety factor, a stepper motor with a rated torque of no less than 2 N·m was ultimately selected for the end-effector to ensure reliable performance during soil-breaking, cutting, and weed-removal operations.
The onboard controller used a laptop as the computational core of the system. An Acer Predator Helios Neo equipped with an Intel i9-14900HX processor, 16 GB RAM, a 1 TB solid-state drive, and an RTX 4060 GPU was employed to provide sufficient computational capacity for real-time data processing. The host computer communicated with the lower-level controller through a serial interface, forming a master–slave control architecture. In this framework, coordinate points, control commands, and execution parameters were transmitted in packet format to the lower-level controller, enabling real-time interaction between path planning outputs and actuator motion control.
The lower-level controller adopted an Arduino Mega 2560 as the core control unit for the three-axis synchronous belt system. It received real-time motion commands from the upper-level computer and converted them into pulse and direction signals to drive the coordinated motion of the X-, Y-, and Z-axis belt-driven mechanisms. Owing to its abundant I/O resources and stable timing performance, the Arduino Mega 2560 was well-suited for synchronized multi-axis motion control tasks requiring deterministic execution.
Following the hardware configuration and motor selection of the three-axis mechanism, a unified three-axis motion control strategy was designed to ensure accurate execution of the path planning results by the synchronous belt system. After weed coordinate extraction and path planning were completed by the host computer, the target points were sequentially transmitted to the lower-level controller in packet format. The lower-level controller converted the commanded displacements into pulse counts based on the synchronous belt transmission parameters and applied synchronous linear interpolation for the X- and Y-axes to achieve planar motion. During interpolation, a trapezoidal velocity profile (acceleration–constant speed–deceleration) was adopted to reduce belt shock caused by frequent high-speed starts and stops.
The Z-axis actuator employed a dual-stage descent strategy: a rapid approach to a pre-defined safe altitude, followed by a low-speed penetration to minimize impact on crop roots. Upon reaching the target depth, constant rotation was maintained for soil breaking and weeding. The cycle concluded with a vertical retraction to the safe height for the subsequent operation.
The actuation system utilizes a three-axis synchronous belt drive to ensure precise positioning and stable movement of the end-effector. To safeguard the crop, the control system dynamically adjusts the trajectory based on identified weed locations, incorporating a protected-area constraint to prevent accidental contact with cabbage seedlings. To enhance reliability, the lower-level controller continuously monitors limit switches and motor status, immediately halting pulse output and initiating a safe shutdown upon detecting over-travel, overload, or communication anomalies.
A control loop rate of 500 Hz is implemented for rapid response, complemented by 16-microstep control to ensure smooth motion and minimize mechanical jumps. Real-time feedback integrated with PID control further refines the target position, maintaining a system response time within 100 ms to ensure immediate and stable operation.
During the weeding process, the walking mechanism remains stationary until all targets in the current area are processed. As weed positions are static during operation, path planning is treated as a Traveling Salesman Problem (TSP) to minimize travel distance and maximize efficiency. Path recalculation is triggered only upon detecting new targets or completing the current task, ensuring efficient weeding under stable conditions.