1. Introduction
United Nations projections indicate that the global population will reach 9.2–10.1 billion by 2050, requiring a 70% increase in food production (vs. 2009 levels) to address undernourishment and growing demand [
1,
2]. This imperative drives the development of precision agriculture technologies, among which field scouting is critical for monitoring crop status and enabling targeted management [
3,
4]. As a staple food crop, corn’s yield stability directly impacts global food security, but its narrow row spacing and closed mid-to-late growth stage canopy create significant challenges for traditional manual scouting—limiting efficiency, coverage, and observational detail [
5]. Smart agricultural robots have emerged as promising solutions for automated cornfield scouting.
Quadruped robots offer unique advantages for cornfield operations: their multi-legged mobility adapts to sloped and furrowed terrain, minimizes soil compaction, and allows row-crossing movement [
6,
7]. Their payload capacity supports integrated sensors for real-time data collection, establishing them as reliable scouting platforms [
8]. Autonomous navigation is a prerequisite for their deployment. Traditional agricultural navigation largely relies on crop-row extraction. Several methods have been effective in structured environments such as orchards or vineyards, where row spacing is wider and canopy occlusion is less severe [
9,
10]. For narrow-row crops, Higuti et al. [
11] used 2D LiDAR for histogram-based row segmentation; Gasparino et al. [
12] enhanced stalk features to improve robustness under partial occlusion; and Affonso et al. [
13] applied a self-supervised neural network to predict crop row parameters via LiDAR point cloud projection. However, crop row-based navigation has notable limitations in cornfields: it depends on continuous, intact rows (accuracy drops sharply under closed canopies, weeds, or missing plants) and only enables local “feature-following” without global positioning capability.
Simultaneous Localization and Mapping (SLAM) technology, which enables multi-feature fusion and the construction of a global environmental map, presents a viable solution for autonomous navigation within complex under-canopy settings. Nevertheless, prevailing LiDAR-SLAM methodologies encounter substantial limitations when deployed in densely planted cornfields. Conventional 2D SLAM algorithms, such as GMapping [
14] and Hector SLAM [
15], are constrained by their planar sensing capability and exhibit marked vulnerability to visual occlusion. Meanwhile, typical 3D SLAM methods face fundamental difficulties. Matching-based algorithms like ICP [
16] suffer from performance degradation in environments dominated by repetitive plant structures, while even advanced frameworks that rely on feature extraction and matching, including LOAM [
17], LeGO-LOAM [
18], and their subsequent derivatives [
19,
20,
21], are prone to pose estimation errors induced by dynamic foliar movement, uneven terrain, and a general scarcity of distinctive, stable geometric features.
To adapt SLAM to agricultural environments, researchers have developed field-specific improvement schemes, which form the core foundation for crop field navigation. For olive groves, Cheein et al. [
22] proposed an extended information filter-based SLAM method, fusing visual HOG features with LiDAR data to enhance olive trunk detection robustness, achieving reliable state correction while maintaining estimation consistency. Shalal et al. [
23] integrated camera and LiDAR data for orchard mapping, using tree/non-tree positions as priors and combining with an extended Kalman filter for localization—achieving a 0.103 m Euclidean distance RMSE in real orchard tests. Lepej et al. [
24] used 2D LiDAR and phase correlation to calculate inter-frame poses, providing stable positioning for robots moving parallel to crop rows in apple and grape orchards. Pierzchala et al. [
25] built GTSAM-based point cloud maps in forests, extracting trunk features to support forestry vehicle positioning. These studies confirm SLAM’s effectiveness in sparse, structurally stable agricultural scenes (e.g., orchards) by focusing on salient feature extraction (e.g., trunks). However, cornfields differ drastically: sub-meter narrow rows limit movement, mid-to-late growth closed canopies block features, and dynamic factors (leaf swaying, weeds) reduce feature stability—rendering orchard SLAM methods infeasible. Targeting cornfield specificity, recent progress includes: Li et al. [
26] fused odometry to reduce 2D LiDAR motion distortion, cutting GMapping’s cornfield mapping error by 0.05 m; Dong et al. [
27] proposed a stalk semantic feature-based LOAM improvement, segmenting corn stalks via region growing and parameterizing them as linear models, achieving a 0.88% relative trajectory error (outperforming traditional methods) with clearer plant edges. Nevertheless, facing extreme conditions such as high-density occlusion under corn canopies and dynamic plant deformation, further enhancement of feature stability and anti-interference capability is still required.
Notably, while quadruped robots show great potential in cornfield scouting, existing SLAM algorithms tailored for quadruped platforms further highlight the research gap. Most of these quadruped-specific SLAM methods are designed for relatively simple or structured terrains, lacking adaptation to cornfield complexity. For instance, Ou et al. [
28] proposed Leg-KILO, a kinematic-inertial-lidar odometry framework that tightly couples leg odometry, lidar odometry, and loop closure via graph optimization to mitigate high-dynamic motion impacts. Yang et al. [
29] developed Cerberus, a visual-inertial-leg odometry system achieving sub-1% drift through online kinematic calibration for agile locomotion in general indoor/outdoor environments. Zhou et al. [
30] proposed a tightly coupled LiDAR-IMU SLAM method for quadruped robots, reducing positioning error by 65.08% compared to LOAM via NDT-sliding window registration. These algorithms leverage obvious environmental features or open spaces to ensure performance, but they cannot address the core challenges of cornfield navigation.
Although existing research has made progress in cornfield-oriented feature extraction and quadruped robot motion-induced anti-interference for SLAM systems, two critical limitations remain. First, conventional SLAM methods struggle to adapt to measurement noise caused by gait variations, as they inadequately account for sensor jitter and abrupt pose changes resulting from the high-dynamic motion of quadruped robots. Second, the inadequate integration of crop row features with unstructured terrain leads to feature matching failures in extreme scenarios such as straw obstruction and dense weed growth. To address these issues, this paper proposes the CK-SLAM algorithm, which integrates quadruped robots’ kinematic constraints with crop row information. Its key innovations are as follows:
A body perception fusion strategy for legged robots based on the Invariant Extended Kalman Filter (InEKF) is proposed to establish a leg odometer, which adapts to the motion characteristics and terrain adaptability of legged robots.
An innovative crop plane feature extraction and matching method is designed to enhance adaptability to complex farmland scenarios.
A multi-feature hierarchical constraint and factor graph global optimization framework is developed to achieve precise robot pose estimation in farmland environments.
2. Materials and Methods
2.1. Experimental Platform
In this study, the JueYing Mini quadruped robot developed by Hangzhou CloudMinds Technology was adopted as the experimental platform. It is equipped with a core computing unit consisting of an Intel Core i7 processor (Intel Corporation, Santa Clara, CA, USA) and an NVIDIA Jetson Xavier NX edge computing module (NVIDIA Corporation, Santa Clara, CA, USA), which can provide powerful parallel computing capabilities and real-time data processing performance. This configuration meets the computational requirements of feature extraction, point cloud registration, and multi-sensor fusion in SLAM algorithms. The JueYing Mini quadruped robot has a total weight of 20 kg, with a ground contact pressure of 0.06 MPa (calculated based on the contact area of four feet, each with a contact area of 80 cm
2). This parameter is critical for agricultural scenarios, as excessive soil compaction may affect crop root growth. This contact pressure is far lower than the critical pressure for soil compaction (typically 0.2–0.3 MPa for agricultural soil [
31]). The robot has a standing dimension of approximately 700 mm × 400 mm × 500 mm, featuring a compact mechanical structure. It supports various gaits such as diagonal walking, trotting, and jumping, and possesses excellent dynamic balance performance and all-terrain adaptability. These characteristics enable the robot to navigate flexibly through corn rows with a spacing of 0.8–1.2 m, as well as handle common terrain undulations and small obstacles in the field.
In terms of sensor configuration, the robot is standardly equipped with a RS-LiDAR-16 (with a horizontal field of view of 360°, vertical field of view of −15°~+15°, ranging distance of 0.1~100 m, and point cloud frequency of 10 Hz) and absolute encoders for 12 driving joints, which can collect in real time 3D environmental point clouds, joint angles, and output torque data. To improve the accuracy of motion state perception and the capability of positioning ground truth verification, an MTi-710 high-precision IMU (Xsens Technologies B.V., Enschede, Overijssel, The Netherlands) and a Lite RTK module (Qianxun Spatial Intelligence Inc., Shanghai, China)are additionally configured. Specifically, the MTi-710 IMU achieves a pitch and roll accuracy of 0.2°, a heading accuracy of 1°, a gyroscope bias stability of 10°/h, and an accelerometer bias stability of 15 μg, with a maximum output frequency of 2 kHz, enabling precise capture of the robot’s dynamic motion states. The Lite RTK module supports signals from five major satellite systems (Beidou, GPS, GLONASS, GALILEO, and QZSS) and incorporates a u-blox ZED-F9P centimeter-level positioning module. It delivers an RTK positioning accuracy of ±(1 cm + 1 ppm) in the horizontal direction and ±(2 cm + 1 ppm) in the vertical direction, with an update rate of 10 Hz, capable of providing centimeter-level positioning ground truth. As shown in
Figure 1, the experimental platform integrates the above-mentioned core components.
All sensor data are synchronously transmitted via the EtherCAT high-speed bus (Beckhoff Automation GmbH & Co. KG, Verl, North Rhine—Westphalia, Germany) with a synchronization accuracy of ≤1 ms, and data routing, management, and fusion processing are completed based on the Robot Operating System (ROS Noetic). This provides a stable and reliable hardware support for the testing and verification of field SLAM algorithms.
2.2. System Overview
The overall system framework is shown in
Figure 2. The system fuses data from 3D LiDAR, IMU, joint encoders, and joint torque sensors. The InEKF module first fuses motion information from the IMU, joint encoders, and contact detection results based on joint torque, completing the prediction and iterative update of the robot’s motion state, outputting a high-frequency motion estimate with controlled drift. Subsequently, the 3D LiDAR data undergoes preprocessing based on this motion estimate for motion distortion compensation, eliminating point cloud distortion induced by robot motion. The distortion-compensated point cloud data further extracts three types of features: line features, plane features, and crop plane features. The LiDAR odometry uses these features with a scan-to-map matching method, aligns current scan features with an incrementally built local submap, and fuses the motion state output from InEKF to calculate the LiDAR odometry. The system simultaneously maintains a robot-centric incremental map, which is stored using an incremental kd-tree (ikd-tree). This map supports dynamic addition and deletion of point clouds and manages keyframes by defining a robot-centric circular search region. Finally, the backend uses a factor graph optimization framework to fuse the leg odometry factor from the InEKF motion estimate and the LiDAR odometry factor from scan-to-map alignment. Through Maximum a Posteriori (MAP) estimation, it minimizes the residuals of each factor, achieving global optimization of the robot trajectory and map geometry.
2.3. Quadruped Robot State Estimation Based on InEKF
2.3.1. State Equation
During the motion of a quadruped robot, due to frequent foot-ground impacts on the ground, the accelerometer data acquired by the IMU is unstable. This instability leads to significant drift when IMU measurements are directly used for state estimation, especially noticeable in the z-axis (vertical) direction. On the other hand, due to the uneven terrain of cornfields, their soft soil structure and irregular surface can cause foot slippage or sudden contact state changes, further exacerbating IMU measurement errors. Therefore, this study adopts the InEKF method proposed in previous work, filtering IMU data by fusing motion information from the IMU, joint encoders, and contact detection results based on joint torque. This method models foot slippage as a bias term in the body velocity and incorporates it into the state model, while dynamically adjusting the observation noise covariance using a foot contact probability model, providing a relatively accurate initial pose and velocity reference for the system.
We define the state vector of a quadruped robot in the world coordinate system
on a group:
where
represents the body orientation,
denotes the fuselage position,
indicates the fuselage velocity, while
,
,
,
correspond to the positions of the four foot ends (front-left, front-right, rear-left, rear-right respectively).
signifies the slip velocity caused by unstable contact or sliding at the foot end, modeled as an exponentially decaying process with decay rate
[
32]:
Neglecting IMU biases (i.e., gyroscope and accelerometer biases), the dynamic model of the system is given by:
Among them,
and
are the angular velocity and acceleration measured by the IMU, respectively.
denotes the measured joint angles of the
-th leg, acquired from joint encoders.
is the gravity vector,
,
,
are Gaussian noise terms,
denotes the skew-symmetric matrix of vector, and
is derived from forward kinematics, representing the measured orientation of the contact frame relative to the IMU frame. Rewritten in matrix form:
where
is an isomorphism that maps elements of the tangent space of the Lie group
at the identity element to their corresponding matrix representations. In the InEKF, the left-invariant errors
and right-invariant errors
are used to define the error between the estimated and true values, where
denotes the estimated state of the system at time
. This study employs right-invariant error derivation:
where
Let the Lie algebra corresponding to
be
, i.e.,
, where
denotes the exponential map, which maps an element of the Lie algebra
to the corresponding element of the Lie group
and an element of
to the corresponding element of
. Using the first-order Taylor approximation of the exponential function,
, we obtain:
Subsequently, the time derivative of the error state vector is derived as:
where
is the state transition matrix:
and
denotes the group adjoint map [
33]. The covariance matrix
is solved via the Riccati equation:
2.3.2. Observation Equation
Based on the forward kinematics of the quadruped robot, the position of each foot relative to the body frame
can be obtained as:
This equation describes the relationship between the foot-end position relative to the body frame (derived from joint angles) and the global foot-end position, with
representing the measurement noise of the foot-end position. The body velocity observation is given by
where
represents the probability of stable contact between foot
and the ground at time
.
denotes the set of feet in potential contact with the ground.
indicates that the
i-th foot is in stable contact with the ground.
is the ground reaction force acting on foot
at time
.
is the z-axis component of
.
and
can be regarded as the weights of a logistic regression classifier.
is the fuselage velocity in the body frame and
is the velocity of foot
in the body frame at time
:
where
denotes the Jacobian matrix of the forward kinematics for the
-th leg,
is the angular velocity of the robot in body frame. To align the body velocity with the world coordinate system and account for slip effects, we transform the foot-end velocity from the body frame to the world frame:
where
is the body velocity error, including errors from each foot velocity observation. The final observation model is expressed as:
and
are observation variables derived from the system’s right-invariant observation model, and their details are consistent with the framework proposed in [
32]. All noise terms
, …,
. Based on the definition of right-invariant errors. Yielding the observation equation in the form
, the update process of the InEKF is implemented as follows:
2.4. Point Cloud Preprocessing
In the cornfield operation scenario, the quadruped robot’s body is prone to bumping, tilting, and other irregular movements during walking, coupled with posture mutations caused by foot slippage, which significantly increases the movement of the LiDAR during a scanning cycle, leading to geometric distortions like stretching and offset in the point cloud. Therefore, in the preprocessing stage, deskewing of the LiDAR point cloud is necessary. Due to the instability of IMU acceleration measurements on the quadruped robot, this study uses the leg odometry derived from InEKF for distortion compensation.
During a single scanning cycle, the LiDAR sequentially collects multiple laser points and forms a frame of point cloud data. Assuming the start time of the scanning cycle is
and the end time is
, each
-th laser point carries a relative time offset
in the raw point cloud data, where
representing the time interval from the start of the scan to the completion of data acquisition for that point. Thus, the absolute acquisition time of the
-th laser point is:
The core of point cloud deskewing is to match each laser point with the precise robot pose at its acquisition moment, thereby eliminating motion interference through coordinate transformation. The Invariant Extended Kalman Filter (InEKF) outputs the robot’s motion states at a frequency of 200 Hz, with its time series
covering the global timeline. For the acquisition moment
of the
-th laser point, the adjacent state timestamps of InEKF are selected according to the timeline inclusion principle:
Let the poses of the InEKF at times
and
be
and
, respectively, then the pose
corresponding to the
-th laser point’s acquisition moment is calculated as follows:
Herein, SLERP denotes Spherical Linear Interpolation, and LERP denotes Linear Interpolation. Given the extrinsic parameters between the LiDAR and IMU as
, the coordinates of the
-th laser point in the world coordinate system after deskewing are:
Here denotes the raw coordinate vector of the -th laser point in the LiDAR coordinate system , and denotes the coordinate vector of the deskewed -th laser point in the world coordinate system .
2.5. Feature Extraction
The feature extraction module serves as a crucial hub linking system perception, pose estimation, and map construction. Using the deskewed 3D LiDAR data, the module first separates the ground point cloud from the non-ground point cloud by leveraging the ground segmentation method from LeGO-LOAM. Specifically, after projecting the point cloud into a range image, the angular and distance relationships of neighboring points are analyzed column-by-column. Points that conform to the geometric characteristics of the ground are selected to constitute the ground point cloud, and the remaining point set constitutes the non-ground point cloud. Furthermore, a three-category feature system tailored to the complex cornfield environment is established based on different point cloud scopes. On one hand, it directly inherits the core framework for feature extraction of LeGO-LOAM, extracting planar features from the ground point cloud and line features from the non-ground point cloud, thereby fully utilizing its advantages of being lightweight and highly robust. On the other hand, an innovative crop planar feature extraction method is designed according to the characteristics of the farmland scenario, realizing accurate perception of crop structures in complex farmland environments. These three types of features collectively provide comprehensive constraints for subsequent pose optimization and map construction.
2.5.1. Line Feature Extraction
The line feature extraction fully adopts the method of LeGO-LOAM and is only carried out based on the non-ground point cloud to accurately capture the edge structures of corn plants. Specifically, the non-ground point cloud is divided into local neighborhoods. The edge characteristics are quantified by calculating the local smoothness of points (the normalized value of distance deviations among points within the neighborhood), where a higher smoothness indicates a more prominent edge in the region where the point is located. Subsequently, within multiple sub-regions divided in the 360° horizontal field of view, feature points are selected by sorting according to smoothness and subjected to Non-Maximum Suppression (NMS) to ensure the uniform spatial distribution of line features. The line features extracted by this method can accurately match the edges of the slender structures of plants, such as corn stalks and leaf contours.
2.5.2. Planar Feature Extraction
Planar feature extraction also adopts the method proposed in LeGO-LOAM, with features selected exclusively from the ground point cloud. The key process involves the following steps: First, the local smoothness of each point in the ground point cloud is calculated (quantified by the distance deviations among points within the neighborhood). A lower smoothness indicates that the region where the point is located is flatter and more consistent with planar characteristics. Subsequently, points are sorted by smoothness within horizontal sub-regions, and the less smooth points are selected as planar feature candidates. Finally, a set of ground planar features is formed by integrating the uniform sampling rule across sub-regions. This method screens points with flat and geometrically stable properties on the ground via local smoothness quantification. Meanwhile, uniform sampling in sub-regions ensures comprehensive coverage of features across the field surface. Thus, these planar features can stably reflect field undulations and furrow distributions. Beneath the corn canopy, they can effectively distinguish between the ground and plant-covered areas.
2.5.3. Crop Planar Feature Extraction
Corn plants grow in regular rows, forming continuous planar clusters along the row direction. These crop row planes exhibit stable geometric characteristics, which can serve as reliable scene constraints to make up for the lack of salient features under dense canopies. The crop plane feature extraction module follows a “coarse screening→precise positioning→plane fitting” logic: First, it screens out potential crop areas from the complex point cloud environment via grid filtering; then it precisely locates crop row positions through differential evolution optimization; finally, it fits crop plane features via the RANSAC algorithm to provide effective constraints for SLAM.
Coarse screening of potential crop areas: The non-ground point cloud is projected into a 2D grid space to establish a local coordinate system with the sensor position as the origin. Let the grid resolution be
and the grid size be
. The mapping relationship between grid coordinates and 3D spatial coordinates is established through the row index
and column index
, where
are the horizontal coordinates of the point cloud in the local coordinate system, and
denotes the floor function. The number of points
in each grid is counted, and dense regions satisfying
are selected via thresholding as potential crop distribution regions. The entire process of coarse screening and the resulting potential crop areas are illustrated in
Figure 3.
Precise positioning of crop rows: A differential evolution algorithm is adopted for crop row region detection. The crop row model is parameterized as
, where
is the crop row direction angle (angle with the
-axis),
is the distance from the first crop row to the origin, and
is the spacing between two adjacent crop rows. The objective function of the algorithm is defined as the weighted product of the total number of points in the two crop row regions and a balance factor:
where
and
are the total number of points in the two crop row regions, respectively, with
and
,
is the weight of the balance factor.
and
, represent the region ranges of the two crop rows (with width
), satisfying
The differential evolution algorithm generates an initial population within the parameter space , , , and obtains the optimal parameters after iterations through mutation, crossover, selection, and other operations.
Crop planar fitting and feature extraction: Crop planar fitting and feature extraction are performed. For the detected two crop rows, point clouds within a region of width
around each crop row are extracted from the original non-ground point cloud as planar fitting candidates, where the distinction between crop lines and crop planes is illustrated in
Figure 4. and the RANSAC algorithm is employed to fit the crop planes. During the fitting process, inliers are screened using a distance threshold
and reliable plane models with the number of inliers exceeding
are retained to form the crop planar feature point set.
The module is designed with strong environmental adaptability, and its handling strategies for typical complex scenarios are as follows:
Non-crop areas: The module judges whether valid crop rows are extracted based on two core criteria derived from the detected crop lines. First, it checks if the number of points crossing each fitted crop line is sufficient (meeting the preset requirement for effective crop row representation). Second, it verifies that the number of points between the two detected crop lines is not significantly different (ensuring balanced distribution of crop plants). When both criteria are satisfied, the system confirms the presence of crop rows and incorporates the extracted crop plane features into odometry calculation, using them to constrain the yaw, x, and y directions. If either criterion is not met, the module determines that no valid crop rows are present (i.e., the area is a non-crop region such as field ridges or paths) and disables crop plane feature extraction. In this case, the system relies solely on line and ground plane features for odometry calculation to maintain continuous navigation.
Uneven crop row spacing: The cornfield in the experimental scenario is planted with a standard row spacing of 1 m. However, due to sowing equipment errors, terrain undulations, or manual operation deviations, the actual row spacing may deviate from the standard value. To adapt to this unevenness, the differential evolution algorithm in the module sets the search range of the row spacing parameter s to 0.8–1.2 m (covering the common deviation range of the 1 m standard row spacing). This design allows the algorithm to flexibly fit crop rows with non-uniform spacing within the reasonable deviation range.
Weed interference: Weeds usually distribute sparsely and randomly, and their point cloud performance is significantly different from the dense continuous distribution of corn plants. The module uses a two-stage suppression strategy: in the grid preprocessing stage, sparse grids are directly removed to eliminate most scattered weed point clouds; in the plane fitting stage, for locally dense weeds, the crop row direction constraint is applied—only the planes whose normal vector direction is consistent with the overall crop row direction are retained, thereby further excluding weed interference and ensuring the purity of crop plane features.
2.6. Feature Matching
The feature matching method is carried out around the three types of features extracted from LiDAR (line features, planar features, and crop planar features).
For the matching of line features and planar features, the real-time pose provided by leg odometry (based on the InEKF) is used as the initial pose guess. An ikd-tree is employed to quickly search for the nearest neighbor features of the current frame’s line feature points and planar feature points in the local map, thereby constructing initial matching pairs.
The matching of crop planar features is based on their core geometric characteristics. The crop planar feature is defined as
consisting of the plane centroid
, boundary convex hull vertex set
, and unit normal vector
. These three types of core features are extracted from the pre-fitted crop planar point cloud, comprehensively characterizing the spatial position, spatial scope, and spatial attitude of the plane. For a crop planar point set
(where
), the centroid
is calculated as:
which is used to represent the spatial position of the crop plane. The boundary convex hull vertex set
(where
is the number of convex hull vertices) is obtained by projecting the crop planar point set onto a local plane with
as the normal vector and then extracting the vertices of the minimum convex polygon, serving to define the spatial scope of the crop plane. The unit normal vector
is solved via Principal Component Analysis (PCA): first, perform PCA on the crop planar point set
to construct the covariance matrix
, then, conduct eigenvalue decomposition on
the eigenvector corresponding to the smallest eigenvalue, after normalization, is the plane normal vector
, which characterizes the spatial attitude of the crop plane.
Under the constraints of these three core features, the crop plane is searched and matched in the local map. Based on the initial pose
provided by leg odometry, the current crop planar feature
. Subsequently, the historical plane object closest to
is selected from the local map. Let the historical crop planar features stored in the local map be
, where
is the number of historical crop planes. For the
j-th historical crop planar object
in the local map, the proximity between
and
in three dimensions (spatial position, scope, and attitude) is quantified as follows:
where
denotes the convex hull area calculation function, and
denotes the intersection operation. To comprehensively evaluate the overall proximity, a weighted comprehensive proximity index is defined:
with the constraint
(where
,
,
are weights). By iterating through all historical plane objects, the
that satisfies
and
is selected as the optimal matching object for the current crop plane.
2.7. LiDAR Odometry
The LiDAR odometry proposed in this study adopts three types of LiDAR-extracted features (ground planar features, line features, and crop planar features) as core constraints, and employs a two-step Levenberg–Marquardt (LM) iteration strategy for pose estimation. Ground planar features primarily constrain the roll angle, pitch angle, and -axis (height) components of the pose, mitigating pose deviations in the vertical direction and attitude tilt. Non-ground features (including crop planar features and line features) focus on constraining the yaw angle, -axis, and -axis (horizontal plane) components, ensuring the accuracy of horizontal pose estimation in cornfield scenarios. By precisely associating features with optimization dimensions and dynamically adjusting priorities, the odometry adapts to pose estimation requirements in complex cornfield scenarios.
The ground planar feature residual is defined as the distance from the current frame’s ground planar feature points to the corresponding historical planes in the local map:
where
is the
-th ground planar feature point in the current frame,
denotes the local map, and
and
are the unit normal vector and centroid of the matched historical ground plane in the local map, respectively.
The crop planar feature residual adopts the same form as the ground planar feature residual, i.e., the distance from the current frame’s crop planar feature points to the corresponding historical crop planes in the local map:
where
is the
-th crop planar feature point in the current frame, and
and
are the unit normal vector and centroid of the matched historical crop plane in the local map, respectively.
The line feature residual is defined as the distance from the current frame’s line feature points to the corresponding historical line segments in the local map:
where
is the
-th line feature point in the current frame,
,
are the two endpoints of the matched historical line segment in the local map.
denotes the vector cross product, and
denotes the 2-norm of a vector.
Pose optimization takes the initial pose
provided by leg odometry as the iteration starting point and gradually converges to the optimal pose through two-step LM iterations. In the first step, focusing on ground planar feature residuals, a nonlinear least squares problem for the pose parameter
is constructed to primarily correct deviations in the roll angle, pitch angle, and z-axis drift in the initial pose:
where
is the number of ground planar feature points. This optimization problem is solved via LM iterations, outputting the preliminarily optimized pose
.
In the second step, based on the pose output from the first step, the yaw angle, x-axis, and y-axis components are further optimized. The optimization strategy is dynamically adjusted according to the matching results of crop planar features. When crop planar features exist and are successfully matched, a joint optimization objective is constructed with crop planar feature residuals as the main constraint and line feature residuals as the auxiliary constraint. Weight coefficients satisfying
are used to ensure the dominant constraint role of crop planes:
When crop planar features are missing or matching fails, the optimization objective is constructed solely using line feature residuals to ensure the continuity of horizontal pose optimization:
This problem is also solved via LM iterations, ultimately outputting the twice-optimized pose , i.e., the pose of the current frame’s LiDAR odometry.
After the LiDAR odometry solves the current frame’s pose, a keyframe selection mechanism is used to filter valid frames to support local map maintenance and subsequent optimization. The core logic of keyframe selection revolves around two dimensions: “motion significance” and “environmental information novelty.”
First, the pose deviation between the current frame and the previous keyframe is calculated. Motion significance is determined if the translation deviation or the rotation deviation (where , are preset thresholds). Second, the overlap between the current frame’s point cloud and historical keyframe point clouds is calculated based on the local map. Environmental information novelty is determined if (where is the overlap threshold). The current frame is marked as a valid keyframe only if both conditions are satisfied.
2.8. Optimization
In the backend fusion stage, factor graph optimization is adopted to achieve tight coupling between legged odometry and LiDAR odometry. This framework takes the robot states corresponding to LiDAR keyframes as nodes, where each node is defined by the attitude rotation matrix
and position vector
in the world coordinate system. The residual of the legged odometry factor
is defined as the deviation between the predicted pose transformation
calculated based on the states of nodes
,
, and the measured pose transformation
from the legged odometry. The residual of the LiDAR odometry factor
is defined as the deviation between the predicted pose transformation
and the measured pose transformation
from the LiDAR odometry. The initial state residual
corresponds to the deviation of the prior state obtained through static initialization when the robot starts. For covariance matrix adjustment: the covariance matrix
of the legged odometry factor is determined by the posterior covariance matrix of the InEKF. The covariance matrix
of the LiDAR odometry factor is determined by the statistical characteristics of the point cloud matching residuals. Subsequently, the objective function is constructed based on the MAP criterion:
2.9. Mapping
The mapping module involves the maintenance of local and global maps. The local map adopts a maintenance mechanism centered on the robot’s current pose, with its core data derived from valid keyframes within the region. Combined with ikd-tree technology, efficient addition, deletion, and retrieval of feature point clouds are achieved. Line features and planar features are stored in two independent ikd-trees, stores line feature points and stores planar feature . For crop planar feature storage, a collaborative mode of ikd-tree for inliers and hash table for geometric parameters is adopted, stores the crop planar inlier set , while core geometric parameters (centroid boundary convex hull vertex set and unit normal vector ) are stored in a hash table . The index key is the unique identifier of the crop plane, enabling rapid association between inlier retrieval and parameter invocation.
The construction of the global map is based on the keyframe poses fused with LiDAR odometry and legged odometry constraints after factor graph optimization, achieving cross-local-region feature integration and global consistent representation. Specifically, downsampled line feature points, planar feature points, and crop planar feature points are extracted from the valid keyframes output by factor graph optimization. These feature points are uniformly transformed from the local coordinate system to the world coordinate system according to the optimized poses of the keyframes. For line features and planar features, a global ikd-tree is used for storage and management, and duplicate redundant points are eliminated through nearest-neighbor retrieval to realize cross-keyframe feature fusion. The crop planar features adopt a storage mode consisting of inliers and their corresponding geometric parameters: after coordinate transformation, the inliers are stored in the global crop feature ikd-tree, while their geometric parameters are synchronously updated to the global hash table to maintain the efficiency of feature retrieval and parameter invocation.
3. Experiments and Results Analysis
3.1. Experimental Scenarios and Data Collection
The experimental site is located in an experimental field in Santai County, Mianyang City, Sichuan Province, as shown in
Figure 5. The terrain is generally flat but characterized by complex microtopography: small drainage ditches, shallow gullies, and field ridges are widely distributed, resulting in local terrain undulations. All experiments were conducted under low-wind conditions: during the early and late jointing stages (corn height < 220 cm), the sparse-to-moderate canopy led to slight leaf sway (amplitude < 5 cm) and a small number of dynamic points in the LiDAR point cloud; in the tasseling and milk ripening stages, the closed canopy effectively reduced wind penetration, resulting in minimized leaf motion (amplitude < 2 cm) and stable crop structural features. The experiments were performed under relatively dry soil conditions. Based on field observations throughout the tests, the robot’s feet exhibited no sinking—even after light rainfall—though occasional slippage was observed. This slippage was attributed to slight soil moisture retention following light precipitation, without any signs of muddiness or deep adhesion that would impede navigation. Corresponding to these soil conditions, soil moisture content is estimated to range from 10% to 23%.
To cover environmental differences throughout the corn growth cycle, tests were conducted across 4 key growth stages. The corn morphology and field environmental characteristics at each stage are summarized in
Table 1.
The test trajectory for the early jointing, tasseling, and milk ripening stages was set to a 130 m round trip across 4 corn rows, following the same pre-defined path. However, due to inherent farmland microtopographic variations and occasional minor foot slippage induced by soil moisture, the RTK ground truth trajectories of repeated tests for these stages cannot be guaranteed to be completely identical. The test trajectory length for the late jointing stage was specifically designed as 50 m (shorter than the 130 m used for other stages) based on prior preliminary tests. These preliminary evaluations revealed that mainstream IMU-fused SLAM algorithms (i.e., LeGO-LOAM and Point-LIO) exhibited significant positioning drift and even loss of localization in long-distance cornfield scenarios (≥100 m) due to gait-induced IMU jitter and cumulative integration errors. In contrast, the late jointing stage features moderate canopy density and relatively stable environmental structure, which enables these algorithms to maintain valid localization results within a short trajectory (50 m).
3.2. Comparative Algorithms
To verify the superiority of CK-SLAM, mainstream methods covering different technical routes are selected as baselines to ensure the comprehensiveness and fairness of the comparison. All baselines are tested under the same parameter settings, hardware platform, and data input conditions to eliminate artificial biases. Priority is given to algorithms that can stably output valid results in the experiments, and three representative schemes are finally determined. A-LOAM—A classic LiDAR Odometry and Mapping (LOAM) algorithm. It relies on edge and planar features for point cloud registration without IMU fusion, representing a pure LiDAR SLAM scheme without inertial assistance, with parameters consistent with its open-source default configuration. LeGO-LOAM—A feature-based LiDAR SLAM scheme fused with IMU. IMU data is integrated in a loose coupling manner, mainly used for motion distortion compensation and initial pose guess for registration, representing a scene-adaptive feature extraction scheme with IMU assistance, adopting the same IMU data and parameter settings as CK-SLAM. Point-LIO—A tightly coupled LiDAR SLAM scheme fused with IMU. It does not rely on traditional edge or planar feature division but deeply fuses raw LiDAR points with IMU pre-integration data, representing an explicit feature-independent scheme under IMU tight coupling, with parameters kept consistent with its open-source default configuration and using the same sensor data as CK-SLAM.
3.3. Evaluation Metric Design
Evaluation metrics are designed from two dimensions—localization accuracy and real-time performance—combining quantitative and qualitative analyses.
3.3.1. Localization Accuracy Metrics
Two metrics are adopted: Absolute Pose Error (APE) and Relative Pose Error (RPE). APE evaluates global pose accuracy, defined as the direct deviation between the pose estimated by the algorithm and the ground truth pose. RPE assesses local pose accuracy, referring to the deviation of pose changes between adjacent moments (or at intervals of). Both metrics are typically presented in the form of Root Mean Square Error (RMSE), with the calculation formulas as follows:
where
is the error between the estimated pose
and the ground truth pose
at the
-th moment, and
is the deviation of pose changes within the adjacent time interval
.
3.3.2. Real-Time Performance Metrics
To ensure comprehensive and reliable real-time evaluation, statistical characteristics of each algorithm running on the quadruped robot are recorded, including average single-frame runtime, minimum runtime, and maximum runtime. These metrics comprehensively reflect the algorithm’s overall processing efficiency, time redundancy capability in complex scenarios, and optimal response speed under ideal working conditions.
3.4. Experimental Results and Analysis
Figure 6 presents the top-down views of the trajectories of each algorithm across the four corn growth stages, where the black dashed line represents the RTK ground truth trajectory. It should be noted that the start and end points of LeGO-LOAM and Point-LIO trajectories in
Figure 6 are inconsistent with the ground truth and other algorithms. This phenomenon arises from the trajectory alignment setting adopted in the evo trajectory evaluation tool: to eliminate the influence of initial pose offsets between different algorithms and facilitate quantitative comparison of trajectory shape consistency and drift characteristics, each algorithm’s trajectory was aligned with the RTK ground truth trajectory using a rigid-body transformation (including translation and rotation). During this alignment process, the relative positional relationship of each trajectory’s internal points is preserved, while the absolute start and end coordinates are adjusted to minimize the overall deviation from the ground truth. This alignment operation does not affect the validity of the experimental results. On one hand, the core evaluation metrics (APE RMSE and RPE RMSE) are calculated based on the deviation between the aligned trajectories and the ground truth, which can objectively reflect the localization accuracy of each algorithm. On the other hand, the relative drift trend, shape consistency with the ground truth, and algorithmic stability displayed by each trajectory (e.g., whether LeGO-LOAM maintains a stable path or Point-LIO produces chaotic trajectories) remain unchanged after alignment, ensuring the reliability of qualitative analysis.
Both CK-SLAM and A-LOAM demonstrate favorable localization performance throughout all growth stages. However, during the tasseling and milk ripening stages, although A-LOAM can maintain the trajectory shape, a significant deviation from the ground truth trajectory is observed. The advantages of CK-SLAM gradually become prominent, which stems from its integrated crop row extraction and crop planar feature constraint modules. As corn stems thicken and plant height increases, CK-SLAM can more easily extract crop line information. LeGO-LOAM performs adequately only in the short-trajectory scenario of the late jointing stage, while it completely loses accuracy in the early jointing stage. Except for the late jointing stage, Point-LIO exhibits chaotic localization in all other growth stages, indicating extremely poor adaptability.
Figure 7 shows the trajectory components over time for each algorithm across four growth stages, displaying the time-varying curves of the X, Y, and Z coordinates. During early jointing, CK-SLAM and A-LOAM closely matched RTK true values on all three axes, while LeGO-LOAM and Point-LIO exhibited significant errors on the Y-axis. During the late jointing stage, the shorter trajectory length prevented cumulative errors from fully manifesting, resulting in minimal algorithmic differences. By the tasseling and milk stages, as corn stalks thickened and plant height increased, row features became easier to extract. CK-SLAM’s crop row constraint mechanism enabled high alignment with true values across all three axes, while A-LOAM exhibited slight single-axis drift. After 100 s of operation, LeGO-LOAM exhibited noticeable deviations on both the X- and Y-axes, while Point-LIO showed severe deviation throughout the entire process.
Figure 6 and
Figure 7 visually demonstrate the trajectory tracking performance of various algorithms on a quadruped robot platform. The LeGO-LOAM and Point-LIO algorithms, which fuse IMU data, exhibit significant trajectory deviations. In contrast, the pure LiDAR-based A-LOAM and CK-SLAM algorithms produce trajectories closer to the ground truth, with CK-SLAM maintaining high alignment with the ground truth throughout the entire testing period. The primary reason may be that the gait motion of quadruped robots causes high-frequency jitter in IMU data. Algorithms incorporating IMU failed to effectively process this jitter interference, instead introducing additional errors. LeGO-LOAM’s loosely coupled IMU failed to accurately compensate for the robot’s bumpy motion. Although Point-LIO’s tightly coupled architecture is theoretically superior, it was not optimized for the robot’s non-stationary motion characteristics, leading to accumulated IMU pre-integration errors and ultimately inferior positioning accuracy compared to pure LiDAR algorithms.
Table 2 presents the statistical posture accuracy and real-time performance metrics for each algorithm across four growth cycles. Data represents the average of five test rounds, with bolded values indicating the optimal result.
In terms of the APE RMSE metric, CK-SLAM achieved an average value of 2.0939, representing reductions of 15.7%, 56.4%, and 72.2% compared to A-LOAM (2.4917), LeGO-LOAM (4.7994), and Point-LIO (7.5423), respectively, demonstrating a notable advantage in global positioning accuracy. The most pronounced performance improvement occurs during the milk ripening stage (characterized by the densest canopy and most severe occlusion). CK-SLAM achieves an APE RMSE of 1.9840, while A-LOAM, LeGO-LOAM, and Point-LIO register 2.8080, 6.3427, and 9.8624, respectively. This demonstrates that CK-SLAM, through targeted optimization for agricultural scenarios, effectively mitigates feature extraction interference caused by dense canopy cover, resolving the severe positioning drift issues faced by traditional algorithms in complex agricultural environments.
Regarding the RPE RMSE metric, CK-SLAM achieved an average value of 0.0946, maintaining a leading position. LeGO-LOAM attained lower RPE RMSE in certain growth stages but exhibited insufficient overall stability. Point-LIO’s average RPE RMSE of 0.11 was the highest among all algorithms. This reflects its inability to accurately capture subtle motion state changes caused by robot gait transitions or terrain undulations due to the combined effects of IMU pre-integration errors and failed point cloud matching during pose calculation between adjacent time steps. Its local pose estimation capability was significantly weaker than other algorithms. A-LOAM demonstrated stable RPE RMSE performance but was slightly inferior to CK-SLAM overall.
Figure 8 and
Figure 9 show the runtime trends of each core module and the single-frame computation time histogram of CK-SLAM across four growth stages, respectively. The histogram reveals that CK-SLAM exhibits a typical unimodal distribution in single-frame computation time across all four scenarios, with the peak concentrated in the 33–40 ms range and an average time around 35 ms. This demonstrates the algorithm’s high processing stability and consistent scene adaptability. Among the four core modules, the feature extraction module consumes the most time, accounting for an average of 40% of the total runtime, mainly due to the innovative crop plane feature extraction process. The laser odometry module follows, while preprocessing and factor graph optimization have relatively lower and stable time consumption. The runtime trends of each module show no significant fluctuations across stages, with only a slight increase in the feature extraction module’s runtime during the tasseling stage due to denser canopies. Although CK-SLAM’s average single-frame computation time (approximately 35 ms) is longer than that of traditional algorithms such as LeGO-LOAM (average 14.24 ms) and A-LOAM (average 15.08 ms), it fully meets the real-time requirements for a 10 Hz frame rate of lidar systems.
3.5. Ablation Experiment
To quantify the independent contributions of CK-SLAM’s three core innovations—InEKF-based leg odometry, crop plane feature extraction, and multi-feature factor graph optimization—to localization performance, three ablation variants were designed by removing one core module each from the complete CK-SLAM algorithm, with the full algorithm serving as the baseline. The ablation experiments included four test groups: the full CK-SLAM system (referred to as “Full System”) and three variants with each core module removed sequentially, denoted as “W/O InEKF” (excluding the InEKF-based leg odometry module), “W/O Crop Plane” (removing the crop plane constraint module), and “W/O Optimization” (excluding the backend factor graph optimization module). The specific adjustments for each variant are as follows: For “W/O InEKF”, the point cloud deskewing and initial pose guess for laser odometry relied solely on raw IMU data integration, without fusing joint encoder data or the foot contact probability model, and the backend optimization only retained laser odometry factors. For “W/O Crop Plane”, the feature extraction process skipped grid filtering, differential evolution-based crop row detection, and RANSAC crop plane fitting, with laser odometry optimization constrained solely by line features and ground plane features. For “W/O Optimization”, the laser odometry results after two-step LM iteration were directly used as the final pose output, without global consistency optimization via factor graph fusion of leg odometry and laser odometry factors.
Table 3 presents the quantitative comparison results of the four test groups across all growth stages, and
Figure 10 visually displays the global trajectories of each variant in representative stages (tasseling stage and milk ripening stage), while
Figure 11 presents the XYZ axis position error curves of the variants in these stages. As shown in
Table 3, the Full System achieved the lowest average APE RMSE (0.8137 m) across all stages, confirming the synergistic effectiveness of the three core modules in addressing cornfield navigation challenges throughout the crop growth cycle.
The “W/O InEKF” variant exhibited the most significant overall performance degradation, with an average APE RMSE of 4.3592 m (a 435.703% increase compared to the Full System). This degradation was particularly pronounced in the early jointing stage and milk ripening stage: in the early jointing stage, the variant’s APE RMSE reached 7.6589 m (1154.668% higher); in the milk ripening stage with dense canopy and severe occlusion, the APE RMSE surged to 4.4388 m (554.647% higher). Notably, this variant, which relies solely on raw IMU data for motion state estimation, exhibits trajectory behaviors similar to those of traditional SLAM algorithms such as Lego-LOAM and Point-LIO, including trajectory regression and oscillation phenomena. This indicates that the InEKF-based leg odometry module effectively suppresses motion noise caused by quadruped robot gait variations and foot slippage by fusing joint encoder data and foot contact probability models. In contrast, raw IMU data alone cannot compensate for accumulated jitter and terrain-induced errors, leading to unstable trajectory estimation and obvious regression-oscillation characteristics.
The “W/O Crop Plane” variant showed an average APE RMSE of 0.9517 m (a 9.113% increase) and an average RPE RMSE of 0.4467 m (a 0.072% increase), with performance degradation exhibiting a distinct correlation with navigation distance. As visualized in
Figure 11, during long-distance navigation tasks that involve traversing multiple crop rows and completing turning maneuvers, the “W/O Crop Plane” variant fails to maintain the expected parallel trajectory relationship between pre-turn and post-turn paths. Instead, the trajectory gradually drifts as the navigation distance increases, ultimately resulting in crossing between the post-turn and pre-turn trajectories. This phenomenon stems from the fact that the crop plane feature, as a scene-specific geometric constraint customized for cornfield environments, leverages the regular spatial distribution of crop rows to provide continuous and stable long-distance positioning references. It effectively restricts the cumulative drift of lateral position and heading angle during extended navigation, ensuring the consistency and parallelism of trajectories across multiple crop rows. Without this module, the algorithm relies solely on traditional line and ground plane features, which lack sufficient scene adaptability to constrain long-term drift in complex cornfield environments. As a result, the positioning errors accumulate continuously with increasing navigation distance, leading to trajectory crossing during row switching and turning in long-distance tasks. The quantitative results further verify this trend: in the early jointing stage with relatively short navigation distances, the APE increase is only 10.713%; while in the milk ripening stage involving longer navigation paths and denser vegetation, the APE increase rises to 18.427%, reflecting the growing importance of the crop plane constraint in long-distance navigation.
The “W/O Optimization” variant had an average APE RMSE of 2.5049 m (a 207.826% increase) while maintaining a relatively stable average RPE RMSE (0.4442 m, a 0.488% decrease). This trend was consistent across all stages: the variant’s RPE remained close to the Full System in short-term motion estimation, but its APE diverged significantly as the trajectory lengthened. A key function of the backend factor graph optimization module is to effectively restrict lateral drift of the robot during row switching and turning maneuvers. As observed in
Figure 11, when the robot performs row switching and turning operations, the absence of this module leads to inadequate suppression of lateral position deviation and angular error. Specifically, during U-turn maneuvers intended to transition to an adjacent parallel crop row, the variant exhibits excessive lateral drift and angular deviation, resulting in the post-turn trajectory failing to maintain parallelism with the pre-turn path and instead producing crossing phenomena. This indicates that the backend factor graph optimization module integrates leg odometry and laser odometry factors to correct local estimation errors in real time, thereby maintaining the accuracy of lateral position and heading angle during dynamic processes such as row switching and turning. Without this module, the algorithm cannot eliminate cumulative errors from local motion estimation, leading to uncontrolled lateral drift during key navigation phases and subsequent trajectory crossing. This phenomenon is particularly prominent in long-distance navigation across all growth stages, even though short-term local motion performance remains stable.
4. Discussion
Corn plants exhibit rapid growth in height and density throughout their lifecycle, posing significant challenges for manual field scouting due to narrow row spacing, dense canopy occlusion, and complex terrain. Autonomous scouting by quadruped robots has emerged as a promising solution, but reliable positioning remains a critical bottleneck. This study proposed CK-SLAM, integrating an Invariant Extended Kalman Filter (InEKF), innovative multi-feature extraction (including crop plane features), and a multi-constraint global optimization framework to address this bottleneck. Experimental validation across four corn growth stages demonstrated that CK-SLAM achieves more stable and accurate localization compared to mainstream SLAM methods, providing valuable insights for real-world agricultural robotics applications while also highlighting key limitations and future directions.
4.1. Performance Interpretation of Localization Accuracy Metrics
CK-SLAM achieved an average Absolute Pose Error (APE) RMSE of 2.0939 m, reducing errors by 15.7%, 56.4%, and 72.2% compared to A-LOAM, LeGO-LOAM, and Point-LIO, respectively. Previous studies in structured agricultural environments (e.g., orchards) reported lower positioning errors, but these are not directly comparable due to simpler environmental complexity—CK-SLAM targets mid-to-late corn growth stages with closed canopies and sub-meter row spacing, where traditional methods struggle.
For practical autonomous scouting, CK-SLAM’s 2 m average APE RMSE is sufficient for inter-row navigation (corn row spacing 0.8–1.2 m), ensuring the robot avoids crop collisions. It meets the needs of tasks like canopy density assessment and disease detection, which prioritize coverage over centimeter-level precision. Notably, its performance is most impressive in the milk ripening stage (APE RMSE = 1.9840 m), where canopy occlusion is severe, addressing a key limitation of traditional methods.
4.2. Adaptability to Quadruped Robot Platform and Agricultural Scenarios
For quadruped robots, gait-induced motion noise (e.g., IMU jitter from foot-ground impacts and body bumps) is a major source of positioning error. Most mainstream SLAM algorithms (including LeGO-LOAM and Point-LIO) fuse IMU data but fail to account for the high-dynamic motion characteristics of legged robots. LeGO-LOAM’s loosely coupled IMU integration cannot effectively compensate for bumpy motion, while Point-LIO’s tightly coupled architecture relies on the assumption of “approximate motion smoothness” (valid for wheeled robots but not quadruped robots). As a result, LeGO-LOAM and Point-LIO exhibit chaotic trajectories in most corn growth stages, with APE RMSE values 2.3–3.6 times higher than CK-SLAM in the milk ripening stage.
In contrast, CK-SLAM’s InEKF-based leg odometry module addresses this issue by fusing IMU data with joint encoder data and a foot contact probability model (derived from joint torque). This fusion strategy dynamically adjusts observation noise covariance to suppress motion noise from gait variations and foot slippage, providing a reliable initial pose reference for point cloud deskewing. Ablation experiments confirmed the critical role of this module: the variant excluding InEKF (W/O InEKF) exhibited an average APE RMSE of 4.3592 m, a 435.7% increase compared to the full CK-SLAM system.
CK-SLAM’s innovative crop plane feature extraction module enhances cornfield adaptability through dual stability support. On one hand, it uses grid filtering, differential evolution, and RANSAC to capture crop row planar clusters, directly complementing the line and plane features inherited from LeGO-LOAM and addressing the scarcity of stable geometric features in cornfields. On the other hand, the crop plane features primarily consist of points from corn stalks, as the extraction process targets the planar clusters formed by crop rows. Compared to corn leaves, stalks are less susceptible to external disturbances such as wind, maintaining structural stability even under dynamic field conditions. This dual advantage ensures reliable feature matching, especially under dense canopies, where traditional features are scarce or easily disrupted.
4.3. Limitations of the Study
Despite promising performance, this study has several limitations.
4.3.1. Algorithmic Limitation: Sensitivity to Curved Crop Rows
Corn is predominantly planted in straight rows, with minor, scattered deviations from sowing tolerances or terrain undulations—scenarios CK-SLAM adapts to well. A core premise of the differential evolution algorithm is that crop rows appear as two parallel lines within the LiDAR scanning range. This enables adaptation to large-radius curved rows, as local segments approximate parallel lines sufficiently for parameter optimization and plane fitting. However, small-radius curved rows deviate significantly from this parallel linear model within the scanning window. The differential evolution algorithm, constrained by its parallel linear parameterization, cannot accurately capture the curved structure, leading to failure in extracting valid crop lines or the extraction of incorrect crop lines. It is worth noting that such small-radius curved planting scenarios are extremely rare in corn production, as they conflict with standard agronomic practices and mechanized farming operations, limiting the practical impact of this limitation.
4.3.2. Computational Efficiency Limitation: High Time Cost of Feature Extraction
CK-SLAM’s multi-feature strategy increases computational complexity. The differential evolution algorithm (20–30 iterations, ~40% of single-frame runtime) and crop plane fitting/matching add overhead, resulting in an average single-frame runtime of 35.48 ms—2.3–2.6 times longer than LeGO-LOAM and A-LOAM. While meeting 10 Hz LiDAR real-time requirements, reduced time redundancy raises frame loss risks in extreme scenarios. High computational costs also restrict deployment on low-power edge platforms.
4.3.3. Other Application Limitations
CK-SLAM’s ~2 m APE RMSE suffices for inter-row navigation but not high-precision tasks (e.g., targeted pesticide application) requiring centimeter-level accuracy. In scenarios with high corn lodging rates or dense weed coverage, performance may degrade—lodging disrupts crop row structure, while dense weeds introduce false features that grid filtering and direction constraints cannot fully suppress. These scenarios were not fully tested, and future work should evaluate robustness under such conditions.
It should be noted that
Figure 6 and
Figure 7 present representative single-test trajectories. LeGO-LOAM and Point-LIO’s start/end points differ from RTK ground truth due to evo tool rigid-body alignment, which eliminates initial pose offsets without affecting metric validity. Future visualizations will clarify this process.
5. Conclusions
This study addresses the high-precision localization and mapping challenge for quadruped robots in cornfield scouting—where narrow row spacing, dense canopy occlusion, and complex terrain hinder traditional SLAM algorithms—by proposing CK-SLAM, a framework integrating quadruped kinematic constraints and crop row feature extraction. Its core innovations include an InEKF-based leg odometry for adapting to legged robots’ high-dynamic motion, an innovative cornfield-specific crop plane feature extraction module, and a factor graph optimization framework fusing multi-source constraints for global pose refinement. Validated across four key corn growth stages (early jointing to milk ripening), CK-SLAM outperforms mainstream baselines (A-LOAM, LeGO-LOAM, Point-LIO) with an average APE RMSE of 2.0939 m (15.7–72.2% lower than competitors) and an average RPE RMSE of 0.0946 m, excelling particularly in the milk ripening stage (APE RMSE = 1.9840 m) under severe occlusion. Despite a longer average single-frame runtime (35 ms), it meets the real-time requirements of 10 Hz LiDAR systems, providing a reliable navigation solution for automated cornfield monitoring and laying a foundation for SLAM adaptation in other dense crop scenarios.
Notably, CK-SLAM has inherent limitations: it struggles with small-radius curved crop rows (rare in practical corn production), it faces computational overhead from the differential evolution-based crop row detection (restricting low-power edge deployment), and its ~2 m APE RMSE is insufficient for centimeter-level precision tasks (e.g., targeted pesticide application). Additionally, performance may degrade in high corn lodging or dense weed scenarios.
Future work will focus on three key directions: optimizing the computational efficiency of the differential evolution module to reduce runtime overhead, integrating multi-modal sensors (e.g., visual or semantic sensors) to enhance robustness under extreme occlusion and dynamic plant interference, and extending the algorithm to handle non-straight crop rows and high-disturbance field conditions (e.g., lodging and dense weeds) for broader agricultural applicability.