1. Introduction
Greenhouse environments play an important role in modern food production by providing controlled climates that reduce the impact of external weather conditions [
1]. This ability to produce crops regardless of external weather conditions is a major benefit, helping to ensure a consistent food supply and support agricultural development [
2]. Yet, the sector faces challenges, such as persistent labor shortages and escalating costs for farmers and agriculturists [
3]. These pressures have made the automation of greenhouse operations desirable [
4]. In response, unmanned agricultural robots have emerged as a promising solution, equipped with key capabilities such as navigation, detection, action, and mapping [
5]. Of these, navigation stands out as particularly critical, since efficient and autonomous movement is required for robots to effectively operate within the complex layouts typical of greenhouse environments [
6].
Building on the need for effective automation in greenhouses, a wide range of studies have investigated methods to achieve autonomous navigation in agricultural settings. Among these, one prominent solution is the use of the Global Navigation Satellite System (GNSS) combined with Real-Time Kinematic (RTK) technology, which delivers centimeter-level positional accuracy [
7,
8]. This high precision enables robots to accurately localize themselves and navigate fields [
8]. However, greenhouse structures attenuate and reflect satellite signals, often causing multipath, poor satellite visibility, and loss of RTK corrections, which severely degrade accuracy and availability indoors [
9]. As a result, many studies have shifted toward local, onboard sensor approaches for autonomous navigation in greenhouses.
Among these local sensor-based navigation approaches, Light Detection and Ranging (LiDAR) has become a popular alternative. LiDAR sensors generate point clouds, which, when combined with algorithms like Simultaneous Localization and Mapping (SLAM), enable robots to map and localize in real-time in unknown environments [
10,
11]. Despite these advantages, the high cost and complexity of LiDAR-based solutions have led researchers to investigate alternative navigation techniques.
Visual sensors have gained increasing favor in recent research [
12]. Their low computational cost and reliability make them particularly advantageous for autonomous navigation tasks [
12]. Current computer vision methods utilizing visual sensors can be broadly categorized into two main approaches: traditional methods and deep learning techniques.
Traditional computer vision methods typically use algorithms that rely on manually designed algorithms and mathematical models to process and interpret images, with the aim of extracting navigation lines from visual data. For instance, ref. [
13] developed a computationally efficient algorithm that utilizes Otsu thresholding and blob detection with a low-cost 2D camera to identify the center of vineyard rows, enabling collision-free navigation. Similarly, ref. [
14] proposed a multi-stage algorithm that segments images, extracts crop row feature points, and uses linear regression to accurately calculate navigation lines, enabling navigation for high-ridge, broad-leaved crops across different growth periods.
In another example, ref. [
15] introduced a prediction-point Hough transform algorithm that uses a novel grayscale factor for improved image segmentation and achieves accurate navigation path fitting for greenhouse cucumber-picking robots. Meanwhile, ref. [
16] presented a low-cost local motion planner for vineyard navigation, leveraging RGB-D cameras and depth maps for real-time row-end detection as the primary control algorithm.
While the traditional computer vision methods described previously can perform in targeted agricultural scenarios, they tend to rely on carefully tuned parameters or environment-specific cues, which limit their ability to adapt across diverse settings. As a result, their generalizability remains a significant challenge.
To overcome these limitations, recent research has shifted toward deep learning approaches, which are designed to address the generalization problem, although this often comes with increased computational demands. For instance, ref. [
4] applies YoloV8 to segment crop-free ridges, which are the areas where the robot will traverse, and then uses the least-squares method (LSM) to obtain the ridge centerline. Similarly, ref. [
12] enhances the ENet network to segment rice inter-rows and utilizes the LSM algorithm to obtain the navigation line, achieving a pixel accuracy of 96.39%. Ref. [
17] uses a U-Net model to segment rows between trees, then uses a navigation line to traverse with an average mean intersection over union of 0.973. However, these approaches still struggle with common computer vision problems, such as errors under changing illumination.
Other methods, such as refs. [
5,
18], utilize YOLOV8 to detect tree trunks as reference points to define row limits, then calculate the navigation line, reporting 95% and 92.11% precision, respectively. In addition, ref. [
19] introduces a multi-perception network that simultaneously detects tree trunks, obstacles, and traversable areas in a single pass. However, tree trunks may be unreliable landmarks inside greenhouses due to occlusions, crop—stage variability, and crop types lacking prominent trunks; moreover, as noted in [
20], the computational overhead of deep learning can be prohibitive on low-resource systems typical of greenhouse robotics.
To address these persistent challenges, researchers have developed neural network architectures that prioritize lane identification over more complex and heavy compute tasks, such as image segmentation and feature extraction. These specialized neural networks are designed to robustly detect navigation lines under diverse and challenging conditions, including low-light and visually complex environments. While much of the research on lane identification networks has emerged from the self-driving car domain [
18], these advances have also found application in adjacent fields, such as autonomous mine exploration, where conventional computer vision methods often fail [
21]. Among these, LaneATT stands out as an efficient approach capable of detecting lane anchors and achieving strong alignment with real-world lanes [
22]. Despite its success in these domains, the lane identification network LaneATT remains largely unexplored in the agricultural sector.
This study introduces the novel use of lane identification networks, specifically the Real-Time LaneATT network, for autonomous greenhouse navigation. The model achieved 91% F1 Score with the real-time performance of 48 ms per frame on embedded hardware. Experimental validation demonstrates that the robotic platform can maintain its trajectory within a safe lateral deviation from the row center, thereby preventing interference with crops.
This paper is organized as follows:
Section 2 presents the LaneATT methodology with its ResNet-18 backbone and anchor-based detection mechanism.
Section 3 describes the implementation, including the robotic platform, ROS2 architecture, custom greenhouse dataset preparation, and control algorithms.
Section 4 demonstrates results achieving 91% F1 Score and 90% accuracy with real-time inference.
Section 5 concludes the work’s contributions to autonomous greenhouse navigation.
3. Implementation
3.1. Greenhouse
The research scenario depicted in
Figure 3 is the tomato greenhouse at CAETEC (Tec de Monterrey Experimental Agricultural Field), which serves as the site for all of the experimental trials. Within this greenhouse, multiple narrow cultivation rows are arranged with white plastic mulch covering the soil, a common practice for moisture retention. Notably, each row measures an average of 104 cm in width, and the floor is characterized by loose, uneven soil. The experimental trials were conducted exclusively on straight row configurations with north–south orientation, as this layout represents the conventional architecture for indoor greenhouse tomato cultivation, where space optimization and irrigation efficiency favor linear row arrangements [
26]. However, the LaneATT architecture has already demonstrated robust performance in handling curved lanes within the autonomous driving domain [
24], suggesting that the model’s capabilities can generalize to curved agricultural paths in future outdoor orchard applications.
3.2. Robotic Platform
The robotic platform used in this research is the Jackal UGV from Clearpath Robotics (Kitchener, ON, Canada), a skid-steer mobile robot designed for field robotics research (see “a” in
Figure 4). While the Jackal employs a skid-steer drive configuration, its kinematics can be approximated as a differential drive to simplify the control strategy, which is explained in
Section 2.4.
The Jackal features external dimensions of 508 mm in length, 430 mm in width, and 250 mm in height. All robot control and image processing were performed directly on the platform using Robot Operating System (ROS2 Humble).
For visual input, an Intel RealSense D435 depth camera (Intel Corporation, Santa Clara, CA, USA) was mounted on the system (“c” in
Figure 4). D435 features active stereoscopic depth technology with a depth resolution of 1280 × 720 at 15 frames per second. The camera was mounted on a Feiyu Mini 2 gimbal from FeiyuTech (Guilin, China), which was positioned atop the Jackal (see “b” in
Figure 4). The gimbal features 3-axis stabilization and measures 210.1 × 133.7 × 296 mm in its balanced position. The gimbal was configured so the yaw remains unstabilized, allowing the camera to rotate freely for navigation, while the roll and pitch axes were actively stabilized to minimize image degradation caused by motion and to ensure high-quality image acquisition. The Jetson Xavier NX Developer Kit (NVIDIA Corporation, Santa Clara, CA, USA) of 8 GB used has an NVIDIA Volta GPU with 384 CUDA cores and 48 Tensor cores, a 6-core Carmel ARM v8.2 CPU, and 8 GB of LPDDR4x memory. The board interfaces with the RealSense camera via USB 3.1.
3.3. Software
For the deployment on the Jetson Xavier NX Developer Kit, ROS2 Humble was chosen as the middleware due to its compatibility with Ubuntu 22.04, which is the default operating system for the Jetson platform. The system’s core functionality was implemented using both rclcpp 16.0.17 (C++) and rclpy 3.2.1 (Python) to leverage the strengths of each language in different modules. The essential ROS2 packages required for the software are geometry_msgs, for handling velocity and pose messages, and cv_bridge, which facilitates the conversion between ROS2 image messages and OpenCV image formats. OpenCV version 4.12 was used throughout the image processing pipeline. The software operates by continuously running the trained lane detection model, performing inference on images captured from the onboard camera. The detected lane centerline is then extracted and used in a control loop to guide the robot’s navigation within the greenhouse environment.
Figure 5 illustrates the ROS node architecture where the Image Publisher node acquires frames from the Intel RealSense D435 at 1280 × 720 resolution and 15 FPS, the Lane Detector node processes these frames to predict lane boundaries, and the Velocity Controller node computes and publishes linear and angular velocities to actuate the robot.
3.4. Framework and Environment
The implementation of LaneATT was carried out using the PyTorch deep learning framework. The model was trained and evaluated on a workstation equipped with an NVIDIA RTX 3070 Ti GPU, 8 GB of VRAM, and a system running Ubuntu 22.04 with CUDA version 12.6. For reproducibility, all experiments were conducted using the same software environment, comprising Python 3.12.3, PyTorch 2.5, and other necessary libraries, such as OpenCV and NumPy 2.1.2. To aim the research question, the network runtime was also tested in the Jetson Xaver NX (Nvidia Corporation, Santa Clara, CA, USA), and in a 12th Gen Intel i5.
The dataset preprocessing, model training, and evaluation were automated using Python scripts, with a series of logs across all the processes, providing a way to manage the experimental workflow.
3.5. Dataset Preparation
The TuSimple dataset [
27] was utilized, which contains images from various driving scenarios, including urban streets, highways, and rural roads. The dataset consists of over 6408 annotated images with corresponding lane markings, making it suitable for training deep learning models for lane detection. Each image in the dataset is accompanied by a label file that contains pixel-wise lane annotations, which define the lane locations within the image.
The training set consisted of 3626 images, while the validation set comprised 358 images, with an additional 2782 images reserved for testing. The training images were normalized using the mean and standard deviation calculated from the ImageNet dataset to maintain consistency with the ResNet-18 backbone, which was pre-trained on ImageNet.
Although the TuSimple dataset is based on various driving scenarios, which differ significantly from the current application (greenhouse navigation), it can still be used for pre-training the network. Pre-training on TuSimple enables the network to learn visual features such as edge detection, line recognition, and spatial relationships that are transferable across domains. These low-to-mid-level features learned from detecting road lanes provide a foundation that can be fine-tuned for detecting navigation paths in greenhouses, resulting in faster convergence and better performance than training from scratch with limited greenhouse data alone. The large size of the dataset allows for the effective initialization of the network before fine-tuning it with the dataset prepared.
In addition to the TuSimple dataset, a custom dataset [
28] was prepared in a greenhouse scenario, following the TuSimple annotation format. This choice was made because the network’s dataloader is compatible with datasets in this format. To create the dataset, videos were recorded over a one-year period by mounting the camera atop the Jackal while navigating through the greenhouse at three distinct cultivation stages: during initial tomato growth, at harvest maturity, and post-harvest conditions. These videos were then divided into multiple frames, which were then labeled to indicate where the lanes should be. The two different datasets used to train the model can be seen in
Figure 6, which shows the different cultivation stages of the greenhouse captured in the custom dataset.
Ultimately, the dataset used to train the model comprises 3872 greenhouse images from CAETEC. The resolution of the images is 1280 × 720. The dataset consists of 3171 images for training, 501 for validation, and 200 for testing, as shown in
Figure 7. All images were captured exclusively during daylight conditions, as the system operates under the assumption that the robotic platform will navigate the greenhouse during daytime hours when natural lighting is available. It is important to note that all images were captured from a single greenhouse environment (
Figure 3); therefore, to apply this approach to different greenhouses, new images should be collected, and the model should be fine-tuned accordingly.
3.6. Model Training
The LaneATT model was initialized with weights from a ResNet-18 backbone pre-trained on ImageNet, which provided a strong starting point for lane detection. The training process was carried out using the Adam optimizer with an initial learning rate of and a weight decay of . A learning rate scheduler was employed, reducing the learning rate by a factor of 0.1 every 5 epochs to ensure stable convergence. The total number of epochs was set to 100, and a batch size of 8 was used for training.
The loss function was focal loss, as described in
Section 2.1.3. To further enhance the model’s performance, a set of anchor points was carefully chosen to cover a wide range of lane shapes and positions in the image. During training, the model learned to adjust these anchors to fit the ground-truth lane annotations.
The training process was monitored using validation loss, F1 Score, Precision, Recall, and Accuracy as evaluation metrics. Model checkpoints were saved at each epoch, enabling systematic monitoring and recovery. The final model was selected based on the epoch with the lowest validation loss.
An important thing to note is that the projection of the anchors was modified to preserve the angles of the original image. Unlike the original LaneATT implementation, which distorted the angles when projecting anchors due to the mismatch in scale between the original image and the height and width of the activation volume, the method ensures that the geometric relationships of the anchors remain consistent. This adjustment enhances the alignment between the predicted anchors and the actual lanes, particularly in scenarios where maintaining angular consistency is crucial, enabling the network to achieve high results in fewer epochs.
3.7. Inference and Post-Processing
During inference, the trained LaneATT model takes an RGB image as input and outputs a set of lane predictions. For each image, the model produces lane existence probabilities, anchor offsets, and lane parameters. These parameters were used to compute the lane position and curvature in image space. Since lane lines can overlap in complex driving scenarios, non-maximum suppression (NMS) was applied to the output to remove duplicate lane predictions, ensuring that only the most confident lanes are retained.
The lane predictions were then transformed into polynomials, representing the lane curves in pixel coordinates. These curves were overlaid on the original image to visualize the detected lanes.
For real-time applications, the model was optimized using TorchScript 2.5.4, allowing the inference to be deployed on embedded systems. This optimization enabled fast inference times, achieving around 30 frames per second (FPS) on various systems, as detailed in the
Section 4, making the system suitable for real-time lane detection in autonomous vehicles.
3.8. Control
A proportional control strategy is employed, where the following sections describe how linear and angular velocities are obtained from the model outputs.
3.8.1. Calculation of Lane Point
The control strategy for calculating the centerline proceeds as follows: First, the trained neural network performs inference on each input image to detect lane points corresponding to the left and right lanes, which are then stored in separate arrays (see
Figure 8a). Next, the midpoints between the two detected lanes are computed. From these midpoints, the first
x points are selected starting from the bottom of the image upward, representing the region closest to the robot (
Figure 8b). The average of these selected points yields the reference lane point for navigation control, calculated as follows:
where
n is the number of detected midpoints, and
represents each individual midpoint (“a” in
Figure 9). Each point is defined in the image coordinate system, as shown in
Figure 9. This averaging approach prioritizes immediate navigation decisions by focusing on the region closest to the robot while simultaneously filtering potential noise from discontinuous or deformed lane predictions that may arise during inference. With this approach, a higher number of detected points indicates greater model confidence in the lane structure.
3.8.2. Calculation of Linear and Angular Velocity
Inspired by [
16], the control approach uses the calculated lane point to generate real-time navigation commands. First, the angle
between the image center and the reference lane point is computed using
where
represents the horizontal center at the bottom of the image, and
is the computed lane point with
as seen in
Figure 8c. The angular error
is then obtained by subtracting
.
This transformation ensures that , where indicates the robot is aligned with the lane centerline.
Linear velocity
v is calculated based on the number of detected midpoints
n relative to the maximum expected points
x:
where max_lin_vel is the maximum linear speed, and
is a proportional constant selected empirically. This formulation allows the robot to move at maximum speed when the model detects all expected points (high confidence), and reduces speed when fewer points are detected (lower confidence), prioritizing safer navigation during uncertain conditions.
The angular speed is computed using the proportional constant
:
where max_ang_vel is the maximum angular speed, and
is a proportional constant selected empirically. These proportionality constants are tuned based on the navigation error and desired system response (see “b” in
Figure 9). To enhance robustness, when no lane prediction is detected, the system maintains the previous control commands for up to two seconds. If no valid prediction is received for more than two seconds, such as during temporary occlusions or other failure scenarios, the robot is brought to a complete stop to ensure safe operation (“c” in
Figure 9).
3.9. Experimental Setup
To evaluate the performance of the algorithm, tests were conducted in three different greenhouse rows. For each row, a 6 m straight path was designated with an ArUco marker placed at the 6 m endpoint. This setup allowed for consistent distance measurement and validation across all experimental trials. To ensure precise marker alignment, the row centerline was measured at each 0.1 s interval along the 6 m path, and the ArUco marker was positioned at the averaged centerline position. The alignment accuracy was verified by manually positioning the robotic platform at the marker location and confirming that the validation node reported lateral displacement within the acceptable range of [−3 cm, 3 cm].
Three initial positioning conditions were tested for each row to assess the system’s ability to handle different starting scenarios and field-of-view configurations. The first condition placed the robot centered in the row with the camera facing directly forward, representing ideal alignment. The second condition positioned the robot 10 cm from the left side of the row, resulting in an off-center field of view. The third condition placed the robot 10 cm from the right side of the row, similarly challenging the system with a laterally shifted perspective. These varied starting positions were designed to evaluate the control system’s performance in correcting both lateral positioning errors and maintaining consistent navigation under non-ideal camera viewing angles. Although the model was trained across different cultivation stages, the tests were only conducted under post-harvest greenhouse conditions with cleared rows, as shown in
Figure 7. However, the model performance metrics and average yaw angle error were evaluated across all cultivation stages using the complete test dataset.
3.10. Parameter Tuning
According to [
5], sprayers and harvesters achieve velocities between 0.44 m/s and 1.39 m/s in agricultural operations. Since the primary applications for autonomous navigation in greenhouses include monitoring and harvesting tasks, a maximum linear velocity of 0.7 m/s was selected to fall within this operational range.
For the expected points parameter (x), 7 was selected based on the model’s prediction capabilities and navigation requirements. The LaneATT model can generate up to 72 lane points representing the complete path from the robot to the end of the visible lane. By selecting the first 7 points (approximately 10% of the total predictions), the system focuses on the immediate path region, which corresponds to approximately 1.6 m ahead of the robot. Given the maximum linear velocity of 0.7 m/s, this provides the robot with approximately 2 s of lookahead time for trajectory adjustments.
Finally, the timeout threshold of 2 s was selected based on navigation safety and model performance considerations. At 0.7 m/s, the robot travels approximately 1.4 m during this interval, which remains within the consistent row structure of the greenhouse, where previous predictions maintain validity. This duration was selected considering two constraints: it prevents excessive unguided movement (the robot maintains previous commands when the model outputs zero predictions), which could compromise navigation safety, while providing sufficient time for the model to recover from areas with challenging visual conditions where no lane predictions are generated.
The final parameters used for all the tests are shown in
Table 1.
3.11. Validation Node
To track the performance of the algorithm, a validation node was implemented using ArUco marker detection. The ArUco marker, positioned at 6 m in front of the robot, served as a reference point for measuring the lateral error of the robot with respect to the row centerline.
The validation process operates as follows: First, the system detects the ArUco marker and computes the transformation matrix representing the pose of the marker with respect to the robot’s camera
as indicated in Equation (
18):
where
is the rotation matrix, and
is the translation vector. From this, the inverse transformation provides the camera pose with respect to the marker frame:
Since the ArUco marker is positioned centered in the row, the deviation from the centerline can be measured by analyzing the z-axis displacement in the marker coordinate frame. As shown in
Figure 10, this z-axis component represents the lateral displacement (in centimeters) of the robot from the row center, providing a direct quantitative measure of centering error throughout navigation. The recorded data were then used to analyze how the robot followed the track, enabling visualization of the corrections made by the navigation system over time.
4. Results
The quantitative performance of LaneATT was assessed on a benchmark lane detection dataset created from the custom dataset. For obtaining the metrics described in
Figure 11 and
Table 2, the model had an evaluation mode, in which, using the validation dataset, it would calculate all the metrics with the previously mentioned methods.
For this, the evaluation mode was run after each epoch to track the performance history of the project throughout the entire training period.
Table 2 summarizes the final results in terms of precision, recall, F1 Score, accuracy, and runtime of the model trained by 100 epochs across the different cultivation stages.
The high precision and recall values demonstrate the model’s ability to accurately identify lanes while minimizing false positives and false negatives. The F1 Score of 0.91 indicates a strong balance between precision and recall, while the 90% accuracy highlights the reliability of predictions across diverse scenarios. Moreover, the inference speeds obtained demonstrate that the model can process frames fast enough to enable autonomous navigation in the greenhouse.
Apart from the dataset, LaneATT was also tested in real-world environments using the Jackal Robot.
Figure 7 illustrates examples of lane detection in a greenhouse setting under varying conditions, such as uneven lighting and partial occlusions. The model consistently identified lane boundaries. Furthermore, as shown in
Table 3, the model achieved an average yaw angle error of 1.67°, demonstrating performance comparable to current state-of-the-art navigation methods.
Using the validation node described in
Section 3.11, ten experimental runs were conducted along different greenhouse rows to evaluate the navigation accuracy and network performance under real operating conditions. For each trajectory, the lateral error was defined as the displacement of the robot with respect to the centerline of the row. Representative trajectories are illustrated in
Figure 12.
For each run, the mean lateral error and its standard deviation were computed to assess systematic bias and lateral oscillations, respectively, as seen in
Table 4. Additionally, the root mean square (RMS) lateral error was calculated for each trajectory to provide an overall accuracy metric that accounts for both bias and variability.
Across all trajectories, the mean RMS lateral error was 0.133 m with a standard deviation of 0.048 m. Considering an average greenhouse row width of 1.04 m, this corresponds to a normalized RMS error of 12.8 ± 4.6% of the row width. The maximum normalized RMS error observed among all runs was 17.4%, remaining below the predefined safety threshold of 20% of the row width (0.208 m).
The mean lateral errors exhibited alternating signs and small magnitudes, indicating the absence of systematic drift toward either side of the row. Although lateral variability was observed due to real-world disturbances and sensing noise, the bounded RMS error across all experiments demonstrates that the robot consistently remained within the crop corridor, validating the practical suitability of the proposed navigation system for autonomous greenhouse operation.
At the same time, the runtime analysis revealed that LaneATT operates in real-time on both high-performance GPUs and embedded devices. The model achieved an average processing time of 11 ms per frame on an NVIDIA RTX GPU, 32 ms per frame on a 12th Gen Intel i5, and 48 ms per frame on NVIDIA Jetson Xavier. These results confirm the feasibility of deploying LaneATT for real-time applications on edge devices.
Apart from the previously mentioned results, in
Figure 11, it can be seen how the majority of the metrics, in only a few epochs, achieved a high value, while the recall (
Figure 11) required more epochs to surpass 85%.
The results demonstrate that LaneATT delivers accurate lane detection in both controlled and real-world scenarios. The model’s high precision and recall validate its robustness, while the runtime results affirm its suitability for embedded deployment. These results validate the model’s ability to handle diverse lane types and challenging conditions, such as varying lighting and occlusions, without significant performance degradation. As shown in
Table 5, the proposed model achieves a state-of-the-art precision of 0.94.
Additionally, as shown in
Table 6, the model’s compact size of 13.5 M parameters represents an improvement over current state-of-the-art segmentation and object detection approaches. This is the result of LaneATT’s lightweight anchor-based architecture. Finally, LaneATT handled challenges such as varying lighting, occlusions, and complex lane geometries, further showcasing its reliability in practical settings.
5. Conclusions
In this work, an implementation and evaluation of the LaneATT model for lane detection was presented, with a focus on its application in greenhouse environments.
Quantitative evaluation on a benchmark dataset highlighted the model’s robustness, achieving an F1 Score of 0.91, accuracy of 90%, and real-time inference speeds of 18 ms per frame on a high-performance GPU and 100 ms per frame on a low-performance GPU. The results demonstrate that LaneATT balances high accuracy, precision, and recall while maintaining computational efficiency.
LaneATT’s implementation with only 13.5 M parameters enabled real-time network execution on edge computing devices, providing enough inference speed for autonomous navigation and maintaining the capability for direct deployment on edge devices in agricultural environments.
Another thing to note is the ability of the network to achieve high metrics in just a few epochs; being this achievement mainly by the change of the anchor proposal system, searching to project the anchors on the activation module, maintaining the angles in critical scenarios. This can be seen in
Figure 11, where in less than 20 epochs, the model achieved an F1 Score, accuracy, and precision above 85.
Qualitative results from real-world tests in greenhouse environments further demonstrated LaneATT’s capability to identify lane boundaries accurately, enabling efficient navigation. This validates the model’s reliability in practical autonomous navigation tasks.
For future work, it is important to note that since the current research focused on validating LaneATT as an efficient alternative to segmentation and object detection models, the development of a specialized control system for dynamic obstacle handling in complex environments remains as a future improvement. Furthermore, while the tests were conducted on the linear row configurations typical of conventional indoor greenhouses, future iterations should validate the model’s performance on curved corridors, junctions, and irregular widths, either through simulated environments or by gaining access to greenhouses with more complex layouts than those currently available. Finally, to ensure the system’s robustness across diverse agricultural settings, further data collection and training in multiple greenhouses will be conducted to generalize the model’s capabilities.