Computer Vision in Self-Steering Tractors

: Automatic navigation of agricultural machinery is an important aspect of Smart Farming. Intelligent agricultural machinery applications increasingly rely on machine vision algorithms to guarantee enhanced in-ﬁeld navigation accuracy by precisely locating the crop lines and mapping the navigation routes of vehicles in real-time. This work presents an overview of vision-based tractor systems. More speciﬁcally, this work deals with (1) the system architecture, (2) the safety of usage, (3) the most commonly faced navigation errors, (4) the navigation control system of tractors and presents (5) state-of-the-art image processing algorithms for in-ﬁeld navigation route mapping. In recent research, stereovision systems emerge as superior to monocular systems for real-time in-ﬁeld navigation, demonstrating higher stability and control accuracy, especially in extensive crops such as cotton, sunﬂower, maize, etc. A detailed overview is provided for each topic with illustrative examples that focus on speciﬁc agricultural applications. Several computer vision algorithms based on different optical sensors have been developed for autonomous navigation in structured or semi-structured environments, such as orchards, yet are affected by illumination variations. The usage of multispectral imaging can overcome the encountered limitations of noise in images and successfully extract navigation paths in orchards by using a combination of the trees’ foliage with the background of the sky. Concisely, this work reviews the current status of self-steering agricultural vehicles and presents all basic guidelines for adapting computer vision in autonomous in-ﬁeld navigation.


Introduction
Crop monitoring can lead to profitable decisions if properly managed. Recent advances in data analysis and management are turning agricultural data into the key elements for critical decision-making in favor of farmers. In-field acquired sensory data can be used as efficient information for effective resource management towards maximum production and sustainability [1]. Cloud computing has been subsequently developed to handle the unprecedented volume of acquired data, known as Big Data, creating new prospects for data-intensive techniques in the agricultural domain [2]. Data-based farm management, combined with robotics and the integration of Artificial Intelligence (AI) techniques, paves the way for the next generation of agriculture, namely Agriculture 5.0. Agriculture 4.0, also known as Digital or Smart Farming, incorporates precision agriculture principles and data processing to assist farmers' operational decisions [2,3]. Smart farming provides a practical and systematic tool that aims to detect unforeseen problems that are hard to notice either due to the lack of experienced workers or due to large-scale farms that are difficult surveil. Going one step further, Agriculture 5.0 incorporates robotics and AI algorithms to already existing data-driven farms [1,4], implying autonomous decision systems and unmanned operations. The concept of Agriculture 5.0 along a crop management cycle is illustrated in Figure 1. The crop management system starts from the crop. Spatial measurements of crops imply on-the-go in-field monitoring platforms. The platforms collect data from the crop, soil and environment through remote sensing and provide spatial inputs to the decision system. AI algorithms are employed for effective real-time decision-making and action, driven by the decision system, occurs as a reaction to the sensory feedback. The process is repeated throughout the crop's life cycle. The advent of robots [5] and non-invasive sensors with their simultaneous reduction in size [6], emerging digital technologies such as remote sensing [7], the Internet of Things (IoT) [8] and Cloud Computing [9], support the process. On-the-go monitoring platforms are mounted on agricultural vehicles. Smart sensors can provide conventional agricultural vehicles, such as tractors, with adequate selfawareness and extend them into self-steering vehicles with built-in intelligence, able to act autonomously in the field. Therefore, the agricultural vehicles of Agriculture 5.0 are either self-steering tractors or autonomous robots [10].
Agricultural machinery, such as tractors, is meant to operate for many hours in large areas and perform repetitive tasks. The automatic navigation of agricultural vehicles can ensure the high intensity of automation of cultivation tasks, the enhanced precision of navigation between crop structures, an increase in operation safety and a decrease in human labor and operation costs. Autonomous navigation systems have been employed toward the mechanization of different agricultural tasks [10]: weeding, harvesting, spraying, planting, etc. Autonomy is obtained by sensing the environment. In general, automated navigation of agricultural tracts can be achieved by using either local positioning information or global positioning information [11]. Local information refers to the relative position of the tractor with respect to the crops, provided by sensors mounted on the tractor, such as vision sensors (cameras), laser scanners, ultrasonic sensors, odometers, Internal Measurement Units (IMU), gyroscopes, digital compasses, etc. Global information refers to the absolute position of the tractor in the field, provided by the Geo-Positioning System (GPS).
This work focuses on the use of local positioning information obtained by vision sensors toward self-steering tractors. The scope of the present research is to provide an overview of vision-based, self-steering tractor systems. The main contributions of this work are: (1) to highlight the degree of integration of computer vision in the field of tractors, identifying the usefulness of this technology in specific functions and applications, (2) to augment the knowledge on agricultural vision-based navigation methods, (3) to provide evidence on related trends and challenges and (4) to extend the knowledge on vision-based machinery, covering aspects such as architecture, sensors, algorithms and safety issues. This research aims to prove the feasibility of machine vision applications in the targeted problem of agricultural machinery navigation and extend the provided knowledge to other contexts in favor of the broader research community, e.g., towards autonomous navigation for off-road vehicles, etc.
Towards this end, this work reviews the following aspects: (1) the system architecture, (2) the safety of usage, (3) the most commonly faced navigation errors, (4) the navigation control system of vision-based, self-steering tractors and to present (5) state-of-the-art image processing algorithms for in-field navigation route mapping. The remainder of the paper focuses on the five aforementioned categories. A detailed overview is provided for each category with illustrative examples that focus on specific agricultural applications. Finally, a resume of the most important conclusions of the reviewed literature is presented.

Evolution of Vision-Based Self-Steering Tractors
The rapid development of computers, electronic sensors and computing technologies in the 1980s has motivated the interest in autonomous vehicle guidance systems. A number of guidance technologies have been proposed [12,13]; ultrasonic, optical, mechanical, etc. Since the early 1990s, GPS systems have been used widely as relatively newly introduced and accurate guiding sensors in numerous agricultural applications towards fully autonomous navigation [14]. However, the high cost of reliable GPS sensors made them prohibitive to use in agricultural navigation applications. Machine vision technologies based on optical local sensors could be alternatively used to guide agricultural vehicles when crop row structures can be observed. Then, the camera system could determine the relative position of the machinery in relation to the crop rows and guide the vehicle between them to perform field operations. Local features could help to fine-tune the trajectory of the vehicle on-site. The latter is the main reason why most of the existing studies on vision-based guided tractors focus on structured fields that are characterized by crop rows. A number of image processing methodologies have been suggested to define the guidance path from crop row images; yet only a finite number of vision-based guidance systems have been developed for real in-field applications [15].
Machine vision was first introduced for the automatic navigation of tractors and combines in the 1980s. In 1987, Reid and Searcy [16] developed a dynamic thresholding technique to extract path information from field images. The same authors, later in the same year [17], proposed a variation of their previous work. The guidance signal was computed by the same algorithm. Additionally, the distribution of the crop-background was estimated by a bimodal Gaussian distribution function, and run-length encoding was employed for locating the center points of row crop canopy shapes in thresholded images. Billingsley and Schoenfisch [18] designed a vision guidance system to steer a tractor relative to crop rows. The system could detect the end of the row and warn the driver to turn the tractor. The tractor could automatically acquire its track in the next row. The system was further optimized later by changes in technology; however, the fundamental principles of their previous research have remained the same [19]. Pinto and Reid [20] proposed a heading angle and offset determination using principal component analysis in order to visually guide a tractor. The task was addressed as a pose recognition problem where a pose was defined by the combination of heading angle and offset. In [21], Benson et al. developed a machine vision algorithm for crop edge detection. The algorithm was integrated into a tractor for automated harvest to locate the field boundaries for guidance. The same authors, in [22], automated a maize harvest with a combine vision-based steering system based on fuzzy logic.
In [23], three machine vision guidance algorithms were developed to mimic the perceptive process of a human operator towards automated harvest, both in the day and at night, reporting accuracies equivalent to a GPS. In [24], a machine vision system was developed for an agricultural small-grain combine harvester. The proposed algorithm used a monochromatic camera to separate the uncut crop rows from the background and to calculate a guidance signal. Keicher and Seudert [25] developed an automatic guidance system for mechanical weeding in crop rows based on a digital image processing system combined with a specific controller and a proportional hydraulic valve. Åstrand and Baerveldt performed extensive research on the vision-based guidance of tractors and developed robust image processing algorithms integrated with agricultural tractors to detect the position of crop rows [26]. Søgaard and Olsen [27] developed a method to guide a tractor with respect to the crop rows. The method was based on color images of the field surface. Lang [28] proposed an automatic steering control system for a plantation tractor based on the direction and distance of the camera to the stems of the plants. Kise [29] presented a row-detection algorithm for a stereovision-based agricultural machinery guidance system. The algorithm used functions for stereo-image processing, extracted elevation maps and determined navigation points. In [30], Tillett and Hague proposed a computer vision guidance system for cereals that was mounted on a hoe tractor. In subsequent work [31], they presented a method for locating crop rows in images and tested it for the guidance of a mechanical hoe in winter wheat. Later, they extended the operating range of their tracking system to sugar beets [32]. Subramanian et al. [33] tested machine vision for the guidance of a tractor in a citrus grove alleyway and compared it to a laser radar. Both approaches for path tracking performed similarly. An automatic steering rice transplanter based on image-processing self-guidance was presented by Misao [34]. The steering system used a video camera zoom system. Han et al. [35] developed a guidance directrix planner to control an agricultural vehicle that was converted to the desired steering wheel angle through navigation. In [36], Okamoto et al. presented an automatic guidance system based on a crop row sensor consisting of a charge-coupled device (CCD) camera and an image processing algorithm, implemented for the autonomous guidance of a weeding cultivator.
Autonomous tractor steering is the most established among agricultural navigation technologies; self-steering tractors have already been commercialized for about two decades [12,13]. Commercial tractor navigation techniques involve a fusion of sensors and are not based solely on machine vision; therefore, they are not in the scope of this research.
Although vision-based tractor navigation systems have been developed, their commercial application is still in its early stages, due to problems affecting their reliability, as reported subsequently. However, relevant research reveals the potential of vision-based automatic guidance in agricultural machinery; thus, the next decade is expected to be crucial for vision-based self-steering tractors to revolutionize the agricultural sector. A revolution is also expected by the newest trend in agriculture: agricultural robots, namely Agrobots, that claim to replace tractors. Agrobots can navigate autonomously in fields based on the same principles and sensors and can work on crop scale with precision and dexterity [5]. However, compared to tractors, an Agrobot is a sensitive, high-cost tool that can perform specific tasks. In contrast, a tractor is very durable and sturdy, can operate under adverse weather conditions and is versatile since it allows for the flexibility to adapt to a multitude of tools (topping tools, lawnmowers, sprayers, etc.) for a variety of tasks. Therefore, tractors are key pieces of equipment for all farms, from small to commercial scale, and at present, there is no intention to replace them but to upgrade them in terms of navigational autonomy.

Safety Issues
Most of the injuries related to agricultural activities are connected to the use of agricultural tractors [37,38]. The latter is attributed to the following reasons: (1) the large number of small farms lacking expert equipment and operators, (2) the wide range of agricultural tasks in need of machinery contribution, (3) the engagement of the same operators for all the different tasks, which require both the adaptation of different tools to the tractors and different handling, (4) the seasonal work associated with changes in the field per season in addition to the constant alteration of workspaces that do not allow the user to get acquainted with the environment, (5) the use of outdated machinery not complying with safety regulations and (6) the use of obsolete sensors that have not been updated to their more recent improved versions with better technical specifications and performance.
In order for self-steering tractors to fully act autonomously, the autonomous steering system needs to be safer and more precise than any human operator. Therefore, the study of tractor safety issues can help the design of safer systems towards complete navigation autonomy. However, self-steering tractors are only to provide steering aid to the human operator rather than to replace him. Driving for hours along the vast farms is attentionintensive and tedious. The tractor needs to autonomously navigate through crop lines and the presence of the human operator is to respond to emergencies regarding navigation troubleshooting and to perform additional agricultural operations, e.g., pruning, spraying, etc. [11]. Many researchers investigate tractor safety issues. Their focus is mainly on issues related to technical features such as vibrations [39], rollover protection systems (ROPS) [40], ergonomic design with respect to the operator's position [41], etc. Safety issues are also related to the operators' skills and attitudes [42]. Feasible solutions for monitoring mechanical hazards suggest devices to monitor the status of a tractor's components [43]. Other researchers investigated the augmentation of the visibility of the human operator [44], while advanced solutions such as virtual reality (VR) for intuitive tractor navigation have also been proposed [45]. Since error prevention is not always possible, it is common for the burden to fall on systems that monitor and report system malfunctions in a timely manner. Warning and alert systems [46], as well as emergency notification systems in case of accidents [47], have been developed to keep the operator awake and situationally aware. In general, situational awareness while operating agricultural machinery in complex and dynamic environments, such as fields, is critical. By using design and practical interventions, farmers' situational awareness can be supported and enhanced and, thus, prevent fatal incidents [48]. Figure 2 depicts the most common reasons leading to tractor safety issues [37]. When it comes to self-steering tractors, safety is closely related to the reliability of the steering system architecture, in terms of both hardware and software. The latter depends on the degree of autonomy of the system; yet, even for steering systems of the same degree of autonomy, architectures may significantly differ. A typical sensory-based autonomous navigation system consists of (1) a sensory-based perception system, (2) an algorithm-based decision system and (3) an actuator-based activation system. Errors can occur in all three parts of the system. Therefore, the most common errors can be either perception errors, decision errors or activation errors.
Activation errors include the instability of the electrohydraulic control system, the side slip of the tractor when turning at high speed, the real-time control of the vehicle, the steering in upland, the operating speed, etc. Decision errors mainly emanate from misjudging slopes. Perception errors are attributed to the measurements of the sensory system. In the case of self-steering tractor systems based on machine vision, all perception errors result from the image processing unit and methods [49]. Image acquisition and processing in real time with simultaneous decision making needs to be fast and accurate. The trade-off between the speed and accuracy of detection algorithms in dynamic environments such as fields is a challenge. In-field automated guidance is identified as crop row guidance. Crop conditions affect the system's performance to such an extent that it cannot function properly if the crop is not clearly detected. Therefore, crop rows need to be distinguishable under varying environmental conditions. Missing plants, small plants of different stages of growth or with different densities of leaves and weeds are the most common problems for identifying crop row structures. Weeds are highly similar in features to certain kinds of small crops, e.g., sugar beets have the same green color, size and shape. Therefore, the vision algorithms need to be robust for crops of all stages of growth and tolerant to weeds [26]. Another disturbance is due to lighting conditions; changes in the brightness of the field may affect the algorithms. Moreover, direct sunlight may cause shadows from the tractor, leading to unfortunate detection results [32].
The perception system includes optical sensors. The quality and positioning of the sensors are crucial, since proximal sensing can derive comprehensive data. A camera mounted on the cab can view more than one mounted on the head of the tractor. The height of crops can also restrict the crop row detection ability of the system. Tractors are ground vehicles; therefore, acquired sensory data is in crop scale and can be characterized by increased accuracy and high resolution. When the data is of such high quality, environmental conditions such as lighting and shadowing may severely deteriorate the accuracy of the system [50]. Figure 3 summarizes the main issues related to errors of the visual perception systems of self-steering tractors. According to the above, vision-based steering, although flexible, can be affected by in-field factors. Multi-sensory systems that fuse the information from a variety of sensors can significantly increase the steering accuracy [49]. This is the main reason why there are no navigation systems in the recent literature that rely solely on vision. Future automated guidance systems will mainly rely on multi-sensory fusing techniques. However, fields remain complicated and unstable environments, which focuses future research in artificial intelligence and machine learning towards self-learning and self-adapting guiding systems.

Self-Steering Tractors' System Architecture
In what follows, the basic modeling of self-steering tractors is presented. A visionbased system architecture is provided and key elements essential for performing autonomous navigation operations are reviewed.

Basic Modeling
In order to develop autonomous driving machinery, a cyclic flow of information is required; it is known as the sense-perceive-plan-act (SPPA) cycle [51]. The SPPA cycle connects sensing, perceiving, planning and acting through a closed-loop relation; sensors collect (sense) physical information, the information is received and interpreted (perceive), feasible trajectories for navigation are selected (plan) and the tractor is controlled to follow the selected trajectory (act). Figure 4 illustrates the basic modeling of self-steering tractors. In order to automate the guidance of tractors, two basic elements need to be combined: basic machinery and cognitive driving intelligence (CDI). CDI needs to be integrated into both hardware and software for the navigation and control of the platform. Navigation includes localization, mapping and path planning, while control includes all regulating steering parameters, e.g., steering rate and angle, speed, etc. CDI is made possible by using sensory data from navigation and localization sensors, algorithms for path planning and software for steering control. The basic machinery refers to the tractor where the CDI will be applied. Based on the above and in relation to the SPPA cycle, the basic elements for the automated steering of tractors are sensors for object detection, localization and mapping [52][53][54], path planning algorithms [55], path tracking and steering control [56]. Table 1 includes a list of the aforementioned basic elements for an autonomous self-steering tractor. Most commonly used sensors and algorithms are also included in Table 1.

Vision-Based Architecture
A fundamental vision-based architecture for self-steering tractors is presented in [29]. Figure 5 illustrates the flow diagram of the proposed vision-based navigation system. The main sensor of the architecture is an optical sensor. The optical sensor captures images that are processed by a computer (PC), which also receives real-time kinematic global GPS (RTK-GPS) information and extracts the steering signal. The steering signal is fed to the tractor control unit (TCU) that generates a pulse width modulation (PWM) signal to automate steering. The closed loop of the steering actuator is comprised of an electrohydraulic steering valve and a wheel angle sensor. The system prototype of Figure 5 was installed on a commercial tractor, and a series of self-steering tests were conducted to evaluate the system in the field. Results reported a root mean square (RMS) error of lateral deviation of less than 0.05 m on straight and curved rows for speed of up to 3.0 m/s.

Path Tracking Control System
The basic design principle of a path-tracking control system comprises three main systems, as depicted in Figure 6: image detection, tracking and a steering control system. Acquired images from an optical sensor, i.e., a camera, are sent to a computer for realtime processing. The center of the crop row line is identified, and the navigation path is extracted. The system uses a feedback sensory signal for the proportional steering control of the electrohydraulic valve of the vehicle for adaptive path tracking [57].

Basic Sensors
Sensors record physical data from the environment and convert them into digital measurements that can be processed by the system. Determining the exact position of sensors on a tractor presupposes knowledge of both the operation of each sensor (field of view, resolution, range, etc.) and the geometry of the tractor, so that by being placed in the appropriate position onboard the vehicle, the sensor could perform to its maximum [58]. Navigation sensors can be either object sensors or pose sensors. Object sensors are used for the detection and identification of objects in the surrounding environment, while pose sensors are used for the localization of the tractor. Both categories can include active types of sensors, i.e., sensors that generate energy to measure things such as LiDAR, radar or ultrasonic, or passive types of sensors, e.g., optical sensors, GNSS, etc.
Sensory fusion can enhance navigation accuracy. The selection of appropriate sensors is based upon a number of factors, such as the sampling rate, the field of view, the reported accuracy, the range, the cost and the overall complexity of the final system. A vision-based system usually combines sensory data from cameras with data acquired from LiDAR, RADAR scanners, ultrasonic sensors, GPS and IMU.
Cameras capture 2D images by collecting light reflected on 3D objects. Images from different perspectives can be combined to reconstruct the geometry of the 3D navigation scenery. Image acquisition, however, is subject to the noise applied by the dynamically changing environmental conditions such as weather and lighting [59]. Thus, a fusion of sensors is required. LiDAR sensors can provide accurate models of the 3D navigation scene and, therefore, are used in autonomous navigation applications for depth perception. LiDAR sensors emit a laser light, which travels until it bounces off of objects and returns to the LiDAR. The system measures the travel time of the light to calculate distance, resulting in an elevation map of the surrounding environment. Radars are also used for autonomous driving applications [60]. Radars transmit an electromagnetic wave and analyze its reflections, deriving radar measurements such as range and radial velocity. Similar to radars, ultrasonic sensors calculate the object-source distance by measuring the time between the transmission of an ultrasonic signal and its reception by the receiver. Ultrasonic sensors are commonly used to autonomously locate and navigate a vehicle [61]. GPS and IMU are additional widely used sensors for autonomous navigation systems. GNSS can provide the geographic coordinates and time information to a GPS receiver anywhere on the planet as long as there is an unobstructed line of sight to at least four GPS satellites. The main disadvantage of GPS is that it sometimes fails to be accurate due to obstacles blocking the signals, such as buildings, trees or intense atmospheric conditions. Therefore, GPS is usually fused with IMU measurements to ensure signal coverage and precise position tracking. An IMU combines multiple sensors like a gyroscope, accelerometer, digital compass, magnetometer, etc. When fused with a high-speed GNSS receiver and combined with sophisticated algorithms, reliable navigation and orientation can be delivered.

Vision-Based Navigation
Vision-based navigation can be performed by using monocular vision, binocular vision or multi-vision, depending on the number of visual sensors and by using appropriate image processing algorithms.

Monocular Vision Methods
Monocular vision is widely used for navigation purposes in agricultural machinery [20]. Essentially, the problem of visual in-field navigation is about detecting crop lines and obstacles on the pathway in between the crop lines. In [62], a monocular vision system was proposed to guide a tractor; the vehicle captured images while moving through the crop rows and corrected the steering angle by identifying the heading and offset errors from the line. Results indicated acceptable performance, with a 0.024 m maximum error of position identification in the offset and 1.5 • in the attitude angle for a 0.25 m/s navigation speed. In [19], the proposed monocular vision system was able to automatically drive a tractor for a 35 s trial at a speed of 1 m/s with an accuracy of 0.020 m. In [24], a monocular vision system was developed to guide a tractor at a maximum velocity of 1.3 m/s with an overall accuracy of 0.050 m, in day and night navigation trials. The monocular vision system in [63] resulted in steering performance comparable to steering by human operators, with an accuracy of 0.050 m at 0.16, 0.5 and 1.11 m/s. The same research team as in [64] developed a monocular vision guiding system that could navigate for a 125-m run by keeping a stable distance of 10 cm from the left side of the crop row in varying environmental illumination with two different speeds: 1.33 m/s and 3.58 m/s. In both cases, for 70 trials, the robotic system performed 95% of the trials and a standard deviation (SD) from the predetermined route was identical to that of a human driver. Monocular sensing for crop line tracking was also used in [65]; 95% of rows were segmented correctly over a distance of 5 km at a maximum speed of 1.94 m/s. In [36], the proposed monocular vision system achieved a round mean square (RMS) offset error and a heading error between the camera and the crop row less than 0.030 m and 0.3 • , respectively. The robotic monocular vision-based sprayer of [66] reported an average error of 0.010 m inside a straight plant path and 0.011 m and 0.078 m before and after a 90 • turn, respectively. The system introduced in [ Monocular vision, when compared to binocular vision, simplifies the hardware, but it needs to be coupled with more complex algorithms in order to function with adequate accuracy [70]. Towards this end, many algorithms have been developed, focusing mainly on: (1) expert systems, (2) image processing, (3) crop-row segmentation and (4) path determination.
Expert systems are based on human knowledge. Two basic approaches have been considered: one that uses only images and one that builds a map of the trajectory. The first approach relies on images extracted from the predetermined navigation route; the vehicle drives in the specified path as it is captured from one image to another by com-puting its relative position from the current image and moving accordingly [71][72][73][74]. This approach avoids reconstructing the entire navigation scene and defines the environment from overlapping images. The second approach builds a map of the environment a priori, resulting in faster and more accurate localization and navigation. On one hand, the latter is time-consuming. On the other hand, the process is done offline and before use. In addition, fusion with appropriate sensors such as GPS can provide global coordinates for the localization of the vehicle [68,75]. SLAM combined with monocular vision has also been considered [76]; however, a small landmark database is required for real-time navigation responses.
Image processing algorithms are essential for effective navigation to deal with weeds, shadowing and other noise that infect in-field acquired images. To this end, monocular vision systems use near infrared (NIR) cameras, grayscale cameras or filters to determine the optical properties of crops that are strongly related to their physical properties, such as greenness [77]. The latter can help segmentation tasks to detect crop rows by discriminating between green and non-green features in a scene or gray levels of soil [26] in order to deal with light changes, weed noise [78], etc.
Navigation methods based on crop-row segmentation focus on detecting multiple crop rows and determining their exact position so as to define the navigation pathway in between them [79,80]. Alternatively, methods for direct path determination can be applied. A typical method to determine pathways is the Hough transform; yet, it is sensitive to discontinuities and needs considerable computational time [81]. Variations of the Hough transform, such as adaptive Hough transform [82], intrinsic blob analysis [83] and curve fitting [84], are introduced to deal with the reported defects.

Binocular Vision Methods
Binocular vision combines two monocular cameras simultaneously so that each camera contributes to a single common perception. Information acquired by a binocular vision system can be used to define the exact location of objects in a scene. Compared to monocular vision, binocular vision can provide better overall depth, distance measurements and 3D viewing details, therefore, it is more resistant to varying illuminations and more accurate in locating regions of interest [59]. Binocular vision systems are used for the autonomous navigation of agricultural vehicles. In [85], a low-cost binocular vision system is proposed for the automatic driving of an agricultural machine. The results indicated a mean deviation between the actual middle of the road and traveled trajectory of 0.031 m, 0.069 m and 0.105 m, for straight, multicurvature and undulating roads, respectively. An adaptive binocular vision-based algorithm was proposed in [86]. Experiments on S-type and O-type paths resulted in an absolute mean of turning angle of 0.7 • and an absolute standard deviation of 1.5 • for navigation speeds less than 0.5 m/s. In [59], a navigation algorithm based on binocular vision is proposed, resulting in a correct detection rate greater than 92.78% for the average deviation angle, an absolute average value less than 1.05 • and an average standard deviation less than 3.66 • in paths without turnrows. A tractor path-tracking control system based on binocular vision is presented in [57]. In-field experiments indicated a mean absolute deviation of course angle of 0.95 • and a standard deviation of 1.26 • . Binocular vision-based algorithms for autonomous vehicle navigation in agriculture focus mainly on: (1) obstacle detection, (2) 3D scene reconstruction and (3) crop-row detection.
Obstacle detection methods are critical for the safety of in-field automated operations of agricultural machinery. Binocular vision can provide the depth information of obstacles in an agricultural scene by stereo matching; thus, its application in obstacle detection attracts growing attention [87]. One binocular vision approach is based on an inverse perspective transformation and the selection of non-zero disparity zones [88]. This method is effective when applied on flat surfaces. Other approaches use the plane-line projection characteristics of the UV-disparity, where the height and width of obstacles are acquired from the height of vertical line segments in V-disparity maps and the length of horizontal line segments in U-disparity maps, respectively [89]. These methods can detect simple obstacles in structured environments. The most common approach is based on binocular stereo matching [90]. However, the stereo matching of in-field images is time-consuming and not very precise. In order to enhance the precision of stereo matching, motion analysis for object tracking can be considered (2008). Moreover, the processing time could be reduced by considering fewer points than the entire 3D reconstruction of a field scene.
The 3D reconstruction of a field scene can determine all surrounding environments accurately with adequate detail [85]. Binocular vision-based methods can provide 3D field maps even in unstructured and complex environments [91]. An accurate disparity map can help agricultural machinery to navigate safely in the fields [92]. However, stereo matching methods are preferable for 3D scene reconstruction due to reduced processing times, making them more flexible for real-time applications.
Crop-row detection algorithms based on binocular vision are effective when applied to fields where the crops are significantly higher than the weeds [93], due to complex in-field features that obstruct quick and accurate stereo matching. Crop rows are traditionally detected by Hough transform or by horizontal strip segmentation, but these methods do not fully exploit binocular vision techniques [59]. Binocular vision combined with a pure pursuit path-tracking algorithm can provide reliable information to navigate tractors in the fields [57]. The continuous advancements in image processing and automatic control can guarantee accurate real-time information about the surrounding environment for automatic vehicle control in the near future.

Classification of Stereovision Methods
Stereovision analysis consists of the following basic steps: (1) image acquisition, (2) the modeling of the camera, (3) feature extraction, (4) stereo matching, (5) determination of depth and (6) interpolation. Stereo matching, i.e., the identification of pixels in two images that correspond to the same 3D point in the scene, is the most important step of the process. In order to resolve the stereo matching problem a set of constraints are applied: epipolar geometry, similarity, uniqueness and smoothness [94]. Epipolar geometry defines the correspondence between two pixels in stereo images by relating 3D objects to their 2D projection. The similarity constraint matches pixels with similar properties. Uniqueness defines the existence of a unique match between two pixels in stereo images, apart from occlusions. Finally, the smoothness constraint determines a smooth change in neighboring disparity values, apart from discontinuities resulting from sharp edges.
Stereo matching methods can be either local, global or semi-global [95]. Local methods achieve matching on a local window. Challenges are due to regions with repetitive or low textures that introduce ambiguities. Global methods result in disparity maps with high accuracy, but global methods are computationally expensive due to the calculation of the disparity of every pixel in an image by optimizing a global energy function. Semi-global methods are introduced to balance disparity maps' estimation accuracy and computational time by performing the optimization of the global energy function on part of the entire image. Vision-based disparity estimation algorithms are comprised of the following basic steps that formulate stereo matching as a multistage optimization problem: (1) computation of cost, (2) aggregation of cost, (3) optimization of disparity and (4) refinement of disparity. The speed and accuracy of disparity estimation are equally important and are taken into consideration for the overall performance evaluation of stereo vision algorithms. The research focuses on reducing computational complexity while achieving better disparity estimation accuracy. The latter is the greater challenge when developing a stereovision algorithm [96]. Traditional stereo matching algorithms are mainly software-based implementations of global and local methods of generating disparity maps [97]. The ability to deliver stereo matching in real time by using parallel processing or additional hardware in less processing time paved the way for new research in the field. Recently, due to the advancement of convolutional neural networks, stereo matching is treated as a deep learning task [98].

Multi-Vision Methods
Multi-vision systems include input images of multiple vision sensors. In [99] three images for three cameras and GPS data were fused to generate a localization system. The use of multiple cameras reduced the complexity of the system since the final image was extracted by image stitching without the need to rotate the cameras to capture multiple viewing angles. Compared to conventional stereo vision systems, the proposed multicamera system resulted in better viewing angles four times faster. Trinocular vision and odometry were used in [100], while in [101] a multi-vision method was introduced for multi-sensor surveillance and long-distance localization.
A comparative table (Table 2) of all vision-based navigation algorithms for self-steering tractors reviewed in this work is subsequently provided. Details regarding the utilized vision systems and the performance of the algorithms are included in the same table. It should be noted that methods included in Table 2 use cameras as their main navigation sensor. Additional sensors included in Table 2 are auxiliary or only used to enable comparison between sensors and methods. Moreover, only the methods that have been tested for navigation and localization purposes are considered and included in Table 2. Many vision-based navigation algorithms have been developed, their performance to detect crop rows has been investigated and their potential use in vehicle guidance has been evaluated [102,103], yet only a part of them has actually been integrated into agricultural vehicles and tested in real-life navigation applications.

Discussion
The autonomous navigation of agricultural machinery is an important aspect of smart farming and has been widely used in several agricultural practices for sustainable high yield production and in-field automation. Computer vision has been integrated into agricultural machinery to guarantee enhanced navigation accuracy in real in-field conditions. Figure 7 illustrates the categories of agricultural machinery to which navigation algorithms have been integrated, according to the bibliography of Table 2. The actual problem that the use of computer vision aims to overcome is the exact localization of crop lines, the mapping of the navigation routes and its real-time correction. Navigation control systems are based on monocular vision and, more recently, on binocular vision or multi-vision systems. Optical sensors provide images to effective image processing algorithms. The extracted data from image processing are then fused with additional data from other on-board sensors. All processing is completed in real time on a computer mounted to the autonomous vehicle.
According to the revised bibliography, it has resulted that multi-vision and stereovision systems are superior to monocular vision systems and meet the agricultural requirements for navigating vehicles in the fields. However, monocular systems are more often found in the literature, according to Table 2. The latter can be better visualized in Figure 8. This is due to the simpler system design, the comparatively low-cost of the monocular vision sensors and the less complex image processing algorithms that accompany them. Additionally, although the use of one camera is affected by environmental noise, navigation results still range to sustainable levels of accuracy, allowing monocular vision systems to drive a tractor safely. In particular, binocular vision systems demonstrate higher stability control accuracy, allowing for automatic control of the navigation path between the crop lines, especially for large crops such as cotton, sunflower, maize, etc. Indicatively, an algorithm [59] for detecting crop rows based on binocular vision combines image pre-processing, stereo matching and centerline detection of multiple rows. The method first converts the stereoscopic image to grayscale by using the improved 2G-R-B greyscale transformation. Then, the Harris corner point detector is employed to extract the candidates for stereo matching. The 3D coordinates of crop rows are calculated with stereo matching by the disparities between the binocular images. Finally, the crop lines are determined by using the normalized sum of absolute difference for matching (NSAD) metric and the random sample consensus method (RANSAC) for the optimization of disparity. The results demonstrated the efficiency of the algorithm in dealing with various visual noises such as lighting, shadows, weeds and density of crop rows. Moreover, results revealed the satisfactory speed and accuracy of the algorithm, especially when the camera was mounted at an appropriate height on the vehicle and when the crops were significantly higher than the weeds. The latter is more common in orchards.
In the case of vision-based autonomous navigation in orchards, several algorithms have been developed that take advantage of terrestrial structures such as tree trunks and foliage. Many approaches combine data from two or more sensors in order to detect and locate objects, while in recent years RGB-D, Kinect and other sensors have been widely used [104]. Even though methods with sensor fusion perform satisfactorily in orchards, they are challenged by shading, the altering angle of the sun, the chromatic similarity of crops, the visibility of tree trunks in the adjacent rows, etc.
A recent methodology [105] provided a solution to the above challenges by combining the tree foliage with the background sky instead of tree trunks and the ground; by looking up instead of down, environmental factors decreased considerably. The method used a multispectral camera mounted on the front of an agricultural vehicle to capture images and a computer to process them in real time. The captured image was cropped and the lower part was used to extract the green color plane, which provided greater contrast between the foliage and the sky. Simple thresholding was applied to derive the path plane of the agricultural vehicle. Finally, after filtering out the noise, the centroid path plane resulted. Results revealed the potential of this original approach to successfully guide agricultural vehicles in orchards.
Future work may include state-of-the-art sensors and algorithms for newly introduced vision-based system architecture. Regarding image processing algorithms, there are several issues to be addressed. First, algorithms must balance the processing of large amounts of data from the environment with low processing times so that decisions can be made in real time. Complicated and multiple input data may provide a better understanding of the surroundings, yet result in increased processing time, obstructing the real time control and actions of the tractor. Second, optimal navigation accuracies are achieved with multi-sensor systems; vision, although flexible, is affected by environmental noise. A variety of sensors can lead to advanced accuracies in navigation, mapping and the position estimation of tractors. Therefore, future work must focus on fusion techniques that are able to deal efficiently with data from multiple sensors. Factors such as the variation of the vehicle's speed, irregular terrain and varying controllers should also be considered. Figure 9 summarizes the range of maximum navigation speeds at which the best performance of the algorithms in Table 2 were obtained. It seems that the optimal speed for a tractor is between 1-2 m/s, which is fast enough for sustainable automation and sufficient for the data processing of the autonomous navigation algorithms. However, it should be noted that speeds over 2 m/s are usually used in cultivations with tall (e.g., corn) or dense plants (e.g., cotton), where the crop line is theoretically detected more easily. Effective control algorithms are very important in dynamic outdoor environments. When autonomous navigation is employed in structured or semi-structured environments where disturbances are known or can be easily predicted, simulations and expert knowledge can be used efficiently to control a guiding system. However, a complicated environment introduces many uncertainties into the system. For this reason, future work should concentrate on control algorithms that are able to self-learn from and self-adapt to the environment. Finally, a future investigation should focus further on in-field testing for different types of crops at different growing stages and under varying environmental conditions.
The referenced literature revealed that autonomous navigation systems of agricultural vehicles have been in the spotlight for decades, paving the way for the sustainable agriculture of the future. However, research needs to be perpetual in order to face challenges and overcome all reported limitations.

Conclusions
Data-driven agriculture combined with appropriate sensory systems, including artificial intelligence methods, paves the way for the sustainable agriculture of the future. To this end, a vision-based system is capable of providing precise navigation information to autonomously guide a tractor to follow crop rows. Thus, the burden of the monotonous crop row following-tasks will fall to the self-steering system and the operator could be engaged in maneuvering and other tasks, thus increasing the guiding performance, the working efficiency and the overall safety of farming operations.
The present work demonstrated the feasibility of machine vision systems in selfsteering tractors with respect to the following issues: vision-based tractors' system architecture, the safety of usage and navigation errors, the navigation control system of vision-based self-steering tractors and state-of-the-art image processing algorithms for in-field navigation route mapping. Research revealed the potential of machine vision systems to autonomously navigate agricultural machinery in open fields in the future.
The aim of this work is to augment the knowledge on agricultural vision-based navigation methods and provide evidence of trends and challenges in a systematic manner. The reviewed methods included in this work could be used for future studies to extend the knowledge of vision-based machinery architecture, sensors, algorithms and safety issues. The provided knowledge could be extended from agricultural navigation tasks to other contexts where the use of autonomously guided vehicles will be the focus of interest.