Vision Control of a Vehicle Intended for Tourist Routes Designed for People with Special Needs

Staniek, Marcin; Celiński, Ireneusz

doi:10.3390/app152312573

Open AccessArticle

Vision Control of a Vehicle Intended for Tourist Routes Designed for People with Special Needs

by

Marcin Staniek

and

Ireneusz Celiński

^*

Faculty of Transport and Aviation Engineering, Silesian University of Technology, 40-019 Katowice, Poland

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(23), 12573; https://doi.org/10.3390/app152312573

Submission received: 29 September 2025 / Revised: 11 November 2025 / Accepted: 17 November 2025 / Published: 27 November 2025

(This article belongs to the Section Transportation and Future Mobility)

Download

Browse Figures

Versions Notes

Featured Application

The solution is used to control on-road and off-road vehicles, particularly off-road vehicles designed for mountain tourism for people with disabilities. The advantage of this solution is the low cost of the devices, which are based on inexpensive cameras.

Abstract

Off-road vehicles, including those intended for mountain tourism, are also designed for people with special needs. These designs primarily concern the design of the drive of the vehicle, which can be manual, foot-powered, electric or a combination of these. Unusual forms of controlling these vehicles are also used, which use various parts of the body for this purpose, including the torso. In addition to using specific parts of the body to control the vehicle, an alternative is to use vision for this purpose, such as through eye tracking and similar techniques. The problem with these applications is the high prices of the devices and software used. They are mainly implemented in military solutions. The cost of these forms of vehicle control is too high; often higher than the price of the vehicle. This article presents an overview of the broad concept and technical solutions used to control various vehicles using hardware that can interact with the organ of vision. An extremely cheap prototype of this type of solution for several dozen EUR is also proposed in this article. The device uses methods based on vision techniques using the OpenCV 4_11 library. The research results in this area are presented, stating that such control is efficient.

Keywords:

eye tracking; OpenCV; people with special needs; transport accessibility

1. Introduction

Off-road vehicles, including those intended for mountain tourism, are also designed for people with special needs (PWSN), taking the specifics of their disabilities into account. This requires the use of appropriate vehicle controls. This primarily concerns the design of the drive in such a vehicle, which can be manual, foot-propelled, electric, or represent a combination of these options. Some atypical solutions can also be used in this respect, where various body parts are engaged for this purpose, including the torso. In addition to specific body parts, another vehicle control alternative is using vision (eye tracking, vision technique, LIDAR, etc.). In such a case, the main obstacle in this application is the high price of the required hardware and software (mainly military/specialist applications). The expenses related to the equipment available in the market of autonomous vehicles (AV) are still considerable, considerably exceeding those of bicycles. Overall, the cost of such a control solution is too high, especially considering the market price offered for this type of vehicle. The cost is often higher than the price of the vehicle itself. This article provides an overview of the broad concept and the technical solutions employed to control various vehicles using the organ of vision. What has also been proposed in the paper is an extremely cheap prototype of such a solution, costing merely several hundred PLN.

Diverse vehicle control models dedicated to people with special needs have been analysed in this article. Some are based solely on the organ of vision, while others involve the use of various devices programmed for this purpose with the application of vision techniques. According to the approach postulated in this article, it is also proposed that special markers as well as additional analysis of the driver’s head motions or gestures are used. Nevertheless, the main subject of the analysis remains the sight organ of a person driving a vehicle intended for PWSN. An example of such a vehicle for PWSN has been provided in Figure 1.

Figure 1a provides the vehicle overview. It shows the control panel on the right-hand side of the vehicle, including a steering lever, featuring a tablet enabling the control of the driving parameters and transmission of situation-specific data to the user. Figure 1b depicts the vehicle’s drive and power supply system. Figure 1c is a front view of the vehicle, clearly showing the ground clearance of this version. Figure 1a–c depict a fragment of its vertical structural frame, used to engineer and build the vehicle’s rollover protection and shield against unfavourable weather conditions.

In our opinion, the relevance of the problem addressed in this article and the innovative nature of our approach are demonstrated by the way the camera system is used, where the information retrieved from the driver is combined with the visual information extracted from the area in front of the vehicle. The latter, in turn, is supported by the data delivered by vibroacoustic sensors and the GPS unit. This involves the control of steering parameters, which are limited by infrastructure conditions. Moreover, what we intended to achieve in this study was to obtain information about the potential of controlling vehicle steering parameters using the driver’s blinking process. These are the first attempts to use blinking in the vehicle steering process. However, such an approach calls for a series of further studies, which, unfortunately, require a very large sample size.

2. Background

Researchers of road traffic problems have pointed out that driving a car (or any other means of transport) is primarily based on the effective use of the organ of vision. This is a cliché, to a certain extent, but there are numerous aspects of it that apply to the problem at hand, as described below. The knowledge base regarding the role of vision in driving has been presented in publication [1], also highlighting the matter of rehabilitation for visually impaired persons [1]. In the context addressed in this article, this is particularly important, because there is a certain group of persons with disabilities (PWD) who—in addition to musculoskeletal disorders—also suffer from vision problems. Consequently, this group is doubly excluded from riding cycles intended for PWSN. Regarding the driver’s vision, researchers often focus on discussing factors that determine road traffic safety (RTS). Both in the field of applied science as well as in relation to the human factor, as it is commonly referred to, the problem of vision appears to be dominant. Those who study this factor tend to highlight its complexity. Eye tracking (ET) techniques, implemented in research on the said problems over the last two decades, enable an effective analysis of the impact of vision problems while driving. One of the articles on this subject stated that the efficiency and nature of cognitive processes affect human activity in road traffic [2]. This seems to be a key hypothesis in the context of this article. The method proposed by the authors for controlling the movement of a vehicle for PWSN using the organ of vision is intended to enable exactly this kind of activity, since otherwise it is often impossible. The study involved a literature review and synthesis of knowledge about cognitive processes and their relevance to drivers’ behaviour. The authors conclude that a significant share of road accidents result from inadequate perception of environmental stimuli as well as storage and use of information [2]. In this context, researchers are often interested in stereoscopic vision, which is used to drive autonomous vehicles (AV) using cameras. Similarly to our method, other authors processed images obtained from cameras using vision techniques (VT). However, they used different software (Matlab), while ours is an original solution. The software in question is diversified, and the authors of this paper have presented an algorithm for lane detection and motion angle calculation for vehicle control along a path. For this purpose, known VT procedures were employed. A test vehicle (P3-DX Pioneer) was examined under laboratory conditions, moving along a given trajectory with a maximum deviation error of 2 cm [3]. The study described in that article is like the one we have discussed in the present paper, but the former involved different procedures, and the basis for vehicle control was not human vision, but a camera image. Another publication on the same subject addressed different vectors of the development of future vision systems for ground vehicles [4]. Its authors used analogies to biological vision systems, criticising technical systems in this regard, stating that this is the optimal solution for vehicle driving (i.e., biological and pseudo-biological). They described compact n-camera systems with a high level of performance in dynamic vision. It was assumed that such systems make it possible to understand the traffic scene (TS). Some authors concluded that the real challenge faced in this area is software (AI algorithms, deep learning) rather than detection and computing hardware [4]. At this point (as the authors of this study note), it is difficult to agree with the foregoing, since detailed FHD or 4K images involve processing gigabytes of data in a short time. When in motion or in field conditions, this is difficult and often expensive. This issue was also addressed in yet another article [5]. It discusses deep learning (DL) problems in the context of the use of image-derived data. Its authors point out that supervised methods are flawed: they require large volumes of training data and are unreliable. This is particularly true when the contextual situations of traffic scenes do not match the training set. The authors proposed a solution which entails using a non-labelled sequence of images. According to this method, predictive control is employed by utilising visualised trajectories to select vehicle control parameters. The images obtained at deflected trajectories, as they are referred to, are used to increase the model’s robustness. Experiments confirm that the performance of such a neural network is comparable to methods which involve additional data collection/supervision [5]. Similarly, in line with our approach, vision is supported by certain additional data from outside the biological system, e.g., a GPS signal. Given the body of problems addressed in this article, interesting research was conducted under the EU HASTE project [6]. The organ of vision was studied while driving tasks were performed with the assistance of artificial information systems on board vehicles (S-IVIS). Data was collected from over 100 people while driving on a motorway, a motorway in a simulator, and a country road. The authors noted that when the visual task assigned was more difficult, drivers shifted their gaze from the centre of the road more frequently to look at the displays. As the complexity of the driving task increased, so did the concentration of sight on the centre of the road [6]. The authors studying the problem at hand generally agree that a vehicle equipped with a vision system can visualise a traffic scene (i.e., the situation one has encountered) [7]. These objectives were pursued in the PROMETHEUS project, where vehicles travelled in AV mode in heavy traffic on motorways at speeds of up to 130 km/h. The tasks analysed during the tests were group driving, lane changing, and overtaking. The authors stated that, when a third-generation EMS-vision system was in use, the behavioural capabilities of agents were represented at an abstract level to characterise their potential behaviour patterns. What was demonstrated was the capacity of vehicles to drive in a network of secondary roads as well as off-road and to avoid obstacles [7]. Note that driving a vehicle for PWSM is easier in this case, because it involves less complex infrastructure.

Official manuals state that approx. 90 per cent of the information used by drivers is visual [8]. The referenced paper contains interesting information about night driving, the drivers’ ability to distinguish details from afar, to detect luminance differences, etc. One can also read in this publication that drivers with poor eyesight need to be closer to road signs or hazards to notice them. Drivers usually misjudge the distance at which pedestrians can be seen, while lighting can increase contrast and make objects on the road more visible. All such information is provided by the FHWA manual, which also addresses the issue of visual function degradation progressing with age [8]. This is relevant in the context of the method proposed in this article, as it is based on eye measurement without taking the associated cognitive processes into account. This matter is discussed further in this article. The low RTS level observed prompted further researchers to conduct literature studies in the field of visual navigation, smart vehicles, and visual clarity improvement theories [9]. The research shows that a smart vehicle with improved visual clarity features is better than a conventional one, and the vehicle’s performance is higher. The authors propose improvements to the steering algorithm by increasing the visual clarity of images through smart filtering [9]. Many researchers address the problem of the capacity of vehicles featuring vision techniques (VT) to navigate effectively in difficult conditions. To this end, deep learning is often used to improve the perception of machine vision in AVs. In one of the articles on this subject, the authors focused on the problem of detection of small targets. Tests conducted with a camera mounted in a vehicle allowed them to verify the effectiveness of road vehicle identification [10]. Most studies analyse vehicle control based on the visual information provided by cameras. In our research, the camera was directed at the driver’s eyes, and not at the TS itself, which represents a reversed arrangement. It is the driver’s eyes/mind that determine the direction and parameters of movement, not the processed TS. However, the various methods in question are similar in other aspects, since VTs are used to determine the vehicle control parameters. Vision image processing procedures include: binarization, edge detection, and transformations. These procedures are simplified by taking the configuration of the driving area (traffic scene) into account [11]. In the context of the problem at hand, one should mention the Vision Zero strategy. It represents a set of rules applicable to the automation of the road transport system, according to which AVs will need to perform better than human drivers. Therefore, according to the authors of this study, the road transport system will have to be adapted to both unreliable humans and unreliable automated vehicles [12]. Road traffic scene acquisition is a basic visual–motor task required of every driver. However, researchers point out that the way the task is performed is unclear [13]. And this is the case despite the widespread use of eye tracking (ET) and brain imaging research methods. Such studies often use measurements of the angular variables, head, and gaze of test participants. In this way, hypotheses concerning the visual strategies of the people examined are verified. According to the authors of this study, intercept control is best described as a constant target tracking strategy, with the gaze and head coordinated to continuously acquire visual information enabling the strategy to be pursued [13]. Other authors introduced models of what is referred to as active visual scanning [14]. This method consists of lateral control (position adaptation) and longitudinal control (speed adaptation), and is based on drivers’ visual input. The authors used scanning path analysis to examine visual scanning sequences over the course of driving. They have identified several stereotypical visual sequences: forward, steering, backward, landscape, and speed monitoring paths. According to the authors of the study, their results shed some new light on the patterns used while driving [14]. The authors of another publication highlighted numerous challenges, such as real-time data processing, decision-making under uncertainty, and navigating complex environments [15]. In this regard, they reviewed deep learning methodologies, including convolutional neural networks (CNN) and the recurrent neural networks used in AVs, against tasks such as object detection, scene understanding, and path planning [15]. Importantly, their study makes it possible to identify gaps in the attempts to achieve full autonomy, improve sensor fusion, and optimise costs [15]. Vision-based control systems were also studied under the C-ITS research project, where increased safety requirements, computing power in embedded systems, and the introduction of AVs were all relevant matters. The authors of this emphasised the interdisciplinary nature of their research, comprising aspects such as computer vision, machine learning, robotic navigation, embedded systems, and automotive electronics [16]. They reviewed a list of advanced vision-based control systems [16], where in addition to the control, per se, the operation of on-board devices is also considered important in this process. This problem was referred to in yet another article on the perception of vehicle interiors [17]. The authors studied manipulators on board a car using all human senses under various conditions of sensory deprivation, spanning hearing, sight, and touch. Unlike in direct driving, touch plays a more considerable role (three times greater) in vehicle operation [17].

In the field of VT and AV, the Defence Advanced Research Projects Agency programme is widely known. Another scheme, UGCV Percept OR Integration, introduced a new approach to all aspects of the design of autonomous mobile robots (MR), i.e., machines whose main purpose is to enable transport between points in a limited time and in a complex environment [18]. The article provided a discussion on the capacity of these machines to detect terrain features and introduces various devices serving this purpose to analyse the differences between AVs [18]. Another important study concerned divided attention, analysed in terms of environmental events and vehicle driving-related tasks. In the said study, drivers’ eye movements were recorded using an eye tracker [19]. As was easy to predict, eye movements proved to be highly dependent on the situation, with the main areas of interest being the windscreen, mirrors, and dashboard. The tasks assigned to the study participants affected the distribution of their attention in an interpretable manner [19]. A subject like our research was addressed in yet another study, where an intelligent wireless car control system based on eye tracking was designed. Tobii ET was used to track and recognise lines of human sight. ZigBee wireless communication technology was used to provide connectivity [20]. According to the authors, this form of control can perform forward and reverse driving, left and right turns, parking, and avoiding obstacles with an accuracy of approx. 98% [20]. Problems with perception in traffic constitute enough of a reason to try to improve technology. To this end, the authors of another article proposed a cooperative visual perception model using 506 images of complex road traffic scenarios [21]. An improved object detection algorithm intended for AVs was applied in their study. The average perception accuracy for individual TS elements reached 75%. Using the image fusion method, drivers’ points of view were combined with vehicle monitoring screens. This cooperative perception was able to cover the whole risk zone and predict the trajectory of a potential collision [21]. Regarding the foregoing, it should be noted that perception errors are a major factor in accidents, and few of them can be attributed to vision. According to the authors of this paper, in the case of all drivers, the fact that visual acuity declines abruptly as angular distance increases poses problems [22]. A phrase used by the authors to reflect these visibility limitations in the context of accidents is the statement “he was looking but did not see” [22]. This is because drivers often operate beyond their visual/perceptual capabilities, especially at night. The authors believe that errors in assessing the situation at hand are inevitable, but they do not necessarily lead to accidents, owing to the safety margins of drivers and other road users. An interesting conclusion drawn by the authors is that, despite certain limitations and fallibility, an average driver takes part in few incidents [22]. The aspects generally addressed in the research on vision driving are as follows: steering, visual cues, and human–vehicle interaction. This analysis comprises various modes of interaction between humans and vehicles in various scenarios, such as calling, stopping, steering, etc. [23]. With this context in mind, the researchers demonstrated vehicles’ capacity to detect and understand human intentions and gestures. The authors concluded that safety can be improved by way of vision-controlled human–vehicle interactions [23].

From the opposite perspective (namely, that of pedestrians), scenarios for road crossing were studied. Spontaneous gaze behaviour was among the problems investigated [24]. The tests consisted of distinguishing between running speed and the time of a car’s arrival at a crossing. The role of saccadic eye movements was emphasised. These make it possible to predict the time of a vehicle’s arrival at a crossing [24]. The knowledge of the driver’s role in the driver–vehicle system is very limited. There are many models of drivers (e.g., as following a leader), but few of them take the driver’s sensory dynamics into account [25]. Therefore, the authors of another study reviewed the literature about sensory dynamics, delays, thresholds, and sensory stimulus integration [25]. The concept of using vision in AVs has shown benefits in AV–pedestrian interactions. In the next study of interest, the researchers analysed a method of intuitive behaviour-based control. They proposed three algorithms for the control of AVs [26]. An oculomotor system became the basis for developing a vision sensor system intended for AVs under yet another study [27]. According to this approach, a stereoscopic control system mimicked the human vision system, performing tracking and cooperative movement tasks. The project culminated in a field experiment covering a pedestrian crossing [27]. According to the authors of another article, visual interactions play a central role in the performance of a spatial task on a TS. Perception-based control is grounded on the idea that the visual and motor systems form a unified system in which the necessary information is naturally extracted by vision [28]. The sensory/motor response is limited by the visual and motor mechanisms. This hypothesis was investigated in that article, referring to tasks involving indoor remote control of rotorcraft. The authors of the paper posed the following questions: what is the operator’s general control and steering strategy, and how does the operator obtain information about the vehicle and the environment [28]. The model proposed in the article enables profiling of the perception–action system. The purpose of this visual–motor model is to unburden the operator [28]. Another review of the literature on the subject focuses on methods for modelling and detecting the spatio-temporal aspects of a driver’s attention [29]. The paper refers to a machine learning-based approach to modelling and detection of the gaze for driver monitoring. Publicly available datasets containing recordings of drivers’ gaze have been provided [29]. Numerous studies on the driver attention problem assume that attention is placed where the gaze rests. Drivers can use peripheral vision to perform certain tasks. With this goal in mind, the authors of another study analysed the influence of peripheral vision on lane keeping. The performance of novice drivers deteriorated when the central task was close to the periphery, while the performance of experienced drivers only dwindled when the central task was at the bottom of the console. The authors concluded that novice drivers learn to cope with peripheral vision as they gain experience dealing with road traffic [30]. The research on the perception of TS raises many questions; for instance, about the way in which active-gaze-fixing patterns promote effective steering [31]. According to the authors of this article, the role of stereotypical gazing patterns used while driving remains unclear. The key question is whether there is enough information available in the direction of driving to use it for steering. To answer the above questions, the authors validated driving models using data obtained from human vision [31]. The vision control model can function based on what is referred to as a pure-pursuit controller, which calculates the trajectory towards the steering point, or a proportional controller. These foregoing studies imply that “looking where one is headed” can provide information for the steering system. According to the authors, capturing the variability in steering requires more sophisticated models or additional sensory information [31]. Another literature review focuses on discussing gazing behaviour patterns and their implications for understanding vision-based steering. This study has made it possible to build a knowledge base for researchers investigating oculomotor behaviour and physiology. The problem of utilizing gaze-fixing strategies while performing closed tasks at one’s own pace (with lesser dynamics) has been raised. The paper highlights the problem with separating vision-based steering strategies (closed) from open behaviour patterns [32]. A wide range of adaptation problems in the driver–vehicle system was discussed in another article [33]. A separate aspect of research, resulting from the implementation of AVs, is what is called situational awareness. It involves developing interfaces which will enable drivers to “be up to date/participate” in a traffic scene. In this context, a systematic review of human–vehicle interactions (HVI) ensuring situation awareness was proposed regarding AVs [34]. They were analysed in terms of modality, location, information transmitted, evaluation, and experimental conditions [34]. Another article also concerned the peripheral vision driving model, where the AV predicts vehicle speed based on camera recordings. The vehicle operates based on information from the driver’s sight. According to the authors, adding high-resolution input data based on predicted driver gaze-fixing locations improves driving accuracy [35]. Drivers’ physical limitations are the subject of further studies [34]. Specific driver models comprising such limitations have been identified to enable researchers to predict the performance of a driver–vehicle system under lateral and longitudinal control tasks [36]. Eye tracking technologies are considered to play an important role in the context of these problems, clearly demonstrating how such solutions improve driving safety. The authors of that study managed to build a theoretical model of a human–machine system integrated with the eye tracking technology. The specific impact on improving driving safety and interaction efficiency was also investigated [37]. In terms of road traffic safety, drivers rely heavily on visual perception. Its key elements, those that contribute to safe driving, include situational awareness, vehicle control, responsiveness, and anticipation of potential hazards. In this context, AVs are predominantly based on two basic technologies: LiDAR and others. In one of the articles concerning this subject, an image classification model was proposed, aimed at evaluating AV safety, and a RTS framework was defined. The framework encompasses the application of the two groups of technologies, installed on board AVs for use under different weather conditions [38]. With reference to that study, it should be noted that the role of sensors in terms of control efficiency depends on the data obtained. The sensors and cameras incorporated into a vehicle or tool can be used to provide such information, while the number of sensors and their arrangement will depend on the task at hand [39]. The experiments conducted in this context concerned monocular, stereoscopic, or enhanced stereoscopic (hyperstereoscopic) information. They evidenced that the efficiency of remote control depended on the specific driving task, and that no significant efficiency difference was revealed between monocular and stereoscopic tasks [39]. This is important for the methodology proposed, focusing mainly on monocular measurements. Further researchers have also highlighted the importance of predicting the temporal aspects of gaze-fixing patterns in natural multitasking situations. One of the solutions they propose is to break down complex tasks into modules which require independent sources of visual information. To this end, a softmax barrier model was introduced based on the use of two key elements: a priority parameter, representing the importance of a given task, and a noise estimate (representing uncertainty) regarding the state of the visual information relevant to the task [40]. Another technique which has recently been growing in popularity in the research on perception is immersive virtual reality (VR). In this context, researchers have undertaken projects intended to assess the behavioural gap. It defines the disparity in a participant’s behaviour over the course of a VR experiment compared to an equivalent role in the real world. A digital twin, being a model of a pedestrian crossing, was created for this purpose. It is used to study pedestrian interaction with AVs in both real-life and simulated driving conditions. Experimental results show that pedestrians are more cautious and curious in VR [41]. However, the outcome of the study was dependent on the interface used, which highlights the role of visual information in these processes [41]. In terms of vehicle driving, vision impairments affect two areas. Where the eye is the problem, these include conditions such as glaucoma and macular degeneration [42]. At the brain level, stroke and Alzheimer’s disease are the main issues, especially in older adults. Such disorders can increase the number of errors attributable to reduced visual acuity, sensitivity to contrast, and field of vision [42]. Ageing and brain damage can also reduce the useful field of vision, increase the frequency of blinking, impair the perception of structure and depth, and reduce the perception of direction [42]. Another study of relevance concerns the relationship between drivers’ eye movement patterns and driving performance against a dual-task driving paradigm. The first task consisted of following a car while maintaining a set headway from the preceding vehicle. The second task was used by the authors to examine the updating of traffic light signals. The performance under task one was measured by the distance, and under task two, by the reaction time and accuracy. The authors concluded that the frequency of fixations, as well as their duration and spatial distribution, were significantly correlated with the drivers’ performance. Driving performance improved with fewer eye movements, longer fixation time, and less spacious distribution of fixations [43]. That study is counterproductive vis-à-vis the goals and assumptions of the project in question, since the latter is focused on maximising the utilisation of eye movements for steering vehicles intended for PWSN. This will, consequently, be examined in detail in subsequent studies. The performance of a steering task is impacted by a visual field deficit [44]. Binocular and monocular visual field deficits exert a negative impact on driving skills [44]. Both central and peripheral visual field deficits cause various difficulties, but the degree thereof depends on the deficit’s severity and the driver’s capacity to compensate [44]. In this context, the authors of one study discussed the problem of visual field deficit compensation. The measures undertaken in the case of central compensation for a visual field deficit include driving speed reduction, while in compensating for a peripheral visual field deficit, these include increased adaptive scanning [44]. Many researchers point out that eye movements are determined by space-based attention, or, more specifically, by the distinctive features of a traffic scene. The authors noted that some research indicates that visual attention does not specifically select places of high salience, but functions on the level of the units of attention assigned by objects in the scene [45]. The authors of that study analysed the relevance of such objects in directing attention using spatial models, including those based on low and high levels of salience, object-based models, and a mixed model. The authors conclude that scanning path models use object-based attention and selection, which implies that the units of object-level attention play an important role in attention processing [45]. Some other authors note that navigation and event prediction depend on the optimal distribution of attention through overt eye movements. They refer to the essential parameters: fixations and saccades. In their perspective, fixations are periods of relative stability, when the eyes focus on the TS. Fixation means that the brain is processing the information it has fixed upon. Saccades are used to direct the focus of the eyes from one point of interest to another. No visual information is acquired during saccadic movements [46]. Another of the articles reviewed describes new elements in the research on drivers’ gaze. It introduces IV Gaze—a pioneering dataset used to record gaze in a vehicle (in a sample of 125 persons). This dataset covers a wide range of gazes and head positions observed in vehicles. The author’s research focuses on in-vehicle gaze estimation using IV Gaze. They have investigated a new strategy for classifying gaze zones by extending the solution known as GazeDPTR [47]. This research is considered important on account of the fixed datasets utilised in our method, which, however, caused more or less significant errors in identifying the driver’s gaze or even head position, which is discussed in more detail in the method validation section. In this highly specific study, eye tracking technology was used to examine the impact of display icon colours on the preferences and perceptions of elderly drivers in human–vehicle interfaces. The study comprised an analysis of six foreground colours against two background colours. The results have shown that elderly drivers find orange icons easier to locate, while yellow icons are the most difficult to spot [48]. According to further researchers, continuous tracking of the fixation point enables smoother motion control, which means that predicting the road curvature by tracking a distant point contributes to driving stability. The authors concur with the hypothesis that drivers focus their sight in-between traffic lane boundaries to set the path of movement [49]. Yet another aspect to this discourse is the research on the effect of cognitive load on visual attention, given that cognitive load can impair driving performance. Cognitive load combines with a deficit of external cues when a driver briefly takes their eyes off the road. This problem was studied in the context of an auditory task. The performance measures applied included drivers’ adaptations to changes and their confidence in detecting them. According to the authors, cognitive load reduced the participants’ sensibility and self-confidence, regardless of the external cues [50].

3. Materials and Methods

The problem of using the organ of vision to control vehicle movement is complex and involves an analysis of characteristics and data retrieved from many different information channels. One of the objectives underpinning the study addressed in this paper was to develop a method for safe control of a cycle intended for PWSN using real-time analysis of the vision organ’s characteristics. At the same time, the solution was to be inexpensive. It was further envisaged that this system would be GPS-supported. The fundamental problem is related to answering the question of whether the measurement data from vision that are obtained using a measurement system are sufficient for the safe control of such a vehicle. Another problem is the application of an appropriate signal separation methodology in the scope of the characteristics of vision acquired from a person driving an off-road vehicle designed for people with special needs. Therefore, the very essence of the study is to select adequate tools for tracking eye movements as well as to determine which movements and what range thereof can be effectively used to control a vehicle for PWSN. Another problem to be solved is whether it is possible to use only the organ of vision to control the vehicle’s movement without resorting to other information channels, i.e., data extracted from speech or gestures. The GPS is necessary in this context, since it is incorporated into the vehicle’s steering system to increase its safety (taking precedence over the designed system, and being installed as such).

The methods presented in this article for tracking driver gaze have been proven for approximately 20 years. They have a rich citation base [51,52,53,54,55,56,57]. They have been tested in numerous practical applications. However, they will be expanded to include the current state-of-the-art object tracking methods based on neural networks. It is worth remembering that data about the visual system is not only provided by the eye itself. Such data can be obtained from the electromagnetic potential of the eye muscles. Furthermore, processes occurring in the eye, such as blinking, provide many other valuable clues about this organ. Below, we present one of them: blinking.

The authors have some experience in analysing the phenomena of microsleep and blinking in drivers [58,59]. In our opinion, data concerning driver microsleep and blinking phenomena can be successfully analysed to use this behaviour to start, steer, and brake vehicles intended for PWSN, as discussed further on in the paper. This phenomenon (closing eyes for a shorter or longer period) has the potential to be developed for use in vision control systems.

Although the problem of using vision techniques to process visual characteristics is not essentially new, their implementation for the control of off-road vehicles for PWSN is. This approach attempts to analyse the possibility of using visual characteristics to steer an off-road vehicle intended for PWSN, one which typically negotiates terrain that is unfavourable from the point of view of motion control: uneven, with large elevation differences, often narrow, and effectively reducing vehicle ground clearance. In contrast, the studies found in the literature mainly concern the movement of AVs on public roads, which are even and generally straight, with one exception being the subject of the DARPA project. Contrary to autonomous vehicle control on public roads, our study benefits from the typically low traffic density on the mountain trails used by PWSN.

Another important aspect of the problem is the development of an entire control system to be installed on board the vehicles intended for PWSN at the lowest possible cost compared to the total vehicle price. Another novelty is defining the vehicle’s control logic using characteristics obtained from the organ of vision. The method of their processing as well as selective and sequential use is specifically defined to achieve effective and safe vehicle control. In a broader sense, controlling such a vehicle also entails operating its subsystems using the organ of sight, at least as far this is possible. However, across the whole process, one should keep in mind that the problem at hand concerns a vehicle driven and operated by people with special needs, who must be provided with a higher level of safety.

The total cost of the prototype devices engineered under the research project in question varies, while still being low across all cases and amounting to several hundred PLN. As of 29 September 2025, the EUR to PLN exchange rate is 4.29. The cost of a reference device available on the market and intended for commercial use, e.g., Tobii Eye Glasses, also used over the course of the study, is approx. PLN 130,000, which exceeds that of even expensive cycles for PWSN [60]. The same can be said about the cost of the ET solutions from the now defunct company SMI (which used to supply the most popular mobile ET solutions for the last dozen or so years). When intended to be used to control the movement of vehicles for PWSN, such glasses must be additionally programmed with SDK. Furthermore, the operation of the ET glasses is no simpler than that of the devices we propose.

Essentially, there were several prototype variants studied, all based on the measurement of vision characteristics using small cameras mounted opposite the right and/or left eye, or possibly in front of the whole face. The eye tracker of our design is known as People GBB_eye_tracker (GBB_ET). The cameras or their sub-systems, as presented in this paper, are prototypes and are typically installed on jeweller’s goggles or in other systems enabling their flexible mounting on the driver’s head. They can be incorporated into a GoPro-type set or mounted on a cap resembling a jeweller’s headpiece (or goggles) with a visor (including an add-on scene camera on top of the eye monitoring camera). The jeweller’s type goggles or headpiece are devices designed for precision work, featuring a frame enabling a magnifying glass to be fitted. In the prototype version, a high-speed camera (120 FPS) is often mounted on special glasses. The cameras used in the prototypes operate within frequencies ranging from 15 to 30 to 120 fps. The resolutions of the cameras typically used in practice range from 1280 × 720 to 1920 × 1080 pixels (although the capabilities of the cameras used in this area are much greater, reaching 5–12 Mpx).

The first of the prototypes in question is based on a single camera installed in front of the eye being monitored in the PWSN vehicle driver (equivalent to a monocular). The idea is that the driver’s field of view should be obscured by the camera as little as possible. Ultimately, the cameras are to be built into the frames of the goggles, as is the case of the commercial devices from SMI (originally a German, Berlin company acquired in 2017 by Apple, Cupertino) or Tobii (Danderyd, Sweden) [61]. A variant of this prototype is a group of devices built using a camera installed centrally in front of the driver’s eyes on a GoPro-type boom. This kind of prototype features one or two independent cameras, covering the field of view of both eyeballs or the entire face. Further variations in these prototypes are devices mounted on the jeweller’s goggles or headpiece with a visor. The headpiece reduces reflections, while the visor is used to fit a separate scene-oriented camera. The scene camera monitors the area in front of the vehicle. The device with a scene camera is similar in terms of functionality to commercial glasses designed for eye tracking. Such a camera, observing the area in front of the vehicle, paired with the headpiece with two cameras fitted at the same time—one for eye movement tracking and the other to monitor what in front of the vehicle—provides functionality like that of a conventional eye tracker. Regardless of the foregoing, a situational (silhouette type) camera, as it is typically referred to, is mounted on board the vehicle for PWSN, supporting the driver’s analysis and monitoring the driver’s entire silhouette or only the head. The PWSN vehicle itself may also be equipped with a scene camera, independent of the driver’s scene camera (both providing two, often disparate, perspectives of the traffic scene). Two independent scene cameras also make it possible to separate the visual data used for control purposes. In the case of the prototypes presented in this article, the entire setup of such a system may seem complicated. In the commercial version, all the gear will be simplified as much as possible and integrated into the vehicle. Figure 2 shows the general layout of the prototype measuring equipment in use against the background of the vehicle for PWSN and the driver’s face.

Figure 2a shows the layout of the vehicle’s cameras. The traffic scene camera, also known as the vehicle foreground camera (1), is mounted on the structural frame, at the highest point of the vehicle. The vehicle frame dimensions can be adjusted to the driver’s height. The camera or the system of cameras (2) for gaze monitoring are mounted on the driver. The silhouette camera (3) is installed on a special boom attached to the support frame. Figure 2 also shows the seat belt system used to ensure the vehicle driver stability. Figure 2b presents the spatial arrangement of individual cameras in relation to the driver’s face. Driver monitoring can cover the organs of vision, head, silhouette, and traffic scene. Procedures for detecting the biological and behavioural characteristics of a vehicle driver can be divided into four main groups: eye motion, face motion, body motion, and scene object detection, only the first two of which have been discussed in this article. In terms of the control of the vehicle for PWSN, one should distinguish between steering—meant as the operation of setting the direction of the vehicle movement and changing its speed—and the handling of other vehicle sub-systems. In the future, vision-based operation of certain PWSN vehicle sub-systems is also envisaged (employing the concept proposed). Essentially, the concept in question involves controlling the vehicle movement without the use of limbs. In this context, several vehicle control scenarios can be defined. The first, and most difficult to accomplish, consists of using only the analysis of movement and the visible position of the eyes, or more precisely—the pupils. The data utilised according to this approach is the information about fixations, saccadic eye movements, and the pupil size. In other words, once the data have been adequately converted, one extracts information about the points in the traffic scene where the driver’s gaze is fixed for a certain time (fixations) and the movements between these points (saccades), although both kinds are of completely different nature. Additional control information is derived from whether or not the driver closes their eyelids (blinking). The fact that blinking is used for vehicle control has been described further on in the paper with reference to the authors’ empirical observations of this process [62]. The second scenario concerns the support for the characteristics of the sight organ using additional information retrieved from the peribulbar area, where adequate temporary markers (e.g., direction arrows) are placed. In this case, the marker itself can be a pictogram and contain information used to control the vehicle, e.g., concerning the direction of movement or speed control (up or down). The marker can be placed both on the eyelid as well as elsewhere near the eyes. For example, eye squinting causes the marker to be placed in the camera’s field of view. The third scenario for the vehicle control involves the use of an additional camera to monitor the head area, where facial gestures and/or whole head movements are also used for vehicle control. The fourth case concerns a situation where a general/situational (silhouette) camera placed on the PWSN vehicle frame covers the driver’s torso. With such a system layout, the driver can use upper limb movements or hand gestures to control the vehicle. This article addresses the first three scenarios. However, all the four cases of control of the driver’s characteristics have been illustrated in Figure 3.

Figure 3 shows the potential reference systems for the vision control system. The first (Figure 3a) is based solely on eye tracking. The source of data is solely the eyeball. The second relies on the use of additional markers in the eye socket area (Figure 3b). Such markers are placed on the eyelids or around the eye socket to obtain additional data for control. The third takes the driver’s head orientation into account, while the fourth one considers the orientation of the driver’s entire body. There are special systems on the market which are dedicated to driver silhouette modelling, available at prices starting around EUR 100–200. These are called pose estimation systems, wherein the head is represented by “five to seven joints”, and several dozen additional points from the rest of the body can serve as references for it. At this stage of research, however, we did not use such stereoscopic camera systems, yet.

The choice of the procedures used to control vehicle movement as well as of the on-board devices to be installed, as discussed in this article, ultimately depend on the driver’s health condition, the algorithm’s efficiency, and—from a commercial perspective—the final price of the entire control system. It has been assumed that the control system will be designed in such a way as to increase the accessibility of this form of (mountain) tourism for people with special needs. Nevertheless, all the prototypes presented in the paper cost relatively little vis-à-vis the entire construction of a regular cycle, the estimated price of the former being a maximum of several hundred PLN. Therefore, the cost of the most advanced PWSN vehicle control system will not exceed 5–10% of the market price of the available vehicles for PWSN. This will allow even the most excluded group of users to control such vehicles. The process of monitoring the vehicle driver’s eyes is effectively based on several approaches. One involves vehicle control using information retrieved from a single eye. Another entails using data from both eyes, which is much easier, and can be achieved using one or two cameras, depending on the mounting solution (Figure 4). There is also an approach which considers the fact that it is difficult to separate the signals for control purposes in the first two cases, and so it assumes that some additional information is to be used. Figure 4 shows views from a camera monitoring the eyes of a PWSN vehicle driver. The choice of the optimal headset mounting method has not yet been investigated.

Figure 4 illustrates some real-life problems encountered when attempting to keep track of the driver’s gaze. Firstly, the eye can be moved in the direction of an extreme end position with a large amplitude of movement (Figure 4b). Moreover, this may be accompanied by a sudden turn of the entire head (Figure 4a). Observation of only one eye does not consider the differences between the two eyeballs. Even if this is not visible, one may have a slight squint (Figure 4b). The characteristic features of an individual can make it difficult to record extreme eye positions (Figure 4c). As shown in the above three figures, selected from a wide range of problems related to eye position monitoring, this is by no means a simple issue.

Different measuring device prototypes were used over the course of the study. The measuring equipment precursor was a device (video game controller) from 2018, based on three IR diode position measurements. This device served to control the position of a moving object in a 3D simulator game. It used to be a popular control method in the 1990s and early 2000s, especially in computer games and simulators. Another device was a prototype based on a HAM radio kit with a Logitech c270 camera (Logitech, Lozanna, Switzerland) (Figure 5a). This prototype was the first to apply procedures representing the group of vision techniques to control the pupil position using the OpenCV library [63].

The following Figure 5b,c illustrate the successive stages of development of the measuring devices used for gaze monitoring in drivers of vehicles for PWSN. Inexpensive off-the-shelf components are typically used to mount the prototypes. In the future, these will be 3D-printed components. The sets shown in Figure 5a run on Windows, while the others run on the Raspbian operating system (Raspberry Pi Holdings, Cambridge, UK) currently Raspberry Pi OS. Various types of cameras were used in the prototypes, both with and without the autofocus feature, for daytime and night-time operation (also without an IR filter). Most of them are cameras representing the spy camera category, as it is commonly referred to, i.e., units mounted without a PCB (Printed Circuit Board). In such cases, the camera’s electronic circuits are mounted directly on the Camera Serial Interface (CSI) tape. This makes it possible to generate a better field of view for the driver and causes less traffic scene obstruction. Additionally, the small size of the camera head (8 × 8 mm), in the commercial version, enables the device to stay hidden in the frame of the glasses. High-speed cameras operating at frequencies above 100 FPS (up to 300 fps, which is faster than the popular commercial eye trackers available on the market) are currently tested. The first (and the largest) of the cameras, shown in Figure 5a, is representative of typical webcams. The Logitech C270 HD provides 720p of resolution at 30 fps. The other cameras depicted in Figure 5 are devices based either on different CSI connectors or on USB (enabling operation under Windows, and only with native drivers). Another camera—the OV5693 5MPx manufactured by OmniVision Technologies, is compatible with Raspberry Pi (Raspberry Pi Holdings, Cambridge, UK) and Jetson Nano (Nvidia, Santa Clara, CA, USA) [64]. The cameras mentioned further on are mainly intended for Raspberry Pi Zero. These include the ZeroCam OV5647 5MPx night vision camera, the ZeroCam OV5647 5MPx with a fisheye lens and a field of view of up to 160 degrees, and finally the smallest model—the OV5647 5MPx, known as a spy cam. The latter camera features a flexible cable without a PCB, which enables convenient installation in the field of view of the PWSN vehicle driver. The last unit tested, having a catalogue number of OV5640 5MP, represents USB cameras. These can also be used under the Windows OS, making it possible to process more complex video footage (on PCs with greater computing power). Where this is the case, images can be processed by the PWSN vehicle’s on-board computer. In some of the cases addressed in the study, the images of the eye/eyes are processed by the Raspberry Pi minicomputer systems. Where these computers do not have enough capacity, the video footage is processed under Windows 11. The following hardware platforms were used in the research: Raspberry Pi Zero 2 W, 512 MB RAM, 4 × 1 GHz, WiFi, Bluetooth, and Raspberry Pi version 4 B with 8 GB of RAM. The latter minicomputer was powered by the Broadcom (San Jose, CA, USA) BCM2711 quad-core 64-bit ARM-8 Cortex-A72 processor running at 1.8 GHz. It features a dual-band 2.4 and 5 GHz WiFi unit, Bluetooth 5/BLE, and an Ethernet port with speeds up to 1000 Mbps [65,66]. It was a deliberate choice to use minicomputers enabling wireless connectivity for the communication in the PWSN cycle prototype system.

The Dell 5070 computer powered by the Intel (R) Core (TM) i3-8100 CPU operating at 3.60 GHz with 32 GB of memory and the Microsoft Surface Laptop Go 2 platform powered by the Intel Core™ i5-1135G7 processor were used for video analysis under Windows. Although the OpenCV library can be installed on the Raspberry Pi Zero, due to its severely limited computing power (merely ca. a few GFLOPS, e.g., five), it was mainly used for real-time image acquisition. Models starting from 4B upwards (8–35 GFLOPS) are better suited for processing, given their limitations in meeting the needs of mobile projects, resulting from their significantly larger dimensions [67]. However, the discussion on further integration of the measuring system goes beyond the thematic scope of this article.

4. Control vs. Microsleep and Blinking

One of the problems analysed regarding the research objectives was whether closing one or both eyes at the same time (aside from the pupil movement) can be applied as a (secondary) criterion for vehicle control. This crux of the problem to which the question pertains is the time for which the driver can close their eyes in order for the drive controller to obtain reliable information about what the driver intends to do with the vehicle, while at the same time, unintentional (purely physiological) eye closing should not be treated as a control signal. This is a much broader body of problems concerning signal reliability, which will be addressed in more detail in another publication. Apart from blinking, it boils down to identifying a set of certain gestures and facial expressions that can be used to control vehicle movement (e.g., slightly narrowing eyelids without closing them). These gestures need to be isolated from natural actions, often involuntary, that occur beyond the driver’s consciousness. Blinking should be mentioned in this respect, as it can be purely physiological in nature. In our opinion, other forms of blinking are of great relevance in terms of their utilisation for vehicle control where only the organ of vision is in use.

Figure 6 illustrates two different characteristics of blinking. Between 2014 and 2025, the authors participated in numerous studies concerning the sight organs of professional car drivers, train drivers, and users of individual means of transport. The studies were conducted using eye trackers from SMI and Tobii. That research clearly shows that physiological and involuntary blinking occurs within specific situational timeframes. In Figure 6a, the purple line represents the average blinking time for the entire study period, which is 186 ms, while the green line represents the median, which is 116 ms, and the mode, which is 33 ms. Figure 6b depicts these characteristics for a much longer driving time. In this figure, the purple line also represents the average value, i.e., 815 ms, the green line represents the median—183 ms, and the blue line represents the mode—33 ms. Only a few eye-closing actions exceed the range of 500–600 ms in the first case depicted, and 3–5 s in the second case, with very high variability observed in the range of blinks exceeding several hundred milliseconds. Data analysis implies that, over long periods of travel by a given means of transport, the average time of blinking/eye closing increases, but the median and mode remain at a similar level. To continue the discussion in this field, microsleep is a short-term (episodic) involuntary series of sleep stages lasting from a few seconds to half a minute, sometimes longer. It can occur unexpectedly, even during routine activities that require attention, such as driving. This phenomenon poses a serious threat to rail and road traffic. It is a momentary episode of sleep that can occur unexpectedly, often without the person’s awareness or recognition of the fact. According to research, approx. 30% of the accidents happening in the USA are due to sleep deprivation. Fatigue and sleep deprivation in the period immediately preceding driving (although this is a complex issue) affect the characteristics of blinking. During microsleep, drivers appear to remain conscious. They may keep their eyes open while driving but not see or be able to interpret the traffic scene. Despite these differences between sleep and blinking, their effects are similar in terms of the loss of concentration and eye closing for longer (compared to physiological blinking). Yawning is also a frequent phenomenon. The action of squinting is captured by the eye camera, and yawning, by the face camera. However, the most dangerous condition is the loss of the driver’s attention for a few seconds. Therefore, what can be identified between typical physiological blinking, microsleep, and longer blinking, depending on driving parameters, is an interval of several seconds, typically 1–2 s, which, in our opinion, can be utilised to introduce additional information into the vehicle control system. This interval requires extensive population studies for its parameterisation, but it can be determined for the benefit of the owner of an individual PWSN vehicle by way of calibration. It can also be controlled in real time using the prototypes presented in this paper. Therefore, we believe that it is possible to enter information into the PWSN vehicle’s control system by relying on intentional blinking, in one or both eyes, via a learned sequence of such blinks. This is difficult to implement, but not impossible, as it requires additional blink processing and a vehicle movement assistant that analyses the range of intentional blinks on an ongoing basis. From the perspective of this study, it is interesting to note the slow lowering of the driver’s head, correlated with falling asleep, which, in a system such as ours, can be monitored by means of the vehicle’s situational camera, i.e., the one capturing the driver’s head. Therefore, in our opinion, it is possible to filter intentional blinking with limited reliability, distinguishing it from the microsleep phenomenon and physiological blinking. This can be achieved mainly with the support of the data retrieved from the silhouette camera. However, using learned sequences of intentional blinking observed in both eyes, it is possible to increase the probability of correct reading of the control signal, hence the possibility of initiating pre-defined signals in the vehicle control system through the intentional blinking process. To this end, numerous combinations can be applied, where the three basic ones involve closing one or both eyes. Additionally, the sequence in which the eyes are closed results in more combinations. Furthermore, a sticker marker designed for an optical monitoring system can be set up, e.g., on the eyelid, further increasing the number of combinations (forward, backward, left, right + six signals to control other vehicle systems). Eye closure time modulation is not analysed precisely in the intervals resulting from the analysis of the blinking and microsleep signals, since this can be a major problem for non-professional (and non-military) drivers (although help in this respect can be provided, for example, by a voice assistant that collects data from prototypes on an ongoing basis). Additionally, the vehicle monitoring (silhouette) camera can track head movements which may be associated with closing one eye. Such a gesture can be utilised to steer the vehicle in turns. When turning, the eye opposite to the turning direction must be closed to indicate the turning direction with the lead eye. Such a mode of control using a closed eye and simultaneous head movement can be mapped very dynamically and seamlessly in the steering system. It is a matter of analysis of the eye-closing time with simultaneous measurement of the time and direction of head movement (time is represented by the camera frequency expressed in FPS).

5. GPS-Assisted Vehicle Steering

Vehicle steering using the organ of vision in real-life mountain trail conditions can be dangerous, especially in case of sudden movements of the head or eyes of the person driving the vehicle for PWSN, which may result from hazardous situations occurring, especially in the peripheral field of view. It should be emphasised that a mountain route is a narrow and uneven strip of terrain. The dynamics of visually controlled vehicle movements are tested in the process of methodology validation using an IMU under laboratory conditions. To this end, one should constrain the sudden manoeuvres of sight-controlled vehicles, especially to the left and right. This can be performed either directly by means of the drive controller system or by acting outside-in, i.e., by controlling the vehicle position using the GPS. This action limits the sensations attributable to dynamic driving but significantly increases the driver’s safety. The inexpensive and widely available GPS units used in the study have been shown in Figure 7a,b.

In practice, the GPS modules commonly used for low-cost positioning, i.e., GY-NEO6MV2 and Gravity PS, enable the geographical coordinates of the device to be read with a very small error of up to 2 m (in urban traffic, the edges of cycle lanes are also detected. This is extremely useful in applications such as the PWSN vehicle’s movement control. When one navigates a tourist or mountain trail, the accuracy of the GPS unit alone is insufficient due to numerous reasons, including the trail’s bendiness. This requires using a Lidar to study the area in front of the vehicle (solutions for approx. EUR 200) or a traffic scene camera.

Both the systems are inexpensive and require a power supply voltage in the range of 3.3 ÷ 5 V, with a low current consumption of approx. xx mA. In the case of both these receivers, the satellite readout time in unfavourable conditions is up to 300 s. Rationale behind the positioning task is to keep the PWSN vehicle in each lane or mountain trail (with a specified path width). The GPS signal adjusts the control data retrieved from the eyesight system, preventing the vehicle from straying outside the lane or mountain trail. In a commercial solution, the vehicle intended for PWSN would not only be equipped with a GPS unit, but it would also feature other signals to adjust the data obtained from the human organ of vision, enabling the vehicle to be kept on the permissible path of movement (which will be discussed in further publications; this problem is mentioned in this paper partly in connection with the method validation under laboratory conditions).

6. Processing of Vision Characteristics

Below are screenshots from a proprietary programme intended for analysing the characteristics of the human organ of vision. It analyses the movements of the pupil(s) of PWSN vehicle drivers using procedures representing a group of the visual techniques implemented in the OpenCV library [63,68].

Figure 8a–c provides screenshots from a proprietary software intended for recognising the position of the human eye. It can detect the position of both the eye, using the Haar procedures (Figure 8a–c), and of the pupil itself, once the relevant image has been pre-converted, which enables that element to be extracted from the context (application window with black background, where the pupil is shown in white). The black colour of the pupil is used in this case to determine its exact position (Figure 8a,b).

The programme created to analyse the sight organ’s characteristics was developed in the Python 3.13.7 language based on the OpenCV library [63,68]. As such, it can be deployed on hardware platforms powered by diverse operating systems: Windows, Unix, Linux, Raspberry Pi, etc. Data are processed in real or near-real time (depending on the capacity of the microcomputer processing the signals; potential delays are very small and will be described in another publication), while pupil movement is read and saved to output files, and used directly to control the vehicle. Various procedures are used to read pupil movement. The two most important ones have been illustrated in Figure 8. The first procedure makes use of what is referred to as Haar-like features (Figure 8a–c), and the second one extracts colour features from the image in combination with contour detection (Figure 8a,b). Haar-like features of a digital image are utilised in object recognition and are commonly applied in real-time face detectors [69]. They can be used to track the movement of people in city surveillance networks. In the second case, the eye image is colour-converted (knowing that pupils are black, subject to heterochrony) [70]. Next, in the converted image format, specific procedures are applied to find the iris position using calculations related to contour search in the image analysed [71]. In one of the prototypes discussed, this information processing is combined with simultaneous traffic scene acquisition. Such a prototype has been depicted in Figure 9 below.

Figure 9 presents a camera which records the traffic scene. This device offers the same level of functionality as commercial eye trackers. The traffic scene camera, with a resolution of up to 2500 × 1440, records the image that the driver is looking at, while the eye-facing camera, as in the devices mentioned above, records the movement of the eye itself, specifically the pupil position. What matters in particular in this prototype is that both cameras are synchronised with a one-frame tolerance, so that both types of content, i.e., the eye scene and the traffic scene, can be compared with each other, and relevant notes can be added to both at the same time (with an accuracy ranging between 0.066 and 0.0083 of a second). This means that following adequate measurement data conversions, one can estimate what object the driver of a PWSN vehicle is gazing at, assuming a tolerance of 2–3 angular degrees (this procedure has been described in another publication). In urban space, the movement of a vision-controlled wheeled vehicle can be oriented according to some characteristic elements, e.g., horizontal road signage, as shown in Figure 9a,b. For example, vehicles can be steered along the lines which mark traffic lanes and cycle paths, and they can be stopped based on the stop lines or according to information derived from horizontal and vertical road signs (according to this procedure, horizontal and vertical signage may be decomposed into basic pictograms). Unfortunately, on a mountain trail, the PWSN vehicle traffic can only be oriented axially along the path (mountain trail in this case). Where this is the case, traffic is managed using a single linear marker or, where mountain trail ruts are visible, the axis of two such longitudinal markers. Any other forms of marking require infrastructure investments on mountain trails. Such infrastructure changes may well be inexpensive, as we have evidenced in other publications. The slope gradient can also be utilised for this purpose. The broad body of problems related to the marking of the mountain trail infrastructure dedicated to PWSNN will be discussed in more detail in another paper.

7. Calibration Procedure

The prototypes of the measuring devices analysed in this study and their accuracy in measuring vision characteristics depend on the method and accuracy of camera installation. Each time, before use, on account of the face profile, the installation method and the specific camera setup or type experience minor camera deviations from the focal axis. This should be corrected upon installation. Additionally, the device calibration procedure should be performed each time (as for ETs). During a ride or drive, the prototype reports problems connected with potentially incorrect positioning relative to the eye compared to the calibration results. The calibration procedure is performed in a separate programme with a similar functionality to the main programme (in fact, it is the GBB_ET software package). In the calibration procedure, the user moves the eyeballs as far as possible to all sides within the range of their respective mobility. Then, they focus on a point directly in front of the pupil (a special marker, as in commercial ET goggles, proves very helpful), after which the conventional centre of the eye is pinpointed and saved to the configuration file. What the system also records are the positions of the eye cameras and the situational camera. The data obtained during the calibration process are saved and converted into configuration data, entered into the basic software as constants for a given vehicle user. The pupil deviations identified during vehicle movement are compared with the reference constants established in the configuration process. Vertical eyeball mobility represents the range of the upward and downward movement of the eyeballs. In fact, the user needs to move the eyes for longer (over the course of configuration, specific visual tasks are performed), since this movement is never perfectly oriented up and down. It should be noted that this range is the result of the coordinated work of six external eyeball muscles (being a biological and variable factor). Horizontal eyeball mobility is the ability of the eye to move left and right, thanks to the same six external eye muscles. Sight may sometimes be disturbed by a phenomenon known as nystagmus. It can be neurologically conditioned by the CNS functions or the labyrinth system. At other times, its origin is of kinetic nature. When moving in a vehicle for PWSN, one may be subject to optokinetic nystagmus (mountain paths are typically uneven). Another condition relevant in this respect is strabismus. However, it can be detected during the calibration of binocular devices. Eyeball movement irregularities can be caused by muscle or nerve damage (which often applies to PWSN). In such cases, corrective procedures should be implemented in the prototypes proposed, but these have not been the subject of research to date. If a driver of a vehicle for PWSN wears glasses, this does not require any special preparatory procedures other than changing the device’s configuration files, which is discussed further on. Another measurement of relevance pertains to the pupillary distance (PD), which defines how far apart the pupil centres in both eyes are. This quantity is given in millimetres.

The eyeball mobility examination consists of observing the movement of the eyes in different directions, including horizontally, vertically, and diagonally, during the calibration procedure (Figure 10a). Calibration involves measuring the deviation of the pupil from the central position (Figure 10b). For simultaneous measurement of both eyes, the distance between the eyes is important (Figure 10c). It is noteworthy that, where words such as left and right, etc., are used in this paper, the eyes are also rotated. Adequate eyeball mobility is essential for the PWSN vehicle control, although certain adjustments can still be made based on the calibration procedure.

Basically, a healthy person’s horizontal field of vision (without moving the head) is approx. 180 degrees. For one eye (as we test it), this field spans approx. 55–60 degrees from the nose plus approx. 85–100 degrees on the other side. The vertical field of vision spans 45–50 degrees upwards and 65–70 degrees downwards (monocularly and without head movement). The binocular vertical field of vision is approx. 135 degrees. The visual field comprises the central zone of sharp vision, corresponding to merely 2 angular degrees of the traffic scene, as well as the peripheral area (much larger, requiring rapid eye movements). This peripheral area can be used in the procedures discussed in this paper. Considering the measurement methodology in question, the horizontal field of vision, i.e., 180 degrees, contains a varying number of pixels which can be used to identify the pupil position. Similarly, in the vertical plane, the field spans a maximum of 120 degrees. This all depends on the eyeball’s mobility, which is conditional on the driver’s physiognomy, health status, including that of the peribulbar muscles, and potential medical conditions. Additionally, the configuration procedure conducted for each eye has an impact on the foregoing. With an adequate setup of the measuring device (full HD camera), this can be 1920 points horizontally and 1080 vertically. This corresponds to approx. 10 points per degree in the traffic scene, horizontally, and 8–9 points for the vertical field of view. Given the errors described further on in this article, this means that the accuracy of object positioning on the scene ranges between one angular degree and several degrees. It is therefore inferior to that of the commercial eye trackers commonly available on the market (offering less than one degree). This can be changed using 4k and higher resolutions, but it requires high computing power and data transfer capacity of at least approx. 0.03 GB/sec. Nevertheless, for each use, the resolution of the measuring device depends on its setup (device positioning on the head and independent positioning of cameras). The load-carrying element, such as a GoPro-type head-mounted harnesses, or jeweller’s goggles and caps (of various designs), is independent of the camera mounting system, which can be set up in different variants within certain limits using dedicated clips from 3 M. Paradoxically, the clips represent a significant cost in the prototype’s production but will be eliminated in subsequent stages. Simplified calibration diagram is shown in Figure 11.

8. Software

In the GBB_ET software package, measurement applications designed for the analysis of eye characteristics can be fundamentally broken down by the camera’s field of view and the objects that can be analysed within that field. Accordingly, individual programmes can be divided into those which analyse eye movement (eye movement detector) and the entire face (face detector), the latter often being combined with eye analysis. This is because eye detection takes place in individual iterations related to face detection. Another application is a body detector, which analyses the movement of the entire body, and particularly that of the lower and upper limbs, including gestures made with the head or hands.

The eye position analysis programme comprises two procedures. One is based on Haar-like features, i.e., it finds the position of the eyes based on the information collected. This is possible owing to standard OpenCV library procedures. The Haar procedure enables face detection using Haar cascades, as they are commonly referred to. They provide an effective way of detecting faces, although they are relatively inaccurate (but fast). The algorithm functions by comparing the image with pre-defined features. The input data for the detection procedure are contained in XML files of the haarcascades.xml type. They determine the accuracy of the procedure. OpenCV and other providers supply ready-to-use files which can serve various purposes, including recognition of selected features. According to the procedure in question, these features are grouped in what are known as cascades, and they are compared in steps. In the first step, only a few of the most important features are checked. If there is a match, the procedure checks the next group, i.e., the cascade which is quantitatively larger, then the subsequent one, and so on, hence the high processing rate. If the comparisons of all the features are correct, what follows is identification, which is provided in the form of a rectangular area where the object subject to recognition has been identified. This procedure is also used to recognise road signage when a cycle for PWSN is used to navigate the road network.

As the research has revealed, although very fast and not requiring advanced (time-consuming) processing of additional image frames, the Haar procedure is not accurate. However, its precision can be improved in many ways, e.g., by changing (updating) the *.xml feature files, or by developing one’s own files based on the characteristics of known vehicle end users with special needs. A diagram of such a procedure is shown in Figure 12. According to this approach, such features would be constructed based on the PWSN cycle users, thus increasing the algorithm’s precision. The problem of the accuracy of the methods in question is discussed at the end of the article. The second method requires more operations to be performed on the image but is much more accurate. Under this procedure, the image is first colour-converted. Next, colour thresholding (elimination of colour variability) takes place in the colour space of a given format. It is assumed that one is looking for the colour black in the frame, which corresponds to the pupil of the eye (ignoring health conditions that may affect this procedure, which pertain to ca. 1% of the population). Once the image has been processed accordingly, assuming that the eye has been correctly framed, only the pupil will remain. Next, its contour (pupil) is found using standard OpenCV library procedures. The actual centre of the pupil is thus established, typically with much more precision than the Haar procedure would otherwise provide. Unfortunately, as tests have shown, the second procedure is susceptible to various types of artefacts in the frame image, such as reflections from car windows, affecting the recorded colour space. This undesirable phenomenon is eliminated by the application of filters on glasses (as in a conventional ET device) or by using goggles mounted on jeweller’s caps with visors (Figure 5b,c). The second procedure evidently requires more intermediate steps when using the OpenCV library (more time, which can be critical in the process of control of a vehicle for PWSN).

Information about the driver’s intentions is conveyed not only by the eyes, but also by head and body movements (this article only touches upon the characteristics obtained in the visible light spectrum). For this purpose, detection is also conducted using a vehicle camera, known as a situational/silhouette camera, as well as procedures for detecting the driver’s face and its elements. It should be mentioned that eye detection procedures are generally applied in conjunction with face detection procedures, as the detected face provides a spatial reference for the eye detection procedures. Nevertheless, in line with our approach, these procedures are sometimes combined, while being separated at other times, in different modules of the GBB_ET software package, depending on the system configuration. The foregoing has been schematically illustrated in Figure 13.

Figure 13b clearly shows that the procedure can identify many objects incorrectly (in this case, the corner of the mouth is interpreted as a potential eye location). Therefore, it requires further modification, the results of which have been shown in Figure 13c. In these two cases, an extension of the detection scheme is the detection of the entire body of the PWSN vehicle driver or of individual limbs, both separately and in pairs. Using this approach (and from such a perspective), it is also possible to detect gestures made with hands (sensors dedicated for microcontrollers can also be used for this purpose: Arduino, Monza, Italy, Raspberry Pi, STM, Plan-les-Ouates, Switzerland etc.). The authors have some experience in the use of LoRa networks for the navigation of vehicles intended for PWSN. People who are incapable of operating wheelchairs should be offered the possibility of their remote control by pilots/operators via this network (a kind of virtual guidance). Another option is to control the PWSN vehicle using a wireless LoRA network dedicated to PWSN and specifically designed by the research team. In such cases, the PWSN vehicles would be remotely controlled along a route by guides assigned to persons with special needs. Navigation using the LoRa network for PWSN has been described in a separate publication [72]. Such remote navigation can be conducted from distances reaching up to several kilometres, while the person using the PWSN vehicle always remains under the guide’s strict oversight.

The purpose of the method of software operating on a recorded image with recorded eye movements is to develop steering and control signals for the PWSN vehicle, i.e., to determine when and where the vehicle should go left, right, or forward, start, stop, or reverse, etc. It is also to set specific objectives for the drive controller, transferred to and implemented as clear messages for the PWSN vehicle’s drive unit(s) (it has also been assumed that this measurement system is to be adapted to other mountain cycles for PWSN). Such messages as go, stop, brake, accelerate, go left/right, or turn lights on/off are intended to reflect the driver’s intentions, communicated via the organ of vision, potentially supported by markers and gestures, including facial expressions.

Figure 14a,b shows the messages assumed for specific pupil positions read while driving (complemented by the situational context, e.g., resulting from GPS data, and data from other sources). As the pictures demonstrate, the position of the pupils can be determined for each direction of movement within a certain range defined in the calibration procedure. Thus, a left turn can be defined within a range from slight to medium to full. In subsequent studies, we intend to develop a fuzzy logic system for these purposes. The same applies to moving to the right. Upward and downward eye movements mean acceleration and deceleration, respectively (it should be remembered that the vehicles are equipped with electric motors, generating high torque instantaneously, causing dynamic acceleration). Some of the tests have been performed in infrared, but apart from taking the specific lighting conditions of the traffic scene into account, they have not made much contribution to the research. Figure 14 shows that during vision analysis, contextual information about the functioning of this organ is available.

One more problem to consider is that of torsional movements, which are best read using EMG electrodes installed on the temple.

Therefore, using only this form of the PWSN vehicle control (i.e., eye steering) can be dangerous. Another issue is the separation of pupil movements from involuntary movements or those related to observing the traffic scene (more broadly) in natural environment, since this is precisely the reason why mountain trails are used, after all. Consequently, the said procedures have been modified by introducing additional information and a system of two cameras or a face camera, where both eyes are monitored (mounted on a GoPro-type support). Movement to the left is achieved by closing the right eye (as a precaution when changing direction), depending on the current blinking characteristics measured on an ongoing basis, for a specific period of time, followed by briefly glancing in the direction of movement or—with the vehicle’s situational camera in use—turning the head slightly to the left. To turn in the other direction (right), one closes the left eye and, if possible, performs a short head motion to the left. Forward movement is commanded by moving the eyes upwards, possibly accompanied by a head movement or with a specific eye closed (this being a matter of configuration of the file containing settings for the adopted mode of control). When steering the vehicle, one should keep the eyes in a given position for longer. The health condition context of this approach has not been studied, but according to the literature, intensive activity in terms of mobility and visual acuity should also promote the rehabilitation of the sight organ. This is an additional benefit arising from this solution. Slowing the vehicle down adequately involves moving the eyes downwards, with the head moving downwards as well, if possible, and with one eye closed. To separate these signals from the motions performed in relation to nature observation, an additional gesture can be made, e.g., the free hand raised upwards, communicating such an intent in the field of view of the situational camera. The choice of the decision-making hand is determined by the functionality of the driver’s dominant hand and the corresponding way the movement manipulator has been installed on the left or right side of the PWSN vehicle (Figure 1a,b).

The diagram below represents a simplified model of control of the vehicle in question. It is noteworthy that, in this context, the vision control system can be divided into the part associated with the control of the area in front of the vehicle obtained using a camera and that based on a Lidar system. Superimposed on this information are the data retrieved from the vision control subsystem, thus limiting the vehicle’s steering capabilities based solely on the control data obtained from the organ of vision. Such a limitation is implemented so that the target direction of the vehicle’s motion will cause no traffic safety issues. This is also controlled by accelerometer and GPS systems. Additionally, the GPS and navigation subsystems make use of GIS data. To this end, most of the mountain trails located in two Polish provinces were surveyed for this project.

As of now, the most unreliable of the subsystems is the one which uses the data extracted from the driver’s blinking process. Unfortunately, this problem requires further time-consuming and cost-intensive research. The control (drive control, CD) can be formulated as an ordered seven:

C D = < A C C, E T, G P S, V C, G I S, E N G, L I D >

(1)

where

ACC—data from the accelerometer system,

ET—data from the eye tracker system,

GPS—data from the GPS system,

VC—scene camera data, silhouette camera,

GIS—selected data from the GIS system,

ENG—current data from the drive system,

LID—data from the LIDAR system.

A detailed vehicle control system will be presented in the next article, including the application of fuzzy logic. This approach stems from the different ranges of observed eye position variations in the context of a mountain trail. Figure 15 shows a simplified control model.

9. Results

The article provides a discussion of the eye control device in question, its implementation and pilot test results. The results have been compared with the characteristics obtained using relatively more expensive eye-tracking solutions (three times more so). The results are presented in the context of laboratory tests of the prototype proposed. These results are continuously updated due to the magnitude and nature of the study.

Figure 16 presents selected characteristics of the vehicle driver’s eye movements obtained using the procedures described above. Figure 16a shows the distribution of fixation points on the traffic scene, typical of the ET technique. Every point in that graph corresponds to a specific measured position of the eye. Figure 16b,c illustrate the movement of saccades (broken down into two axes), where successive eye movements to the left and right were recorded and accumulated between footage frames. Figure 16d contains the statistics of left and right movements as well as of the zero position (central) fixation of the eye. The data provided in Figure 16d indicates how complex it is to obtain information about the movement direction based on data retrieved from the driver’s sight. Such studies, apart from providing the possibility of applying their results to control the turning behaviour of a vehicle, also often show which eye is dominant. However, these characteristics were not used to analyse the upward and downward deviation of the pupil (as shown in Figure 17c).

Figure 17 illustrates the data obtained from the authors’ signature measuring device, which is not a commercial eye tracker. It presents a selection of the eye movement characteristics obtained. Figure 17a represents eye movement speed, which, however, was not converted to the traffic scene dimensions, but to those of the screen frame. Figure 17b, on the other hand, depicts eye acceleration. Figure 17c indicates the angle in the contour circumscribed on the eye socket to which the combined position of the eye on the x and y axes corresponds. Figure 17d is a diagram which represents the pupil diameter. As aforementioned, the pupil itself is extracted from the eye image via a series of colour transformations performed on the image.

In the future, these characteristics will make it possible to identify the points of gaze fixation in the traffic scene (identify objects in the traffic scene). This can be achieved by using the measurement system shown in Figure 9a,b. It is a device which features synchronised cameras: one for the traffic scene, focused on the area in front of the PWSN vehicle, and one for monitoring the driver’s eye movement. It is through the integration of such sub-systems that one can create a prototype with functionality like that of the typical commercially available ETs. The work pertaining to this problem is still ongoing. The dual-camera systems were controlled from a single computer with a common time base, set using a tool of appropriate accuracy (requiring synchronisation with the GPS control unit), and synchronised by frame [73]. Over the course of the analyses, it was noticed that the Haar procedure was particularly far from accurate in pupil positioning. Significantly better results were obtained using image colour operations and contour detection. On the other hand, the former procedure is more or less susceptible to external conditions during measurements. Therefore, the accuracy of both procedures was compared with the known centre position of the eye. Figure 18a–c concern one test subject, while Figure 18d–f apply to another person (for comparison): the former being a young (22 years old) person with no vision impairments, and the latter being a person over 50 years of age with significant vision impairments.

Figure 18a shows the deviations between the two methods in terms of the pupil positions measured. It clearly shows that larger deviations are observed on the y-axis (vertically in the frame), while smaller ones are observed on the x-axis (horizontally in the frame). For the first person, most of the horizontal deviations revealed between the methods were up to a few pixels (a result almost as good as that of professional eye trackers). Vertically, the differences reached several dozen pixels. For the second person, most of the horizontal deviations between the methods in question came to several dozen pixels, with high variance. Vertically, the differences also reached several dozen pixels, but they were much smaller than in the horizontal plane, also with high variance. Figure 18b,c,e,f show the distributions of the pupil centre locations calculated using both methods. The differences are significant, but it seems that certain factors can be implemented for the Haar method to correct it. Alternatively, one can improve the feature description files at the algorithm’s disposal or develop better datasets. Table 1 summarises the statistics obtained for both people and methods.

The table above is provided for reference purposes only, and it has been compiled following tests of the project team members. It is intended to illustrate the problem of the high variability in the statistics subject to observation. This variability cannot be explained solely by the accuracy of the device; it also results from other factors such as the personal circumstances of the drivers, weather, and lighting conditions. As can be seen from the data included in Table 1, the deviations on the Y axis are excessive. This is particularly problematic, because this is the axis where the observed range of motion is smaller. According to data from the literature, the correct horizontal field of view is approximately 180 degrees, and the vertical field is 130 degrees. The above field-of-view values may vary slightly due to individual parameters such as the field of view of a single eye, defects, ambient lighting, diseases, etc. In the context of this study, this is important, because controlling the lateral axis of movement is easier than controlling the forward axis, unless the axis is reversed, which is inconsistent with the intuitive sense of space observed in the population. Therefore, this issue requires further detailed research, including in trail conditions where additional stimuli cause significant differences in the observed ranges of movement. This table is therefore only an illustration of a significant problem in this matter.

The accuracy of the methods can also be tested by studying whether or not they ensure correct identification of the central eye position. Such tests are conducted using a static image of the eye or with a dummy featuring a display showing different eyes of different people in sequence. To this end, an eye database was developed and loaded into the pupil centre calculation procedures discussed in this article. This made it possible to measure the deviation of the actual eye position from the one established by measurements performed in line with a given method [74].

The accuracy of the pupil centre identification algorithm was tested with reference to a collection of static stock-type images where a series of faces and eyes were collated to create a video [74]. Each face modified the dataset in terms of the deviation of the actual eye centre (the pictures corresponded to one another in this respect) from the point at which the pupil is measured. Unfortunately, the measurement error reached several dozen pixels, which represented approximately 2%. The accuracy of the calculations applied depends on the test databases used in relation to the file containing features, such as haarcascade_righteye_2splits.xml, haarcascade_eye.xml, or a similar file. It is best to prepare and customise such a file based on a selection of actual vehicle users with special needs.

Every human eye is only similar. This problem is addressed by biometrics. However, we are only interested in the measures of eye position, which also pose certain issues when being determined. The eye position measurement can be affected by various anatomical features, such as eye folds, individual characteristics (congenital and acquired diseases), or even illumination, humidity of the environment, etc. This is the reason why we have been consistently building our own database of eye representations, matching diverse environmental conditions and aligned with the diagram provided in Figure 19. It will be somewhat easier to restrict this database to the people who have consented to the processing of their personal data. Another solution available in this respect is to implement additional calibration procedures to be initiated each time before the vehicle is started.

10. Method Validation

Four validation procedures were applied: with mobile robots, with a commercial eye tracker, with a comparison of data from different cameras, and an analytical one. The validation of the method proposed, i.e., of the capacity to control a moving wheeled object in a designated environment, was conducted under laboratory conditions and on a test track. For this purpose, a group of mobile robots with wireless communication systems was prepared, enabling communication by means of the device’s microcontroller. Some of the robots subject to the tests have been shown in Figure 20a. Figure 20b depicts the robot selected for further testing, Figure 20c shows the robot drive controller, and Figure 20d shows the vision-controlled robot on the test track.

The chosen robot is here described in terms of its chassis, control and communication systems, supplied separately, and sensor system. This robot (Figure 20b) enables the dynamic testing of movement direction changes, which is why it was selected for further testing. The robot was built based on an intelligent chassis of the Robot Chassis series from Waveshare, Shenzhen, China (Figure 20b). The advantage of the chassis is its shock-absorbing design, which is important for the measurement systems used in validation, especially the optical ones. The chassis is based on Mecanum wheels. They represent a special type of wheel which allow the vehicle to move in any direction, including diagonally. Such a wheel arrangement enables unconstrained movement in all planes. The chassis (Figure 20b) is compatible with the Raspberry Pi 4B, Raspberry Pi Zero, and Jetson Nano microcomputers, although it can also be connected to other development boards. A unit from the same company, General Driver Board (Waveshare), was chosen for the controller. It is a multifunctional controller designed for the control of mobile robots, and it is based on the ESP32-WROOM-32, Espressif Systems, Shanghai, China module. This board cooperates with the following platforms: Raspberry Pi and Jetson Nano (Raspberry Pi 4B was used in the validation process). A device known as a general-purpose controller makes it possible to control DC motors both with and without encoders (in this case, the distances covered were established in a different manner). The board features standard terminals for connecting numerous typical robot components and accessories. These can be OLED displays, sensors, WiFi antennas, and sub-assemblies for environment scanning, e.g., Lidar. The device used under the research project in question was a Lidar from the same company, Waveshare, Shenzhen, China., namely D200 Developer Kit–LiDAR LD14P, offering 360-degree scanning and up to 8 m of operating range [75]. Figure 20d shows the robot along with the person who controlled its movements on a test track designed for mountain bicycles. On such a track, each time the robot is overdriven, it typically causes it to capsize. However, this is a good testing ground for the problems at hand.

The robot controller supports 2.4 GHz WiFi, Bluetooth 4.2 BLE, and ESP-NOW wireless communication, and features a 9-axis QMI8658C IMU. Additionally, it has a built-in 40-pin GPIO connector, which is intended for connecting the Raspberry Pi, Jetson Nano, etc., boards. Communication with the system is provided through the serial/I2C port, which will make it possible to connect additional robot tracking and positioning sensors in future research.

The results obtained over the course of the robot motion control process performed under laboratory conditions and on a test track have been provided and discussed below. They depend on the mobile robots in use as well as their sensory equipment. The robots operate with different spatial orientation systems, ranging from cameras, limit sensors, and lines to a 2D planar Lidar, as shown in Figure 20b (a 3D variant will be examined in further research). In order to assess whether the method proposed was correctly validated in relation to the capacity to move along a selected route, the number of contacts between the test robot and barriers, resulting from erroneous control commands obtained from the eyes, was calculated. The number of misinterpreted messages has also been provided. All these results have been collated in Table 2.

The results are specific to the route being performed on the test track. They are also specific to the person conducting the tests, requiring further research on a large group of test drivers.

11. Conclusions

The study results that we have obtained provide an incentive to conduct detailed research in order to validate the method for controlling an off-road vehicle designed for people with special needs in the real conditions of selected mountain routes in Poland. Based on these results, obtained both under laboratory conditions and on a test track, it can be concluded that it is possible to control a moving, wheeled object using the organ of vision in controlled conditions. Both methods applied to read eyeball positions make it possible to determine the intended direction of vehicle movement (other procedures are currently being tested). However, for the sake of the reliability of data reading, it is required that auxiliary data from a silhouette camera or extracted from other driver’s gestures, including facial expressions, or potentially from auxiliary markers be used. In this regard, a fuzzy control system for the PWSN vehicle drive will also be developed. Regardless of the results of further research, for safety reasons (considering mountain trail conditions), the vision-based control prototype must be supported by a GPS system or another system enabling the vehicle to be positioned on mountain trails. It is currently easier to implement such a system for the movement of cycles intended for PWSN in street network conditions on account of the numerous spatial references available. As a side effect of the system’s deployment, the organ of vision is subject to rehabilitation by way of an intentional effort made to stimulate the eye muscles. Further tests will be conducted under natural conditions on selected mountain trails. The database of research participants should also be expanded in order to obtain data representative of a broader population.

Although the OpenCV methods used are not new, they are 20 years old, they are efficient and fast and are well documented in the literature. The achieved accuracy of eye position measurement is sufficient. Other methods such as YOLO (on CNN) will be presented in subsequent publications.

To conclude the above elaboration, it should be highlighted that vision-based control is principally not a safe method for steering any vehicle due to the characteristics of human sight. The eye is an object which reaches relatively high velocities and accelerations.

Figure 21 illustrates some real-life characteristics of vision measured in vehicle drivers. They are based on the authors’ own data resources, collected on the actual road and rail network (not the data obtained by means of our signature eye tracker, as presented in Figure 17 and Figure 18 above). Figure 21a shows how the average velocity of the driver’s eye changes while driving. The data were acquired over the course of the eye tracking tests conducted by the authors in the years 2016–2018. For most of the time, these velocities remained relatively low. Unfortunately, during approx. 0.2% of the recording time, these velocities reached values that were one or even two orders of magnitude higher. These abrupt (sometimes drastic) increases need to be separated and suppressed if such data are to be used for vehicle control purposes. Otherwise, they would be dangerous from the perspective of proper vehicle control. Similar observations apply to the amplitude values presented in Figure 21b. Even though no extreme peaks are observed in their case, the variability of this characteristic over the course of driving reaches several hundred per cent. Figure 21c,d show the acceleration peaks and average acceleration, respectively. There is a certain regularity observed in the average acceleration values, which cannot be said about peak acceleration values. These driver vision characteristics demonstrate how difficult it is to make sure that the values retrieved from the organ of vision are adequately limited in order to steer the vehicle effectively. While imposing such limitations, one must take numerous elements into account, including the cross-section of the infrastructure where the vehicle is currently moving and the real-life ranges of eye movement in the context of these data. We believe that engineering a safe vehicle control system based on the organ of vision will render tourism, including mountain tourism, available to people who are currently denied such an opportunity. However, this type of a vehicle control model requires a separate publication to present and discuss it in a detailed manner.

The application of the system described in the paper still needs considerable effort. With that in mind, we have also constructed a digital twin of the vehicle for PWSN. It constitutes an independent system dedicated to this vehicle type, based on vehicle position measurements using GPS, accelerometric, and Lidar subsystems. It will prove particularly useful during implementation tests to be conducted on actual tourist and mountain trails [75]. This system detects any deviations of the vehicle from a safe driving path in real time, and it has been illustrated in Figure 22.

Monitoring of the area in front of the vehicle using cameras (Figure 22b,c), when supported by data obtained from a high-speed Lidar, makes it possible to keep track of the position of the vision-controlled robot (simulation equivalent of the PWSN vehicle) in line with the control signals retrieved from a person simulating the vehicle driver’s behaviour. Additionally, vibroacoustic signals are controlled in such a system using three to eight accelerometers (Figure 22a). One of the tasks of the presented cameras in the foreground of the robot is to keep it on the set route and to control the parameters from the eye tracker system.

Finally, the innovative aspects of the solution proposed definitely deserve to be emphasised. In the first place, what marks a clear change compared to the solutions applied to date is the extremely low cost of the vision-based control system (two orders of magnitude cheaper than off-the-shelf eye trackers). Secondly, the solution in question is extremely flexible, making it possible to take the various needs of PWSN into account. The gear does not integrate as tightly with the driver’s head as conventional eye trackers. Ultimately, this equipment will operate with no contact with the driver at all, owing to the specific design of the vehicle’s vertical frame. Additional information delivered by markers set on the eye socket as well as extracted from data from the blinking process will be used to control steering parameters, which is neither an obvious nor an easy solution to implement. One should also remember to retrieve information from diverse vision subsystems, although this is similar to solutions known from autonomous vehicles. Our approach is based not only on the use of the eye position, but also the position of its components, such as the pupil, which is not obvious given the equipment class and its simplicity. We also intend to use other kinds of information already known from biometrics, with regard to the biometric modalities of the eye, which will be the subject of subsequent articles. And lastly, it is noteworthy that the publicly available cascade databases, although universal, are of relatively poor quality, ranging from 70 to 80% (where the values have not been boosted by our signature algorithm changes). That is precisely why we are developing our own cascade databases, which are expected to contribute new quality to the solution in question. The application of telemetry markers, IoT, and new cascade databases represents a prioritized vector for our further research. We would also like to test the possibility of employing cheap substitutes for the EEG devices typically used in learning systems to control such vehicles. However, this concept no longer falls under the category of visual control.

In the next phase, an algorithm that uses the convolutional neural network (CNN) YOLO will be tested. This approach has an alternative Apache licenced library such as Official RT-DETR (RTDETR) aka Real-Time DEtection TRansformer (DETR) [76]. You can also use ready-made tools for this purpose. Ultralytics is a leading AI company dedicated to creating transformative, open-source computer vision solutions [77]. It is worth noting that the YOLO series has become the most popular framework for real-time object detection, due to its reasonable trade-off between speed and accuracy, since 2015 [78]. Recently DETRs have provided an alternative. Therefore, the RT-DETR, the first real-time (RT-DETR) end-to-end object detector was proposed [78]. These libraries and tools will be the subject of further research in terms of the possibility of their implementation in the project in question.

Finally, it is worth mentioning that this vehicle is used for mountain tourism, planned to use specialised trip planners. One of these is described in [79]. Such a planner can accommodate the specific requirements of both the user and the vehicle.

Author Contributions

Conceptualization, M.S. and I.C.; methodology, M.S. and I.C.; software, M.S. and I.C.; validation, M.S. and I.C.; formal analysis, M.S. and I.C.; investigation, M.S. and I.C.; resources, M.S. and I.C.; data curation, M.S. and I.C.; writing—original draft preparation, M.S. and I.C.; writing—review and editing, M.S. and I.C.; visualisation, M.S. and I.C.; supervision, M.S. and I.C.; project administration, M.S.; funding acquisition, M.S. All authors have read and agreed to the published version of the manuscript.

Funding

The authors disclose the receipt of the following financial support for the research, authorship, and publication of this article: National Centre for Research and Development (Poland), research programme: Things are for People. Funding number: /0026/2020-00.

Institutional Review Board Statement

The confirmation we have consists of statements in the project application (in Polish) that the “Borderland Without Barriers” Foundation will be involved in the implementation of project activities, as well as direct contracts for the performance of research tasks in the project concluded with individuals from the Foundation (civil-law contracts). Whereas, MDPI Editorial Office has attached an information form for consent to the research and written consents from all study participants presented in this study.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study. All participants in the study provided informed consent. Wheelchair testing was permitted only after informed consent had been provided. In the form of a written declaration of voluntary participation in the study.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request and can be used for scientific reasons only.

Acknowledgments

The support of project partners throughout the project is very valuable.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

PWSN	people with special needs
AV	autonomous vehicles
RTS	Road traffic safety
ET	Eye tracking
VT	Vision technique
FHD	Full HD
DL	Deep learning
GPS	Global positioning system
FHWA	Federal Highway Administration
CNN	convolutional neural networks
HVI	human–vehicle interactions
PLN	Polish zloty
PCB	Printed Circuit Board
FPS	Frames per second
TS	Traffic scene

References

Owsley, C.; McGwin, G., Jr. Vision and driving. Vision Res. 2010, 50, 2348–2361. [Google Scholar] [CrossRef]
Krzewniak, D. The importance of the vehicle driver’s cognitive processes in shaping road safety. Sci. J. Mil. Univ. Land Forces 2021, 200, 285–302. [Google Scholar] [CrossRef]
Manivannan, P.V.; Ramakanth, P. Vision Based Intelligent Vehicle Steering Control Using Single Camera for Automated Highway System. Procedia Comput. Sci. 2018, 133, 839–846. [Google Scholar] [CrossRef]
Dickmanns, E.D. May a Pair of ‘Eyes’ Be Optimal for Vehicles Too? Electronics 2020, 9, 759. [Google Scholar] [CrossRef]
Khan, Q.; Sülö, I.; Öcal, M.; Cremers, D. Learning vision based autonomous lateral vehicle control without supervision. Appl. Intell. 2023, 53, 19186–19198. [Google Scholar] [CrossRef]
Trent, W.V.; Harbluk, J.L.; Engström, J.A. Sensitivity of eye-movement measures to in-vehicle task difficulty. Transp. Res. Part F Traffic Psychol. Behav. 2005, 8, 167–190. [Google Scholar] [CrossRef]
Dickmanns, E.D. Evolution of the “4-D Approach” to Dynamic Vision for Vehicles. Electronics 2024, 13, 4133. [Google Scholar] [CrossRef]
Current Research and Practices, Principles of Visibility. Available online: https://highways.dot.gov/safety/other/visibility/roadway-visibility-research-needs-assessment/2-current-research-and (accessed on 12 July 2025).
Leibing, Y.; Deepak, G. Positioning control algorithm of vehicle navigation system based on wireless tracking technology. Wirel. Commun. Mob. Comput. 2021, 2021, 8620409. [Google Scholar] [CrossRef]
Qiu, C.; Tang, H.; Yang, Y.; Wan, X.; Xu, X.; Lin, S.; Lin, Z.; Meng, M.; Zha, C. Machine vision-based autonomous road hazard avoidance system for self-driving vehicles. Sci. Rep. 2024, 14, 12178. [Google Scholar] [CrossRef]
Lee, H.T. Guidance control of vehicles based on visual feedback via internet. J. Wirel. Com. Netw. 2015, 2015, 155. [Google Scholar] [CrossRef]
Lie, A.; Tingvall, C.; Håkansson, M.; Boström, O. Automated Vehicles: How Do They Relate to Vision Zero. In The Vision Zero Handbook; Edvardsson Björnberg, K., Hansson, S.O., Belin, M.Å., Tingvall, C., Eds.; Springer: Cham, Switzerland, 2023. [Google Scholar] [CrossRef]
Zhao, H.; Straub, D.; Rothkopf, C.A. The visual control of interceptive steering: How do people steer a car to intercept a moving target? J. Vis. 2019, 19, 11. [Google Scholar] [CrossRef]
Navarro, J.; Lappi, O.; Osiurak, F.; Hernout, E.; Gabaude, C.; Reynaud, E. Dynamic scan paths investigations under manual and highly automated driving. Sci. Rep. 2021, 11, 3776. [Google Scholar] [CrossRef]
Sahoo, L.K.; Varadarajan, V. Deep learning for autonomous driving systems: Technological innovations, strategic implementations, and business implications—A comprehensive review. Complex Eng. Syst. 2025, 5, 2. [Google Scholar] [CrossRef]
Horgan, J.; Hughes, C.; McDonald, J.; Yogamani, J. Vision-based Driver Assistance Systems: Survey, Taxonomy and Advances. In Proceedings of the 2015 IEEE 18th International Conference on Intelligent Transportation Systems, Gran Canaria, Spain, 15–18 September 2015. [Google Scholar] [CrossRef]
Burnett, G.I. Drivers quality ratings for switches in cars: Assessing the role of the vision, hearing and touch senses. In Proceedings of the First International Conference on Automotive User Interfaces and Interactive Vehicular Applications, Essen, Germany, 21–22 September 2009. [Google Scholar]
Lu, J. Autonomous Vision of Driverless car in Machine Learning Advances in Economics, Business and Management Research. In Proceedings of the 2022 7th International Conference on Social Sciences and Economic Development, Wuhan, China, 25–27 March 2022; Volume 215. [Google Scholar]
Yke, E.; Eijssen, D.; de Winter, J. What Attracts the Driver’s Eye? Attention as a Function of Task and Events. Information 2022, 13, 333. [Google Scholar] [CrossRef]
Guo, R.; Wu, J. Intelligent Car Control System Based on Vision Tracking Technology. In Proceedings of the 2024 3rd International Symposium on Control Engineering and Robotics, Changsha, China, 24–26 May 2024; pp. 13–17. [Google Scholar] [CrossRef]
Zhao, Y.; Lei, C.; Shen, Y.; Du, Y.; Chen, Q. Human-vehicle Cooperative Visual Perception for Autonomous Driving under Complex Road and Traffic Scenarios. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2022; pp. 646–662. [Google Scholar] [CrossRef]
Hills, B.L. Vision, Visibility, and Perception in Driving. Perception 1980, 9, 183–216. [Google Scholar] [CrossRef]
Yonghua, Y.; Ming, X. Vision-Based Hand Gesture Recognition for Human-Vehicle Interaction. 2020. Available online: https://www.researchgate.net/publication/2386425_Vision-Based_Hand_Gesture_Recognition_for_Human-Vehicle_Interaction (accessed on 22 July 2025).
Sudkamp, J.; Bocian, M.; Souto, D. The role of eye movements in perceiving vehicle speed and time-to-arrival at the roadside. Sci. Rep. 2011, 11, 23312. [Google Scholar] [CrossRef]
Nash, C.J.; Cole, D.J.; Bigler, R.S. A review of human sensory dynamics for application to models of driver steering and speed control. Biol. Cybern. 2016, 110, 91–116. [Google Scholar] [CrossRef] [PubMed]
Singh, P.; Chang, C.M.; Igarashi, T. I See You: Eye Control Mechanisms for Robotic Eyes on an Autonomous Car. In Adjunct Proceedings of the 14th International Conference on Automotive User Interfaces and Interactive Vehicular Applications; Association for Computing Machinery: New York, NY, USA, 2022; pp. 15–19. [Google Scholar] [CrossRef]
Zhang, X.; Murase, T.; Kobayashi, T.; Takagashi, S.; Mourigurchi, T. Human Vision System and Its Application for Safety and Security. SEI Tech. Rev. 2010, 70, 103. [Google Scholar]
Andersh, J.; Mettler, B. Modeling the Human Visuo-Motor System to Support Remote-Control Operation. Sensors 2018, 18, 2979. [Google Scholar] [CrossRef] [PubMed]
Kotseruba, J.; Tsotsos, J.K. Attention for Vision-Based Assistive and Automated Driving: A Review of Algorithms and Datasets. IEEE Trans. Intell. Transp. Syst. Volume 2022, 23, 11. [Google Scholar] [CrossRef]
Summala, H.; Nieminen, T.; Punto, M. Maintaining Lane Position with Peripheral Vision during In-Vehicle Tasks. Hum. Factors 1996, 38, 442–451. [Google Scholar] [CrossRef]
Tuhkanen, S.; Pekkanen, J.; Mole, C.; Wilkie, R.M.; Lappi, O. Can gaze control steering? J. Vis. 2023, 23, 12. [Google Scholar] [CrossRef] [PubMed]
Lappi, O. Gaze strategies in driving–An ecological approach. Front. Psychol. 2022, 13, 821440. [Google Scholar] [CrossRef]
Smiley, A. Auto Safety and Human Adaptation. Issues Sci. Technol. 2000, 17, 70–76. [Google Scholar]
Capallera, M.; Angelini, M.; Meteier, Q.; Khaled, A.O.; Mugellini, E. Human-Vehicle Interaction to Support Driver’sSituation Awareness in Automated Vehicles: Systematic Review. J. Latex Cl. Files 2012, 8, 2551–2567. [Google Scholar]
Xia, Y.; Kim, J.; Canny, J.; Zipser, K.; Canas-Bajo, T.Y.; Whitney, D. Periphery-Fovea Multi-Resolution Driving Model Guided by Human Attention. 2020. Available online: https://whitneylab.berkeley.edu/PDFs/Xia_Driving_Model_Human_Attention_WACV_2020.pdf (accessed on 22 July 2025).
MacAdam, C. Understanding and modeling the human driver. Veh. Syst. Dyn. 2003, 40, 101–134. [Google Scholar] [CrossRef]
Liu, B.; Zhang, G.; Cui, Y. The Application of Eye Tracking Technology in Human-Machine Co-driving System under the Background of Intelligent Vehicle Development. In Proceedings of the 3rd International Conference on Art Design and Digital Technology, ADDT 2024, Luoyang, China, 24–26 May 2024. [Google Scholar] [CrossRef]
Quito, B.; Esmahi, L. Compare and Contrast LiDAR and Non-LiDAR Technology in an Autonomous Vehicle: Developing a Safety Framework. Open J. Saf. Sci. Technol. 2023, 13, 101–131. [Google Scholar] [CrossRef]
Viveash, J.; White, J.; Boughton, J.; King, S.; Kaye, M. Remote Control of Vehicles. In Proceedings of the RTO HFM Symposium on Spatial Disorientation in Military Vehicles: Causes, Consequences and Cures, La Coruña, Spain, 15–17 April 2002. [Google Scholar]
Sullivan, J.L.; Hayhoe, B.; Ballard, D.M. Predicting humanvisuomotor behaviour in a driving task. Philos. Trans. R. Soc. B Biol. Sci. 2014, 369, 20130044. [Google Scholar] [CrossRef]
Serrano, M.; Izquierdo, S.; García Daza, R.; Sotelo, I.; Fernández-Llorca, D. Behavioural Gap Assessment of Human-Vehicle Interaction in Real and Virtual Reality-Based Scenarios in Autonomous Driving. Int. J. Hum. Comput. Interact. 2024, 41, 6879–6892. [Google Scholar] [CrossRef]
Rizzo, M.; Kellison, I.L. Eyes, Brains, and Autos. Arch. Ophthalmol. 2004, 122, 641–647. [Google Scholar] [CrossRef]
Bian, Z.; Pierce, R.; Andersen, G. Eye movement patterns and driving performance. In Proceedings of the Sixth International Driving Symposium on Human Factors in Driver Assessment, Training and Vehicle Design, Lake Tahoe, CA, USA, 27–30 June 2011. [Google Scholar]
Patterson, G.; Howard, C.; Hepworth, L.; Rowe, F. The Impact of Visual Field Loss on Driving Skills: A Systematic Narrative Review. Br. Ir. Orthopt. J. 2019, 15, 53–63. [Google Scholar] [CrossRef] [PubMed]
Roth, N.; Rolfs, M.; Hellwich, O.; Obermayer, K. Objects guide human gaze behavior in dynamic real-world scenes. PLOS Comput. Biol. 2023, 19, e1011512. [Google Scholar] [CrossRef]
Crundall, D.; Underwood, G. Visual attention while driving. In Handbook of Traffic Psychology; Porter, B.E., Ed.; Elsevier: Amsterdam, The Netherlands, 2011; pp. 137–148. [Google Scholar] [CrossRef]
Li, S.; Hao, S. Eye Tracking Study on Visual Search Performance of Automotive Human–Machine Interface for Elderly Users. IEEE Access 2024, 12, 110406–110417. [Google Scholar] [CrossRef]
Mars, F. Driving around bends with manipulated eye-steering coordination. J. Vis. 2008, 8, 10. [Google Scholar] [CrossRef] [PubMed]
Lee, Y.; Lee, J.D.; Ng Boyle, L. Visual attention in driving: The effects of cognitive load and visual disruption. Hum. Factors 2007, 49, 721–733. [Google Scholar] [CrossRef]
Viola, P.; Jones, M.J. Robust real-time face detection. Int. J. Comput. Vis. 2004, 57, 137–154. [Google Scholar] [CrossRef]
Lienhart, R.; Maydt, J. An extended set of haar-like features for rapid object detection. In Proceedings of the International Conference on Image Processing, Rochester, NY, USA, 22–25 September 2002; IEEE: New York City, NY, USA, 2002; Volume 1, p. 900. [Google Scholar]
More, J.; Sutar, D.; Sequeira, R.; Chavan, V. Eye Detection using Haar Cascade Classifier. Int. Res. J. Eng. Technol. 2021, 8, 2395-0056. [Google Scholar]
Vikram, K.; Padmavathi, S. Facial parts detection using Viola Jones algorithm. In Proceedings of the 2017 4th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 6–7 January 2017; IEEE: New York City, NY, USA, 2017. [Google Scholar]
Anjith, G.; Aurobinda, R. Fast and accurate algorithm for eye localisation for gazetracking in low-resolution images. IET Comput. Vis. 2016, 10, 660–669. [Google Scholar]
Fitriyani, N.L.; Yang, C.K.; Syafrudin, M. Real-time eye state detection system using haar cascade classifier and circular hough transform. In Proceedings of the 2016 IEEE 5th Global Conference on Consumer Electronics, Kyoto, Japan, 11–14 October 2016; pp. 1–3. [Google Scholar] [CrossRef]
Haider, S.; Al Kindy, B.; Abbas, A.H. Detection of Iris Localization in Facial Images Using Haar Cascade Circular Hough. Transform. J. Southwest Jiaotong Univ. 2020, 55, 1–13. [Google Scholar] [CrossRef]
Burdzik, R.; Celiński, I.; Młyńczak, J. Research methods and solutions to current transport problems. In Proceedings of the International Scientific Conference Transport of the 21st Century, Ryn, Poland, 9–12 June 2019; Siergiejczyk, M., Krzykowska, K., Eds.; Advances in Intelligent Systems and Computing; Springer: Berlin/Heidelberg, Germany, 2020; Volume 1032, pp. 63–73. [Google Scholar] [CrossRef]
Staniek, M.; Celiński, I.; Sierpiński, G. Adapting the parameters of mountain trails for off-road vehicles for people with special needs. Transport Systems Theory and Practice. In Proceedings of the 20th Scientific and Technical Conference, Katowice, Poland, 10–11 September 2024. [Google Scholar]
Tobii. Available online: https://www.tobii.com/products/eye-trackers/wearables/tobii-pro-glasses-3 (accessed on 14 July 2025).
Available online: https://www.roadtovr.com/smi-3d-eye-tracking-glasses/ (accessed on 14 July 2025).
Młyńczak, J.; Burdzik, R.; Celiński, I. Remote monitoring of the train driver along with the locomotive motion dynamics in the course of shunting using mobile devices. In Dynamical systems: Control and Stability; Politechnika Łódzka: Łódź, Poland, 2015; pp. 411–422. [Google Scholar]
Available online: https://opencv.org/ (accessed on 14 July 2025).
Available online: https://www.ovt.com/products/#image-sensor (accessed on 14 July 2025).
Available online: https://www.raspberrypi.com/products/raspberry-pi-zero-2-w/ (accessed on 14 July 2025).
Available online: https://www.raspberrypi.com/products/raspberry-pi-4-model-b/ (accessed on 10 July 2025).
Available online: https://web.eece.maine.edu/~vweaver/group/green_machines.html (accessed on 14 July 2025).
Available online: https://www.python.org/downloads/release/python-3135/ (accessed on 14 July 2025).
Schmidt, A.; Kasiński, A. The Performance of the Haar Cascade Classifiers Applied to the Face and Eyes Detection. In Computer Recognition Systems; Springer: Berlin/Heidelberg, Germany, 2007. [Google Scholar] [CrossRef]
Jeyasakthy, S. Heterochromia iridis: More than beautiful eyes. Postgrad. Med. J. 2020, 96, 721. [Google Scholar] [CrossRef]
Vijila, S.S.D.; Shastika, M. Air xylophone Using OpenCV. In Proceedings of the 2022 International Conference on Innovative Computing, Intelligent Communication and Smart Electrical Systems, Chennai, India, 15–16 July 2022; pp. 1–6. [Google Scholar] [CrossRef]
Staniek, M.; Celiński, I. The use of LoRa technology as an alternative to GPS in the navigation of a mountain vehicle intended for people with special needs. AoT 2025. Available online: https://archivesoftransport.com (accessed on 14 July 2025).
Available online: https://www.worldtimeserver.com/atomic-clock/ (accessed on 14 July 2025).
Available online: https://www.freepik.com (accessed on 14 July 2025).
Available online: https://www.waveshare.com/d200-lidar-kit.htm (accessed on 14 July 2025).
Staniek, M.; Celiński, I. Digital Twin of Specialized Off-Road Vehicle for Peoples with Special Needs; Lecture Notes in Networks and Systems; (Accepted for Printing, October 2025); Macioszek, E., Sierpiński, G., Jurdana, I., Eds.; Smart, and Environmentally Friendly Contemporary Road Traffic and Transportation Solution; Springer Nature: Berlin/Heidelberg, Germany, 2026. [Google Scholar]
Official RT-DETR (RTDETR paddle pytorch), Real-Time DEtection TRansformer. Available online: https://github.com/lyuwenyu/RT-DETR (accessed on 2 October 2025).
Ultralytics. Available online: https://www.ultralytics.com/ (accessed on 2 October 2025).
Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. DETRs Beat YOLOs on Real-time Object Detection. arXiv 2024, arXiv:2304.08069. [Google Scholar] [CrossRef]
Sierpiński, G.; Staniek, M. Heuristic approach in a multimodal travel planner to support local authorities in urban traffic management. Transp. Res. Procedia 2017, 17, 640–647. [Google Scholar] [CrossRef]

Figure 1. Vehicle for PWSN: (a) general view (b) side view (c) vehicle with driver. Source: photo by Marcin Staniek.

Figure 2. Setup of the vehicle’s vision system: (a) 1—front view camera integrated with the vehicle, 2—eye/eyes monitoring camera, 3—situational/silhouette camera; (b) driver’s gaze monitoring prototypes: (b) 1—head-mounted traffic scene camera, 2—right eye camera, 3—left eye camera, 4—face camera on a GoPro-type boom. Source: authors’ materials. The different combinations of the systems used to monitor the driver’s organ of vision and face and/or body are due to different needs.

Figure 3. Vehicle movement control scenarios: (a) eye movement only, (b) auxiliary use of markers, (c) use of head movements, (d) use of upper limbs, torso, body silhouette, and hand gestures (e) Pose estimation from OAK 1.4.49. Source: authors’ materials and OAK software.

Figure 4. Control solutions: (a) mono—single camera; (b) duo—single camera on a boom, covering both eyeballs; (c) control with extended information. Source: authors’ materials.

Figure 5. Prototypes of the measuring device used to test the characteristics of the sight organ: (a) based on a HAM radio kit with a USB camera (WINDOWS); (b) based on precision eyeglasses (OPP) with the Raspberry P system; (c) set featuring an autofocus camera. Source: authors’ materials.

Figure 6. Blinking characteristics: (a) case 1, short travel time, (b) case 2, long travel time. Source: authors’ materials.

Figure 7. PWSN vehicle positioning systems subject to tests: (a) Gravity GNSS PS, (b) Neo 6M GPS. Source: authors’ materials.

Figure 8. Proprietary programme for analysing characteristics of the organ of vision: (a–c), pupil position reading. Source: authors’ materials.

Figure 9. Device with two cameras, one for the traffic scene and one for the eye: (a) 1st traffic scene camera view, (b) 2nd traffic scene camera view. Source: authors’ materials.

Figure 10. Calibration procedure: (a) detection range, (b) centre position of the eye, (c) pupillary distance measurement. Source: authors’ materials.

Figure 11. Simplified calibration diagram. Source: authors’ materials.

Figure 12. Simplified diagram of eye detection and its outcomes: (a) diagram, (b) HAAR result, (c) outcome of thresholding, colour extraction, and contour detection. Source: authors’ materials.

Figure 13. Simplified diagram and face detection outcomes: (a) diagram, (b) face and eyes detection, lab conditions, (c) driver’s face detection. Source: authors’ materials.

Figure 14. Computational control criteria in the GBB_ET software: (a) slightly to the right, (b) faster to the right. Source: authors’ materials.

Figure 15. Vision control model. Source: authors’ materials.

Figure 16. Selected eye movement characteristics obtained in the studies: (a) identified pupil positions, points (x,y), (b) pupil position deviations accumulated on the x-axis (corresponding to saccades), (c) position deviations accumulated on the y-axis, (d) identified number of basic direction-specific messages based on pupil movement. Source: authors’ materials.

Figure 17. Selected eye movement characteristics obtained in the studies: (a) eye movement speed, (b) eye movement acceleration, (c) average pseudo angle of the field of vision, (d) pupil size. Source: authors’ materials.

Figure 18. Deviation of the pupil centre measurements under both methods: (a,d) comparison of deviations in the x and y axes, (b,c) centres as per the Haar procedure, (e,f) centres according to eye_detect (contour procedure). Source: authors’ materials.

Figure 19. Tests and building of Haar files: (a) eyes sample (b) diagram of the diagnostic procedure (c) diagnostic program window. Source: author’s materials based on FreePik (Malaga Spain) [73].

Figure 20. Testing ground; mobile robot controlled by eye movement: (a) group of robots used, (b) selected target robot, (c) driver controller, (d) examinations at the testing ground. Source: authors’ materials.

Figure 21. Real-life characteristics of the human eye measured while driving a motor vehicle: (a) average velocity, (b) amplitude, (c) acceleration peak, (d) acceleration average.

Figure 22. Digital twin of the PWSN vehicle, basic subsystem: (a) linear acceleration sensors (3–8 parts), (b) front scene camera, (c) another front scene camera. Source: authors’ own materials.

Table 1. Example of the numerical characteristics of the dataset obtained for the test subjects by way of the HAAR procedure and contour detection.

		AVG	STD	Median	Mode
Person 1	Delta_X	0.401	15.614	3	3
Person 1	Delta_Y	23.080	15.399	24	22
Person 2	Delta_X	−1.035	17.114	3	3
Person 2	Delta_Y	21.346	16.416	23	22
Person 3	Delta_X	19	0	19	19
Person 3	Delta_Y	44.961	0.193	45	45

Table 2. Example mobile robot control errors.

Test Drive	Number of Contacts	Direction L	Direction R	Straight	Wrong Direction
2′	12	2	9	1	4
6′	14	3	11	0	6
8′	23	5	16	2	11

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Staniek, M.; Celiński, I. Vision Control of a Vehicle Intended for Tourist Routes Designed for People with Special Needs. Appl. Sci. 2025, 15, 12573. https://doi.org/10.3390/app152312573

AMA Style

Staniek M, Celiński I. Vision Control of a Vehicle Intended for Tourist Routes Designed for People with Special Needs. Applied Sciences. 2025; 15(23):12573. https://doi.org/10.3390/app152312573

Chicago/Turabian Style

Staniek, Marcin, and Ireneusz Celiński. 2025. "Vision Control of a Vehicle Intended for Tourist Routes Designed for People with Special Needs" Applied Sciences 15, no. 23: 12573. https://doi.org/10.3390/app152312573

APA Style

Staniek, M., & Celiński, I. (2025). Vision Control of a Vehicle Intended for Tourist Routes Designed for People with Special Needs. Applied Sciences, 15(23), 12573. https://doi.org/10.3390/app152312573

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Vision Control of a Vehicle Intended for Tourist Routes Designed for People with Special Needs

Featured Application

Abstract

1. Introduction

2. Background

3. Materials and Methods

4. Control vs. Microsleep and Blinking

5. GPS-Assisted Vehicle Steering

6. Processing of Vision Characteristics

7. Calibration Procedure

8. Software

9. Results

10. Method Validation

11. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI