End-to-End Automated Guided Modular Vehicle

: Autonomous Vehicles (AVs) have caught people’s attention in recent years, not only from an academic or developmental viewpoint but also because of the wide range of applications that these vehicles may entail, such as intelligent mobility and logistics, as well as for industrial purposes, among others. The open literature contains a variety of works related to the subject. They employ a diversity of techniques ranging from probabilistic to ones based on Artiﬁcial Intelligence. The increase in computing capacity, well known to many, has opened plentiful opportunities for the algorithmic processing needed by these applications, making way for the development of autonomous navigation, in many cases with astounding results. The following paper presents a low-cost but high-performance minimal sensor open architecture implemented in a modular vehicle. It was developed in a short period of time, surpassing many of the currently available solutions found in the literature. Diverse experiments were carried out in the controlled and circumscribed environment of an autonomous circuit that demonstrates the e ﬃ ciency of the applicability of the developed solution.


Introduction
Advancements in computing power have allowed the development and implementation of more complex systems in vehicle development. The automotive industry has taken advantage of these advancements by creating and continuously improving advanced driver-assistance systems, such as automatic parking systems, road-assisted driving, and sometimes fully autonomous driving (AD) capacity [1]. Urban, commercial, and industrial vehicles have been enhanced with these systems and are commonly known as Automated Guided Vehicles (AGV). Many automotive manufacturers have self-driving car development programs, including Tesla and Toyota. Additionally, other companies, such as Uber and Waymo, focus on the Transport as a Service (TaaS) industry. Finally, automated vehicles are also being employed inside warehouses or factories to automate distribution and optimize processes-e.g., Amazon [2,3].
There is still a long way to go before seeing this technology incorporated into all industries and transport services. The task of controlling a vehicle is not an easy one; it is a task that sometimes even the human being himself finds complicated to perform. The developed prototypes of self-driving car systems are required to fit industry and academic standards in terms of performance, safety, and trust [2,3]. The stand-alone navigation problem seemed to be solved after the Defense Advanced Research Projects Agency (DARPA) Grand and Urban Challenge [4], however new problems popped up. They generated interest in the following research topics: data management, adaptive mobility,

Background Overview
The development of technologies that automate driving in the last decade has structured specific definitions and methodologies that allow the effective implementation of vehicles with the capacity to navigate over complex environments. These definitions and methodologies are the basis of the current development of autonomous vehicles, and it is the same basis used for the development of the solution presented in this work. These concepts are levels of driving automation, approaches to achieving navigation, and learning methods for navigation.

Levels of Driving Automation
Since 2014, the Society of Automotive Engineers (SAE) has published yearly the "SAE J3016 Levels of Driving Automation", which defines six levels of automation ranging from 0, no automation, to 5, full vehicle autonomy [6][7][8]. Table 1 briefly summarizes it. Table 1. Levels of driving automation based on the Society of Automotive Engineers (SAE) J3016 [6][7][8].

0
The driver has the main control even though some of the automated systems are engaged.
The systems work only as momentary assistance providing warnings.
Blindspot warnings, lane departure warnings, parking alerts assistance. 1 The driver must supervise the environment and the system to interfere as he requires it.
The vehicle has partial control and assistance in the steering OR brake/acceleration.
Lane centering OR adaptive cruise control. 2 The vehicle controls the steering and brake/acceleration in specific circumstances.
Lane centering AND adaptive cruise control at the same time. 3 Passive passengers when automated driving is engaged.
The driver needs to intervene upon the system request.
The vehicle has autonomous driving in limited conditions and Traffic jam and highway chauffeur. 4 Passive passengers when automated driving is engaged.
will not operate unless all the conditions are met.
Local driverless taxi (pedals or steering wheel is optional).

5
The passenger never takes over control.
The vehicle has driving control under all conditions.
Robo taxi or full autonomous car (pedals and steering wheel are not required).

Navigation Approaches
By defining the levels of driving automation, it is possible to know the scope and limitations that a navigation system must have when driving the vehicle. The generation of navigation systems for AVs has been achieved in recent years through various approaches.
Two main scopes aim to solve the navigation problem: • Traditional approach: It focuses on the individual and specialized tasks for perception, navigation, control, and actuation. The environment and the ego vehicle are modeled with the gathered data. Thus, for highly dynamic environments this approach is limited by the computing power, model complexity, and sensor quality. Robotic systems have used this modular solution for many years. Its complexity tends to be directly proportional to the model's accuracy [2,3,9]. • End-to-end approach (E2E): This scope allows users to generate a direct correlation of the input data of a system with certain output data through the use of Deep Learning (DL). Artificial intelligence has been a breakthrough for AGVs. Machine learning (ML) serves to generate intelligent perception and control systems for robot navigation. Deep Learning (DL) is a state-of-the-art technique for creating models that resemble human behavior and decision-making [10][11][12][13][14].
DL is trained with positive and negative examples using Deep Neural Networks (DNN). There are several types of architectures to choose from, including Convolution or Deconvolution layers, Recurrence or Memory layers, and Fully Connected Neurons, as well as Encoding or Decoding layers. These layers form popular architectures, such as Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long-Short Term Memories (LSTM), and Fully Convolutional Networks (FCN), among others. Specific architectures have been generated by companies or development groups, such as LeNet, GoogLeNet, VGG, ResNet, Inception, and others [15,16]. The performance of each type of network will strongly depend on the data with which it is trained.
Data comes mostly from the ego vehicle's front view, along with little processed data [17,18]. Outputs are usually referenced values to be given to closed-loop controllers for the steering, speed and braking systems [14]. Among all the DNN, convolutional neural networks (CNN) have proven to be the most efficient ones for this type of task [19]. Their inner layers extract and filter the information with which it is trained. CNNs correlate input images to control and actuate the variables of the AGV. Thus, the training process efficiency is increased. Only a few thousand data points are required to achieve acceptable accuracy [20][21][22].

Learning Methods
Taking into account the scope using neural networks, different learning algorithms have achieved an effective navigation system for AGVs in simulated and real environments. Each algorithm has its advantages and disadvantages in different types of development environments. The two learning algorithms for this kind of neural networks are: • Deep Reinforcement Learning (DRL): This method optimizes a cost function by maximizing a reward or minimizing penalties, depending on which parameters are specified by the user. For example, a reward in an E2E approach regarding autonomous navigation could be to grant positive points if the platform remains inside a delimited lane mark and a negative point if it were to drive outside this area. Regarding this example, the cost function is composed of a set of rules of expected outcomes. A set of discrete randomized actions are tested in each scenario [13]. The DNN will try iteratively to optimize the robot's navigation within a lane [23][24][25], merging [24,26], or exploring unknown environments [27,28]. The algorithm is described in Figure 1.  [13]. The DNN will try iteratively to optimize the robot's navigation within a lane [23][24][25], merging [24,26], or exploring unknown environments [27,28]. The algorithm is described in Figure 1. A mapping algorithm would need to be implemented to delimit the lane marks that feed the image to the DNN and establish the reward rules and parameters. Not to mention that it is an iterative method, which means that the training of the DNN would need to happen as the platform drives itself in the chosen area. This is a risky approach, since no training has been carried out in the first iteration, so the platform would be driving itself around the area at the same time as it trains itself with the rewards and the penalties. This training is usually developed in big areas, or simulated as a first approach and then taken to the real world. This is dangerous, as in the first iterations the platform is going to drive outside the lane; it will break and accelerate very slowly or quickly. It can hurt somebody or even damage its own structure in the training process.
For this work, a safe area was chosen to acquire training data. No simulator was used. For that reason, DRL was not the approach used for the development of this work.

•
Behavioral Cloning/Imitation Learning (BC/IL): These algorithms are applications of supervised learning. They are iterative processes in which a network modifies its weights when given a set of inputs with its corresponding outputs. The training and validation datasets contain information about the vehicle's surrounding environment and control variables: steering wheel angle, acceleration, and braking. Data are acquired by navigating several laps on a given track. The DNN generalizes an optimal solution after being trained with a valid dataset [29][30][31]. Valid datasets depend on the complexity of the track [30]. Usually tens of thousands of examples are needed [32]. The CNN is used to train these sets of images, performing sequential tasks of convolutions, batch normalizations, and max pooling activations in order to take important image descriptors. These extractions of image features are commonly taken using feature A mapping algorithm would need to be implemented to delimit the lane marks that feed the image to the DNN and establish the reward rules and parameters. Not to mention that it is an iterative method, which means that the training of the DNN would need to happen as the platform drives itself in the chosen area. This is a risky approach, since no training has been carried out in the first iteration, so the platform would be driving itself around the area at the same time as it trains itself with the rewards and the penalties. This training is usually developed in big areas, or simulated as a first approach and then taken to the real world. This is dangerous, as in the first iterations the platform is going to drive outside the lane; it will break and accelerate very slowly or quickly. It can hurt somebody or even damage its own structure in the training process.
For this work, a safe area was chosen to acquire training data. No simulator was used. For that reason, DRL was not the approach used for the development of this work.

•
Behavioral Cloning/Imitation Learning (BC/IL): These algorithms are applications of supervised learning. They are iterative processes in which a network modifies its weights when given a set of inputs with its corresponding outputs. The training and validation datasets contain information about the vehicle's surrounding environment and control variables: steering wheel angle, acceleration, and braking. Data are acquired by navigating several laps on a given track. The DNN generalizes an optimal solution after being trained with a valid dataset [29][30][31]. Valid datasets depend on the complexity of the track [30]. Usually tens of thousands of examples are needed [32]. The CNN is used to train these sets of images, performing sequential tasks of convolutions, batch normalizations, and max pooling activations in order to take important image descriptors. These extractions of image features are commonly taken using feature extraction methods, such as Scale-Invariant Feature Transform (SIFT), Speeded Up Robust Features (SURF), and others [33]. These methods often have image rotation invariance extractors as well as a low computational cost. For a BC/IL approach, a high number of features need to be extracted and acquired in very short periods of time. CNNs use the same approach as a conventional feature extractor, but they perform more tasks in the color and intensity histograms. These features are labeled with either the steering wheel angle, gas pedal intensity, brake force, or all three of them.
The system captures images and labels them with the aforementioned parameters while the users drive. CNNs extract features from the images and correlate them with the specified parameters. They do this by inputting the features extracted from the camera into the model, which outputs a label. Output data is used to give the control information of the vehicle. This approach requires large quantities of images but does not need in-field training as in the DRL approach. This implies that safety is only at stake when a model with 95%+ accuracy is tested.
Both methods can be trained and tested in simulated or real-world environments. DRL converges faster in simulations because of the lower complexity of the environment [34]. After an acceptable performance, it can be tested in a real-world environment. However, extra safety measures must be considered, such as inertia, friction, braking, and acceleration, which represent some factors that cannot be accurately simulated. BC and IL need human intervention to be trained. They offer a more straightforward solution; however, they require much more time to acquire enough data to function. The quality and amount of data, the DNN architecture, reward functions, and computer power determine how much time is needed to train both approaches [19,32]. The ego vehicle will perform well if the testing scenario resembles the training data.

Related Work and State-of-the-Art for AD
AI and ML applied to robot navigation and autonomous vehicles have been trending for more than 20 years. However, computer power and memory usage have limited the performance of such algorithms. Current technology can train and implement such technology in shorter periods, which allows more complex variations of these algorithms to be developed [19]. The first practical implementation of BC/IL was carried out in 1989 by Carnegie Mellon University. The Autonomous Land Vehicle in a Neural Network (ALVINN) used a three-layer NN. It used an input image of the front view to output a steering angle [35].
The Defense Advanced Research Projects Agency (DARPA) developed a vehicle in 2004. DARPA Autonomous Vehicle (DAVE) used a six-layer CNN with a binocular camera in the front. Images with steering angles given by a human driver who trained the DNN. The robot successfully navigated through an obstacle course [36]. In 2016 Nvidia developed DAVE-2. It used BC/IL with a nine-layer CNN. Data obtained from a front camera and two lateral ones were loaded to the DNN in order to predict the steering wheel angle. This architecture improved the system's accuracy and prevented possible derailments [15,16].
The works mentioned above set the basis for state-of-the-art developments, like the one presented in this paper. Deep Learning state-of-the-art work and developments can be found in Table 2. All of them are applications either in simulation, real, or combining both types of environments. Some have optimization purposes or test systems that allow them to assist driving at different levels of autonomy.  [44].
Real-world and simulated fused environments The system predicts the position of front vehicles [12].

DL models testing in simulated environments
As seen in the last table, there are some gaps in the methodology and proposal development on low-cost modular vehicle platforms, where the BC/IL algorithm is the basis for driving automation. From this, we can highlight the different areas of opportunity and contributions that this work seeks to develop. These are:

1.
Few works of large-scale automated vehicles for the transport of people or cargo.

2.
A low-cost modular AGV capable of autonomous driving on certain fixed routes.

3.
A practical implementation of the BC/IL algorithm in a real environment.

4.
Research, development, and test methodology implementation.

5.
A quick development limited to four-weeks of work. BC/IL is mostly limited to simulated environments with simple vehicular models or controlled environments with small-scale vehicles. Likewise, its implementation is developed or optimized through these simulated environments or merging with real data. There are very few tasks where the algorithms are embedded in a real-size vehicle platform or in a vehicle with the capacity to carry people or heavy material. Most of them are developed on small-scale automotive platforms. This becomes a differential in this work by the development of a minimal, modular, and low-cost sensor architecture but with a high performance that allows automating the driving to an SAE level 1 on a AGV platform that can be used to load material or people on it. Additionally, due to the AGV's modularity, it can be modified or adapted in size, sensors, and capacities as required. This is how the concept of the Automated Guided Modular Vehicle is created.
The following section details the capabilities of the platform used for this work, mentioning its technical and development characteristics.

Research Approach
To carry out this work, it was necessary to use a research and development methodology. The development had to fulfill with the proposed work within a margin of 4 weeks. During this time, the research, development, and testing of the vehicle platform had to be carried out with the BC/IL algorithm. This methodology can be seen in Figure 2.
to develop. These are: 1. Few works of large-scale automated vehicles for the transport of people or cargo. 2. A low-cost modular AGV capable of autonomous driving on certain fixed routes. 3. A practical implementation of the BC/IL algorithm in a real environment. BC/IL is mostly limited to simulated environments with simple vehicular models or controlled environments with small-scale vehicles. Likewise, its implementation is developed or optimized through these simulated environments or merging with real data. There are very few tasks where the algorithms are embedded in a real-size vehicle platform or in a vehicle with the capacity to carry people or heavy material. Most of them are developed on small-scale automotive platforms. This becomes a differential in this work by the development of a minimal, modular, and low-cost sensor architecture but with a high performance that allows automating the driving to an SAE level 1 on a AGV platform that can be used to load material or people on it. Additionally, due to the AGV's modularity, it can be modified or adapted in size, sensors, and capacities as required. This is how the concept of the Automated Guided Modular Vehicle is created.
The following section details the capabilities of the platform used for this work, mentioning its technical and development characteristics.

Research Approach
To carry out this work, it was necessary to use a research and development methodology. The development had to fulfill with the proposed work within a margin of 4 weeks. During this time, the research, development, and testing of the vehicle platform had to be carried out with the BC/IL algorithm. This methodology can be seen in Figure 2.  The methodology describes the stages for the development of this work. The first stages were to define and know the vehicle platform. This platform was already built, so this work was limited to knowing its characteristics and capabilities, which are described in the following section. For the instrumentation, actuators were already defined in the platform, but it was necessary to propose the required sensors to implement the BC/IL algorithm for driving automation.
The next stage was to define the hardware and software for processing. This needed to have the minimum capacity necessary to process DL algorithms and be able to communicate with the sensors and actuators installed on the platform. That is why the Nvidia Jetson AGX Javier met the requirements.
The stages of software development and integration were the next ones. During these stages, it was fundamental the use of ROS and DL frameworks, such as Keras and Tensorflow. The testing stage was made up of several parts, such as data acquisition and neural network training and testing protocol. All of this focused on the proposed route for the vehicle to drive it autonomously.
After obtaining results, it was necessary to analyze and define some design improvements on all stages of the development of this work. These will be reviewed in the analysis, conclusions, and future work sections of this paper.
For all stages of the methodology, it was essential to support it with information found in state-of-the-art developments. All the stages are detailed in the following sections.

Experimental Platform
Different add-ons enhance the bare-bones platform so that it can be used for diverse purposes, such as personal transportation or cargo. The electric vehicle's drive is rear-wheeled. A control box in the rear generates the signals for the open-loop transmission system. This system can set a speed level, rotational direction, and braking. The brakes are triggered with a hysteresis controller system that activates coupled relays.
This system communicates through an RS-232 serial protocol. The vehicle uses an Ackermann distribution with a 30 • steering ratio. The steering system uses the same communication protocol as the brakes, while a closed-loop system controls the position. It is also equipped with switches that detect the maximum position on each side. The modular platform can be controlled with an Xbox wireless controller with no electrical or mechanical modifications made to the vehicle. The general technical details of the platform are listed in Table 3 and an image of the modular platform is depicted in Figure 3. The methodology describes the stages for the development of this work. The first stages were to define and know the vehicle platform. This platform was already built, so this work was limited to knowing its characteristics and capabilities, which are described in the following section. For the instrumentation, actuators were already defined in the platform, but it was necessary to propose the required sensors to implement the BC/IL algorithm for driving automation.
The next stage was to define the hardware and software for processing. This needed to have the minimum capacity necessary to process DL algorithms and be able to communicate with the sensors and actuators installed on the platform. That is why the Nvidia Jetson AGX Javier met the requirements.
The stages of software development and integration were the next ones. During these stages, it was fundamental the use of ROS and DL frameworks, such as Keras and Tensorflow. The testing stage was made up of several parts, such as data acquisition and neural network training and testing protocol. All of this focused on the proposed route for the vehicle to drive it autonomously.
After obtaining results, it was necessary to analyze and define some design improvements on all stages of the development of this work. These will be reviewed in the analysis, conclusions, and future work sections of this paper.
For all stages of the methodology, it was essential to support it with information found in stateof-the-art developments. All the stages are detailed in the following sections.

Experimental Platform
Different add-ons enhance the bare-bones platform so that it can be used for diverse purposes, such as personal transportation or cargo. The electric vehicle's drive is rear-wheeled. A control box in the rear generates the signals for the open-loop transmission system. This system can set a speed level, rotational direction, and braking. The brakes are triggered with a hysteresis controller system that activates coupled relays.
This system communicates through an RS-232 serial protocol. The vehicle uses an Ackermann distribution with a 30° steering ratio. The steering system uses the same communication protocol as the brakes, while a closed-loop system controls the position. It is also equipped with switches that detect the maximum position on each side. The modular platform can be controlled with an Xbox wireless controller with no electrical or mechanical modifications made to the vehicle. The general technical details of the platform are listed in Table 3 and an image of the modular platform is depicted in Figure 3.  This is a minimal system vehicle that can be used for passenger transportation or cargo. It can be further enhanced with sensors and processors to become an AGV.

Sensors and Instrumentation
The vehicle had three cameras covering the front view. The central processing hardware was a Nvidia Jetson AGX Xavier that oversaw acquiring data, processing it, and turning it into actuator primitives.
The components that were used to accomplish autonomous navigation can be grouped into sensors, processors, and control elements. Table 4 briefly summarizes these elements. Two types of sensors were used to acquire information from the environment: cameras and a LIDAR. The Logitech cameras used were middle/low range. They provided 720-1080 p images at 30 fps with a 78 • field of view (FOV). Their automatic light correction warranties contrasted images, making them a good choice for different outdoor environments. Three of these cameras were used to generate a 150 • FOV, as the acquired images from the left and right cameras overlapped the central one. The LIDAR, an active sensor manufactured by Hokuyo, uses a laser source to scan a 270 • FOV divided into 1081 steps and a maximum distance of 10 m. It has an angular resolution of 0.25 • and a linear accuracy of 40 mm. A full scan takes around 25 ms. The LIDAR was positioned in the center of the vehicle's front. An F310 Logitech gamepad was used to tele-operate the vehicle's velocity and Ackerman's steering angle [49]. These four sensors along with the gamepad were connected to the AGV's processor: a Nvidia Jetson AGX Xavier. The cameras were connected through USB-C 3.0 ports, the Hokuyo through an Ethernet connection, and the gamepad via a USB 2.0.
A Nvidia Jetson AGX Xavier and an Atmega 2560 were used as high and low-level controllers, respectively. The Nvidia Jetson AGX Xavier acquired and processed information from the environment and the user. The user's information was acquired through a mouse, a keyboard, and a gamepad, which were connected to a USB-hub 3.0. The CPU/GPU capabilities of this development kit enable it to perform operations of up to 1100 MHz. Thus, ML algorithms can be easily implemented in this platform. The user can supervise the Nvidia Jetson AGX Xavier through an on-board screen. The developer kit translated its computations into low-level instructions that could be communicated to a microcontroller using an RS232 protocol. An ATmega 2560 microcontroller embedded in an Arduino 2560 shield was used. This microcontroller can handle an intensive use of input-output (IO) peripherals and serial communication protocols. It has enough computational power to implement a basic open-loop and hysteresis control for the speed, braking, and steering systems. The speed system was open loop. Basic digital outputs were enough to activate and deactivate the system. Hysteresis control for the brakes and steering required an Analog-Digital Converter (ADC) and the RS232 communication protocol. The brakes and the steering angle positions were sensed with potentiometers. Typical H-bridges were used to trigger the actuators, which could be halted by a kill switch that interrupted the Vcc. The kill switch was used as an emergency stop. Figure 4 shows the instrument connection diagram.
systems. The speed system was open loop. Basic digital outputs were enough to activate and deactivate the system. Hysteresis control for the brakes and steering required an Analog-Digital Converter (ADC) and the RS232 communication protocol. The brakes and the steering angle positions were sensed with potentiometers. Typical H-bridges were used to trigger the actuators, which could be halted by a kill switch that interrupted the Vcc. The kill switch was used as an emergency stop. Figure 4 shows the instrument connection diagram.

Software Architecture
The Robotics Operating System (ROS) was chosen because of its ease of handling input and output devices, communication protocols, such as RS232 or CAN, and simultaneously running scripts of different programming languages, like Python or C++. ROS is based on nodes that oversee specific tasks, such as communicating with a microcontroller or processing information from a camera. Nodes subscribe or publish information through topics. They can run in parallel, which allows an agile flow of information. Safety nodes were implemented so that whenever a publisher stopped working, the vehicle would make an emergency stop. The ROS facilitated high and low-level algorithm integration. SAE level 1 navigation was achieved with E2E BC/IL. Several expert users drove on the test track to generate the training data. Drivers used the vehicle at different times of the day to get different light conditions. Twelve models were tested in a simulated testbed before being implemented in the vehicle. The AGV was tested on campus in a controlled environment. All the autonomous navigations were successful. The enhancement and automation were carried out in a four-week time span.
The goal of this work was to implement the instrumentation, programming, intelligence, and integration of minimal system requirements to enhance an electric vehicle to an SAE level 1 AGV in less than four weeks. The BC/IL minimum instrumentation proposed in [15] was used as a basis. A LIDAR sensor was added in order to continuously monitor the distance to the surrounding elements and prevent a collision. This section will go into further detail of the instrumentation, the test track and the DNN architecture.

Software Architecture
The Robotics Operating System (ROS) was chosen because of its ease of handling input and output devices, communication protocols, such as RS232 or CAN, and simultaneously running scripts of different programming languages, like Python or C++. ROS is based on nodes that oversee specific tasks, such as communicating with a microcontroller or processing information from a camera. Nodes subscribe or publish information through topics. They can run in parallel, which allows an agile flow of information. Safety nodes were implemented so that whenever a publisher stopped working, the vehicle would make an emergency stop. The ROS facilitated high and low-level algorithm integration. SAE level 1 navigation was achieved with E2E BC/IL. Several expert users drove on the test track to generate the training data. Drivers used the vehicle at different times of the day to get different light conditions. Twelve models were tested in a simulated testbed before being implemented in the vehicle. The AGV was tested on campus in a controlled environment. All the autonomous navigations were successful. The enhancement and automation were carried out in a four-week time span.
The goal of this work was to implement the instrumentation, programming, intelligence, and integration of minimal system requirements to enhance an electric vehicle to an SAE level 1 AGV in less than four weeks. The BC/IL minimum instrumentation proposed in [15] was used as a basis. A LIDAR sensor was added in order to continuously monitor the distance to the surrounding elements and prevent a collision. This section will go into further detail of the instrumentation, the test track and the DNN architecture.

Neural Network Architecture
A robust and fast CNN was used to solve the autonomous steering problem of this work. As mentioned in Section 2, DNNs are the most popular alternatives to train a BC/IL system [17]. This work's implementation was inspired by DAVE-2 [15,16]. In this model, an image could be loaded to a DNN and yield a steering wheel angle value for the modular platform. Thirteen CNN models were tested before one had the desired driving behavior. The images used as inputs were acquired from the three cameras described in Section 5.1. The images coming from the front central camera were labeled with the steering wheel angle, whereas the ones are coming from the lateral one were labeled with a correction of the steering wheel angle. Each image was an RGB 720 × 480 px. The images were resized and cropped to an 83 × 160 8-bit RGB. As the RGB data had a 0-255 range, it needed to be normalized to a −0.5 to 0.5 range before being fed to the NN. Normalization helped to reduce the processing time and memory usage. The following formula was used to achieve this: x/255 − 0.5. (1) The input layer was followed by five convolutional layers. The first three use 5 × 5-sized filters and the next two 3 × 3. The output was then vectorized to obtain an array of 2496 elements. Then, three fully connected neural layers of 200, 100, and 20 elements were alternated with dropout layers. The dropout layers prevent the system from overfitting [50]. Convolutional and fully connected layers were activated by Rectified Linear Unit (ReLU) functions, which helped with weight adjustment during the Adam-optimized training stage [51]. More details of the training process will be described in the following sections. Finally, the one-dimensional layer outputs from −1 to 1 were normalized to the predicted angle. Figure 5 shows a diagram of the CNN architecture used for this work.
work's implementation was inspired by DAVE-2 [15,16]. In this model, an image could be loaded to a DNN and yield a steering wheel angle value for the modular platform. Thirteen CNN models were tested before one had the desired driving behavior. The images used as inputs were acquired from the three cameras described in section 5.1. The images coming from the front central camera were labeled with the steering wheel angle, whereas the ones are coming from the lateral one were labeled with a correction of the steering wheel angle. Each image was an RGB 720 × 480 px. The images were resized and cropped to an 83 × 160 8-bit RGB. As the RGB data had a 0-255 range, it needed to be normalized to a −0.5 to 0.5 range before being fed to the NN. Normalization helped to reduce the processing time and memory usage. The following formula was used to achieve this: The input layer was followed by five convolutional layers. The first three use 5 × 5-sized filters and the next two 3 × 3. The output was then vectorized to obtain an array of 2496 elements. Then, three fully connected neural layers of 200, 100, and 20 elements were alternated with dropout layers. The dropout layers prevent the system from overfitting [50]. Convolutional and fully connected layers were activated by Rectified Linear Unit (ReLU) functions, which helped with weight adjustment during the Adam-optimized training stage [51]. More details of the training process will be described in the following sections. Finally, the one-dimensional layer outputs from −1 to 1 were normalized to the predicted angle. Figure 5 shows a diagram of the CNN architecture used for this work. During the training process, different architectures were tested by modifying the convolution filter sizes and the number of neurons, optimizers, and activation functions in order to find the best model, which is the one described above.

Proposed Route
Tecnológico de Monterrey, Campus Ciudad de México is in the south of Mexico City. The campus has internal roads and streets. The traffic and number of pedestrians in these roadways are almost non-existent. Therefore, they represent an excellent option for a controlled and low-risk environment. The chosen 170 m flat road segment is located between the Cedetec and Legorreta buildings. When the roundtrip is considered, the street has two pronounced 90° left/right curves and a 70 m slightly left/right curved way. The roadway is generally in good condition, although it has a few bumps and potholes with poor lane separation. The sidewalk is separated from the road surface by a yellow line. The satellite view of the test track (red) is depicted in Figure 6. The data acquisition and test phases were held under restricted access to the area and expert supervision. During the training process, different architectures were tested by modifying the convolution filter sizes and the number of neurons, optimizers, and activation functions in order to find the best model, which is the one described above.

Proposed Route
Tecnológico de Monterrey, Campus Ciudad de México is in the south of Mexico City. The campus has internal roads and streets. The traffic and number of pedestrians in these roadways are almost non-existent. Therefore, they represent an excellent option for a controlled and low-risk environment. The chosen 170 m flat road segment is located between the Cedetec and Legorreta buildings. When the roundtrip is considered, the street has two pronounced 90 • left/right curves and a 70 m slightly left/right curved way. The roadway is generally in good condition, although it has a few bumps and potholes with poor lane separation. The sidewalk is separated from the road surface by a yellow line. The satellite view of the test track (red) is depicted in Figure 6. The data acquisition and test phases were held under restricted access to the area and expert supervision.

Development
This section will provide details on how the ROS integration, data acquisition, system training, and tests were developed.

ROS Integration
Hardware architecture described previously in Section 5.1 was managed by the Robot Operating System (ROS), a collection of open-source software. It provides a wide range of computational tools such as low-level device control, process communication, hardware abstraction, and package management. ROS is based on the computational graph model, which allows a wide range of devices to run the same programs if they comply with some basic hardware specifications. The communication between platforms is straightforward and simple. Each ROS process is called a node, which is jointly interconnected through topics. A node is a hardware or software element that requests or provides data, which can be a single value or clustered elements. Data are sent through topics. An ROS Melodic Morena was used on an Ubuntu 18.04 OS [52]. Each element described in section 5.1 was a node.
The master node was a C++ or Python program used during the data acquisition and testing, respectively. This node subscribed to each input element: cameras (usb_cam), LIDAR (urg_node), and the gamepad (joy). It also published information via RS232 (rosserial) to an ATmega256. The microcontroller was also a node (rosserial_arduino). The master script used OpenCV, cv_bridge, image_transport, and sensor_msgs libraries in order to acquire and process the corresponding information. It receives information from the Nvidia Jetson AGX Xavier, and transforms this information into primitives for the actuators.
For the training and test phases, a template was created. It retrieves and preprocesses information from the input nodes. The camera's images were switched from an RGB to BGR color palette, then they were resized and cropped. The LIDAR data were continuously compared with a safety factor; if the current measurement was within ±10% of the safety factor, the vehicle would try to avoid the threat. However, if the measurement was not acceptable, the platform would stop and

Development
This section will provide details on how the ROS integration, data acquisition, system training, and tests were developed.

ROS Integration
Hardware architecture described previously in Section 5.1 was managed by the Robot Operating System (ROS), a collection of open-source software. It provides a wide range of computational tools such as low-level device control, process communication, hardware abstraction, and package management. ROS is based on the computational graph model, which allows a wide range of devices to run the same programs if they comply with some basic hardware specifications. The communication between platforms is straightforward and simple. Each ROS process is called a node, which is jointly interconnected through topics. A node is a hardware or software element that requests or provides data, which can be a single value or clustered elements. Data are sent through topics. An ROS Melodic Morena was used on an Ubuntu 18.04 OS [52]. Each element described in Section 5.1 was a node.
The master node was a C++ or Python program used during the data acquisition and testing, respectively. This node subscribed to each input element: cameras (usb_cam), LIDAR (urg_node), and the gamepad (joy). It also published information via RS232 (rosserial) to an ATmega256. The microcontroller was also a node (rosserial_arduino). The master script used OpenCV, cv_bridge, image_transport, and sensor_msgs libraries in order to acquire and process the corresponding information. It receives information from the Nvidia Jetson AGX Xavier, and transforms this information into primitives for the actuators.
For the training and test phases, a template was created. It retrieves and preprocesses information from the input nodes. The camera's images were switched from an RGB to BGR color palette, then they were resized and cropped. The LIDAR data were continuously compared with a safety factor; if the current measurement was within ±10% of the safety factor, the vehicle would try to avoid the threat. However, if the measurement was not acceptable, the platform would stop and enter lock mode. The published data were sent using the RS232 communication protocol. The speed interval went from zero up to 30 km/h. It was normalized in a −4 to 9 interval. This range was chosen according to the documentation of the modular platform. The sign of the integer determined the motion direction: negative backward and positive forward. According to the documentation, the steering values of the platform range from −20 • to 20 • . The microcontroller produces a Pulse-Width Modulation (PWM) signal, with a duty cycle that is proportional to the steering angle. The minimum value maps to the leftmost steering value (−20 • ) with the value of 400, while the maximum is the rightmost, (20 • ) with a value of 800. The brakes were activated with a Boolean. The data were gathered into a single string that contained the speed, steering, and braking setpoints. The microcontroller receives the string and parses it to acquire the desired setpoints. As the data were already in primitive values, the ATmega256 uses them as references for the open-loop or hysteresis control.
For the training phase, synchronized data from each node were properly indexed and stored. This data was later used to train the CNN. The ROS low latency minimizes the time difference between samples coming from different devices. However, the process cannot be considered real-time. During the testing phase, data was added to the CNN and the output was published to the actuator. The trained model uses libraries for Python and C++ contained in Tensorflow and Keras [53]. The user supervises the proper performance of the system. A safety node was implemented using the gamepad, such that a button can be pressed to stop the software from executing and lock the modular platform. Figure 7 shows the overall ROS integrated system.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 13 of 26 enter lock mode. The published data were sent using the RS232 communication protocol. The speed interval went from zero up to 30 km/h. It was normalized in a −4 to 9 interval. This range was chosen according to the documentation of the modular platform. The sign of the integer determined the motion direction: negative backward and positive forward. According to the documentation, the steering values of the platform range from −20° to 20°. The microcontroller produces a Pulse-Width Modulation (PWM) signal, with a duty cycle that is proportional to the steering angle. The minimum value maps to the leftmost steering value (−20°) with the value of 400, while the maximum is the rightmost, (20°) with a value of 800. The brakes were activated with a Boolean. The data were gathered into a single string that contained the speed, steering, and braking setpoints. The microcontroller receives the string and parses it to acquire the desired setpoints. As the data were already in primitive values, the ATmega256 uses them as references for the open-loop or hysteresis control.
For the training phase, synchronized data from each node were properly indexed and stored. This data was later used to train the CNN. The ROS low latency minimizes the time difference between samples coming from different devices. However, the process cannot be considered realtime. During the testing phase, data was added to the CNN and the output was published to the actuator. The trained model uses libraries for Python and C++ contained in Tensorflow and Keras [53]. The user supervises the proper performance of the system. A safety node was implemented using the gamepad, such that a button can be pressed to stop the software from executing and lock the modular platform. Figure 7 shows the overall ROS integrated system. The initial tests were performed with a high range computer with a Nvidia GPU. The ROS node integration worked smoothly. However, during the migration to the Nvidia Jetson AGX Xavier, several compatibility issues emerged. The first one to appear was CUDA, a parallel computing platform and programming model from NVIDIA, which is optimized to work with a GPU or a CPU. Nevertheless, our development kit is a mixture of both. A lite version needed to be installed to solve The initial tests were performed with a high range computer with a Nvidia GPU. The ROS node integration worked smoothly. However, during the migration to the Nvidia Jetson AGX Xavier, several compatibility issues emerged. The first one to appear was CUDA, a parallel computing platform and programming model from NVIDIA, which is optimized to work with a GPU or a CPU. Nevertheless, our development kit is a mixture of both. A lite version needed to be installed to solve the problem. Another issue was the interaction between Python versions. ROS uses Python 2.7, while all the AI software uses Python 3.7. Separate virtual environments were needed to make all the libraries work together.

Data Acquisition and Preprocessing
The hardware architecture and the ROS-based system were tested in the lab before testing them with the vehicle. The LIDAR response worked as expected. No further modifications were needed. The image acquisition system required several iterations for it to work correctly. During the initial tests, some images were missing or broken. The initial supposition was that ROS was unable to handle data acquisition from three cameras simultaneously. However, when testing the template with different cameras, this hypothesis was dismissed. Individual response time tests with RGB 720 × 480 images were performed. The results showed that the number of frames per second (fps) of each camera was different: 18, 20, and 60, respectively. The simplest solution was to sample all the cameras at the slowest rate (18 fps). This way, the same number of simultaneous images could be acquired and saved along with the current steering angle. The images were preprocessed to select an area of interest and this area was taken into consideration when training the CNN. Such an area can be seen in Figure 8.

Data Acquisition and Preprocessing
The hardware architecture and the ROS-based system were tested in the lab before testing them with the vehicle. The LIDAR response worked as expected. No further modifications were needed. The image acquisition system required several iterations for it to work correctly. During the initial tests, some images were missing or broken. The initial supposition was that ROS was unable to handle data acquisition from three cameras simultaneously. However, when testing the template with different cameras, this hypothesis was dismissed. Individual response time tests with RGB 720 × 480 images were performed. The results showed that the number of frames per second (fps) of each camera was different: 18, 20, and 60, respectively. The simplest solution was to sample all the cameras at the slowest rate (18 fps). This way, the same number of simultaneous images could be acquired and saved along with the current steering angle. The images were preprocessed to select an area of interest and this area was taken into consideration when training the CNN. Such an area can be seen in Figure 8. In order to successfully train a CNN, enough training data is required. The data should be as diverse and complete as possible in order for the CNN to throw a useful result. For this purpose, the test track was recorded on seven round trips. More than 100 thousand images were collected. Each image was bound with its respective steering angle. The navigation of the vehicle was made in righthand traffic. For the purpose of assuring suitable training data for the CNN, steering angles collected as raw data were normalized and augmented. A histogram of raw data for such steering angles is shown in Figure 9. It can be seen that there is a slight bias in the histogram, which might lead to an unsuccessful training. In order to successfully train a CNN, enough training data is required. The data should be as diverse and complete as possible in order for the CNN to throw a useful result. For this purpose, the test track was recorded on seven round trips. More than 100 thousand images were collected. Each image was bound with its respective steering angle. The navigation of the vehicle was made in right-hand traffic. For the purpose of assuring suitable training data for the CNN, steering angles collected as raw data were normalized and augmented. A histogram of raw data for such steering angles is shown in Figure 9. It can be seen that there is a slight bias in the histogram, which might lead to an unsuccessful training. Appl. Sci. 2020, 10, x FOR PEER REVIEW 15 of 26 Figure 9. Histogram of the raw data acquisition.
In order to minimize the effect of this bias, prior to training we introduced data augmentation. That is, we horizontally flipped all images and multiplied the normalized steering data times −1. As a result, we had a dataset twice as big as the original one (206,726 images with its corresponding steering angle). This way, the CNN could generalize more efficiently against any bias that is taken towards any given steering angle. The histogram of the data after augmentation and normalization can be seen in Figure 10. The histogram demonstrates how the augmentation and normalization of the data help give greater symmetry to the amount of data per label. This allows to reduce the probability of bias of the system towards a certain type of response and lets the model generalize data in a better way during training.
A total of 103,363 synchronized data points were used for the training stage. This dataset was split in Keras into training (70%) and validation (30%) subsets. The percentages used are used in stateof-the-art works. The separation of data can be seen in more detail in Table 5. In order to minimize the effect of this bias, prior to training we introduced data augmentation. That is, we horizontally flipped all images and multiplied the normalized steering data times −1. As a result, we had a dataset twice as big as the original one (206,726 images with its corresponding steering angle). This way, the CNN could generalize more efficiently against any bias that is taken towards any given steering angle. The histogram of the data after augmentation and normalization can be seen in Figure 10. In order to minimize the effect of this bias, prior to training we introduced data augmentation. That is, we horizontally flipped all images and multiplied the normalized steering data times −1. As a result, we had a dataset twice as big as the original one (206,726 images with its corresponding steering angle). This way, the CNN could generalize more efficiently against any bias that is taken towards any given steering angle. The histogram of the data after augmentation and normalization can be seen in Figure 10. The histogram demonstrates how the augmentation and normalization of the data help give greater symmetry to the amount of data per label. This allows to reduce the probability of bias of the system towards a certain type of response and lets the model generalize data in a better way during training.
A total of 103,363 synchronized data points were used for the training stage. This dataset was split in Keras into training (70%) and validation (30%) subsets. The percentages used are used in stateof-the-art works. The separation of data can be seen in more detail in Table 5. The histogram demonstrates how the augmentation and normalization of the data help give greater symmetry to the amount of data per label. This allows to reduce the probability of bias of the system towards a certain type of response and lets the model generalize data in a better way during training.
A total of 103,363 synchronized data points were used for the training stage. This dataset was split in Keras into training (70%) and validation (30%) subsets. The percentages used are used in state-of-the-art works. The separation of data can be seen in more detail in Table 5.

Training Process
The training data contains individual images sampled from the video along with the corresponding steering command. Training with human driver data is not enough. In order to successfully navigate the road and avoid leaving the lane, the network must also learn how to recover from mistakes.
In order to facilitate this improved model, images of the car from different perspectives are added to the training data, including images from the center of the lane and images taken from different angles of the road.
A block diagram of the training system developed is shown in Figure 11. The images are inputted into a CNN that calculates a proposed steering angle, which is compared to the desired angle for that image. The resulting error is used to update the weights of the CNN and to bring the CNN output closer to the desired output. The weight adjustment is performed using a backpropagation algorithm and optimizers such as the Adams, Keras, and Tensorflow libraries. It is important to note that all the data prior to training enter a normalization layer of the CNN, with the aim of making training more efficient and reducing the computer's memory usage.

Training Process
The training data contains individual images sampled from the video along with the corresponding steering command. Training with human driver data is not enough. In order to successfully navigate the road and avoid leaving the lane, the network must also learn how to recover from mistakes.
In order to facilitate this improved model, images of the car from different perspectives are added to the training data, including images from the center of the lane and images taken from different angles of the road.
A block diagram of the training system developed is shown in Figure 11. The images are inputted into a CNN that calculates a proposed steering angle, which is compared to the desired angle for that image. The resulting error is used to update the weights of the CNN and to bring the CNN output closer to the desired output. The weight adjustment is performed using a backpropagation algorithm and optimizers such as the Adams, Keras, and Tensorflow libraries. It is important to note that all the data prior to training enter a normalization layer of the CNN, with the aim of making training more efficient and reducing the computer's memory usage. The training process required a careful selection of parameters, which included the optimizer, batch size, number of time values, and separation of data between validation and training. Adam, Stochastic Gradient Descent (SGD), and Adadelta optimizers were used in the different models.
The loss is calculated by training and validation. Unlike accuracy, loss is not a percentage. It is a summation of the errors made for each example in training or validation sets. The loss value is usually lower in the training set. The important thing to take care of is that there is no overfitting by observing a general increase in the loss value instead of a decrease. The number of epochs varied depending on the loss drop difference, but all the models were run between 12 and 25 times before the loss value stopped decreasing. Loss values are usually closer to 0. The lower the loss, the better the model (unless the model is overfitted to the training data). The data separation was always the same, with 70% training and 30% validation.
Training times vary depending on the amount of data and their resolution, the CNN architecture, the computing power, the framework efficiency, and other parameters. The basic architecture of the navigation NN was based on the one proposed by Bojarski [15]. The training was carried out on a computer with the following specifications: 6th generation of Intel i7 processor, RAM: 32 GB, GPU: Nvidia 1080 GTX,, with 8 GB dedicated memory. The training was conducted in the The training process required a careful selection of parameters, which included the optimizer, batch size, number of time values, and separation of data between validation and training. Adam, Stochastic Gradient Descent (SGD), and Adadelta optimizers were used in the different models.
The loss is calculated by training and validation. Unlike accuracy, loss is not a percentage. It is a summation of the errors made for each example in training or validation sets. The loss value is usually lower in the training set. The important thing to take care of is that there is no overfitting by observing a general increase in the loss value instead of a decrease. The number of epochs varied depending on the loss drop difference, but all the models were run between 12 and 25 times before the loss value stopped decreasing. Loss values are usually closer to 0. The lower the loss, the better the model (unless the model is overfitted to the training data). The data separation was always the same, with 70% training and 30% validation.
Training times vary depending on the amount of data and their resolution, the CNN architecture, the computing power, the framework efficiency, and other parameters. The basic architecture of the navigation NN was based on the one proposed by Bojarski [15]. The training was carried out on a computer with the following specifications: 6th generation of Intel i7 processor, RAM: 32 GB, GPU: Nvidia 1080 GTX, with 8 GB dedicated memory. The training was conducted in the GPU and the training times were between 14 and 32 min. The Deep Learning framework used was Keras 2.3 integrated with TensorFlow 1.14 running in Python.

Algorithm Evaluation and Testing
The following protocol and criteria were used to validate the model performance. 1.
The trained model will be given a video that is not part of the training or validation set ( Figure 12).
a. The evaluator will check the quality of the predictions. b. The performance will be graded on the different stages of the test track. 3.
The trained model will be tested in the modular platform.
c. The test route. d. A driver/evaluator will supervise the vehicle. e. The driver/evaluator will determine the driving speed. f. Autonomous steering will be done by the model. g. The manual emergency stop button should be activated in any risk situation.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 17 of 26 GPU and the training times were between 14 and 32 min. The Deep Learning framework used was Keras 2.3 integrated with TensorFlow 1.14 running in Python.

Algorithm Evaluation and Testing
The following protocol and criteria were used to validate the model performance.
2. The trained model will be given a video that is not part of the training or validation set ( Figure  12). a. The evaluator will check the quality of the predictions. b. The performance will be graded on the different stages of the test track. 3. The trained model will be tested in the modular platform.
c. The test route. d. A driver/evaluator will supervise the vehicle. e. The driver/evaluator will determine the driving speed. f. Autonomous steering will be done by the model. g. The manual emergency stop button should be activated in any risk situation. The information flow starts with image acquisition of the front camera. The next stages were performed in the Nvidia Jetson AGX Xavier preprocessing and model operation. The model will predict a steering angle, which will later be sent to the microcontroller. Figure 13 shows this process.  The information flow starts with image acquisition of the front camera. The next stages were performed in the Nvidia Jetson AGX Xavier preprocessing and model operation. The model will predict a steering angle, which will later be sent to the microcontroller. Figure 13 shows this process.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 17 of 26 GPU and the training times were between 14 and 32 min. The Deep Learning framework used was Keras 2.3 integrated with TensorFlow 1.14 running in Python.

Algorithm Evaluation and Testing
The following protocol and criteria were used to validate the model performance.
2. The trained model will be given a video that is not part of the training or validation set ( Figure  12). a. The evaluator will check the quality of the predictions. b. The performance will be graded on the different stages of the test track. 3. The trained model will be tested in the modular platform.
c. The test route. d. A driver/evaluator will supervise the vehicle. e. The driver/evaluator will determine the driving speed. f. Autonomous steering will be done by the model. g. The manual emergency stop button should be activated in any risk situation. The information flow starts with image acquisition of the front camera. The next stages were performed in the Nvidia Jetson AGX Xavier preprocessing and model operation. The model will predict a steering angle, which will later be sent to the microcontroller. Figure 13 shows this process.

Results and Discussion
Throughout the training and data acquisition process, 13 different CNN models were trained. Each one was evaluated by the test protocol contained in Section 6.4. Table 6 shows the evaluation of every model. The accuracy is obtained with the final value that Keras prints during the training. This value is calculated through the Mean Absolute Error (MAE) [54], which is expressed by the equation: n .
(2) The final value is the percentage of MAE that the model has with all the training data (training + validation sets). The following stages of the protocol result in more qualitative results through the observation of the model's performance with the test video and the real trials on the test route.
In the four weeks of work, three data acquisitions were performed to increase the amount of training data in search of an improvement in the prediction of the model. In the first capture, 15,276 images were acquired with their corresponding steering angles, the second reached 43,157 data points, and the last capture generated 103,363 data points. As can be seen, the efficiency of data capture between the three data acquisitions and with software and hardware corrections in the acquisition systems was improved. This improved the system performance and accuracy in predicting the angle.
Another important question in the development of the models was whether or not to use data augmentation, since it was considered that it could generate noise during training. However, after evaluation it was determined that the augmentation quantitatively and qualitatively improved the performance of the model.
As mentioned in the previous section, there were three test stages, each described in Section 6.4, with the results shown in Table 6. Since the first four models did not meet the requirements of stage 1, the following stages were not performed. Tests in stage 2 of models 5 to 13 obtained the results shown in Figure 12. At this test stage, it was evaluated if the models made sudden accurate predictions of movement, with particular attention to the part of the main curve of the in-campus circuit. This evaluation had a qualitative nature during driving. In stage 3, it was decided to physically test the model considering the safety aspects mentioned in Section 6.4. Only the best three models passed this test. The objective was to finish the in-campus circuit with the least human intervention. As mentioned before, the speed was controlled all the time by the driver for safety purposes.

Best Model Results
CNN model 13 was the one that obtained the best results in the tests, especially in stage 3 of the test, where an automated steering direction of the entire in-campus circuit was achieved with almost no human intervention to modify the steering angle of the vehicle. The architecture of this model is the one presented in Section 5.3. This model was trained with the Adam optimizer for 14 epochs. The duration of training was 28 min and 37 s with a high capacity computer, whose characteristics are mentioned in Section 6.3. This model was trained with 206,726 images obtained by the augmentation of the 103,363 original data points with their respective steering wheel angles.
An important parameter to consider during training is loss, as shown in Figure 14 for all training epochs.

Best Model Results
CNN model 13 was the one that obtained the best results in the tests, especially in stage 3 of the test, where an automated steering direction of the entire in-campus circuit was achieved with almost no human intervention to modify the steering angle of the vehicle. The architecture of this model is the one presented in section 5.3. This model was trained with the Adam optimizer for 14 epochs. The duration of training was 28 min and 37 s with a high capacity computer, whose characteristics are mentioned in section 6.3. This model was trained with 206,726 images obtained by the augmentation of the 103,363 original data points with their respective steering wheel angles.
An important parameter to consider during training is loss, as shown in Figure 14 for all training epochs. The training set efficiency was verified by the ever-decreasing values of loss across the epochs. As for the validation set, a mild improvement was noticed but it had a mostly constant performance all over the training. Although there was a low improvement in the loss for the validation set, the values for the validation set are lower than 0.02 units. As mentioned above, a loss value close to 0 and with a decreasing trend indicates a well-trained model. The loss value of the validation set is higher than the training set is normal. The important thing is to protect the model from overfitting by keeping its value as low as possible with a decreasing trend.
The platform thus can be expected to develop as planned when the model starts running as a node in the ROS topics, instead of being directed by the remote controller. From the loss plot, it is important to note that the loss value generally declines, since an increasing trend would suggest overfitting the model. Figure 15 shows the prediction of the steering wheel angle made by our CNN model against the ground truth obtained from the training data. This prediction was generated with one of the datasets in which it was trained in order to observe the precision obtained by the model. The training set efficiency was verified by the ever-decreasing values of loss across the epochs. As for the validation set, a mild improvement was noticed but it had a mostly constant performance all over the training. Although there was a low improvement in the loss for the validation set, the values for the validation set are lower than 0.02 units. As mentioned above, a loss value close to 0 and with a decreasing trend indicates a well-trained model. The loss value of the validation set is higher than the training set is normal. The important thing is to protect the model from overfitting by keeping its value as low as possible with a decreasing trend.
The platform thus can be expected to develop as planned when the model starts running as a node in the ROS topics, instead of being directed by the remote controller. From the loss plot, it is important to note that the loss value generally declines, since an increasing trend would suggest overfitting the model. Figure 15 shows the prediction of the steering wheel angle made by our CNN model against the ground truth obtained from the training data. This prediction was generated with one of the datasets in which it was trained in order to observe the precision obtained by the model. Appl. Sci. 2020, 10, x FOR PEER REVIEW 20 of 26 Figure 15. Prediction of the steering angle in a sample dataset.
A certain "noise" can be observed in the prediction, which is reduced by applying a moving average filter when the predicted value is sent to the steering wheel actuator. This filter is expressed by the equation: where y is the new output, x is the predicted output, and M is the number of data points used. This model, as it is trained on a fixed route, presents excellent precision with the data in the tests. However, if a more generalized prediction for the model is desired, it is necessary to increase the variety and amount of training data. This model obtained a 94.5% accuracy with the training data. Although other models had a higher percent accuracy, this model performed better in the other stages of the evaluation protocol.

Shadow Effects
The training set was demonstrated to be well developed. In the field, the platform behaved perfectly under similar conditions to the ones used for training. During the week of training and data acquisition, the weather was cloudy. However, it was sunny during the presentation. This condition negatively impacted the performance, because the shadows increased the amount of noise in the system. An example of these shadows can be seen in Figure 16.  A certain "noise" can be observed in the prediction, which is reduced by applying a moving average filter when the predicted value is sent to the steering wheel actuator. This filter is expressed by the equation: where y is the new output, x is the predicted output, and M is the number of data points used. This model, as it is trained on a fixed route, presents excellent precision with the data in the tests. However, if a more generalized prediction for the model is desired, it is necessary to increase the variety and amount of training data. This model obtained a 94.5% accuracy with the training data. Although other models had a higher percent accuracy, this model performed better in the other stages of the evaluation protocol.

Shadow Effects
The training set was demonstrated to be well developed. In the field, the platform behaved perfectly under similar conditions to the ones used for training. During the week of training and data acquisition, the weather was cloudy. However, it was sunny during the presentation. This condition negatively impacted the performance, because the shadows increased the amount of noise in the system. An example of these shadows can be seen in Figure 16. A certain "noise" can be observed in the prediction, which is reduced by applying a moving average filter when the predicted value is sent to the steering wheel actuator. This filter is expressed by the equation: where y is the new output, x is the predicted output, and M is the number of data points used. This model, as it is trained on a fixed route, presents excellent precision with the data in the tests. However, if a more generalized prediction for the model is desired, it is necessary to increase the variety and amount of training data. This model obtained a 94.5% accuracy with the training data. Although other models had a higher percent accuracy, this model performed better in the other stages of the evaluation protocol.

Shadow Effects
The training set was demonstrated to be well developed. In the field, the platform behaved perfectly under similar conditions to the ones used for training. During the week of training and data acquisition, the weather was cloudy. However, it was sunny during the presentation. This condition negatively impacted the performance, because the shadows increased the amount of noise in the system. An example of these shadows can be seen in Figure 16.  During the presentation, the angle of the sun diminished the effect of the shadows. In order to fix the effect of the shadows, data must be additionally collected during various times and under varying weather conditions, which was not possible during the week of data collection. The platform completed the entire proposed route seen in Figure 6 and completed three circuits with the only human interaction being that needed to accelerate the vehicle.

Conclusions
Technological advancements in computing power allow for the development of more efficient, more practical, and lower cost autonomous vehicles. Several of the algorithms used today, especially in the area of Artificial Intelligence, were developed more than 50 years ago, but technological advancements allow for their implementation in this particular application. Research and development are crucial to the improvement of robust systems that can mimic human behavior.
The development and improvement of electric and intelligent vehicles either for private or public use in industry or transportation services is a disruptive issue nowadays. All development must be carried out around an intelligent electromobility ecosystem. For this, the development of various vehicle platforms for different purposes must be conducted as part of this ecosystem. This is how the concept of the Automated Guided Modular Vehicle was created.
Driving a vehicle is a very complex task. Therefore, the desire that a machine completes such a chore is not simple. Although autonomous driving began to have efficient implementations in the previous decade, it still has areas of improvement that this work seeks to satisfy.
The algorithm of BC/IL allowed generating a practical and efficient solution in the time limit. The results were as expected: the modular AGV was able to navigate the proposed route. Even so, it is necessary to continue developing more complex models, with the option of being able to use reinforced learning or unsupervised learning. This can help generate a more complex and more efficient solution. Even so, it is important to keep the safety criteria in mind, especially if the tests are carried out in a real environment with a platform that has the ability to cause physical damage.
Likewise, the use of a platform not designed in-house brought us advantages and disadvantages. There were several parts of the design that could be considered as a plug and play, which was beneficial in the process. The development of autonomous vehicles is something that technological development and the current computing is allowing to become more efficient, practical, and of lower cost. The origins of several of the algorithms used today, especially in the area of Artificial Intelligence, were developed more than 50 years ago but can be implemented in a real way thanks to the current technology. Nonetheless, development and research are needed to improve them in order to mimic human behavior more precisely.
This work contains the successful implementation of a minimum system for a SAE Level 1 modular autonomous vehicle. This type of vehicle begins to be a differential in the industry and in the implementation of personal transport systems. That is why this work also contributes to the generation and development of the concept of a low-cost modular AGV, capable of driving autonomously on certain fixed routes, and been part of the current trends of mobility services and smart cities. The system is based on ROS and has hardware and software elements. Data acquisition from hardware elements with different sampling rates was achieved. Minimal data curation was performed due to the highly efficient data collection system. High-and low-level algorithms were implemented, such that they were efficient in time and computational resources. Our work contributes as a practical implementation in the controlled environment of behavioral cloning and imitation learning algorithms. The vehicle successfully navigated on a 300 m track. It evaded vehicles and potholes and drove in right-hand traffic.
The most significant relevance of this work is to show that an autonomous navigation system can be implemented in diverse hardware and vehicular platforms, as long as they have a standardized software platform. This opens the opportunity to generate a test environment for other research use and contribute as an open HW-SW solution to the scientific community in the area.

Work Contributions
After the development of this work, different contribution points were defined. After analyzing state-of-the-art technology, it can be observed that there are not many implementations of automated vehicles in the transportation of people or cargo. The design of the vehicle, although it was not part of this work, meets the general concept of an AGV with autonomous driving capacity level 1 and in future work, level 2. This type of vehicle begins to have a significance in the industry with an implementation in personal transport systems. This work contributes to the generation of a low-cost modular AGV capable of driving autonomously on certain fixed routes.
Likewise, our work contributes as an example of a practical implementation of the BC/IL algorithm in a real environment. This algorithm has excellent capabilities and facilities that allowed the completion of the work in the 4-week limit. Even so, the algorithm needs to become more robust or be complemented in a way for it to reach higher levels of autonomy, even if these complements generate a higher cost solution.

Future Work
The development of this work had several limitations due to the time of 4 weeks of development. Future work that improves the design and functionality of the AVG is described in the following part: • Improve the autonomous driving algorithm that allows leading with disturbances (shadows and lights in the image), that modify the performance of the algorithm.

•
Optimize code and hardware architecture to allow faster processing. Incorporating other algorithms and models options with higher performance, for example, using Deep Convolutional Inverse Graphics Network (DCIGN) architectures [55].

•
The automation proposed in this work is limited to level 1 of the SAE scale. A future job will be able to go up to a level 2, which allows automating the acceleration and braking of the vehicular model.

•
An improvement in the longitudinal dynamics of the vehicle, acceleration, and braking stages are safety aspects that must be taken into account for the improvement of the future design.

•
The vertical dynamics of the vehicle will have to be revised to improve overall the mechanical response of the autonomous system. In case of emergency evasion, the system must respond as fast as possible.

•
The on-board electronics will need to be upgraded for viable operation; a Printed Circuit Board (PCB), in order to dispose of all the wires inside the vehicle's control box, is proposed, since this gives additional noise to the sensor measurements. The Arduino is no longer going to be used. Instead, the STM32L496ZGT6P board with the Nucleo-144 microcontroller will help have a more efficient signal control and processing. This also because of its ROS compatibility software and its ARM architecture capable of managing an 80 MHz frequency [56].
These are the highlights for improvement in the future work of the Automated Guided Modular Vehicle proposed and developed in this paper.