Cloud2Edge Elastic AI Framework for Prototyping and Deployment of AI Inference Engines in Autonomous Vehicles

Self-driving cars and autonomous vehicles are revolutionizing the automotive sector, shaping the future of mobility altogether. Although the integration of novel technologies such as Artificial Intelligence (AI) and Cloud/Edge computing provides golden opportunities to improve autonomous driving applications, there is the need to modernize accordingly the whole prototyping and deployment cycle of AI components. This paper proposes a novel framework for developing so-called AI Inference Engines for autonomous driving applications based on deep learning modules, where training tasks are deployed elastically over both Cloud and Edge resources, with the purpose of reducing the required network bandwidth, as well as mitigating privacy issues. Based on our proposed data driven V-Model, we introduce a simple yet elegant solution for the AI components development cycle, where prototyping takes place in the cloud according to the Software-in-the-Loop (SiL) paradigm, while deployment and evaluation on the target ECUs (Electronic Control Units) is performed as Hardware-in-the-Loop (HiL) testing. The effectiveness of the proposed framework is demonstrated using two real-world use-cases of AI inference engines for autonomous vehicles, that is environment perception and most probable path prediction.


Introduction
Among the new technological trends arisen in the last decade, Autonomous Driving has gained a lot of attention, with significant effort and resources invested by both academia and enterprises. The breakthroughs in self-driving cars have been made possible by the emergence of novel algorithms and real-time computing systems in the field of Artificial Intelligence (AI), deep learning (DL), Internet-of-Things (IoT) and Cloud computing [1].
An Autonomous Vehicle (AV) is an intelligent agent that observes its environment, makes decisions and performs actions based on these decisions. Autonomous driving applications aim to enable vehicles to automatically control their behavior. Traditionally in classical cars, these aspects and functionalities are the exclusive responsibility of human drivers, including an understanding of the driving scene, path prediction, behavioral planning and control. The development and deployment of accurate AI models, which can automatize these tasks is fundamental to support the progress of autonomous driving. Addressing these challenges lies within the realms of Data Science, with AI and DL proving to be successful tools to generate sufficiently accurate representations

The EB-AI Framework
The key differences that the EB-AI framework provides with respect to tooling systems such as the ones listed above are: • Ability to digest automotive specific input data such as video streams, LiDAR and/or radar data; • Prototyping and development of AI Inference Engines in the Cloud using an SiL paradigm, agnostic of low-level AI libraries such as TensorFlow [7], PyTorch [8], or Caffe [9]; • Providing tunable state of the art DNN architectures for a broad spectrum of autonomous driving applications; • Ability to deploy and evaluate the obtained AI Inference Engines via HiL testing on edge devices (e.g., target ECUs).
The diagram in Figure 1 illustrates the development workflow of an AI Inference Engine based on the EB-AI framework, according to our own data driven V-Model detailed in Section 3. Once the specific problem to solve has been defined and enough data have been collected, that data is ingested at a prototyping level by a DNN, which stands at the core of the inference engine. During this stage, the DNN is trained, evaluated and refined within the Cloud according to the Software-in-the-Loop (SiL) principles. Finally, the inference engine is deployed on a target Edge device inside a vehicle and evaluated again in real-world scenarios as Hardware-in-the-Loop (HiL). This process allows us to refine the engine even further, according to a continuous feedback loop aimed to keep improving applications over their entire life span. The novelty of the AI Inference Engine concept lies in the effective integration of the DNN's training, evaluation as SiL and testing using the HiL paradigm. A key aspect to consider in the EB-AI workflow is the huge amount of data required to train DNNs. This data can be either synthetically generated (first stage) or collected via real sensors mounted on test vehicles (last stage). In both cases, the data has to be made available at training the stage. Although the training of DNNs can be highly parallelized [10], with unprecedented levels of parallelization reached by leveraging the recent wide availability of GPUs, it is still prohibitive to have enough computational power to process the amount of data required by autonomous driving applications. As a consequence, all Cloud computing providers offer commoditized AI solutions (e.g., IBM's Watson AI, Amazon AI, Microsoft's Azure AI and Google's Cloud AI) for scalable computing training facilities. However, the following limitations hold at the AI Inference Engine prototyping stage, when large quantities of data have to be uploaded to the Cloud: (i) data upload impracticality due to bandwidth bottlenecks and latency [11], (ii) difficult enforcement of privacy requirements (e.g., GDPR [12]), since part of the collected raw data can be sensitive and thus cannot be shared with cloud providers.

Contributions
We propose a solution to overcome the presented limitations by exploiting recent developments in Over-The-Air (OTA) systems for AVs [13,14], enabling the use of an AV as an additional computational device. These computing resources localized on the Edge ECUs, also referred to as Edge computing, can be integrated with Cloud resources to provide an Elastic distributed infrastructure in which the training can be collaboratively carried out [15,16]. Furthermore, Edge devices can limit the data transmitted to the Cloud by pre-processing local raw data or pre-training partial models [17], which in turn can also help mitigate privacy concerns, since raw data sharing can be directly avoided. This kind of hybrid deployment over the Cloud and Edge resources calls for a modular framework for AI-based autonomous driving applications. To this aim, the proposes EB-AI architecture provides a modular toolchain that enables the deployment of autonomous driving applications across Edge and Cloud resources.
Therefore, the main contributions of the paper are: • A simple yet elegant AI Inference Engine concept, based on the SiL and HiL principles; • A data-driven V-Model approach guiding the design of AI-based autonomous driving applications; • A modular Cloud2Edge AI framework for autonomous driving applications coined EB-AI; • An elastic framework able to overcome network bandwidth and privacy concerns by dynamically deploying deep learning training tasks among Edge and Cloud; • The development of two real-world AI Inference Engines for environment perception and most probable path prediction; • A discussion on the advantages of such a hybrid deployment, in terms of training parallelization, privacy-preservation, fault tolerance and scalability.
Structure of the Paper. Section 2 reports the most relevant background concepts and related work. Section 3 presents the V-Model approach and the proposed EB-AI framework. Section 4 discusses the deployment of the architecture across Cloud and Edge devices. Finally, two relevant AI Inference Engines are detailed in Section 5, while the conclusions are stated in Section 6.

Background and Related Work
In the following, we overview the main concepts of DL, report on general DL tooling systems and highlight the main limitations of available tools and learning environments for automotive applications.

Deep Learning Overview
Deep learning is a branch of machine learning that leverages on large DNN architectures to build a layered representation of the input data [18]. DNNs are universal non-linear function approximators, built using multiple hidden layers, and can be classified based on their architecture. Convolutional Neural Networks (CNNs) are mainly used for processing spatial information, such as images, and can be viewed as image feature extractors. Recurrent Neural Networks (RNN) are especially good in processing temporal sequence data, such as text, or video streams. Different from conventional neural networks, an RNN contains a time dependent feedback loop in its memory cell. Due to the high number of layers, DNNs are difficult to train using vanilla backpropagation algorithms. As an example, in the case of auto-encoders [19], each layer is trained separately, then backpropagation is used to train the whole network. To avoid overfitting (i.e., trained models that cannot generalize or predict unseen values), common methods such as Dropout [20] are used to regularize the training.
DNNs are effective in domains characterized by a high number of features, such as computer vision and natural language processing. Training of DNNs critically requires large annotated datasets, which have been increasingly released in the last decade, especially in the computer vision field. One of the first large collections of annotated images is the ImageNet database [21], which contains 1.5 mil images representing 1000 object classes in a variety of shapes, poses and illumination conditions. Although images relevant to autonomous driving systems are only a subset of the whole collection, the complete database was used to pre-train the first convolutional layers of object detectors used for driving scene perception [1].
In the last years, mainly due to the increasing research interest in autonomous vehicles, many driving datasets were made public. These vary in size, sensory setup and data format, commonly comprising of synchronized ego-data, video, LiDAR, Radar, ultrasonic, inertial measurements and GPS datastreams. Among others, popular ones are the KITTI Vision Benchmark dataset (KITTI) [22] by the Karlsruhe Institute of Technology (KIT), NuScenes [23] by Aptive, or Cityscapes [24] by Daimler, Max Planck Institute and TU Darmstadt.
Since acquiring large training datasets is a demanding process, usually performed via manual annotation, alternative methods have been explored for synthetic data generation and simulation engines. In our previous work we have proposed and patented a semi-parametric approach to one-shot learning, coined Generative One-Shot Learning (GOL) [25]. Given as input single one-shot objects (or generic patterns and templates), together with a small set of regularization samples, GOL uses the sample to drive the generative process by outputting new synthetic data. Specifically, it permits to generalize on unseen data, while increasing the classification accuracy on synthetic data as much as possible.

Deep Learning Libraries and Tools
Some of the well-established low-level AI libraries are Tensorflow [7] by Google, PyThorch [8] by Facebook, the Cognitive Neural ToolKit (CNTK) (CNTK-https://docs.microsoft.com/en-us/ cognitive-toolkit/) by Microsoft, and Caffe2 [9] from Berkeley University. The APIs are mainly available for C/C++ and Python languages, although other notable implementations exist, such as Weka [26], which is written in Java. For facilitating their usage and large scale adoption, these libraries are integrated into high-level frameworks (e.g., Lasagne(Lasagne-http://lasagne.readthedocs.org/ en/latest/)).
For fast computation, training commonly takes place on GPUs interconnected as clusters, which are available as a service on the major Cloud platforms, for example, Microsoft Azure and Amazon AWS. After training, the DNN models are downloaded into edge devices for inference. In this case, edge devices can be smartphones, PC commuters, or target ECU devices part of automotive applications.
In order to exploit parallel computation, the training can be distributed based on data partition (e.g., among threads, different machines, GPUs) or model partition [27]. Scalable deployments are enabled by distributed computing platforms such as Storm [28] and Spark [29]. In particular, Trident-ML (Trident-ML-https://github.com/pmerienne/trident-ml) is a real-time library for machine learning upon Storm, while CaffeOnSpark [30] and TensorSpark [31] are Spark integrations of Caffe and TensorFlow, respectively.
To overcome bandwidth and privacy issues when uploading data to the Cloud, federated learning [32] offers a technical framework to scale learning horizontally or vertically across devices [33]. Edge computing is therefore exploited to carry out pre-training, the results of which are integrated on the Cloud and then offloaded on the Edge for on-device inference [34].
The above-mentioned frameworks are designed solely for training purposes within the Cloud. In comparison, the main advantage of EB-AI over them is our proposed elastic approach, which considers both training and inference as a joint task distributed onto powerful GPU clusters residing in the Cloud and Edge computing devices available in each autonomous vehicle belonging to a connected fleet of cars. Additionally, as opposed to the more general AI frameworks presented in this subsection, EB-AI has been designed specifically for automotive purposes, following the constraints of Automotive SPICE and the ISO26262 [6] functional safety standard, along with a proposed data driven V-Model for building AI Inference Engines.

AI Frameworks in Autonomous Driving
The application of deep learning, and more general of AI, to the automotive industry has grown significantly in the last few years. AI and deep learning are used to obtain human-like behaviors in automated driving [35], such as environment perception and scene understanding (e.g., detecting traffic signs, traffic participants, road obstacles), or trajectory planning and control. Deploying AI Inference Engines inside autonomous vehicles requires overcoming limitations of platform dependencies and limited computation resources. Therefore, new transportation vehicles are equipped with embedded ECU devices specialized for AI and exploited to ensure adequate performance and energy efficiency [36].
Overall, the increasing interest in AI Inference Engines for the automotive industry has been driven by the field of autonomous vehicles. While a number of AI-enabled functions are already deployed in cars and interacting with drivers [37], it is predicted that full-fledged AVs, that is SAE levels 4 and 5 [38], will require significant additional cost for developing the new required AI solutions [39]. As such, there is the need to optimize the design, deployment and maintenance of automotive AI Inference Engines by reducing their complexity and by improving accuracy via the access to (real-time) data. Lu et al. proposed a work close to ours [40], where they introduced distributed learning based on self-driving cars as Edge computing devices. However, they only train local models on the cars and then aggregate the parameters of the neural network on a Parameter EdgeServer. Conversely, we propose a solution to split training tasks between Cloud and Edge, in order to deploy training tasks, which process sensitive data only over the Edge while deploying the other over both Cloud and Edge.
Car manufactures and their suppliers are both developing their own in-house AI frameworks. Example of such systems are Tesla's Full Self-Driving and Dojo computers, NVDIA's Drive Works platform, or Uber's Michelangelo training framework. However, these are either closed software platforms, built for in-house operations, or are designed to work only on dedicated hardware, as in the case of NVIDIA's software solutions. Based on Elektrobit's history as a key supplier of automotive grade software and real-time operating systems, we have designed EB-AI for interoperability, incorporating optimizations of AI Inference Engines for different embedded ECU devices, coupled with the necessary functional safety standards required by the automotive industry.

EB-AI Toolchain
In the following, we describe the proposed EB-AI toolchain used to design, train and evaluate the AI Inference Engines. The engines represent application deployment wrappers for trained DNN models, which are used to build driving functions inside EB Robinos , which is Elektrobit's autonomous driving system (EB Robinos-https://www.elektrobit.com/products/ automated-driving/eb-robinos/).

V-Model and Workflow
Automotive software engineering still demands a robust and predictable development cycle. The software development process for the automotive sector is subject to several international standards, namely Automotive SPICE and ISO 26262 [6]. Accepted standards, as far as the software is concerned, rely conceptually on the traditional V-Model development lifecycle. It is important to approach deep learning from a more controlled V-model perspective in order to address a lengthy list of challenges, such as the requirements for the training, validation and test datasets, the criteria for the data definition and pre-processing, as well as the impact of hyperparameters tuning.
In Figure 2 we propose a data driven V-Model for prototyping and development within EB-AI. The first two steps regard data definition and data normalization and cleaning. These are used to define the AI inference engine architecture, where we properly configure the layers of the DNN and select the input training data according to the pre-processed datastreams. We then have the AI training and implementation step, where we train the DNN model to the requirements of the application. The trained model is converted automatically to a format that can be deployed within the autonomous vehicle. The converted model represents the AI Inference Engine, which is integrated and tested in different driving scenarios. According to the results obtained at different levels within the data driven V-Model, we can start the Verification and Validation process, where we refine the inference engine.
This circular approach allows the EB-AI architecture to employ continuous learning, where DNNs can improve their accuracy overtime. To perform this task, we proposed a generative technique to compute synthetic data that can be used in the training process based on the GOL algorithm [25]. Figure 3 details the workflow for AI Inference Engines design, deployment and testing based on the data driven V-Model. Phase 1 consists of defining the problem space and collecting the necessary data for the training process. In this phase, the data is pre-processed, annotated, normalized and filtered. In phase 2 we design the DNN architecture, activation functions, structure of the hidden layers and output nodes. Phase 3 deals with the tuning of the DNN, that is, all hyperparameters necessary for the training phase, such as the learning rate, loss function, regularization strategy and accuracy metrics.
Once the DNN setting has been completed, we deploy the DNNs among the cloud nodes in phase 4 and start the training in phase 5. In this phase we feed the DNN both with real sensory data, as well as with synthetic data, generated in a parallel workflow branch (phases 2.1-2.3). After training, the DNN is evaluated and redesigned if necessary (phase 6). The output of the training and evaluation phases is the AI Inference Engine (phase 7). This is then deployed among the edge device in the car (phase 8) and evaluated again with respect to in-car performance measures.

EB-AI Architecture
The EB-AI cloud architecture is depicted in Figure 4. It is based on two main components: (i) the EB-AI Workbench and (ii) the Inference Engines. The former generates and trains AI models, while the latter uses such models for prediction and inference on different embedded ECUs. Specifically, the EB-AI Workbench is responsible for prototyping, training, and development of the AI models and is composed of four main modules: (i) Data Processor, (ii) DNN Manager, (iii) Resource Manager and (iv) Toolbox.
Data Processor. It reads, pre-processes and analyzes the data provided by the vehicles' sensors. Data pre-processing typically deals with detecting outliers in the input data, visualizing it using feature space reduction techniques such as PCA (Principal Components Analysis) and t-SNE (t-distributed Stochastic Neighbor Embedding), checking its variance and standard deviation across its data distribution domain, splitting it into appropriate train-evaluation-test batches and finally normalizing it. The probable capacity of the DNN model is determined based on the quantity of training samples. Furthermore, this stage is used to generate synthetic data through our patented GOL algorithm [25]. This allows to learn patterns of real data and generate artificial samples to improve the data available when training the DNN.
DNN Manager. It contains two modules: the DNN Config Manager handles the configuration of the DNN, while the DNN Training Coordinator (TC) manages the training process. Similar to the work in [2], DNN models are automatically configured based on the results obtained from pre-processing the training data. By analyzing the structure and type of input data, the DNN Config Manager automatically determines the input and output shapes of the DNN architecture, thus enabling the usage of existing network architectures from the workbench. The first step is to automatically define the input and output layers of the model based on the input-output data types. The input layer's shape is given by the structure of the training samples (e.g., images, LiDAR, radar, ultrasonic and IMU measurements, log traces, GPS tracks), while the output layer's shape is computed based on the labels' type. For example, if the training data is composed of image samples and 2D bounding boxes, then different network architecture for spatial object recognition is suggested, as shown in use-case Section 5.1. Similarly, as presented in Section 5.2, different RNN architectures are proposed if the data is structured as temporal sequences. In this first step, training samples and labels are aggregated into single input and output layers, respectively. The automatic configuration of the input-output layers can be manually tuned if the samples and labels are composed of multiple types of sensory measurements (e.g., images and LiDAR) and/or multiple prediction heads (e.g., 2D object detection and 3D object reconstruction).
Once the input-output shapes of the DNN model have been defined, we proceed to computing the architecture of the inner layers. We split the anatomy of a DNN architecture into three main blocks: (i) backbone model, (ii) feature extractor model and (iii) prediction heads. These are chosen from subsets of state-of-the-art DNN architectures, such as VGG16, MobileNetV2 and Darknet for the backbone, or YoloV3-V5, RetinaNet and SSD (Single Shot Detector) for the case of prediction heads in object detection. We compute the DNN structure depending on the model capacity determined at theData Processor stage. Namely, in order to avoid overfitting if training data is scarce, we use a light-weighted backbone, such as VGG16, coupled to prediction heads having a lower number of convolutional layers. Different DNN models are suggested based on this analysis. The models are subsequently adapted and modified in our SiL and HiL paradigm for obtaining an optimal AI Inference Engine for the task at hand.
The TC employs libraries for DL on top of a Distributed Stream Processing System (dSPS), allowing us to run a distributed training application on the dSPS. Such an application is represented as a directed acyclic graph, where vertices represent operators and edges represent streams between pairs of operators. Each operator carries out a piece of the overall computation, which is a training task, and can be distributed over multiple instances.
Resource Manager. It is the elastic manager that dynamically handles the deployment of the DNN training tasks over Cloud and Edge resources. Figure 5 shows an example of training tasks for a dSPS application composed of four operators (A, B, C, D) allocated over the cloud and two edge nodes. The Resource Manager (RM) is able to reconfigure over time (i) the number of edge/cloud resources, (ii) the operators' parallelism and (iii) the allocation of operator instances to available resources. Thus, RM can redistribute dynamically the training tasks on a different pool of resources to keep desired performances according to the input workload (e.g., the input rate of values read by the vehicles). The operators of the dSPS application can be distributed among edge and cloud nodes according to how much resources are necessary for tasks, as well as according to the sensitivity of the acquired training data, ensuring that sensitive data is processed where it is produced, without propagating it to other cloud or edge nodes, thus mitigating privacy. Once the learning is completed, the trained model is replicated among all edge and cloud nodes so that vehicles can immediately have the latest version available.
Toolbox. It is a set of components specific for automotive AI applications. The training itself feeds the previously defined data to the network, allowing it to learn a new capability by reinforcing correct predictions and correcting the wrong ones. The module Driving Context Understanding classifies the driving context from grid-fusion information, the Behavior Arbitration understands the driving context and optimizes the strategy from real-world grid representations, while the Driving Simulation Environment trains, evaluates and tests AI algorithms in a virtual simulator, such as Microsoft's AirSim.
Inference Engine. It contains the trained AI models from the EB-AI Workbench and deployment wrappers (or application) for the models. For example, such a wrapper can be built as a Robotic Operating System (ROS) Node (ROS-https://www.ros.org/), or an EB Assist ADTF component. ADTF stands for Automotive Data and Time-Triggered Framework (ADTF-https://www.elektrobit. com/products/automated-driving/eb-assist/adtf/) and is a tool for the development, validation, visualization and testing of advanced driver assistance systems and automated driving features.
Interfaces. Both EB-AI Workbench and Inference Engine use an interface layer for low-level libraries (e.g., CUDA, Tensorflow, PyThorch, CNTK, or Caffe2), integration with the dSPS engine (e.g., Spark) and the APIs for cloud and edge nodes. Furthermore, it provides a specific interface to ADTF and ROS.

Hybrid Deployment Advantages
This section discusses the advantages brought by the proposed hybrid deployment. Specifically, it discusses how our solution preserves privacy (Section 4.1) and how it improves fault tolerance and scalability (Section 4.2). In Appendix A we also provide an experimental evaluation aimed to show the benefit of increased parallelism in DNN training.

Privacy-Preserving Techniques
Although the Cloud can offer a scalable training infrastructure, some training data should not flow to the Cloud due to data protection requirements, as those on personal data enforced by the GDPR. Both data encryption and obfuscation solutions can be used to address these challenges.
Data obfuscation techniques like Differential Privacy (DP) have been successfully used both in centralized and federated learning [41]. Differently from pre-processing techniques, using DP on Edge devices as part of a federated learning approach guarantees better accuracy [15,32]. In particular, Edge devices are carrying out the training of a number of front layers by using DP on the sensitive data. Then, they transmit the intermediate results to the Cloud for the completion of the training and dissemination to the Edge of the updated layers.
Learning on encrypted data has also been proposed. While homomorphic encryption has been prototyped on centralized Cloud training [42], applications of Multi Party Computation (MPC) offers distributed privacy-preserving training [43].
The EB-AI toolchain supports DP learning by relying on the TensorFlow Privacy library (Tensor Flow Privacy-https://github.com/tensorflow/privacy). Differently from MPC learning that must rely on the ad-hoc computing framework (e.g., Sharemind [44]), DP learning can be managed by the EB-AI framework to take advantage of Edge devices on self-driving cars. The TensorFlow Privacy library relies on a differentially private stochastic gradient descent (SGD) algorithm [45], which provides privacy protection for deep neural networks while incurring a tolerable overhead in terms of training time.

Fault Tolerance and Elastic Scalability
The EB-AI architecture can support flexible learning deployment across Cloud and Edge, offering scalable execution of learning tasks according to resource and privacy constraints.
Despite the advances in OTA software, the deployment of learning tasks must tolerate faults due to lack of connectivity with AVs. Distributed computing frameworks, such as Apache Spark, can be deployed to enable elastic fault tolerant federated learning via, for example, TensorSpark and CaffeOnSpark.
The EB-AI Resource Manager extends the ELYSIUM [46] autoscaler in order to rebalance the operators of the training application. Specifically, it elastically scales both the dSPS operators' parallelism and resources (cloud and edge nodes), allocating these operators according to workload changes or faults predicted and/or observed.
The flexibility of the task definition provided by existing dSPS platforms, together with their seamless integration with high-level AI frameworks, allows the EB-AI workbench to directly introduce new pre-processing or pre-training tasks at the Edge according to emerging privacy and learning constraints.

EB-AI Use Cases
State of the art autonomous driving systems are defined using modular perception-planningaction pipelines, in which the main problem is divided into smaller sub-problems, each module being designed to solve a specific task and deliver the outcome as input to the adjoining component [1].
In the following, we aim to highlight the functionalities of the proposed EB-AI framework in two autonomous driving application use-cases, namely driving environment perception and most probable path prediction. Since this article focus on the underlying computing framework, as opposed to a specific algorithm, we will not focus on the accuracy of the two use-case methods, but on their designed, deployment and evaluation using the proposed EB-AI framework.
In both use cases presented below, we use the proposed Elastic Cloud2Edge paradigm to perform data ingestion and the configuration of the DNNs using EB-AI's Workbench interface from Figure 6, while the DNN Manager and Resource Manager, detailed in Section 3, are used to jointly distribute the training on the Cloud and the AVs' Edge devices.

Use Case 1: Driving Environment Perception
The perception of the driving environment is a key part of any autonomous driving system, characterized as the ability of the inference engine to understand the image scene and accurately represent the dynamic and stationary objects (e.g., cars, pedestrians, traffic signs). The EB-AI framework offers ready to use DNN architectures, such as SegNet for traffic scene image segmentation [47], different versions of the Yolo detector [48] and our own Deep Grid Net (DGN) for driving context understanding [49]. For the use case of environment perception, the specific problem space of object detection in images will be considered.
For environment perception, we perform object detection and classification on RGB color images and XYZ point clouds, as illustrated in Figure 7. We used labeled data from the CamVid dataset [50], which we have converted into the ROS bags. The ROS bags are uploaded into the EB-AI Cloud for pre-processing operations such as normalization, encoding or clipping. Figure 6 shows a couple of snapshots from the EB-AI Cloud interface, depicting the data upload, DNN architecture design and training as SiL, deployment and evaluation via HiL and models comparison. The first steps in building an AI inference engine are the definition of the requirements and collection of the necessary data for training. The data are collected from two sources, namely vehicles' sensors and synthetic generation through our GOL algorithm [25]. This allows the improvement of the patterns learned from real data and the ability of the trained network to generalize. A manual splitting ratio of the data between train, validation and test is available, as well as automatic k-fold variations.
By analyzing the input data the DNN Config Manager automatically determines the input and output shape of the DNN architecture, thus enabling the usage of existing network architectures from the workbench. Additional adjustments can be made in order to improve the accuracy of the DNN, based on the task at hand. In this driving environment perception use-case we use a Yolo V5 DNN trained on pairs of camera images and annotated 2D bounding boxes of objects of interest.
The training is managed by the DNN Training Coordinator, which handles the DL libraries on top of a distributed streaming processing system. The Resource Manager handles the available training resources over the cloud and/or edge device. In order to illustrate the fault tolerance and scalability of the training process, we have used different ECU configurations with 8, 12 and 16 nodes respectively.
The ECUs are a baseline desktop computer equipped with an Intel Core i9 9900K CPU, 64 GB RAM, with two high-performance NVIDIA GeForce RTX 2080 Ti graphics cards, an NVIDIA Jetson AGX platform, a Kalray Konic board and a RaspberryPi 3B+ computer.  Figure 8 shows how the training resources are used in the context of fault tolerance and elastic scalability. In order to evaluate the fault tolerance, specific resource nodes were deliberately blocked, while the impact on the overall training time was measured. As it can be seen in Figure 8a, with a hardware setup comprised of 16 nodes, the impact of failing one to four nodes is negligible on the training time. However, the impact becomes visible when four nodes (50% of total available nodes) are blocked in a hardware setup of eight total nodes. Figure 8b shows the number of used nodes in the training process while injecting an input stream curve of 250-1200 tuples (subset data) per second. The EB-AI Resource Manager reactively allocates or deallocates nodes in order to compensate for the input demand. The performance of a DNN can be evaluated based on a series of metrics, depending on its architecture. Figure 9 shows different evaluation approaches for our object detection use-case based on direct visualization of the DNN layers' activations, Intersection over Union (IoU), as well as a performance estimator, which aggregates the overall accuracy of the model into a five-star rating system.
To assess the precision of object detection, we use the IoU quality measure, which is a percentage measure reflecting the amount of overlapping between the predicted bounding box, as outputted from the AI inference engine, and the labeled bounding box. An accurate model has an IoU value close to 100%, which represents maximum accuracy. Figure 9b illustrates the IoU matrix determined for the trained Yolo V5 model. From the obtained IoU matrix, it can be observed that the majority of the detected objects overlap in a ratio of 80-90% with the ground truth. Featuring easy interpretability over the testing data, the user can trace back the samples where the evaluation failed or was not accurate enough, while iteratively improving the neural network until the desired performance is achieved. The obtained AI Inference Engine contains the trained DNN model encapsulated in a deployment wrapper. In this case, it was deployed as an ADTF wrapper on Elektrobit's Automotive Data and Time Triggered Framework (ADTF). Figure 10 shows different measure performance indicators for the four considered target ECU devices. The metrics quantify the computation time in frames per second, initialization time, as well as maximum and minimum computation time, respectively. The obtained results are proportional to the computation power of each ECU, yielding low framerates for low-range grade devices, such as the RaspberryPi, as opposed to the high performance of automotive ECUs, such as the NVIDIA AGX platform. The metrics are logged in the ECU device for later importing into the Cloud, where they are optimized by reconfiguring and retraining the DNN using the proposed data driven V-Model.

Use Case 2: Most Probable Path Prediction
In autonomous driving and driver assistance, in order to prepare the vehicle for upcoming hazards or to warn the driver, it is important to predict in advance the route that the vehicle will follow over a future time horizon. Such algorithms, illustrated in Figure 11, are called Most Probable Path (MPP) estimators. With an increasing level of map details, past trips of vehicles can be used to store information about the road paths that are likely to be reached within a specified distance using GPS observations. Using this information, the MPP algorithm can learn and further predict the probable road network ahead. As the vehicle travels forward, traveled road segments are left behind, while new ones are added to the MPP.
A map is a web of paths implemented using a tree structure, where each node in the tree represents a link between two segments of the road. The root node of the tree represents the link where the vehicle is currently located. Starting from a link, the vehicle can travel to a considerably large number of possible paths. A transition from one link to a reachable successor is represented by adding a child to the proper parent node in the tree. This is important because as the vehicle travels forward, nodes that become unreachable are removed from the tree. Every time the vehicle transitions from one link to the next, at least one node becomes unreachable and will be removed from the graph.
In order to predict where the vehicle is heading in the future, the MPP method needs to calculate the likelihood associated to each possible turn. In the classical way, the weight associated to each turn, or transition, depends on many factors and considers the turning angle and the road class of the following road segment. A smaller turning angle results in a higher probability of driving straight, whereas a bigger turning angle results in a smaller probability of following that specific road segment.
The main drawback of the classical approach in computing the MPP is that in most cases these predictions are performed at a small distance (a few hundred meters) in front of the current position, since the computation time required for computing the MPP probabilities is high. Additionally, the result will always be similar, since the turning angle and road class will be constant on the same map, so there will never be any difference between different drivers or different cars.
In this use-case, we have trained and deployed an AI Inference Engine using GPS data collected with Elektrobit's autonomous test vehicle, shown in Figure 12. We have performed 25 trips of 40 km each. Using the EB-AI framework, we were able to enhance the links using the pre-processing operations. As in the environment perception use-case example, the user can split the ratio of training data between train, validation and test datasets.  The DNN at the core of the AI Inference Engine is a Recurrent Neural Network (RNN) consisting of 256 Long-Short-Term-Memory (LSTM) units, followed by a Fully Connected layer (SoftMax activation), providing as output a list of predictions of the most probable paths. We have chosen an RNN network architecture due to its robustness in processing time dependent sequences, thus encoding the links relations in its memory cells. EB-AI was used to efficiently design and optimize the neural network. The obtained AI model wrapped in the ADTF format was deployed in a container ready to run on a NVidia Jetson AGX board mounted in the vehicle from Figure 12.
To evaluate the performance of the MPP approach, EB-AI provides the means to measure the similarity between two paths, namely predicted and ground truth. The output of the DNN is a list of paths ordered by the probability of being the correct one. Figure 13 shows the average position of the most probable route as the trip advances. The y axis in the figure represents the ground-truth reference path and the DNN predicted path, order based on their probabilities. We show the top most probable paths estimated by the neural network. Ideally, the predicted path should be the same as the reference one. In the beginning, when little information is known about the past vehicle's route, the MPP has a lower accuracy. The accuracy increases gradually as the drive progresses, reaching the top match after kilometer 25. Figure 13. The average correct route match relative to the traveled distance. At halfway through the 40 km trip, the predicted route is within the top 5 matches, while in the last 10 km the predicted route is the top match.
An additional performance metric is the correct prediction rate of the DNN. Figure 14 presents how often the MPP correctly predicted across different trips. The left most graph presents the results obtained when driving all the time on new routes, while the right most graph illustrates the results obtained when driving repetitively on the same routes.

Conclusions
In this paper, we have proposed an elastic AI development framework for autonomous driving applications based on deep learning, which takes into account some of the advances brought about by AI, Cloud and Edge computing. A modular Elastic toolchain providing all the required DL components enables the deployment of training tasks over both Cloud end Edge resources, where the latter can be located on the vehicles themselves. Pre-processing raw data and/or training partial models at the Edge allows to reduce the amount of data to upload to the Cloud and help mitigate privacy issues. We showed the effectiveness of the proposed toolchain in two application cases implemented at Elektrobit Automotive, namely Environment Perception and Most Probable Path Prediction. Furthermore, we showed the convenience of exploiting a distributed setting to parallelize as much as possible the training of the DNNs.
As future work, we aim to prototype and deploy AI Inference Engines at a large scale using the proposed hybrid deployment strategy over a larger fleet of autonomous vehicles.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:  [46] that can automatically apply the dropout to minimize overfitting. Figure A1 reports the accuracy results for the two networks during training. The CNN achieves better accuracy than the MLP, being over 98% in five epochs, while the MLP requires about 50 epochs. The maximum accuracy of the CNN is 99.21% versus 98.56% of the MLP. Figure A2a,b report, respectively for MLP and CNN, the performance of training an epoch on the different hardware devices. As expected, training with both the GPUs is much faster than with the CPU. The epoch training time for MLP (resp., CNN) on the GPU Tesla and Quadro are ≈14 and ≈6 (resp., ≈73 and ≈38) times faster than on the CPU. Notice how the Tesla is about two times faster than the Quadro.