2.1. Materials
In this project, 4 open libraries were employed to achieve the desired outcomes. Firstly, the vsomeip library, developed by the GENIVI project, was used to handle the SOME/IP protocol between VMs. Secondly, the OpenCV library was utilized for managing video read and write operations during communication. The Protocol Buffers library, developed by Google, was employed to serialize and deserialize video data. This step is crucial in the video transmission process since video frames cannot be sent directly through the SOME/IP protocol. Instead, they should be packaged as standard data types, such as 8-bit unsigned integers, also known as uint8. Finally, the nlohmann/json library was used to define default data paths from which videos should be read or written, depending on the use-case scenario.
In the scope of this project, the vsomeip library is integral to managing SOME/IP communications among virtual machines. This library, a product of the GENIVI initiative now maintained by COVESA (Connected Vehicle Systems Alliance), offers an optimized implementation of the SOME/IP protocol designed for automotive software applications [
23]. It enhances the development process of in-car communication frameworks, providing a uniform interface and communication protocol across diverse in-vehicle software entities.
This library aims efficient data transmission between software modules, vital for real-time data sharing critical in automotive control systems where prompt data reception is essential. Thanks to its adaptability for the distinct communication needs of various automotive software, it enables developers to fine-tune the communication dynamics to fit their specific applications. It ensures interoperability, permitting software components originated from independent developers to interact seamlessly—a necessity in the heterogeneous realm of automotive software development. Designed for scalability, vsomeip can handle communications from compact embedded systems to intricate automotive networks with ease. SOME/IP messages consist of two parts, namely, header and payload.
Figure 1. Illustrates a SOME/IP communication with a typical message frame.
Within the SOME/IP protocol, there are different methods for communication between ECUs [
24]. The first one, Request/Response is a synchronous communication method, where a device simply asks for a particular service which is known to be available in another controller, and this service is executed by the service provider for one single time. The second one, known as the Fire & Forget method, works exactly like the first one, with the difference that the service requestor does not track whether this service is carried out. In the broadcast-subscription based event communication, the offered service is executed sequentially, and information is sent to those which subscribed to this service without waiting for a request each time. The last method, Event Group communication, categorizes events, letting components subscribe to whole sets of related events, which simplifies their management.
OpenCV, which stands for “Open Source Computer Vision Library,” is an essential tool in the context of computer vision and image processing. It offers a comprehensive set of algorithms and resources that allow developers to work with visual data, supporting a wide variety of applications from recognizing and tracking objects to creating augmented reality experiences. Its open-source status has made OpenCV a popular choice among researchers, engineers, and enthusiasts exploring computer vision [
25].
One of the key strengths of OpenCV is its versatility in working with different image and video formats, making it easy to use with a variety of data sources. The library includes numerous functions that streamline processes such as image filtering, feature detection, and geometric transformations, making the development of complex vision applications more accessible. The inclusion of a deep learning module within OpenCV expands its potential even further, allowing for the use of pre-trained neural networks for more complex image processing tasks like classification and segmentation.
For this project, OpenCV was utilized for all video processing activities. It handled tasks varying from frame capture and video encoding to the analysis of individual frames. To optimize video data handling in OpenCV, the MP4 (Moving Picture Expert Group Part 14) format was selected for this project. Another major advantage of this library lays in its capability of processing images, which makes it more than often the right choice for image acquisition applications. Although actual image processing remains outside the scope of this project, future expansions of this research could leverage OpenCV for such tasks without needing to seek an alternative library.
Protocol Buffers, commonly known as Protobuf, stand out as a powerful tool for serializing data, especially useful when efficient exchange is needed across diverse systems. Created by Google, Protobuf is designed to be independent of programming languages, offering a consistent approach for structuring data to ensure clear communication across various systems and languages [
26].
The core advantage of Protobuf is its straightforward and resourceful nature. Developers define their data schema in .proto files, and the Protobuf compiler then turns these definitions into source code for multiple languages. This auto-generated code provides convenient classes for handling data serialization with ease. Protobuf not only minimizes data size, which helps save on network bandwidth but also speeds up the serialization process, boosting performance. Working principle of protobuf for an exemplary C++ project is illustrated in
Figure 2.
For our project, Protobuf was instrumental in structuring complex data like video frames, detailing attributes such as dimensions and frame rates to maintain a uniform format for transferring data. The integration with C++ was seamless, allowing for the smooth serialization of video frames for transmission and equally efficient deserialization upon receipt. This efficient process was vital for enabling the real-time analysis and processing of video data, which is central to the applications built in this project.
In the dynamic world of software development, the ability to efficiently exchange data is critical. The Nlohmann/JSON C++ library meets this demand by offering an efficient way to handle JSON (JavaScript Object Notation). With its human-friendly format, JSON has become the go-to data interchange standard across web services, IoT devices, and many software systems.
The Nlohmann/JSON library streamlines the parsing and crafting of JSON within C++ programs. Its user-friendly syntax lets programmers easily create, access, and alter JSON data. The library effortlessly converts between C++ data types and JSON, providing a smooth and compatible user experience. Thanks to its strong error handling, Nlohmann/JSON is particularly reliable for applications where data correctness is essential [
27].
In this project, the primary role of the Nlohmann/JSON library was to manage configurations without the need to recompile the entire software project. It enabled our software to dynamically adjust operations such as path definitions for video data by interpreting JSON configuration files. Initially, the Nlohmann/JSON library was also considered for serializing data structures into std::string objects, due to its capability to handle this type. However, it was observed that the library struggles with invalid string characters, which frequently occur in video frames. Consequently, for this specific task, Google’s Protobuf library, which was already introduced earlier, has been employed.
These open-source libraries, integrated into our system, allowed for a flexible architecture that can transmit videos from one setting, like a VM or control device, to another for further analysis and output. This new system grants the secondary environment access to video data it otherwise wouldn’t have, enabling effective utilization. The system’s adaptability makes it suitable for a range of uses, from advanced video processing in complex systems to basic video access in simpler ones. It simplifies the video and image processing complexities that often need sophisticated setups, thereby making it more accessible for simpler controller boards designed primarily for data reception, reducing hardware demands. In the subsequent section, more detailed information to the implementation of methods offered by these libraries will be provided.
2.2. Methods
This section will introduce the methods used to facilitate the entire process. First, it will detail the steps required to establish SOME/IP services for effective communication between VMs. Next, the processes of video reading and serialization, followed by deserialization and writing, will be clarified. Lastly, a demo GUI (Graphical User Interface) application will be presented, developed to demonstrate the potential of this project in transmitting detected objects from a video stream. It should be noted that image acquisition and object detection are not the concerns of this project. The demo GUI allows users to manually select objects and transmit them to another VM via SOME/IP, revealing the system’s ability to send regular updates similar to those in a genuine object detection application. In a system equipped with an actual object detection algorithm, the operational procedure would be identical, with the exception of the demo program, which would naturally be replaced by the detection algorithm.
The first step in setting up a SOME/IP runtime environment is to create an application object. In this project, the video source VM is named “Server,” so a SOME/IP runtime object was initialized with this name. This object must be initialized before proceeding with further settings.
Before starting the application object, two additional steps must be completed for a service provider like our Server program. The first is called “register message handler,” a method for SOME/IP objects. This method registers a specific function to be executed when a particular request arrives, identified by a unique service ID, instance ID, and method ID. In the service-oriented architecture of SOME/IP, a service can have multiple instances running on different hosts, each potentially offering a variety of tasks identified by unique method IDs within the scope of the same service. In this project, there is only one task associated with the service ID, and consequently, only one instance ID and one method ID are used.
The second step is to offer the service. This method broadcasts the previously defined service and instance IDs across the network, allowing clients in need of this service to invoke it. Remote clients call the service using the SOME/IP Service Discovery protocol, which scans the entire network for the specified combination of service, instance, and method IDs. If this service is offered by a network participant ECU, the service provider’s network address is obtained through SOME/IP SD, and the service is requested from this provider. In response, the provider sends the output of this service. In our project, the request and response involve accessing a specific video stored in the Server’s memory and transferring this video byte-wise, respectively.
Figure 3 illustrates the flow diagram of setting up the vsomeip environment. After creating and initializing an application, the “register message handler” method is invoked. The last parameter of this method is a user-defined, application-dependent function. Given that only one method is employed in this work, a straightforward name was chosen for service execution. Nonetheless, in scenarios involving multiple methods and instances, this function’s name should ideally correspond to the specific method and instance required. This function is initially triggered when a matching service request is received by the service provider ECU. At this point, the SOME/IP runtime environment automatically executes this function to adequately address the request.
While this function is tailored to the application, it must prepare an appropriate payload for the request and dispatch it via the SOME/IP protocol. To this end, payload and response objects are instantiated. The payload object features a method titled “set data” for inputting the data. In this project, the data type “uint8_t” vectors are utilized to fill the payload data, as this format can be seamlessly converted to the SOME/IP payload data type “vsomeip::byte_t”. Once the payload object is furnished with the necessary data, it is transferred to the response object using the “set payload” method. Finally, this response object is conveyed to the active SOME/IP application, which then forwards this response to the client.
On the receiver side, which we will refer to as the “Client” from now on, similar steps are undertaken to create an application. However, unlike on the Server side, two distinct SOME/IP runtime methods are invoked before the register_message_handler method is triggered. These methods are register availability handler and request service. The former assigns a unique combination of service ID and instance ID to a user-defined and application-dependent function, which is activated as soon as the sought-after service is located anywhere on the network. In our implementation, this function is named “on_availability” due to convention, although more specific names may be necessary in scenarios involving multiple instance IDs, as previously mentioned. The latter method specifies the service and instance IDs in the SOME/IP runtime to initiate the SD protocol as soon as the application starts running. Subsequently, the SOME/IP SD protocol begins searching for this specific ID combination across the network. Finally, the register_message_handler method is invoked similarly to on the Server side. This method triggers an “on message” function to perform specific actions as soon as a packet is received from the Server. In this work, the packet contains video data, and thus this function encompasses the necessary steps for receiving, reading, and then storing this video in the Client’s memory. The workflow on the Client side is illustrated in
Figure 4.
The management of events is handled in a manner quite similar to the simpler request/response methods described above. However, in this case, the Client offers a service that includes event-based updates, and the Server requests this service. Understandably, events should have their own service, instance, and method IDs. In this study, events convey string-based information about detected objects; their type, such as vehicle or human, information about the number of detected objects, and the time at which these objects are detected. Each time a new object is detected, this information is transmitted as an event update from the Client to the Server. The overall process flow is illustrated in
Figure 5.
Once a running SOME/IP environment is established, video file operations should be properly implemented in the user-defined functions described in the previous section. To package a video stream in a container, a struct named “VideoData” is defined. This struct includes, among other things, a vector of cv::Mat objects capable of representing a single video frame, and a cv::Size object that holds the video data’s window size. In addition to classes from OpenCV [
28], a double type variable is used to store the FPS (Frames Per Second) rate. Lastly, a flag is employed to determine whether the FPS rate is read from the input video; this flag is utilized solely to avoid the redundant operation of reading the FPS rate from every single frame. The full extent of the VideoData struct, as well as the function of each member, is summarized in
Table 2.
To read video files whose directory is known in the VideoData container and writing the contents of it to a video file, a novel class named “Video_Object” is designed. Apart from reading and writing, this class should also be able to serialize video data into an 8-bit unsigned integer stream, and deserialize an existing stream into a VideoData container. As mentioned before, this step is critical while sending video packages through SOME/IP.
The video reading operation is conducted by the “VideoRead()” method of the Video Object class. This method begins by initiating a “cv::VideoCapture” object, which requires the directory of the video file to be known beforehand. Initially, the file in the specified directory is checked for its openability. If the video is successfully opened, the FPS rate is stored in the corresponding container attribute using the “get” method of the VideoCapture class. Subsequently, video frames are read sequentially. The read() method, used within a while-loop, continues to fetch frames as long as it returns a true value, with each current frame being stored in the VideoFrames attribute. During the first loop iteration, the frame size is recorded, and a flag indicating that the frame size has been captured is set to prevent redundant frame size recordings. The pseudo-code for the Video Read method is depicted in
Figure 6.
Once the video file is fully read, its contents are stored in a VideoData container. This means that, given access to a properly populated VideoData container, it is possible to regenerate the exact same video stream without any access to the original data. To initiate the write operation of a video file, a data path is first required, with which a “cv::VideoWriter” object from the OpenCV library is initialized. Additional arguments to initialize this object include the codec code, FPS rate, and frame size. The codec code is chosen as X264 in this project, which is compatible with mp4 files. The FPS rate and frame size attributes are read from the VideoData container. Subsequently, a check is performed to ensure the output file can be created. Finally, the “write()” method of the “cv::VideoWriter” class is used to write the stored frames into the output file one by one. Here, an iteration through the frames is conducted. The pseudo-code for the VideoWrite operation is provided in
Figure 7.
As previously mentioned, complex data structures like our VideoData container must be serialized into sequences of simpler data types, such as bytes, before being incorporated into the payload of a SOME/IP message. Accordingly, methods have been developed for this purpose. Initially, the contents of the VideoData struct must be accurately described in a “.proto” file. This message definition is illustrated in
Figure 8. In this schema, each assignment indicates the field position of the corresponding data type; for instance, frames are situated in the first field, width of a frame in the second, and so forth.
Subsequently, “.pb.cc” and “.pb.h” files are generated using proto compiler. These source files contain all the necessary classes and methods to populate the defined data types and read byte sequences into these data types. To convert a VideoData struct into byte sequences, a proto object generated by proto compiler is initialized to execute all the necessary operations. The “set” method, specifically tailored for our data structure, is used for each content element. The exception to this is frames, which must be handled differently because they are not a single variable but a vector of frames. For serializing a vector of video frames, a for-loop iterating through the frame vector is employed. In each iteration, an OpenCV function named “cv::imencode” encodes the current frame into a buffer in the form of a vector of “uchar” in JPG format, which stands for Joint Photographic Experts Group. This vector is then appended to the video frames field of the serialized message format using ‘add_elements’ method, where ‘elements’ stands for the name of vector field in ‘.proto’ file. The operation of adding video frames to the Proto class member’s repeated byte fields is summarized in
Figure 9.
The final set of operations involves preparing the serialized message as a vector of bytes. Protobuf provides a method named “SerializeToString” which writes the serialized message into a string object passed to the function. However, this string then needs to be reinterpreted as a vector of “uint8” because this format is more compatible with the vsomeip library.
Once the serialization is complete, the entire VideoData container, including the video itself, is successfully represented in the form of sequential bytes. At the receiving end, this byte stream must be converted back into meaningful video data, or in other words, it needs to be deserialized. In the function specifically designed for deserializing our VideoData format, almost every operation conducted during the serialization process is carried out in reverse. Firstly, a proto object is initialized to perform the deserialization operations. A string object is initialized by reinterpreting the received vector of ‘uint8_t’. Then, the “ParseFromString” method is employed to extract the necessary information within the proto-object. To read video frames, the same logic used for writing video frames is applied, except that the encode function is replaced by “cv::imdecode” here. As the name suggests, this function decodes vectors carrying information in JPG format back into the video frame format, or more specifically, into the “cv::Mat” format. Subsequently, decoded video frames are appended to the video frame vector. All other information carried by the serialized message format is read into the appropriate variables using the corresponding get methods.
The nlohmann/json library is an extremely efficient and versatile C++ library designed to process JSON (JavaScript Object Notation) data. JSON is a lightweight data interchange format that is easy for humans to read and write, and easy for machines to parse and generate. The nlohmann/json library simplifies the integration of JSON processing into C++ applications by providing a simple and intuitive Application Programming Interface (API) that supports modern C++ features.
The configuration of video directories is managed for both ends using JSON configuration files. This setup specifies the directory from which the video file is sourced on the Server side and the directory in which received video data is stored on the Client side. The format of these configuration files is illustrated in
Figure 10. As seen in the figure, the format has been kept as simple as possible, as there is no need for a complex structure for this purpose. It is important to note that the format is identical for both ends, with the only difference being the directories used on the VMs. Understandably, “InputFile” is relevant only for the Server and “OutputFile” for the Client. To make the project as scalable as possible, both attributes are utilized as is in both VMs. To read the entries in the JSON files, an “nlohmann::json” object is created. Configuration files are opened with “std::ifstream” objects and then passed to the JSON objects. Subsequently, the JSON object can return the entries from the given file as string objects. Although this method may seem redundant, it was implemented to enhance the scalability of the project and eliminate the need to recompile the entire project each time the video source changes.
The details for realizing the GUI application will not be provided here, since this part is only considered as demonstration in contrast to the other parts of the project. Therefore, only design will be briefly introduced. This application is solely designed for Client side, where object detection is supposed to be made. The GUI design for pseudo-object detection is illustrated in
Figure 11.
The GUI was designed to be simple, as it does not play an essential role in the project. However, it was also considered important that it could be replaced by a real object detection algorithm with as little effort as possible. Therefore, the flexibility to select multiple objects was added to the design, a feature typical of real object detection algorithms. In a real algorithm, there may be both pre-defined and undefined objects, thus, there is an option to categorize undefined or at least non-pre-defined objects under “miscellaneous”. For exemplary purposes, three pre-determined objects were defined: human, vehicle, and animal. These were chosen because the likelihood of detecting one of them in a vehicle environment is considerable. In the case of undefined objects, the text field next to the miscellaneous category must be filled, and then the string in this field is sent as an object. In addition to the user side of the program, there is a clock in the background that counts 100 s and then restarts. When the user selects a set of objects and clicks the “Send objects” button, the current timestamp is recorded, and the corresponding detection time is sent along with the selected objects.
As a result of the use-case, it would not make much sense if some objects were detected and sent before the video transmission is completed. Therefore, a safety mechanism is also implemented to prevent sending objects before the video is completely transmitted to the client side.
By using the methods described in this section, a successfully running project environment was created using only the libraries introduced in this chapter, their dependencies where applicable, and two virtual machines running the Ubuntu Linux operating system. Some properties of these virtual machines are listed in
Table 3 to highlight the system dependencies of this project. In the following section, the results obtained from this work will be summarized.
Each VM has been allocated 4096 MB of RAM and two CPU cores, ensuring sufficient performance for development and simulation tasks. The key features of these virtual machines include the ability to manage SOME/IP communication and video data processing operations, which are critical for the objectives of this project. The chipset is set to PIIX3, the video memory is set to 16 MB, and the VMSVGA graphics controller is used. These settings provide a stable and efficient environment for developing and testing the vsomeip library and related components.