A Survey on Video Streaming for Next-Generation Vehicular Networks

Huang, Chenn-Jung; Cheng, Hao-Wen; Lien, Yi-Hung; Jian, Mei-En

doi:10.3390/electronics13030649

Open AccessReview

A Survey on Video Streaming for Next-Generation Vehicular Networks

by

Chenn-Jung Huang

^*

,

Hao-Wen Cheng

,

Yi-Hung Lien

and

Mei-En Jian

Department of Computer Science & Information Engineering, National Dong Hwa University, Shoufeng, Hualien 974301, Taiwan

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(3), 649; https://doi.org/10.3390/electronics13030649

Submission received: 17 January 2024 / Revised: 1 February 2024 / Accepted: 2 February 2024 / Published: 4 February 2024

(This article belongs to the Special Issue Featured Review Papers in Electrical and Autonomous Vehicles)

Download

Browse Figures

Versions Notes

Abstract

:

As assisted driving technology advances and vehicle entertainment systems rapidly develop, future vehicles will become mobile cinemas, where passengers can use various multimedia applications in the car. In recent years, the progress in multimedia technology has given rise to immersive video experiences. In addition to conventional 2D videos, 360° videos are gaining popularity, and volumetric videos, which can offer users a better immersive experience, have been discussed. However, these applications place high demands on network capabilities, leading to a dependence on next-generation wireless communication technology to address network bottlenecks. Therefore, this study provides an exhaustive overview of the latest advancements in video streaming over vehicular networks. First, we introduce related work and background knowledge, and provide an overview of recent developments in vehicular networking and video types. Next, we detail various video processing technologies, including the latest released standards. Detailed explanations are provided for network strategies and wireless communication technologies that can optimize video transmission in vehicular networks, paying special attention to the relevant literature regarding the current development of 6G technology that is applied to vehicle communication. Finally, we proposed future research directions and challenges. Building upon the technologies introduced in this paper and considering diverse applications, we suggest a suitable vehicular network architecture for next-generation video transmission.

Keywords:

vehicular networks; 6G; video streaming; video processing technology; resource allocation

1. Introduction

The continuous evolution of technology is guiding us towards a new era, especially with the ongoing development in assisted driving technology. This advanced technology is not just aimed at improving driving safety and convenience; it also paves the way for exciting possibilities in vehicular entertainment experiences. Future autonomous vehicles are envisioned to operate independently, reducing the reliance on driver intervention. This autonomy is made possible through enhanced safety features and improved energy efficiency, ultimately contributing to a decrease in the environmental impact [1]. Furthermore, as drivers are relieved of the constant need to monitor the road, passengers within the vehicle will find themselves with increased leisure time for engaging in various activities. These may include participating in online meetings, watching videos, playing games, and listening to music, all without the concern for driving safety. This shift in dynamics is poised to revolutionize people’s perception of vehicles, turning them from traditional modes of transportation into versatile, multi-functional leisure and entertainment spaces. Assisted driving technology provides people with the opportunity to enjoy diverse entertainment activities while on the move, and this trend that integrates technology and entertainment will offer drivers a more enjoyable driving experience.

Due to the swift progression of mobile video technologies, encompassing advancements like 4K/8K, 3D, virtual reality (VR)/augmented reality (AR)/Mixed Reality (MR), High Frame Rate (HFR), and High Dynamic Range (HDR), there has been a notable surge in global data traffic in recent years. This upward trajectory is anticipated to persist in the decades ahead, with projections suggesting that global mobile data traffic is poised to hit 282EB per month by 2027. Of this, nearly 79% is expected to be constituted by video traffic. Thus, ensuring the provision of top-notch video services for users remains a matter of utmost significance [2].

The advancements in multimedia technology have propelled the growth of immersive video experiences. Particularly, VR technology has seen rapid progress in recent years, and 360° video, a pivotal type of VR content, has become increasingly prevalent in our daily lives, capturing widespread attention. The adoption of 360° video extends across various domains, including education, simulation, and gaming, earning praise for its ability to deliver immersive content [3]. Unlike traditional two-dimensional (2D) videos that cover a limited flat plane, 360° video envelops viewers, occupying their entire field of view (FoV). The emergence of commercial head-mounted displays (HMDs) grants audiences the freedom to adjust their viewpoint towards desired content by simply moving their heads, replicating a more natural human experience. When viewed on 2D flat screens, such as smartphones or computers, users retain flexibility in changing the viewing direction, allowing them to watch videos from their preferred angles.

Modern and forthcoming communication systems are structured to address the evolving landscape of multimedia applications, introducing significant network challenges. For instance, 360° video, being more immersive than traditional video, demands increased bandwidth and comes with stringent latency requirements for quality streaming transmission. To address the bandwidth requirements of 360° video, tile-based solutions have become a standard approach. Leveraging projection techniques for the transformation of 360° videos into a 2D format, the videos are subsequently partitioned into numerous tiles, each capable of individual download as separate entities. Through this approach, clients can allocate the majority of resources to the tiles within the user’s viewport—the ones that are most likely to be displayed—ensuring a consistent Quality of Experience (QoE), even if the resolution of tiles outside the viewport is lower or if they are not downloaded at all [4].

In addition to 360° videos, volumetric videos are expected to become the next generation of videos, providing a six-degrees-of-freedom (6DoF) immersive viewing experience. It is poised to emerge as a pivotal application in 6G cellular networks, demonstrating promising potential in sectors like entertainment, healthcare, and education [5]. Due to having more multidimensional information compared to conventional videos and 360° videos, transmitting volumetric videos requires a substantial amount of bandwidth. The most significant challenge is the massive volume of frames, requiring Gbps-scale ultra-high bandwidth, which surpasses the capabilities of current 5G networks [6]. Hence, several studies propose network solutions to optimize immersive video streaming by improving the bandwidth, reducing transmission delays, and mitigating computational demands [7].

1.1. Distinguishing Video Transmission between a Typical Network and a Vehicular Network

The future development of vehicular networks will profoundly impact the automotive industry, with one key application being video transmission. Traditional methods of video transmission typically involve downloading over mobile networks or Wi-Fi. However, the video transmission provided by future vehicular networks will exhibit distinctive features, setting it apart from conventional video transmission methods.

Firstly, future vehicular networks will place greater emphasis on real-time performance and stability. Vehicle communication systems must ensure the smooth transmission of data, even when vehicles are traveling at high speeds, to provide passengers with a seamless experience. This necessitates vehicular networks possessing low latency and high bandwidth characteristics to ensure that the videos that users watch are not interrupted or affected by buffering.

Secondly, security is a pivotal concern in the prospective transmission of vehicular network videos. While a vehicle is in operation, communication systems within the vehicle must secure data to prevent unauthorized access and potential attacks, ensuring the effective protection of video data throughout the transmission process.

Within the vehicular network, the communicating entities exhibit diverse characteristics, including stationary elements like roadside units (RSUs), slow-moving vehicles navigating through traffic jams and road intersections, and swiftly moving entities such as vehicles on less populated highways. This spectrum of node mobility introduces various challenges to vehicular communication. Furthermore, in scenarios that are characterized by high environmental density, despite the considerable data transfer capabilities, numerous issues emerge due to elevated road traffic density. In the realm of vehicular networks, challenges include encountering data collisions, channel fading, bandwidth limitations, instances of packet dropping, and complications arising from signal interference [8].

The transmission of video in vehicular networks places a heightened focus on real-time performance, low latency, high reliability, and effective mobility management. This emphasis is crucial to meet the specific requirements of the dynamic vehicular environment. Owing to the swift progress in chip technology, contemporary vehicles can be equipped with potent processors and sizable hard drives. This endows them with substantial computational, storage, and communication capabilities. Consequently, vehicles can engage in communication with a variety of entities, such as pedestrians, RSUs, satellites, networks, the cloud, and other vehicles. By leveraging the distinctive advantages of vehicle mobility and communication, this technological integration effectively tackles the issue of network congestion in video transmission among vehicles.

1.2. Related Surveys and Our Contributions

Table 1 reflects a comprehensive exploration of our literature review, encompassing various aspects of video transmission [7,9,10,11,12,13,14,15,16,17]. The symbol ● in the table indicates a publication that is within the domain’s scope; ◐ denotes that the topic is covered but in less depth; ⃝ marks papers that do not directly address that area, but in which readers may find related insights; and × indicates that the cited paper does not cover the content within that field.

The Table 1 analysis highlights a dearth of research addressing vehicular network video transmission. A notable contribution comes from Jiang et al. [9], who systematically explored resource allocation for video streaming in vehicular networks, emphasizing the integration of video communication, caching, and computation. However, this paper briefly introduced encoding techniques, lacking coverage of other video processing technologies and 6G-related content. Khan et al. [16] examined caching strategies, computation offloading, and collaborative caching in video streaming with Mobile Edge Computing (MEC). While addressing challenges associated with 360° video transmission, they briefly touched upon related research on vehicular networks. However, their analysis lacked comprehensive discussions on volumetric video, video processing technologies, and the implications of 6G technology.

In a different perspective, Xu et al. [12] conducted a review on visual attention modeling and explored visual quality assessment for 360° video. However, their study did not delve into volumetric video or introduce encoding techniques. Yaqoob et al. [13] proposed adaptive solutions for 360° video, covering viewport-dependent, viewport-independent, and tile-based schemes, and briefly touched on 6DoF, but their discussion lacked a comprehensive exploration of the latest advancements in video processing and 6G technology. Ruan and Xie [10] provided a summary of QoE technologies for 360° video but did not delve into volumetric video and coding standards. Tang et al. [11] surveyed recent machine learning-based network optimization for Quality of Service (QoS) and QoE, primarily focusing on ABR-related research, while not extensively reviewing other video processing technologies. Additionally, their introduction to the technologies used in 6G was not exhaustive.

Turning to Cai et al. [15], they provided an introduction to 360° video, exploring projection and coding techniques, yet their research did not encompass volumetric video and other optimization technologies. Wong et al. [7] discussed streaming methods, MEC, 5G, and 6G networks in the context of 360° video optimization. Unfortunately, the study overlooked volumetric video, encoding technologies, and 6G bands. In contrast, Van der Hooft et al. [14] offered an overview by introducing both 360° and volumetric video, covering developments in compression, transmission, content capturing, and quality perception. Meanwhile, Mahmoud et al. [17] explored caching and multicast for mobile 360° video transmission, introducing encoding and projection technologies for 360° videos. However, their coverage was limited, lacking information on other image processing technologies, volumetric video, and 6G technology.

The review of recent research, as summarized in Table 1, underscores a distinct shortage of research on video transmission within vehicular networks in recent years. This deficiency is especially evident in the absence of surveys that delve into the integration of immersive video and the evolving landscape of wireless networking technologies over next-generation vehicular networks. Therefore, filling in the gaps summarized in Table 1, we conducted a systematic literature review in this paper. The main contributions of our work are outlined below:

In this paper, we provide a comprehensive introduction to and survey of video transmission in vehicular networks, consolidating existing literature surveys relating to Vehicle-to-Everything (V2X). We emphasize the current inadequacy in applying next-generation communication technologies to vehicular networks. Additionally, this work introduces the latest developments in video technology, encompassing not only traditional flat videos but also emerging immersive applications, including 360° videos and volumetric video.
We introduce various compression and optimization technologies that are aimed at conserving video transmission bandwidth. This includes the latest encoding standards and compression techniques that are specifically designed for immersive video. This paper also organizes and compares common encoding technologies. Furthermore, technologies for enhancing video resolution on the client side are discussed, along with a comprehensive review of adaptive bitrate techniques that dynamically adjust video resolution based on user network conditions.
This paper introduces network technologies that are applicable to vehicular networks, with the goal of enhancing the efficiency of video transmission. These technologies encompass transmission strategies to reduce latency, bandwidth-saving transmission techniques, and the implementation of current 5G wireless communication technology. Additionally, a detailed overview of future 6G wireless communication technology is provided. The suggested resolution for the bottlenecks that are encountered in video transmission involves leveraging multiple new frequency bands and employing advanced communication techniques.
Finally, the paper integrates various emerging technologies and proposes a future architecture for vehicular network video transmission. It adopts distinct video optimization techniques that are tailored to the characteristics of different video types and leverages the unique advantages of vehicular communication in conjunction with wireless communication technologies to address bandwidth congestion issues.

1.3. Work Organization

The subsequent sections of this article are organized as follows: Section 2 will delve into research issues relating to vehicular networks. Utilizing existing survey literature on V2X, this section will address the current challenges that are faced by vehicular communication. Section 3 will provide an overview of various video types, ranging from traditional flat videos to the latest immersive videos. In Section 4, a detailed exploration of technologies that are aimed at reducing video bandwidth will be presented, covering common encoding standards and compression techniques. Additionally, this section will introduce technologies for enhancing video resolution and adaptive adjustment of video resolution and bandwidth. Advancing to Section 5, strategies and communication technologies that optimize vehicular network video transmission will be introduced. This section will discuss relevant existing research and provide a detailed overview of actively researched 6G technology, exploring how 6G communication technology in new frequency bands is applied to vehicular communication. Finally, Section 6 will integrate various emerging technologies and propose a future architecture for vehicular network video transmission. It will also discuss future research directions and potential challenges that may be encountered. The overview of each individual section is illustrated in Figure 1.

2. Characteristics of Vehicular Networks

Amidst the swift progress in networking and communication technologies, the modern Intelligent Transportation System (ITS) is flourishing, leading to improvements in transportation quality across various aspects. Vehicular networks, as a foundational and vital component of diverse intelligent transportation scenarios, have the capacity to provide effective vehicle functionalities and services. Consequently, this ensures enhanced traffic management and improved vehicular interactions, which is particularly crucial for densely populated metropolitan areas and congested regions facing significant traffic pressure [18].

Modern automobiles have transcended their conventional role as thermo-mechanical devices, evolving into a fusion of advanced hardware and software elements. Equipped with integrated GPS facilities, entertainment systems, wireless communication devices, sophisticated sensors, visual assistance, automated alarm systems, and robust data processing and connectivity features, contemporary vehicles have attained a high level of technological sophistication. As multiple vehicles traverse the same road segment, the next logical step in the technological evolution is to enable individual vehicles to communicate and coordinate with each other. Specifically, vehicular networks utilize vehicular communications to facilitate instant communication among vehicles, roadside infrastructure, and pedestrians. In this context, a connected vehicular network holds the potential to offer features such as traffic management, route scheduling, data exchange, entertainment, and much more [19]. The primary objective is not only to enhance road safety services but also to accommodate time-sensitive applications within the Internet of Things (IoT) [20].

V2X technology has emerged as a crucial driving force for the future development of vehicle applications. By facilitating communication between vehicles and all surrounding devices, V2X aims to enhance driver safety, reduce congestion, improve traffic efficiency, and provide diverse vehicular entertainment information and applications. V2X encompasses various communication types, including Vehicle-to-Vehicle (V2V), Vehicle-to-Network (V2N), Vehicle-to-Infrastructure (V2I), and Vehicle-to-Pedestrian (V2P) [21]. Given that V2P typically focuses on predicting collision risks between pedestrians and vehicles [22], with lower relevance to video transmission, this section will focus on the introduction of V2I, V2V, and V2N.

2.1. V2I

V2I denotes the exchange of information between vehicles and traffic infrastructure. This communication technology empowers vehicles to participate in two-way communication with the infrastructure along the road. The applications of V2I communication are extensive, covering both safety- and non-safety-related functions. Safety applications include features like collision warning, emergency vehicle priority, driver assistance, road hazards warning, speed, and intersection warning messages, all designed to prevent road accidents and enhance overall mobility. Conversely, non-safety applications are geared towards improving traffic efficiency and optimization, remote vehicle diagnostics, air pollution monitoring, and onboard infotainment. These functionalities depend on diverse elements, technologies, and data formats that are interconnected within V2I ecosystems, enabled by Dedicated Short-Range Communication (DSRC) and Cellular V2X technologies [23].

Numerous investigations into V2I have focused on enhancing the quality of transmission data, employing beamforming techniques, conducting simulations in various environments, and performing practical measurements. Suleman et al. [23] conducted a study to determine the optimal IoT protocols for V2I communications. Their research focused on maintaining data quality across application, transport, and internet layers. The authors suggested a systematic approach to structure and tailor the IoT protocol stack, making it easier to identify areas for adjustment and optimization. In the work by Lopukhova et al. [24], a smart beam-steering algorithm was presented, leveraging vehicle positioning data. This algorithm enhances the generalization capability of the employed machine learning (ML) algorithm, mitigating the impact of the received signal power parameters on system performance and bolstering the system’s resilience against multipath propagation. Ding et al. [25] introduced a context-aware beam update scheme that is compatible with standards. This scheme assists the base station in determining when to initiate beam sweeping by utilizing noisy and quantized feedback on beam-specific layer-1 reference-signal-received power from the vehicle. Yan et al. [26] devised a specialized mobile network system for vehicular communication, conducting field tests to scrutinize channel characteristics during handover and exploring a suitable handover mechanism for the mobile network system. Qiong et al. [27] developed an intelligent vehicular node to assess the dynamic environment and forecast the optimal minimum contention window, ensuring fairness in age. Through the use of a learning algorithm, they assigned the optimal minimum contention window for the vehicular node, enabling informed decisions based on historical data. Additionally, they addressed the challenge of determining beam sweeping decisions for millimeter-wave (mmWave) V2I communications.

The literature has also delved into resource allocation issues in V2I. For instance, Guo et al. [28] proposed an interference management strategy based on dual graph coloring for full-duplex V2I communication. This approach tackles challenges relating to resource allocation and power control, specifically addressing issues stemming from excessive self-interference at the full-duplex base station and interference in V2I uplink and downlink sharing the same resource block. Jin et al. [29] proposed a reinforcement learning methodology to address the resource allocation challenge in V2I communications. Within this framework, a reinforcement learning agent situated in a base station allocates a two-dimensional resource block to each vehicle, ensuring QoS guarantees and maximizing the total achievable data quantity.

As vehicular networks progress and accommodate diverse video transmission demands for vehicle passengers, challenges associated with resource allocation in V2I communication are anticipated to intensify. Fulfilling the requirements of varied video services, including entertainment, remote medical consultations, and other interactive experiences, will demand sophisticated resource allocation strategies to guarantee efficient and reliable data transmission. This complexity will likely necessitate the implementation of advanced techniques and optimizations to balance the diverse needs of various applications while upholding the overall QoS in vehicular networks.

2.2. V2V

V2V communication serves as an information-sharing technology, with a specific focus on preventing collisions, detecting vehicle speeds, tracking locations, and monitoring vehicle movements within the context of connected vehicles. It functions by wirelessly transmitting data between vehicles, aiming to prevent accidents through the exchange of vital information like speed, positions, and heading. Within ITS, V2V communication plays a pivotal role in the development of various vehicle applications, contributing significantly to vehicle safety and the reduction in traffic congestion. Notably, V2V offers dedicated and continuous communication services, irrespective of the availability of internet facilities [30].

In recent years, numerous studies have explored the application of V2V for data transmission. Das et al. [30] introduced a secure blockchain-enabled V2V communication system to enhance vehicle security and ensure secure data sharing and communication among vehicles. Wang et al. [31] proposed bus-based clustering and mixed data scheduling, employing buses as cluster heads. The authors conducted a thorough analysis of multiple characteristics, strategically utilizing them to achieve efficient data dissemination from buses to ordinary vehicles. Mollah et al. [32] concentrated on the approach to mmWave communications in vehicular standardization activities. They addressed connected and autonomous vehicle use cases, along with deployment challenges in achieving fully connected settings in the future. The study also included a comprehensive performance assessment of mmWave-enabled V2V cooperative perception. Jiang et al. [33] created a hybrid far- and near-field 3D non-stationary Multiple-Input Multiple-Output (MIMO) end-to-end channel model designed for Reconfigurable Intelligent Surface (RIS)-assisted V2V communications involving a blocking line-of-sight (LoS) path. The model incorporated a dynamic sub-array partitioning scheme, and the researchers conducted a comparison of its channel modeling performances against traditional methods.

Furthermore, researchers have explored the potential of utilizing V2V for video transmission. For example, Wang et al. [34] introduced a cooperative V2V video alert dissemination mechanism, specifically designed for transmitting accident videos in the highway scenario of the internet of vehicles. The proposed strategy involves bidirectional cooperative transmission, forming clusters of vehicles that are moving in a common direction to enable communication within these clusters. Vehicles in the opposite direction select relay vehicles to aid in the rapid and reliable spread of videos. A solution to the selection of heterogeneous networks designed to support real-time video streaming services and aimed at offloading cellular traffic is proposed by Chowdhury et al. [35]. The proposed network selection issue considers three different network options for content sharing: the cellular network, Wi-Fi, and DSRC with V2V collaboration.

2.3. V2N

V2N communication utilizes standard cellular wireless links to facilitate communication between vehicles, service providers, wireless access networks, core networks, and remote edge/cloud infrastructure [36]. Cellular networks are predominantly employed for V2N connections to facilitate services encompassing entertainment, comfort, and traffic management [19]. This communication model allows vehicles to create strong real-time links with service providers and central networks, making it easier to use applications like live traffic updates, smart navigation, and monitoring the health of vehicles. Apart from offering useful traffic and vehicle-related details, V2N communication through cellular networks can also be used for updating software, smartly adjusting traffic signals, optimizing routes, and more. V2N communication, through leveraging the capabilities of cellular wireless technology, connecting vehicles to a broader network ecosystem, providing users with a richer driving experience, and driving continuous innovation in smart transportation and automotive technology.

In the realm of transportation and automotive technology, researchers have proposed various applications and conducted performance analyses of V2N communication. In a study led by Sandeep et al. [37], the authors introduced a relay architecture for V2N, outlining a system that establishes a connection between vehicles and a base station through a two-phase process. This entails using transceiver-equipped roadside infrastructures, including signboards, streetlights, and traffic lights as intermediate relay nodes.

Hasegawa and Okamoto [38] presented a scheme that is aimed at alleviating network congestion to prevent the degradation of network performance. They devised a method to independently enhance the network performance of uplink V2N communication by adaptively suspending low-priority data transmission for each vehicle based on advanced QoS notifications.

In the context of integrating existing 5G networks with V2N communication, Lucas-Estañ et al. [39] proposed an analytical model for estimating 5G wireless network layer latency. They identified 5G radio configurations and scenarios meeting latency and reliability requirements for V2X services through V2N communication. Addressing resource allocation and data packet sampling rate optimization, He et al. [40] sought to maximize the total throughput capacity of V2N links while ensuring the age of information interruption probability for V2V links in vehicle networks. Coll-Perales et al. [19] calculated the end-to-end latency for V2N connections across various deployment scenarios and 5G configurations, aiming to identify setups that are suitable for meeting V2X application latency needs.

Exploring multicast and unicast in V2N, Jang [41] demonstrated the feasibility of achieving cellular V2X communication through V2N in 5G networks. Considering the increasing popularity of video applications among vehicular passengers, Khalid et al. [42] conducted a comprehensive performance analysis of video transmission through V2N in existing 5G networks. Their analysis primarily emphasized crucial factors, with a specific focus on the throughput and latency of uplink data transmission for V2N applications.

This literature review highlights a gap in current studies on vehicular communication for video transmission. Existing research often fails to consider the diverse nature of video content and overlooks potential optimization through advanced video processing techniques. As immersive applications like 360° videos gain popularity, it is crucial to incorporate a broader range of video types within the vehicular network to address user needs. In response to this, Section 3 of this paper will thoroughly investigate the latest video types and their distinctive characteristics. Following that, Section 4 will delve into emerging video processing techniques that can enhance video transmission in vehicular networks.

Considering the anticipated exponential growth in bandwidth requirements for next-generation vehicular networks, there is a pressing need for a breakthrough in V2X leveraging 6G technologies to meet the future demands of extensive network bandwidth usage. Notably, a gap exists in the current literature regarding surveys that explore the application of the latest 6G technologies in V2X, as is evident from Table 2. Therefore, the upcoming Section 5 will provide a detailed overview of the latest 6G technologies and their application in research relating to next-generation vehicular networks.

3. Types of Video Streaming

The evolution of video streaming technology has not only significantly impacted the entertainment industry but has also transformed various sectors, reshaping our daily lives. This technology is rapidly expanding into industries such as enterprise, transportation, education, tourism, sales, and healthcare, bringing about unprecedented innovation and benefits in these fields.

In the entertainment industry, individuals have become accustomed to watching videos or engaging in online gaming through video streaming during their leisure time. Within the corporate sector, video streaming is widely used for meetings and collaboration tools, enabling teams to achieve real-time interaction and information sharing. In education, video streaming facilitates remote teaching, providing increased learning opportunities. The tourism industry leverages real-time video streaming to offer virtual tours and previews to travelers, allowing them to experience a genuine journey before reaching their destination. The transportation sector employs video streaming technology for real-time traffic monitoring to enhance traffic flow and safety. In sales, video streaming enables consumers to remotely participate in virtual fitting room experiences, enhancing the enjoyment and convenience of online shopping. Within the healthcare field, video streaming of remote medical consultations and medical imaging allows doctors and patients to achieve more convenient healthcare.

As self-driving technology advances, these diverse industries are expected to integrate into the daily lives of vehicle passengers in autonomous vehicles. For instance, the anticipation is for entertainment services for vehicle passengers to transition from listening to music or radios to watching videos [49]. In alignment with the global trend, CISCO anticipates that a substantial portion of infotainment application traffic will comprise video traffic in the foreseeable future. Consequently, onboard video infotainment applications, including live sports streaming and video conferences, are poised to become pivotal applications within vehicular networks [50].

Anticipating the growing demand for future video services within vehicles, numerous manufacturers are strategically focusing on the application and innovation of in-vehicle displays. Notable models, like Tesla featuring a 17-inch screen and the Toyota Crown car that is equipped with a color display, have been mass-produced, marking the onset of the large-screen era [51]. Automobile manufacturers aim to design extensively integrated cockpits within the cabin, bringing together the functionalities of entertainment, social functions, and the office. A deep convergence of vehicles, people, and traffic infrastructure is attainable through internet-based interconnection, providing immersive interactive experiences for vehicle passengers. An exemplary initiative in this direction is the “Cockpit of the Future”, which seeks to revolutionize the in-vehicle infotainment experience for occupants by offering immersive digital services through the cloud platform [52].

The evolution of V2X technologies introduces the potential for providing enhanced video services to vehicle passengers in diverse industry sectors. This advancement not only facilitates the consumption of 8K or higher-resolution videos within vehicles but also encourages interactive video services, such as immersive in-vehicle entertainment experiences and remote medical consultations [53].

For instance, Zhang et al. [54] highlighted that infotainment videos can offer more realistic notices and advertisements of nearby gas stations and shopping malls, thereby enhancing the overall driving experience. Emphasizing the necessity for real-time transmission during critical situations, Yu et al. [55] stressed the importance of high-definition (HD) audio and video, ultrasound images, emergency maps, and large-screen announcements in the remote consultation of an ambulance. Illustrating innovative possibilities, Yu and Lee [56] showcased the remote control of a vehicle by a driver wearing a 360° device. In the realm of reducing distractions on the road, Charissis et al. [57] proposed an AR-based approach that is projected onto the windshield to enhance driver performance, particularly when using infotainment systems. In emergency scenarios, Yu et al. [51] conducted holographic meetings inside the vehicle, projecting virtual three-dimensional video from multi-view cameras onto the cockpit’s central screen and side windows.

To address the evolving in-vehicle demands for the various types of video usage mentioned above, the following subsections will delve into the characteristics and distinctions among different video types, spanning from traditional flat videos to the latest immersive experiences.

3.1. Video on Demand

Video on Demand (VoD) services typically encompass a variety of content, including movies, TV shows, documentaries, and drama series. This diverse selection ensures that viewers have a wide range of choices, catering to different interests. With the continuous improvement of internet speed and technology, VoD has become one of the mainstream forms of audiovisual entertainment. Audiences can easily enjoy high-quality video content through various devices such as computers, smartphones, and tablets. This convenience and flexibility make VoD an ideal choice for modern audiences seeking personalized and diverse entertainment experiences. Netflix [58] and YouTube [59] are two examples of currently popular applications of VoD streaming worldwide.

The primary feature of VoD is that viewers can choose content based on their own time and preferences. This eliminates the need to wait for scheduled viewing times, enabling users to watch videos at their convenience. With the ability to pause, play, and repeat videos, all that is required is a stable internet connection. This autonomous nature provides viewers with greater flexibility, giving them better control over their overall viewing experience.

3.2. Live Video

Live video is a form of real-time transmission of video content, serving as both entertainment and communication. It enables viewers to instantly watch events as they unfold, spanning gaming live streams, musical performances, news reporting, and educational lectures. The immediacy of live video allows audiences to immerse themselves in a genuine and live atmosphere, participating in real-time interactions, such as sending comments, offering gifts, or posing questions.

Numerous online streaming and social media platforms, including Twitch [60] and Facebook [61], offer live streaming capabilities, enabling users to effortlessly share aspects of their lives, professional skills, or hobbies. These platforms often include interactive features between viewers and streamers, promoting real-time communication and community development.

By breaking the constraints of time and space, live video enables audiences worldwide to participate in the same event simultaneously, further narrowing the distance between individuals. This also provides brands, artists, creators, and others with a new avenue for promotion and interaction, expanding the reach of their content.

In the past few years, the video live streaming industry has undergone significant growth. Apart from the inherent need for improved video quality, reduced rebuffering, and minimized quality switches, live streaming clients have specific QoE demands for low latency in the dynamic network conditions of the present, distinguishing them from traditional VoD services [62]. Unlike VoD, live streaming generates segments progressively in accordance with the unfolding live event. The server consistently updates segments, discarding the oldest ones while generating new ones to maintain synchronization between the client and the real-time moment.

In the realm of real-time media streaming, latency, which denotes the time gap between the actual event capture by a camera or when a play-out server emits a real-time signal, and the moment that the end user views the content on their device screen, has traditionally spanned from 30 s to 60 s or more. This duration depends on the capabilities of viewers’ devices and the video workflow in use. Presently, the primary challenge for online live streaming services is the reduction in latency to approximate the range of linear broadcast signals—specifically, achieving low latency within 3 to 5 s, and in some cases, even attaining ultra-low latency and near-real-time streaming. With a growing number of devices being capable of low-latency playback for audiovisual content, there is a heightened focus on research and technological advancements in this field [63].

3.3. 360° Video

Unlike conventional 2D videos, 360° videos provide users with an immersive and interactive experience by offering a navigable panoramic view. Within the realm of three degrees of freedom (3DoF), users have the ability to rotate their heads along the pitch, yaw, and roll axes, enabling them to explore the entirety of the spherical scene.

However, 360° videos require higher resolutions due to their expanded FoV, resulting in an increased demand for data representation. To deliver an immersive user experience, 360° videos mandate elevated resolutions exceeding 3840 × 2160/4K, higher frame rates surpassing 40 frames per second (FPS), and increased bitrates above 10 Mbps. Consequently, 360° videos exhibit significantly larger data sizes compared to fixed-viewpoint videos [64]. Additionally, when users rotate their heads, anticipating content from different FoVs, managing motion-to-photon (MTP) latency becomes crucial to prevent discomfort arising from disparities between the virtual and actual motion. It is essential to maintain a delay of around 10 ms between user movement and video playback [65]. In the case of 360° live video, with its real-time nature, more stringent requirements on end-to-end latency exist. The captured content needs prompt release at the earliest possible moment. Therefore, achieving a satisfactory level of QoE presents significant challenges in network bandwidth, requiring a delicate balance between limited bandwidth and user experience.

The creation of a comprehensive omnidirectional video typically involves the utilization of multiple cameras that are strategically positioned on a spherical structure. The perspectives captured by each camera are then seamlessly stitched together to form a cohesive spherical representation [14]. Processing panoramic videos entails leveraging existing flat image encoding standards. Preceding encoding and compression, the video undergoes transformation into a 2D rectangular image, which is transmitted to the client, decoded, and subsequently inversely projected and rendered onto the user’s head-mounted display.

Several projection methods have been proposed, with a general aim of achieving a more consistent sampling density across the spherical projection. This approach enhances post-projection content continuity, resulting in reduced distortion and improved coding efficiency in the 2D plane after projection [15]. Commonly employed projection methods include Equal Rectangular Projection (ERP) and Cubic Map Projection (CMP) [66]. ERP maps the longitude and latitude lines of panoramic videos onto evenly spaced vertical and horizontal lines, offering a comprehensive view. However, it introduces distortions towards the poles, with more pixels being concentrated at the poles than at the equator. This may lead to additional bandwidth consumption in areas that may be less interesting to the audience [13].

On the other hand, CMP maps the sphere onto the six faces of a cube, reducing the video size by 25% compared to ERP [17]. This reduction results in decreased distortion. However, challenges such as discontinuities at the edges of adjacent faces and a limited FoV may arise [67]. Additionally, the pixel distribution remains uneven. Mapping the projection onto more faces can improve sampling uniformity, as seen in methods like Octahedral Snyder Equal Area Projection and Icosahedral Snyder Equal Area Projection. Nevertheless, as the number of faces increases, seamlessly stitching multiple faces into a rectangular plane becomes more challenging, leading to more discontinuous regions after stitching [15]. Introducing padding between discontinuous faces may reduce the encoding efficiency. Hence, connecting these faces into a rectangular plane while preserving content continuity becomes crucial.

To achieve a more uniform distribution of pixels, an additional method involves incorporating a mapping process for the pixels on the cube surface. Various techniques, such as Equal-Area Cubic Projection, Equal-Distance Cubic Projection, and Parallel-to-Axis Uniform Cubemap Projection, are designed to fulfill this objective. The Hybrid Equi-Angular Cubemap Projection (HEC) is specifically designed to optimize the continuity between adjacent faces by employing different mapping functions for the equatorial (front, back, left, right) and vertical faces. Going a step further, the Content-Aware HEC utilizes adaptive mapping functions to determine parameters, minimizing projection distortion, and enhancing encoding efficiency [68]. In the study conducted by Hussain and Kwon [69], the identification of the format with the least distortion during the format conversion process is a key focus.

Pyramid projection stands out as a viewport-dependent method, effectively reducing bitrate in non-viewport areas [70]. In this method, the region within the FoV is projected onto the base of the pyramid at full resolution, while the remaining portions are distributed across the four sides of the pyramid with a gradual decrease in quality. A challenge with viewport-specific projections lies in the necessity to provide a substantial number of projections simultaneously to seamlessly align with any user orientation. This ensures a smooth quality transition when switching between viewing directions. Consequently, this approach involves a notable additional cost for rendering on the content generation end, as well as for encoding and transmission [14].

3.4. Volumetric Video

Volumetric video technology captures objects and environments in complete 3D [70], enabling viewers to experience 6DoF movement during playback [71,72]. The primary goal of this technology is to deliver a heightened level of realism and immersion in the viewing experience. It is frequently integrated with other technologies, including VR or AR [73]. In vehicular networks, volumetric video can be leveraged for remote healthcare applications or engagement in scenarios requiring rich 3D information and interactivity, such as remote diagnostics and maintenance within the vehicle.

Also, 6DoF introduces translation movements in the horizontal, vertical, and depth directions in addition to the 3DoF orientations. This translation capability facilitates interactive motion parallax, providing viewers with natural visual cues and enhancing their perception of the surrounding volume. An immersive 6DoF representation, unlike its 3DoF counterpart, offers a larger viewing space, giving viewers the freedom to move both horizontally and vertically. Additionally, 6DoF videos facilitate the perception of motion parallax, where the relative positions of scene geometry shift with the viewer’s position.

However, spherical panoramic video lacks the essential information to support such translational movements, thus requiring additional data. Typically, two types of technologies are taken into account for capturing such content [14]: volumetric video-based and image-based solutions. Volumetric video relies on volume-based solutions such as voxels [74], point clouds [75], or light fields [76]. The freedom in the behavior of such content makes it challenging to capture and render intricate volumetric video content like meshes and point clouds. Moreover, volumetric video encompasses a greater array of multidimensional information compared to traditional video including 360° video, resulting in significant bandwidth requirements [77]. The size of an individual frame in volumetric video can exceed 15 MB, and delivering it in an uncompressed fashion would require a transmission rate of 3.6 Gbps [78].

In recent research, numerous researchers have delved into bandwidth considerations that are associated with volumetric video. Hu et al. [79] presented a comprehensive live volumetric video streaming system implemented on standard hardware, capable of delivering volumetric video at 24 FPS with a latency of under 350 ms. This demonstration validated its potential for supporting interactive live streaming applications. Gül et al. [80] proposed a viable solution for rendering a 2D view from volumetric data at an edge/cloud server and streaming the data as a 2D video. They also developed a 6DoF user movement prediction model for ultra-low-latency streaming services, showcasing that the model, on average, reduces position rendering errors that are caused by motion-to-photon delays. Liu et al. [81] introduced a cache-assisted viewport adaptive-volumetric video streaming (CaV3) framework, designed to leverage prior information for the identification of popular volumetric video tiles. Their simulation results underscored the superior predictive capabilities, cache adaptation, and overall utility of CaV3 across various scenarios.

4. Video Processing Technology

The recent flourishing of VoD and real-time streaming media applications, coupled with a significant surge in the internet media data volume, has imposed substantial pressure on global networks [81]. Empirical data also indicate that users show increased engagement and prolonged viewing times when videos are presented at higher bitrates. However, users may face constraints in viewing videos at the highest bitrate due to limitations in available bandwidth between the video server and the video player on the user’s device. Opting for a bitrate that is higher than the available network bandwidth may result in video stuttering during playback, as the played bitrate exceeds the download bitrate. Uninterrupted movie playback, without rebuffering, plays a pivotal role in shaping users’ perception of the QoE [82,83].

Despite the substantial bandwidth capability that is offered by wired network backbones, vehicles in current vehicular networks are limited to wireless transmission with base stations. Given the limited bandwidth of wireless transmission technology, if a large number of vehicles congest a road segment, base stations may struggle to cope with the bandwidth requests of numerous applications. This can result in video playback being unable to proceed smoothly. Therefore, for video applications with substantial bandwidth requirements, various video processing techniques must be employed for compression and optimization to reduce the required bandwidth during transmission. Video compression technology addresses the challenges that are posed by high storage and transmission bandwidth requirements in videos by reducing the file size while maintaining acceptable visual quality. Moreover, video compression plays a pivotal role in vehicular networks [9]. In addition to conventional video coding standards, several innovative encoding techniques have emerged to further enhance compression efficiency.

One such technique is Low-Complexity Enhancement Video Coding (LCEVC), which aims to optimize the efficiency of both existing and future codecs. LCEVC achieves this by utilizing a two-layer approach: the base layer consists of a low-resolution encoder, generated by any existing codec, such as Advanced Video Coding (AVC/H.264). The LCEVC codec is then utilized to encode and generate the enhancement layer. This approach improves performance while minimizing computational complexity.

To implement LCEVC, various 2D video coding standards can be utilized as a base layer. Researchers have explored the application of LCEVC-enhanced videos compared to their unenhanced counterparts, highlighting improvements in visual quality and efficiency. A relevant research paper can provide in-depth insights into these comparisons.

Adaptive Bitrate (ABR) is another vital technology that focuses on dynamically adjusting video streaming bitrate based on the user’s network environment and device characteristics. This adaptive approach ensures an optimal viewing experience by tailoring the video quality to the available bandwidth, preventing buffering issues and enhancing overall streaming performance.

Super resolution (SR), as an image processing technique, aims to reconstruct or generate high-resolution images from lower-resolution sources. This technology enhances visual details and clarity, finding applications in improving visual quality, enlarging images, and various domains relating to image processing.

In summary, video compression technologies like LCEVC, ABR, and SR play crucial roles in overcoming challenges relating to storage and bandwidth in videos. These advancements contribute to more efficient compression, adaptive streaming, and enhanced visual quality in various applications.

4.1. QoE Assessment and Prediction

Due to the continuous growth in video traffic, service providers are required to deliver high-quality streaming experiences to meet user expectations. Nowadays, QoE has become a key metric guiding resource and service management approach, serving to assess the perceived service quality from the end user’s perspective. QoE is the degree of delight or annoyance of the user based on an application or service [84]. It covers all elements of the end-to-end system (client, terminal, network, service infrastructure, etc.). It can be impacted by user expectations and contextual factors, including the surrounding environment, economic considerations, and user preferences.

And QoS refers to the comprehensive set of characteristics in a telecommunications service, as specified by the International Telecommunication Union (ITU) [85]. The QoS determines the service’s ability to fulfill both stated and implied user needs. Over time, within the framework of the ITU’s definition, QoS has evolved into the metrics that are used for evaluating system performance, particularly in networks. However, QoE is a more comprehensive concept, taking into account user expectations, emotions, and overall satisfaction, rather than being limited to technical indicators. While QoS metrics serve as system factors that impact user experiences, relying exclusively on them is inadequate for precisely capturing real user experiences. Human factors such as personal history, age, gender, individual characteristics, and tendencies can significantly influence user experiences. Additionally, non-system elements such as context factors, including places, application-related considerations, and location, play a crucial role in shaping the overall user experience. Therefore, optimizing the system based solely on QoS metrics is not sufficient. Recent works have focused on studying the QoE [86], serving a dual purpose of extracting valuable insights for optimizing audiovisual systems and identifying potential drawbacks that may degrade the user experience and impede the success of emerging technologies [87], such as 360° video or online video delivery in connected vehicle networks.

The assessment methods for QoE are primarily categorized Into two main approaches: subjective and objective assessments. The former is subjectively measured by the end user and may vary from one user to another. The most commonly used measure for QoE is the Mean Opinion Score (MOS), derived by averaging subjective ratings from multiple users to quantify overall satisfaction. For 360° videos, one of the differences in the viewing experience compared to traditional videos is cybersickness. Anwar et al. [88] proposed a predictive model taking into account two QoE aspects: perceptual quality and cybersickness. Additionally, they introduced two new QoE-affecting factors, the user’s familiarity with VR and the user’s interest in 360° video, to assess their impact on user dizziness. Gutierrez et al. [87] employed short 360° sequences (ranging from 10 to 30 s) to assess audiovisual quality, simulator sickness symptoms, and exploration behavior. They analyzed the impact of various factors, including sequence duration, HMD device, coding degradations (uniform and non-uniform), and methods for assessing simulator sickness. These findings played a crucial role in the development of International Telecommunication Union—Telecommunication Standardization Sector (ITU-T) Recommendation P.919.

Objective assessment methods utilize mathematical tools to provide objectivity and real-time capabilities for measuring image quality. They achieve this by mathematically interpreting objective characteristics of images. These methods are primarily rooted in reference-based classification methods or the human visual system. A conventional approach for evaluating video quality often involves a direct comparison with the original video, incorporating metrics such as structural similarity image metric (SSIM), the video quality model (VQM) and peak signal-to-noise ratio (PSNR) [10]. In the algorithm proposed by Taha et al. [89], the received Group of Pictures (GOPs) at the client is compared with the original GOPs at the server. They select several crucial full-reference video assessment metrics for comparison, such as the Mean Absolute Difference (MSAD), VQM, SSIM, and average PSNR, to evaluate the user’s QoE.

Considering the spherical characteristics of 360° videos, specialized metrics have been introduced, including sphere PSNR, Weighted to Spherically PSNR [90], and multiscale weighted-sphere uniform SSIM [91], among others. Dziembowski et al. [92] introduced a novel objective quality metric called immersive video PSNR, which incorporates two techniques for assessing quality loss in common immersive video distortions: the corresponding pixel shift and the global component difference. Zhou et al. [93] proposed a unified architecture integrating user perception and an efficient transformer that is specifically designed for 360° no-reference image quality assessment, using 360° CMP images as input.

Eye-tracking data reflect the patterns of user gaze and gaze movement when viewing visual content, and thus, some studies have utilized eye-tracking data to assess the user’s QoE. Cha et al. [94] introduced an assessment method based on the user’s gaze for QoE relating to no-reference video gaming. By utilizing the variation in delay requirements and gaze patterns across different game genres, the proposed model extracts video features and evaluates video quality in real-time.

Zhu et al. [95], using an eye-tracking device on an HMD, modeled eye-based cues such as blinks, fixations, pupillometry, saccades, and eye gaze as graphs. They developed a graph convolutional network-based classifier to generate QoE assessments by extracting intrinsic features from graph-structured data. Moreover, the QoE is influenced by the metric of QoS, which is primarily dependent on the network’s performance availability. Key factors include packet loss, jitter, and delay. The impact of each individual or combined parameter can result in artifacts such as staircase noise, degraded quality, jerkiness, blurring, blocking, basic pattern, and frames stalling in the streamed video [89].

Assessing and predicting multimedia streaming QoE for end users is the first step in optimizing streaming service delivery and implementing efficient QoE management. QoE prediction models are established through statistical methods by combining various objective assessment indicators and other relevant features. These models aim to predict users’ subjective perceptions of specific services or content. The substantial growth in multimedia content in communication networks necessitates real-time, accurate, and adaptive QoE management for terminal devices and network conditions. In recent years, there have been numerous machine-learning-based predictions for multimedia streaming QoE. Kougioumtzidis et al. [96] conducted an analysis of QoE management objectives for multimedia services, focusing on application-oriented, ML-based QoE prediction models. Furthermore, a review of state-of-the-art ML-based QoE predictive models and innovative techniques, along with challenges relating to multimedia service quality assessments, has been conducted. Miranda et al. [97] estimated VoD QoE using widely supported internet control message protocol probing. Network conditions, measured as input, were utilized by a machine learning model to estimate the QoE. The MOS was estimated for a catalog of 25 different videos, and the model’s generalization performance across the entire catalog was evaluated by training on the shortest video session, resulting in an average root mean square error of 1.05 for MOS estimation. Rao et al. [90] introduced a versatile, bitstream-based video quality model that is applicable in various contexts, including video service monitoring, video encoding quality evaluation, gaming video QoE assessment, and omnidirectional video quality evaluation. Dinaki et al. [98] used indicators such as playtime, average buffering length, buffering frequency, average bitrate, and happiness score to predict video QoE before displaying the effects on the client’s screen. This approach provides the management system with additional time to prevent QoE degradation and reduce customer dissatisfaction. Ruan et al. [10] provided a comprehensive overview of the current state of QoE technologies, applied to 360° video streaming.

To enhance the provision of multimedia services and improve the end user’s QoE, extensive research has been conducted. Many commonly employed strategies rely on quality-based network resource allocation and quality-based routing, which can be viewed as a network optimization process or client-based adaptive video streaming [96]. By understanding user expectations and perceptions, service providers can strategically improve their offerings to deliver a better user experience. Sultan et al. [99] utilized a buffer-based adaptive video streaming algorithm to assess the QoE of real-time video streaming on 5G cellular networks. They proposed a framework that implements personalized multimedia systems through spectrum allocation, content caching, and prediction models, aiming to enhance the QoE for mobile users.

Song et al. [100] explored the impact of class-based user interest and QoE on the storage and update of the RSU cache. They proposed a QoE-driven edge caching method for vehicle networks based on DRL, specifically tailored to short video systems. Benmir et al. [101] introduced a QoE-aware geographic routing protocol designed for video streaming over vehicular networks. In this protocol, relaying vehicles play a crucial role in ensuring high-quality video delivery. Khan et al. [82] introduced a method for resource allocation and unique joint selection of video quality to improve the QoE for vehicular devices. The presented approach leverages the queuing dynamics and channel states of vehicular devices to maximize the QoE, ensuring smooth video playback for end users with a high probability.

4.2. Basic Video Coding

As the prevalence of 8K/4K video continues, the requirements for bandwidth and device specifications in video and real-time video have escalated. The efficient transmission of video to user devices has become paramount. The Moving Picture Experts Group (MPEG), serving as the standard-setting organization for video coding, persistently introduces innovations to address the growing demand for video compression and enhanced video quality. AVC, High-Efficiency Video Coding (HEVC/H.265), and Versatile Video Coding (VVC/H.266) stand out as the primary forces in contemporary video compression. These three technologies play crucial roles in the realm of video coding. Streaming services and live platforms have also established diverse minimum requirements and bandwidth specifications that are tailored to different video types.

4.2.1. AVC

AVC/H.264 was developed through collaboration by a group known as the Joint Video Team (JVT), comprising experts from the ITU-T Video Coding Experts Group (VCEG) and MPEG, established in 2001. Shortly after its introduction, AVC emerged as the preferred standard for various digital optical data storage formats such as Blu-ray Disc and DVD, as well as television broadcasting.

AVC incorporates several features that are designed to enhance video compression efficiency. Noteworthy among these features are intra-prediction within intra-frames, the capability for multiple frames of reference, quarter-pixel interpolation, post-processing through deblocking filtering, and the implementation of flexible macroblock ordering. In intra-prediction within intra-frames, only the variances between successive frames of the video are stored, rather than each individual frame. Ivanov and Moloney [102] proposed a novel method for compressing reference frames. This strategy incorporates predictive pattern encoding, resembling AVC intra-coding but executed on more compact patterns. Experimentation indicates an acceptable degradation in quality with relatively low complexity. Kuo and Lu [103] suggested a straightforward yet effective mechanism for selecting appropriate reference frames for AVC motion estimation, enabling intelligent reference frame selection that is compatible with any existing motion search algorithms. Experimental results demonstrate that the proposed algorithm significantly reduces the complexity of motion estimation at the encoder end. Xie et al. [104] proposed an error recovery method for real-time video streaming in vehicular networks, employing scalable video coding (SVC) as the video encoding standard, which is an extension of AVC encoding.

4.2.2. HEVC

Addressing the escalating demand for bandwidth involves achieving higher compression efficiency in the source encoding process without compromising visual quality, thereby preserving the end user’s QoE. The HEVC codec stands out as a notable solution in addressing these challenges. Developed through a collaboration between two prominent standardization organizations, MPEG and ITU-T VCEG, it emerged as a successor to both MPEG-2 and AVC. These organizations worked together under the Joint Collaborative Team on Video Coding (JCT-VC) [105], jointly contributing to the advancement and establishment of the HEVC standard. In 2013, the ITU-T formally ratified HEVC version 1, designating it as H.265. The primary goal behind its development was to notably enhance encoding efficiency compared to established standards like AVC. The objective was to achieve a substantial reduction in the required bitrate, aiming for a 50% decrease without compromising video quality.

The enhanced coding efficiency of HEVC facilitated the broadcasting and streaming of 4K video with heightened image quality, incorporating features such as HDR and a wide color gamut (WCG). The HEVC codec achieves nearly a 50% reduction in bitrate compared to AVC, while maintaining similar quality. The HEVC standard attains superior coding efficiency over previous standards, such as AVC, through the implementation of new coding tools, including the quadtree coding structure. This innovative structure organizes pixels into coding units (CUs), prediction units, and transformation units, with customizable sizes at each level following a tree configuration. Although these tools offer highly flexible data representation, they also introduce considerable computational complexity.

Notably, the real-time performance and responsiveness of HEVC codecs may be suboptimal due to their inherent high latency resulting from the interframe nature of the codec [106]. Despite this drawback, research has been conducted to mitigate the complexity of HEVC encoding. Xu et al. [107] proposed a long short-term memory (LSTM) framework utilizing a deep learning approach to predict inter-mode CU partition by capturing its temporal dependencies. Their simulation results demonstrated a significant reduction in the encoding complexity of inter-mode HEVC. Jiménez-Moreno et al. [108] introduced an effective complexity control (CC) algorithm based on a hierarchical approach, achieving up to a 60% target complexity reduction compared to full exploration, while maintaining significant accuracy and minimal impact on coding performance. Deng et al. [109] suggested a hierarchical complexity control approach for HEVC, adjusting the maximum depths of the largest coding unit (LCU) in a frame to meet the target complexity. Their approach enables rapid adjustments to abrupt changes in the target complexity during the encoding process.

In the realm of vehicular networks, Chan et al. [110] adjusted the algorithm for autonomous vehicles’ sensing data based on different compression ratios of AVC and HEVC to compare the stability of the algorithm. The study concluded that the algorithm’s performance began to decline when the compression ratio reached a certain threshold. Labiod et al. [111] proposed a cross-layer mechanism to improve HEVC video streaming in vehicular networks while adhering to low delay constraints. Simulated assessments conducted in both suburban and urban vehicular environments showcased substantial enhancements in video quality at the receiver, coupled with notable improvements in end-to-end latency, which were attributed to the proposed mechanism. Jiang et al. [112] presented an approach to address the low latency and low complexity demands of HEVC in vehicular networks. This solution combines a coding tree unit depth decision algorithm with a Bayesian classifier. Their research suggests that this strategy significantly decreases the complexity and encoding time of HEVC, rendering it well suited for video codecs on mobile vehicles.

4.2.3. VVC

With the significant expansion of broadband internet service’s coverage and speed, the required data rates for content delivery have escalated, pushing the boundaries of broadband capacity. This underscores the necessity for even more efficient compression methods than what the current HEVC standard offers—an imperative that is addressed by the emergence of VVC [113].

VVC was developed collaboratively by experts from ITU-T Study Group 16 and MPEG, collectively forming the Joint Video Experts Team (JVET). It represents a hybrid, block-based video coding standard which is primarily grounded in technologies that are akin to those in HEVC. The development of VVC aimed to cater to a broad spectrum of video applications, encompassing standard and HD with a Standard Dynamic Range (SDR) up to 8K or beyond, videos featuring HDR and WCG, computer-generated and screen-captured content, as well as applications requiring ultra-low latency, such as online gaming and wireless display. Additionally, VVC introduces support for partitioning in subpictures and the establishment of virtual boundaries, providing valuable tools for immersive applications deploying 360° video content in immersive and augmented reality settings.

Compared to HEVC, a notable enhancement in VVC lies in the updated block partitioning structure. In contrast to HEVC’s quadtree, VVC introduces increased flexibility with binary tree and ternary tree partition shapes. This enhancement allows for rectangular block sizes, offering versatility in mode selection, transformation coding, intra-frame prediction, and inter-frame prediction [114]. However, the notable improvement in coding efficiency comes at the cost of increased encoding complexity, as the rate-distortion optimization process involves assessing additional partitioning options.

Numerous research efforts are currently underway to enhance the compression performance of VVC. Tissier et al. [115] introduced a two-stage learning-based approach aimed at addressing the overhead complexity associated with the Multi-Type Tree block partitioning structure in VVC intra-encoders. By leveraging Convolutional Neural Networks (CNNs) and decision trees to predict each block, their findings demonstrate that the proposed algorithm effectively reduces the complexity of VVC. Jiang et al. [116] proposed an algorithm for transmitting real-time video streaming over vehicular networks by utilizing CU texture complexity and a gray-level co-occurrence matrix to partition CUs. This algorithm effectively reduces the encoding time and complexity of VVC encoders.

The reduction in bitrate that is achieved by VVC does come with a trade-off of heightened computational complexity, posing challenges for real-time software and hardware applications [117]. The increased encoding complexity makes VVC less suitable for real-time applications [118]. While the coding tools implemented in VVC software test model (VTM) 3.0 enhance the average coding efficiency by 21.08% compared to the HEVC Test Model (HM) 16.19% in all intra-coding configurations, these improvements are achieved at the cost of a significant increase in encoding complexity [119].

Given the noted rise in complexity with VTM3.0, ongoing efforts to reduce encoding complexity are expected to remain an active focus during VVC standardization. In terms of runtime, utilizing the commonly employed random access configuration, the encoding time and decoding time of the VTM are approximately eight and two times longer than those of the HM, respectively.

The VVenC software encoder, built on a streamlined version of VTM, offers an openly available optimized realization. Without using multi-threading, it runs in 46% of the time of the VVC test model VTM, thus improving the efficiency of VVC. This allows a VVC encoder to balance bitrate savings and encoder runtime [120,121]. Wieckowski et al. [122] compared the search space of HM and VTM with the optimized VVenC, introducing an empirical metric to measure the encoder’s search space given a specific search algorithm. While the improved implementation enables VVenC to operate more than two times faster than VTM, it comes at the cost of compression efficiency.

Jialu and Qiang [123] proposed a fast CU partitioning algorithm based on temporal and spatial information, determining whether to traverse further in the next depth via predicted correlations between CUs and neighboring CUs of the same size. Compared with the VTM encoder, the proposed method saves 39.28% of encoding time with a 1.62% increase in the Bjøntegaard delta bitrate (BD-BR) and a 0.05 dB increase in the PSNR in random access mode. Saldanha et al. [114] presented an adaptable approach for efficiently fast block partitioning in VVC intra-frame prediction, leveraging a Light Gradient Boosting Machine. This configurable solution reduces encoding time by 61.34%, with a decrease in encoding efficiency loss of 2.43%. Li et al. [124] proposed a deep learning approach to predict the quadtree- plus Multi-Type Tree-based CU partition to accelerate VVC encoding in intra-mode. Their proposed deep multi-stage exit CNN model determines the CU partition, resulting in an average approach that reduces encoding time by 44.65% to 66.88%, with a 1.322% to 3.188% increase in BD-BR on video sequences.

4.2.4. Comparison of Three Coding Standards

The three aforementioned standards have played pivotal roles in distinct time periods and application scenarios, with each generation introducing improvements in compression efficiency and delivering superior video quality. Consequently, we will conduct a comparative analysis of these encoding standards, referencing relevant research papers to achieve a comprehensive understanding of their technical intricacies and application landscapes.

In real-time applications, minimizing latency is crucial. In the realm of video coding, both HEVC and VVC offer the advantage of high compression efficiency. However, as the compression efficiency of these two standards increases, so do the encoding time and complexity of videos. In contrast, AVC presents a relatively shorter encoding time, making it a more suitable choice for real-time applications. In a comparison study by Nguyen and Marpe [125] on the compression efficiency of HEVC, VVC, and AV1, it was found that the VTM decoder requires 82% more computational resources than the HM decoder, resulting in an average encoding runtime increase of 959% for VTM relative to HM. Additionally, Petreski and Kartalov [126] evaluated the video quality of AVC, HEVC, VVC, and AOMedia Video 1 (AV1) using metrics such as PSNR, Mean Structural Similarity (MSSIM), and Video Multi-Method Assessment Fusion (VMAF). The study also provided a summary of encoding times. In comparison to the widely used AVC, at the same video quality, the encoding times for the three standards VVC, AV1, and HEVC were 135, 18, and 8 times higher than AVC, respectively.

In VoD scenarios, video quality emerges as the most critical factor, and videos in VoD are generally less time-sensitive compared to real-time applications. This is attributed to the flexibility of VoD, allowing for the utilization of techniques such as device caching. Consequently, the goal is to minimize data traffic while upholding high video quality standards. VVC is acknowledged as the successor standard to HEVC, representing the next generation of video coding standards. Numerous studies and evaluations have been conducted to compare VVC with HEVC in various contexts. Validation tests, as confirmed by several studies [113], indicate that VVC has achieved a 50% average bitrate reduction for HD and ultra-high-definition (UHD) content compared to HEVC, with an even more significant reduction in the average bitrate for 360° videos. Researchers have extensively explored the compression performance of three codecs—HEVC, VVC, and AV1 [127,128]. The evaluations consistently indicate that VVC outperforms the other two encoders at both resolutions. In a comparative subjective quality evaluation conducted by Bonnineau et al. [129] for 8K resolution videos, their simulations concluded that, at the same visual quality, VVC achieves a bitrate reduction of 41.11% compared to HEVC.

In the realm of in-vehicle entertainment, diverse video applications can be realized using various encoding standards to cater to the distinct needs of both vehicles and users. Vehicles might prioritize swift imagery for road sensing, while users seek high-resolution videos with lower bandwidth consumption for in-car viewing. By harnessing the technologies of vehicular networks and leveraging the advantages of different encoding standards, we can effectively meet the demands of vehicular networks.

In Table 3, we have compiled relevant studies on the three encoding standards that are mentioned above, including improvements in compression efficiency, optimizations in complexity, and comparisons among multiple encoding standards. Additionally, we have summarized the evaluation metrics that are employed in various studies.

4.3. Enhanced Video Coding

The enhanced compression efficiencies come at the expense of increased complexity and an encoding delay, posing challenges for delay-sensitive applications. Consequently, there is a growing demand for swifter and more streamlined codecs to maintain superior encoding performance. Addressing the demand for more efficient video compression, LCEVC has been introduced to complement existing video codecs such as AVC, HEVC, and VVC, as well as future codecs. LCEVC utilizes encoder-driven upsampling and specialized tools for encoding “residuals”. By employing a limited set of dedicated enhancement tools, LCEVC aims to enhance compression efficiency and minimize the overall computational complexity for coding specific resolutions and bit depths. In doing so, LCEVC achieves a balance between low complexity and high rate distortion performance [130].

LCEVC operates by encoding a lower-resolution version, potentially with a lower bit depth, of a source video using an existing codec, known as the “base codec”, to generate the BL. Subsequently, the disparities between the lower-resolution video and the full-resolution source are encoded, potentially employing mathematically lossless coding, using a distinct compression method referred to as “enhancement”. The enhanced quality is achieved by integrating details that are encoded through an enhancement layer into a lower-resolution version of the same video, encoded through a base layer. These enhancement layers have the capability to introduce features like HDR imaging to any underlying codec, including 8-bit-based codecs, thus addressing concerns relating to backward compatibility [131]. LCEVC data are encoded following the base layer, enabling them to partially compensate for accuracy issues in the encoded size of the BL or base layer stripe. This attribute proves advantageous, especially in applications requiring real-time low latency [130].

Due to its effective reduction in video encoding complexity, various studies have demonstrated significant bitrate and encoding time gains compared to original encoders [132]. Ferrara et al. [131] validated the benefits of LCEVC through experimentation, demonstrating a significant decrease in the overall complexity of the end-to-end process. In comparison to the original encoder, LCEVC enhancement can reduce encoding time by up to 30% and decoding time by up to 15%. Barman et al. [133] conducted an evaluation of LCEVC for live gaming video streaming applications, revealing that LCEVC outperforms both AVC and HEVC codecs in bitrate savings, achieving up to 42% and 38%, respectively, as measured by VMAF. Subjective evaluations suggest that LCEVC outperforms the corresponding base codecs, particularly in scenarios with low bitrates. Additionally, Ferrara et al. [131] suggested the potential use of LCEVC to enhance VVC in MPEG immersive video (MIV), emphasizing that LCEVC is an improvement that can be applied to any codec format and implementation. Notably, LCEVC has been incorporated into a significant next-generation television system, TV 3.0 in Brazil [134], and is currently being implemented across a diverse range of applications, spanning from broadcast to broadband.

4.4. MPEG Immersive Media Standards

The representation of 3D data for volumetric videos includes depth images, point clouds, meshes, and volumetric grids. A point cloud consists of individual points that are defined by geometric coordinates and additional attribute information in 3D space, such as color, reflectance, etc. [135]. Point clouds not only enable free viewpoint rendering, such as through splat rendering, but they also support composite rendering in a synthetic 3D scene, providing comprehensive information on 3D geometry coordinates. However, realistically reconstructed point clouds may encompass an extensive number of points, containing more multidimensional information than typical videos, including 360° videos. Point clouds offer precise and reliable 3D geometric information, enabling autonomous vehicles to perceive 3D space more accurately. As a result, the bandwidth demand for transmitting volumetric video is considerable. Currently, there are several methods for compressing volumetric videos: directly compressing 3D data and transforming 3D data into 2D format. In this context, MPEG has proposed the Visual Volumetric Video-Based Coding (V3C) standard [136]. The V3C standard suggests utilizing the efficiency and widespread applicability of traditional 2D video coding techniques to compress volumetric video. To accomplish this, every volumetric frame undergoes a conversion from its 3D representation into multiple 2D representations and related metadata, known as atlas data in the V3C specification. Following the transformation from 3D to 2D, the resulting 2D representations are then compressed using conventional video codecs. MPEG has designated two applications utilizing V3C: Video-Based Point Cloud Compression (V-PCC) and MIV [137]. In the study conducted by Li et al. [77], the authors utilized a standard 30 FPS video as an example. With an approximate count of 760,000 points per frame, the uncompressed data amount to about 96 MB. This indicates that the bandwidth demand for volumetric video has the potential to reach as high as 2.9 Gbps. However, they noted that while point cloud video codecs can compress volumetric video data, it comes at the expense of increased computational complexity. Balancing computational and communication resources in volumetric video streaming systems remains a significant challenge.

4.4.1. Point Cloud Compression

In 2020, MPEG approved the Final Draft International Standard for two point cloud compression codecs: V-PCC and Geometry-Based Point Cloud Compression (G-PCC) [135]. The two codecs are tailored to different types of point clouds. V-PCC adopts the V3C specification. It generates 3D surface segments through the segmentation of the point cloud into connected regions that are referred to as 3D patches. Following this, each 3D patch undergoes independent projection into a 2D patch. This method adeptly tackles projection challenges, encompassing issues like self-occlusions and hidden surfaces, presenting a pragmatic resolution to the point cloud conversion problem. V-PCC is also suitable for transmitting point cloud videos over networks with limited bandwidth. On the other hand, G-PCC directly encodes content in 3D space, making it suitable for sparse point clouds, such as those from Light Detection and Ranging (LiDAR) sensors and applications like heritage preservation [135]. A growing number of research articles have delved into point cloud compression, proposing new methods or enhancements to both V-PCC and G-PCC [138,139]. Others have analyzed and discussed various compression approaches and tools within V-PCC and G-PCC [140].

Point clouds are gaining increasing prominence as the preferred data capture format across a diverse range of applications, including AR, virtual reality, autonomous vehicles, and telecommunications. This growing adoption reflects the versatility and richness of information that point clouds provide, making them essential in various fields where detailed and accurate spatial data are crucial. This trend is expected to continue as technological advancements further enhance the capabilities and accessibility of point cloud data for a wide array of applications.

4.4.2. MPEG Immersive Video

The MIV standard, developed by MPEG, is designed to cater to the needs of virtual and extended reality applications that demand 6DoF visual interaction with the rendered scene.

Due to the significant technical similarities between V-PCC and MIV, the MIV specification adopts a common bitstream format, V3C. MIV normatively references V3C and introduces enhancements to accommodate its unique requirements. While both MIV and V-PCC share commonalities, they exhibit distinctions in their source and output configurations. V-PCC takes a temporal sequence of point clouds as input, where each point cloud comprises a set of points with scene coordinates and optional attributes per point. In contrast, the source content for MIV is encoded in a multi-view video plus depth (MVD) format [141]. MVD utilizes videos that are captured from multiple perspectives and pieces of depth information to present visual content in a more comprehensive and three-dimensional manner. This approach typically involves using multiple cameras or viewpoints to capture the video, while providing depth information for each frame.

Before the proposal of the MIV standard, several video coding standards aimed to compress MVD formats, including 3D-HEVC [142,143,144] and MV-HEVC [145]. MV-HEVC, as an extension of HEVC, utilizes the established coding techniques of HEVC and introduces multi-view-specific coding methods. However, MV-HEVC faces challenges in handling unstructured multiple videos and ensuring compatibility [146]. These limitations make it less suitable for modern immersive video applications.

The MIV standard utilizes conventional 2D video codecs such as HEVC or VVC to efficiently encode volumetric scenes. The process involves initially processing the video feeds capturing the scene to identify a set of basic views. These views are then enriched with additional information from other perspectives. The data are intelligently organized into atlases and substantially compressed using a chosen existing 2D video codec [72].

Simulcast coding is a direct method for compressing a multi-view representation [147], where each view is compressed independently. However, simulcasting does not account for inter-view redundancies, leading to potential high bitrate penalties. In contrast, MIV addresses inter-view redundancies during coding, providing improved compression by leveraging geometry information through depth maps. Recent studies [148,149] have focused on enhancing the quality of depth maps in MIV. Lee et al. [150] proposed a group-based adaptive rendering method for volumetric video streaming. When compared to the Test Model for Immersive Video (TMIV), their proposed method achieves an average savings of 37.26% in the BD rate in terms of the PSNR. Jeong et al. [146] introduced a bitstream-merging and geometry-packing algorithm for volumetric video streaming, building on the foundations of MIV and VVC. Their algorithm involves dividing and packing geometry atlases, leading to improvements in Immersive Video-PSNR(IV-PSNR) [92], BD rate gain, and reduced decoding runtime compared to the master profile of MIV.

4.5. Super Resolution

Video super resolution (VSR) technology aims to recover high-resolution videos from multiple low-resolution frames, serving as an extension of image SR and employing frame-by-frame processing through image SR algorithms. Image SR, a notable system in computer vision and image processing, focuses on improving the visual perception of low-quality images by transforming blurry, fuzzy, and unclear low-resolution images with coarse details into clear, high-resolution images with enhanced visual perception and details [151]. However, SR performance is not always satisfactory, as it may introduce artifacts and glitches, leading to undesired temporal incoherence within frames [152].

VSR is specifically crafted to restore high-resolution video frames from corresponding low-resolution frames and adjacent low-resolution frames. This technology utilizes complementary information from one or more low-resolution frames to generate missing high-frequency details during the image reconstruction process. Given the interconnected nature of each frame in VSR, spatial information extends beyond individual frames to neighboring frames. In addition to spatial information at the pixel level, each frame is closely linked in the temporal sequence [153]. Most VSR techniques adopt either time-based or space-based approaches to leverage temporal data. Time-based methods [154] treat frames as time-series data, transmitting each frame sequentially through the network. However, this approach can only harness a restricted amount of temporal data from preceding frames and may operate at a slower speed due to the inability to parallelize the processing of multiple frames. In contrast, space-based methods [155] enhance the resolution of low-resolution frames by jointly considering multiple adjacent frames as supplementary data. This method preserves more inter-frame temporal correlations and benefits from parallel computing advantages [156]. However, multiple frames offer a wealth of scene information, covering not only intra-frame spatial dependencies but also inter-frame temporal dependencies such as brightness, color changes, and motions. Consequently, existing works predominantly concentrate on improving the utilization of spatio-temporal dependencies, including explicit motion compensation methods like learning-based approaches and optical flow-based approaches, as well as recurrent methods, etc. [157].

With the remarkable success of deep learning in various areas [158], VSR algorithms based on deep learning have undergone extensive study. Many VSR methods based on deep neural networks, such as recurrent neural networks (RNNs), generative adversarial networks (GANs), and CNNs have been proposed. Typically, these methods utilize a substantial number of both low-resolution and high-resolution video sequences as input to the neural network. The network performs inter-frame alignment, feature extraction, and fusion, and subsequently generates high-resolution sequences corresponding to the given low-resolution video sequences. The standard pipeline of most VSR methods comprises an alignment module, a feature extraction and fusion module, and a reconstruction module [152].

In methods involving alignment for VSR, many utilize motion estimation and motion compensation techniques. Motion estimation is designed to extract inter-frame motion information, and motion compensation is employed to execute the warping operation between frames based on this inter-frame motion information. This process aligns one frame with another. After alignment, relevant features are extracted from the aligned frames, including textures, edges, or other high-frequency components that are essential for reconstructing high-resolution details. Fusion involves combining information from multiple frames to enhance the overall feature representation [159]. Reconstruction’s goal is to generate a high-resolution output from the aligned and feature-enhanced frames, leveraging the extracted features to reconstruct missing details and enhance the overall resolution of the video.

Numerous researchers have proposed distinct VSR techniques. Wen et al. [160] introduced an end-to-end deep convolutional network that dynamically generates spatial adaptive filters for aligning frames and utilizes the aligned frames to recover high-resolution frames. Wang et al. [161] presented a novel low-complexity VSR method that is designed for live video applications. By efficiently leveraging spatial information from the image SR model and the inherent temporal information in the video, their method minimizes computational redundancy through the recycling of intermediate features, achieving real-time inference speed. Li et al. [162] proposed a framework for video coding and uploading that employs VSR technology to enhance the quality of real-time video streaming in scenarios with a constrained upload bandwidth.

In addition to traditional flat videos, VSR technology can be applied to 360° videos. Baniya et al. [163] proposed a 360° VSR deep learning model that is not constrained by traditional VSR techniques such as alignment. They addressed spherical distortion issues through special feature extractors and a novel loss function. Deng et al. [164] introduced a latitude-aware upscaling network that considers the characteristics of 360° videos. Different latitude bands in 360° videos can learn to adopt different upscaling factors, significantly improving VSR efficiency and saving computational resources.

Utilizing VSR technology can lower the bandwidth demands for video transmission. This involves converting high-resolution videos into low-resolution videos. The resultant low-resolution video, along with the VSR model, is sent to the client. Upon reception, the client reconstructs them into high-resolution videos, thus reducing the required bandwidth for video transmission [165]. In the context of vehicular network environments, it becomes feasible to send the VSR model and low-resolution videos to the vehicle, enabling their restoration into high-resolution videos. This method helps minimize the bandwidth that is needed when vehicle users request high-resolution videos.

4.6. Adaptive Bitrate

HTTP adaptive streaming (HAS) is a primary method for delivering video data. In HAS, videos undergo encoding at various quality levels on the server side, each associated with a specific bitrate. These videos are then temporally segmented into consistent-duration segments. On the client side, the ABR algorithm is responsible for selecting and requesting the appropriate bitrate for each segment. This adaptive process aims to address network throughput fluctuations while striving to provide the optimal QoE [166].

The video is segmented into several consecutive segments, and each segment is encoded at various bitrates. ABR algorithms typically determine the quality level for upcoming segments based on user context conditions, such as buffer occupancy and network conditions [167]. Viewers can then request the suitable bitrate version for each segment, taking into account factors like screen resolution and both current and predicted channel conditions [168].

Numerous researchers [62,169,170,171,172,173,174] have put forth diverse ABR algorithms, with many of these studies incorporating DRL. Ma et al. [62] proposed a novel QoE-aware adaptive video bitrate aggregation scheme designed for HTTP live streaming and based on smart edge computing. Their scheme oversees the traffic that is requested by clients for the same live streaming service, adjusting their bitrates according to network conditions, client states, and video characteristics. Wang et al. [172] presented a framework based on DRL to decide the streamer’s encoding bitrate, uploading power, as well as edge transcoding frequency and bitrates. Li et al. [174] introduced an ABR scheme based on reinforcement learning that integrates the processing of transport and video application layers to directly optimize the QoE. This optimization is formulated as a weighted function considering video quality, delay, and stalling rate.

Compared to traditional video, 360° video requires a larger amount of data to maintain user experience, leading to higher bandwidth requirements. Transmitting the entire video in high quality is evidently not wise, as users are typically limited to viewing only a small portion of the 360° video. Viewport-adaptive streaming is a direct method for decreasing transmission bandwidth, dynamically adjusting the scene quality in real time based on the user’s FoV [175]. After projection, 360° videos are spatially divided into equally or unequally sized rectangles called tiles. These tiles can undergo independent encoding, compression, and transmission at varying bitrates [176]. Considering this characteristic, tile-based adaptive streaming has gained widespread acceptance among viewport-based approaches. It facilitates the transmission of quality-variable tiles, allowing for adjustments without compromising the visual quality in response to user interaction.

In the context of traditional 2D video, ABR algorithms mainly adjust the bitrates of downloaded segments based on the available client-server bandwidth. This ensures continuous video playback without rebuffering, commonly referred to as freezes. ABR algorithms for 360° videos are notably more intricate, because they must simultaneously perform two types of adaptations. The algorithm first needs to execute view adaptation by predicting the user’s head position in advance and determining which tiles the user might view in the future. Secondly, the algorithm needs to adapt the bitrate by determining suitable rates for downloading segments of the predicted tiles. It is crucial to optimize both adaptation types jointly, ensuring that tiles that are more likely to be in the user’s viewport are downloaded at higher bitrates [177].

Viewport adaptation relies on the accuracy of FoV prediction. Existing FoV prediction methods are typically designed using linear regression, motion estimation, machine learning methods, user clustering, or hybrid prediction mechanisms [64]. Dong et al. [178] combined the historical motion trajectory of the target user with the future trajectory of other comparable users, proposing a Gaze-based FoV prediction model to enhance the accuracy of long-term FoV prediction. Pang [176] enhanced the precision of predicting the user’s future viewport by incorporating object movement with the trajectory of the user’s head movement and the video saliency map. Nguyen et al. [179] incorporated a Gated Recurrent Unit block in front of the LSTM to expedite input data processing, thereby enhancing accuracy in the initial seconds compared to an algorithm solely utilizing LSTM. In order to eliminate projection errors, Li et al. [180] employed a spherical CNN to optimize FoV prediction, replacing the traditional 2D CNN. Some studies have also focused on the real-time demands in live streaming scenarios. In [181], clustered federated learning and reinforced variational inference were utilized to optimize FoV prediction and content delivery in live streaming. Peng et al. [182] employed CNN for extracting spatial characteristics from video frames and LSTM to understand the temporal characteristics of user viewport trajectories. Their proposed viewport prediction model shows a potential reduction in transmission bandwidth by approximately 50%. Sun et al. [183] explored streaming flock, allowing users to experience diverse playback latencies based on their network conditions within a short range. The viewing directions of users with a shorter playback latency are leveraged as valuable inputs to anticipate the viewing directions of users with a longer playback latency.

As mentioned earlier, transmitting the tiles within the user’s FoV at high bitrates can effectively reduce the video’s bandwidth requirements. Common segmentation methods involve dividing the spatial aspects of the video into equally sized rectangles. However, this is not the most efficient approach, considering the bandwidth waste in areas that are delivered in high quality but remain unwatched. On the other hand, subdividing all tiles into smaller granularity would increase the encoding pressure. Consequently, recent research explores dynamic adjustments to tile size [184,185,186] or methods that involve unevenly sized tile partition [187,188,189].

In the method proposed by [188], the tile sizes adapt to the content of the video. A higher tiling granularity is allocated to areas that attract more visual attention, while larger tiles are used in less critical regions to maintain constant tile overhead. When dealing with tiles at the edges of the FoV, the transmission of the entire tile at the highest quality, even if the FoV only occupies a small part of it, results in bandwidth waste. Kan et al. [187] introduced a viewport-aware adaptive tiling scheme that employs fine-grained tiles to cover the borders of the video’s viewport, marginal, and invisible regions. Their scheme helps reduce transmission redundancy while displaying other areas using coarse-grained or medium-grained tiles to enhance compression efficiency. In [189], the authors considered the impact of projection distortion on the accuracy of dynamic tiling systems. They analyzed spherical viewport trajectories to determine the optimal tile strategy.

Zhang et al. [184] determined the tiling configuration based on the recent performance of viewport prediction. They explored the effects of various tiling options on tile selection and decoding complexity, demonstrating that adaptive tiling can accommodate errors in viewport prediction and contribute to maintaining the quality of tile selection. In [185], the efficiency of encoding and distortion in VR content was assessed concerning various tiling patterns. An edge server that was co-located at a cellular base station divided its generated VR content into several tiles based on the selected tiling pattern. A weighted-to-spherically-uniform quality model was employed to evaluate the impact of different tiling patterns on the QoE. The simulation results demonstrated that the proposed solution effectively reduces the QoE-aware cost for VR content transmission compared to other baseline algorithms.

Yaqoob et al. [175] considered two viewport prediction mechanisms when selecting tiles, adjusting the streaming region based on changes in content complexity and positional information. This adjustment ensures the availability of the viewport, providing a seamless perspective for 360° video users. Wei et al. [64] utilized distinct rate distortion models for different regions of the video to enhance the user’s QoE. They achieved this by de-emphasizing imperceptible video regions, ensuring efficient 360° interactive transmission across various network conditions. Gao et al. [186] concentrated on selecting optimal encoding parameters, such as tile sizes and quantization steps, to strike a balance between transmission and encoding efficiency. Their objective was to minimize the video distortion displayed in the user’s viewport within a given transmission capacity. For live streaming, Chen et al. [190] segregated the transmission tasks for upstream and downstream, optimizing them asynchronously. Wang et al. [191] studied a framework that significantly reduces the uplink bandwidth consumption without compromising the QoE.

Table 4 summarizes adaptive streaming methods that are used in flat videos and immersive videos in recent studies. It can be observed that there is less research on 360° live streaming compared to 360° VoD. Many studies utilize DRL as the algorithm for adaptive bitrate allocation, predicting the FoV based on head movement trajectories and employing the most common tile strategy.

Presently, extensive research is dedicated to the application of ABR technology in video transmission for vehicles. Yun et al. [193] developed a real-time solution to the dynamic adaptive video streaming problem in mmWave vehicular networks. The solution determines which content, at what quality, and how many data chunks will be transmitted from the macro base station to the micro base station. This scenario is modeled as a controlled Markov decision process. DRL is employed to address this challenge, dealing with numerous network states and a continuous state space. Han et al. [194] dynamically calculated the optimal video bitrate based on the current available bandwidth and buffer occupancy, with the ability to dynamically adjust multiple buffers in real time. By considering the bandwidth conditions, buffer size, and client’s area of interest, they aim to enhance the QoE and bandwidth utilization. In [195], the RSU collects the information on resource blocks and available computational resources, and then makes a centralized decision on bitrate selection, fog computational resource allocation, private vehicle scheduling, and resource block allocation. Dai et al. [196] proposed an MEC-based architecture for ABR-based multimedia streaming in connected vehicles, where each multimedia file is split into multiple chunks, encoded using different bitrate levels. The bandwidth allocation and quality level of the block are determined based on a benefit function that combines the quality level, available play time, and freeze delay.

5. Optimizing Video Transmission Involves Network Technologies and Next-Generation Wireless Communication Technologies

From the standpoint of a network operator, ensuring a satisfactory live streaming experience over a robust network is of the utmost importance. This entails minimizing the latency between video recording and playback. In today’s digital landscape, internet traffic is primarily dominated by the transmission of video streaming applications via the wireless/cellular interfaces of mobile devices. Despite the ability to transmit UHD videos via mobile devices, the perceived video quality often falls below user expectations because of distance-based high latency between content sources and end users [197]. Therefore, the imperative is to reduce transmission latency.

Concurrently, 4G wireless networks have witnessed widespread adoption as a dependable communication technology over the past decade. Numerous industrial applications have benefited from the support of 4G and Wi-Fi [198]. However, given the escalating demands of users in terms of reliability, latency, and throughput performance, the transition to 5G and, looking ahead, 6G technologies is anticipated to replace 4G to meet these heightened requirements. Accordingly, this section will present strategies for minimizing latency in transmission, technologies for reducing the required bandwidth, and insights into next-generation wireless communication technologies. The most recent research in their application to vehicular network communication will also be explored to elevate the viewing experience for passengers engaging with multimedia applications.

5.1. Caching

With the rapid advancement of communication technology, the growing demand for mobile devices, extensive network traffic, and the rising expectations for real-time and low-latency services present substantial challenges for communication and computing. MEC has emerged as a pivotal technology to tackle these issues, originating from cloud computing and progressing towards deploying computing resources at nodes near users. This approach significantly reduces communication and computing costs [199]. Edge devices now possess ample processing power to perform specific computing tasks locally, eliminating the necessity for transmitting all data to remote servers for processing. This results in bandwidth savings, reduced latency, and overall system performance improvements. In the MEC environment, edge servers are positioned at base stations or connected to access points near end users [200]. Application providers can lease resources such as memory, CPU, storage space, and bandwidth on edge servers to host their applications, enabling users to access services with low latency [201].

In cache-aided wireless cellular networks, frequently accessed data can be stored by edge facilities to relieve the strain on capacity-limited backhaul links and the demand for wireless transmissions. This leads to the realization of highly effective networks with reduced delays, improved spectrum efficiency, and enhanced energy efficiency [202]. Additionally, caching offers advantages such as convenient deployment, low cost, reduced backhaul link burden, and decreased data traffic, making it a focal point of network deployment [203]. Two typical caching strategies in wireless networks are Largest Content Diversity (LCD) and Most Popular Contents (MPCs). MPCs involves storing the files with the highest demand in network nodes to maximize collaborative gains between nodes, while LCD ensures that each file is stored in at most one node, achieving maximum caching diversity [204]. Machine learning is employed to predict cached content and compute resources in the network. MEC units can be integrated into network edge devices, enabling them to perform storage and computing functions [205].

Currently, researchers have applied caching technology to video transmission and vehicular networks. Wu et al. [206] considered vehicle mobility and proposed a cooperative caching scheme in the vehicular edge computing based on DRL and asynchronous federated learning to predict popular content and subsequently identify the optimal cooperative caching location for the predicted popular content. Liu et al. [207] presented an algorithm that jointly addresses caching, computation, and power allocation. Their algorithm takes into account various video sizes and playback rate requirements, with the goal of minimizing adaptive video streaming transmission delay and energy consumption. Xi et al. [2] utilized drone-assisted cellular networks optimizing adaptive bitrate video streaming caching and user association optimization to enhance users’ video viewing experience and reduce transmission latency. Zhou et al. [199] explored multi-user caching in vehicular MEC networks, where edge servers with caching and computing capabilities support task computing from vehicle users. Ma and Sun [208] proposed an architecture incorporating end-edge collaboration, edge caching technology, and network resource deployment optimization for video transmission in vehicles. Their architecture involves 5G base stations and drone mobile base station configurations to transmit SVC videos with analyzed user mobility, obtaining improved edge caching strategies. Wang and Grace [205] suggested two proactive caching algorithms based on multi-armed bandit learning, using reinforcement learning to predict the next RSU that a vehicle will pass through to achieve proactive caching in vehicular networks. Zhang et al. [209] studied the vehicular edge caching challenge in real vehicle scenarios, designing an adaptive caching algorithm to more effectively utilize caching resources, increase caching prediction success rates, and reduce interruptions in caching services. Fu [210] proposed caching update and pricing algorithms, considering the transcoding of various video versions. Base stations take into account energy consumption when caching video segments. Pricing algorithms based on different usage scenarios of caching and base station resources are adopted to enhance the flexibility of network resource utilization.

Within the vehicular network, the caching approach introduced by [206] can be implemented. This entails forecasting popular video content by leveraging the global model to predict the content preferences of individual vehicle users. Following this, the local RSU or base station can employ the dueling DQN algorithm to establish optimal cooperative caching. This method involves caching the video content that is most engaging to users at the local RSU or base station, with the goal of reducing content transmission delays.

5.2. Multicasting

Multicasting is a communication technique that is employed in networks to concurrently transmit data to multiple destination addresses. Unlike unicast and broadcast, multicast delivers data exclusively to a specific group of receivers rather than broadcasting to all receivers. The transmitter sends the data stream only once, avoiding duplication and saving significant bandwidth resources, thus providing higher real-time performance. This is particularly advantageous in group communication scenarios where the same information needs to be shared, such as in video conferences, multicast Internet Protocol television, and online streaming. In the face of rapidly increasing video traffic in vehicular networks, when multiple users request the same tile, multicast can effectively alleviate bandwidth pressure and offer higher scalability.

Nguyen et al. introduced a novel framework for delivering 360° live video to multiple users through 4G mobile networks [211]. The framework utilizes SVC to encode tiles. In cases where multiple users request the same layer, such as the base layer, the framework employs multicast transmission, resulting in substantial bandwidth savings. For individual user requests pertaining to enhancement layers, the delivery is conducted via unicast, thereby optimizing bandwidth usage in the process. However, multicast capacity is constrained by users experiencing the weakest channel gain. In scenarios where the base station antennas are fixed, the system may face challenges meeting the growing demand for multicast communication as the user count increases, leading to performance degradation. In [212], Device-to-Device (D2D) communication was employed to enhance multicast efficiency. A proposed 4G-based sidelink-assisted multicast system selects users with poorer channel conditions to receive services through a sidelink multicast. Two different decoding scenarios, independent decoding and joint decoding, were adopted in the algorithm design to optimize the entire system and enhance the user experience. Chen et al. [213] implemented a differential QoE provision for numerous video users with time-varying and heterogeneous channel conditions in an SVC-based multicast system. By dynamically adjusting video bitrates and wireless resources, the system optimizes the long-term QoE for all users. Ouyang et al. [214] explored D2D multicast for scalable videos, incorporating potential users into the multicast group and considering inter-user channel quality, social parameters, and user preferences for video quality. They devised a cluster head selection algorithm to enhance the service quality. Chowdhury et al. [50] investigated minimizing the gateway placement for video streaming in vehicular networks, aiming to reduce service costs.

Xiao et al. [215] explored Non-Orthogonal Multiple Access (NOMA)-based edge multicast in the delivery phase, approaching the video delivery as a combinatorial problem involving the base station and multicast group. They introduced a two-layer matching mechanism based on NOMA to address collaboration issues between macro base stations and small base stations, as well as a NOMA-enabled real-time SVC multicasting framework using UAV relays. In the context of NOMA-based multi-layer multicast systems, Dani et al. developed a collaborative power allocation and subgrouping approach to optimize the overall multicast rate [216].

Li et al. [217] explored multicast optimization problems in high-dynamic Low Earth Orbit (LEO) satellite ground communication scenarios. They proposed a joint power and bandwidth resource allocation strategy utilizing a two-stage iterative method. Their simulation results demonstrated that the capacity was at least 10% higher than that of the baseline algorithm. Zhong et al. [218] introduced a decentralized approach for adaptive multicast video streaming with caching assistance. This enables each video client to optimize their bitrate without the need for coordination with other clients. Pan et al. [219] designed a low-latency video streaming system based on a Bit Index Explicit Replication multicast. Upon comparison, it substantially reduced the usage of external network bandwidth.

Multicast provides the same multimedia content to multiple users through the same frequency band. Significant improvements in channel capacity and energy efficiency will enhance the potential of multicast in V2X scenarios, where vehicles can receive multicast information from other vehicles, the internet, and roadside infrastructure [213]. As mentioned in [50], selecting a small number of vehicles as mobile network gateways allows them to obtain video content from the network via V2I communication and distribute the content among other peer vehicles through V2V communication. From various sources, it is evident that scalable video has the potential for multicast applications, and this holds true for vehicular networks as well. Therefore, in our proposed architecture for vehicular network video transmission, similar ideas can be applied. The foundational layer of multicast can be used to form user subgroups with similar channel conditions. Users within multicast groups can subscribe to appropriate enhancement layers based on their individual channel conditions [213]. For vehicles with poorer network conditions, V2V communication via multicast can be employed to enhance their reception quality [212].

5.3. Artificial Intelligence in Vehicular Networks

With the advent of vehicle intelligence, automobiles are evolving from mere transportation tools to intelligent terminals. Moreover, the diversity and quantity of onboard equipment are on the rise, and people’s expectations for the quality of automobile services continue to grow. In the era of vehicular networks, intelligent modules within vehicles can offer traffic management, intelligent vehicle control, navigation capabilities, accident prevention [220], as well as mobile internet applications, rich multimedia services, and various emerging interactive applications. These advancements aim to reduce operating costs, foster a secure driving environment, and enhance the overall user experience. Artificial intelligence (AI) plays a pivotal role in significantly boosting the cognitive performance and intelligence of vehicular networks. This contribution is vital for optimal resource allocation in problems that are characterized by time-varying complexities and differences [221].

AI is recognized as a pivotal technology to handle the substantial data volume in vehicular networks [222,223]. Despite being in its early stages, the application of AI in vehicular network systems exhibits promising potential. AI holds a crucial role in real-time applications, including location-based services, vehicle platooning, congestion control, and traffic flow management. This becomes particularly significant as the count of vehicles that are equipped with computing capabilities and sensors continues to increase, and as the significance of AI in enhancing privacy protection becomes increasingly apparent [224]. Alladi et al. [225] presented an MEC-enhanced intrusion detection framework, leveraging AI, to detect and categorize diverse cybersecurity attacks within the vehicular network. Applying AI to collective sensor data is a significant contributor to enriching applications and driving the development of vehicular networks [47].

These AI technologies also play a crucial role in Advanced Driver Assistance Systems (ADASs), providing drivers with real-time safety alerts and assistance features such as blind-spot monitoring and adaptive cruise control. Kuutti et al. The authors of [226] analyzed the various benefits of AI’s deep learning for vehicle control. The ability to self-optimize and adapt to new scenarios based on data makes it particularly well suited for addressing challenges in controlling dynamic and complex environments. AI monitors vehicle health by utilizing the visual systems of sensors to identify the surrounding environment, overseeing the real-time conditions of vehicles through monitoring systems, predicting potential faults, optimizing routes [227], and managing traffic flow. Furthermore, it intelligently conserves network bandwidth and reduces latency to improve the user experience [228,229].

5.4. Blockchain

In Section II, it is highlighted that in the era of 6G, the convergence of V2X and ITS will result in an unprecedented surge in data traffic, a substantial increase in the number of highly mobile nodes, and the imposition of low-latency requirements. The substantial influx of data brings forth notable challenges in terms of security, privacy, and trust. The heightened exchange of data and communication within the connected vehicle network raises the vulnerability to potential hacker attacks, exposing risks such as malicious code, data theft, and remote intrusions [230].

In contrast to conventional vehicles, connected vehicles exhibit efficient communication capabilities, enabling a real-time exchange of crucial safety-related information among neighboring vehicles and the surrounding infrastructure [231]. However, this enhanced connectivity also amplifies the susceptibility of vehicles to network threats and the risk of cyberattacks, thereby impacting both the vehicles and drivers [232]. Presently, security solutions for vehicular networks incorporate diverse technologies, including cryptography, blockchain, and machine learning algorithms [233].

Ensuring trust among nodes in the vehicular network poses a challenge for secure and credible message dissemination. An unfamiliar message is inherently treated as untrusted, because a malicious node could potentially create a false message regarding an incident that did not genuinely occur [234,235]. False messages can lead to misjudgments in the vehicular network and may result in issues such as tracking of vehicle privacy or network attacks. Therefore, it is crucial to manage and monitor the source and accuracy of data. To address these security and privacy concerns, blockchain technology, which is characterized by a decentralized architecture, immutability, and encryption features, is poised to enhance the security of video transmission within vehicular networks.

Blockchain is a decentralized distributed data storage and management technology. It links data blocks together in chronological order, forming a continuously growing chain. Each block contains the hash value of the previous block, making the data on the entire chain immutable [236]. Its decentralized framework is well suited to large-scale networks. An additional use of blockchain in trusted vehicular networks involves assessing message credibility and facilitating decentralized dissemination. Utilizing blockchain consensus mechanisms for transaction validation can be applied to verify messages originating from untrusted nodes.

Numerous studies [237,238,239] have investigated the application of blockchain in vehicular networks, all acknowledging that blockchain’s characteristics provide a certain level of security and usability in the era of big data. The application perspective emphasizes the utilization of secure blockchain-based vehicular networks in diverse applications, including transportation, data sharing/trading, and resource sharing [240]. Examples include real-time data authentication among vehicles and checking for any vulnerabilities that might compromise communication between vehicles.

Consequently, we believe that blockchain holds significant potential in the future 6G landscape of vehicular networks. Ayaz et al. [241] proposed a blockchain-based federated learning solution and its integration to enhance security and privacy in vehicular networks, reducing the failure rate in the presence of malicious nodes. They also highlighted the alignment of blockchain and federated learning with the trends in 6G, suggesting their applicability in future technologies. Kamal et al. [242] presented a blockchain-based data sharing mechanism that achieves real-time data authentication for V2V communication with lower complexity and utilizing existing protocols. Cui et al. [243] proposed a blockchain-based approach to achieve secure, efficient, and anonymous V2V data sharing, preventing unauthorized data sharing without the need for RSUs.

5.5. 5G

Starting from 2020, the 5G wireless communication network has been in the process of standardization and is currently undergoing commercial deployment. The key focus of 5G revolves around three typical application scenarios: low-latency communications (uRLLC), massive Machine-Type Communications (mMTC), and enhanced Mobile Broadband (eMBB), which is ultra-reliable. This follows the classification by the International Telecommunication Union Radiocommunication Sector (ITU-R) [244].

The 5G spectrum is allocated across low- (sub-1 GHz), mid- (1–6 GHz), and mmWave (30–100 GHz) frequency ranges, offering distinct performance and coverage characteristics. Among these, the mid-frequency spectrum of 3.3–4.2 GHz is considered an optimal choice for 5G signals, striking a balance between bandwidth, distance coverage, and building penetration [245]. The prevalent frequency bands for 5G wireless communication are mmWave and sub-6 GHz. mmWave communication employs wireless frequencies ranging from 30 to 100 GHz to transmit information, enabling the provision of data at rates reaching several gigabits per second due to its abundant bandwidth [246]. However, mmWave has a limited coverage range owing to its high-frequency nature. Furthermore, the implementation of frequencies that are sub-6 GHz can leverage existing 4G bands, making it a suitable option for 5G communication [247].

Researchers have recently implemented sub-6 GHz and mmWave technologies in the context of vehicular networks. Ikram et al. [248] utilized multi-beam, multi-polarized, and MIMO techniques in sub-6 GHz and mmWave frequency bands to realize 5G-V2X applications. Their system provides communication capabilities with 360-degree coverage, addressing the communication needs of vehicles to interact with other devices in diverse scenarios. He et al. [246] introduced an innovative sub-6 GHz V2X-assisted synchronous mmWave communication scheduling system, which takes into account the significance and timeliness of data to establish the communication schedule, thereby enhancing network efficiency across mmWave links. Their proposed scheduler was applied in typical straight-line highway scenarios.

Zhang et al. [49] investigated a 5G information-centric networking-based vehicular network designed for autonomous vehicle users, confirming the viability of integrating 5G into the vehicular network for video transmission. In a related investigation, Noh et al. [249] presented an mmWave-based V2I communication system. They then explored essential technologies to overcome performance challenges in the mmWave-based system arising from high mobility. The validation results illustrate that the 5G-based mmWave system is both feasible and efficient in achieving high data rate vehicular communication on a highway. Thus, in scenarios with a sufficient base station bandwidth, video transmission to vehicle users can occur through the 5G spectrum.

5.6. 6G

As we approach the year 2030, global mobile traffic is anticipated to surge to 670 times the levels that were recorded in 2010. Concurrently, the limitations of 5G become apparent, struggling to cope with network congestion stemming from immersive video services during peak usage periods. The advent of 6G is poised to not only address network congestion but also usher in features such as seamless connectivity, a fully immersive experience, support for massive simultaneous user connections, ultra-low latency, ubiquitous connectivity, and ultra-high capacity and reliability, along with heightened security measures [250].

In the mmWave range of 30–100 GHz, which 5G leverages, achieving the envisaged high speeds is constrained by current transceiver architectures and limitations in digital modulation methods. Challenges include non-linear power amplifiers, phase noise, and suboptimal analog-to-digital converter resolution [251]. Consequently, 6G is anticipated to surpass these limitations by embracing even higher frequencies, incorporating the Terahertz (THz) and visible light spectrum [252]. Designed to fully leverage the frequency spectrum, 6G aims to cover the entire existing spectrum [253]. The following subsections offer an overview of the key frequency bands that are employed in 6G communication.

5.6.1. Terahertz

THz band communications are anticipated to serve as a fundamental technology for 6G and future networks, playing a central role in wireless infrastructure, with the potential to drive numerous promising applications [253]. Occupying the frequency range of 0.1–10 THz in the electromagnetic spectrum, equivalent to wavelengths from 30 μm to 3 mm [254], THz communication offers an extensive bandwidth spanning tens to a hundred GHz and remarkably short wavelengths. This presents significant opportunities to tackle spectrum scarcity and overcome the capacity limitations of 5G networks. This advancement opens avenues for supporting emerging applications with substantial data requirements, including holographic telepresence, ultra-high-speed wireless backhaul, and extended reality [255].

Researchers have made efforts to implement THz technology in vehicular network communication. Lin et al. [256] introduced a method utilizing the Unscented Kalman Filter in THz V2I communication networks, enabling RSUs to track multiple vehicles. They proposed a low-complexity resource allocation method to enhance vehicle speed. Lin et al. [257] focused on enhancing the energy efficiency of wireless networks and presented a channel power gain estimation method using GAN. This was applied in THz V2I communication to determine the optimal THz transmission frequency and power. Li et al. [258] conducted an analysis of THz communication in urban environments for vehicular networks, providing a detailed characterization of features such as path loss and shadow fading.

While the THz frequency range offers numerous advantages, it also introduces novel challenges that are not encountered in frequencies below 6 GHz and mmWave frequencies. Beyond severe spreading loss and high channel sparsity, the THz wavelength is comparable to the size of particles in the atmosphere, leading to exacerbated losses due to even more severe molecular absorption [259]. Additionally, the short wavelength of the THz range makes signals susceptible to blockages, resulting in distinctive transmission characteristics and relatively short transmission distances [255].

In the study by Chaccour et al. [260], three methods were employed to address the aforementioned challenges in THz transmission. The first approach is to leverage large-scale MIMO small base stations for network densification to ensure continuous communication coverage. Nevertheless, the relatively high cost of THz equipment poses a challenge [261]. The deployment of a significant number of THz base stations entails substantial expenses, making large-scale implementation difficult. In the second approach, Cooperative RISs were introduced. RIS is a planar surface comprising numerous quasi-passive and cost-effective reflecting elements. Each element can independently impose phase shifts/amplitudes on incoming electromagnetic signals in a fully customizable manner [262]. Multiple RISs can be employed to establish connections with narrow THz beams, providing uninterrupted communication links to users. Several studies have delved into the application of RIS in wireless communication. Yan et al. [263] explored issues relating to beam split and beamforming design in THz and RIS communication, analyzing the beam split effect based on various RIS shapes, sizes, and deployments. Zarini et al. [264] utilized RIS to assist THz communication systems in mitigating bandwidth shortages. They introduced a resource management algorithm to optimize the reflectivity of RIS elements, base station signal transmission power, and THz resource block allocation. Fu et al. [265] scrutinized the transmission channel model of RIS-assisted THz wireless communication, performing simulations to evaluate the effectiveness of RIS-assisted transmission.

Since the integration of the THz band with lower frequency bands such as sub-6 GHz and mmWave presents an opportunity to overcome the limited transmission distance of THz, Humadi et al. [266] delved into a user-centric dynamic clustering design for a hybrid network incorporating THz, mmWave, and sub-6GHz base stations. Their study highlighted that a significant presence of THz base stations could notably enhance data transmission rates, albeit potentially compromising coverage performance. Thus, thoughtfully designed networks that facilitate user-centric base station cooperation in hybrid wireless systems have the potential to offer both ultra-high transmission rates and ample coverage range. Chukhno et al. [267] explored optimization strategies for the performance of mmWave and THz systems, examining multicast techniques for 5G/6G systems. This investigation encompassed technologies like Reconfigurable RIS, 5G sidelink technology, and mobile edge enhancements. Moltchanov et al. [268] considered diverse environmental conditions in urban environments, incorporating base stations operating at frequencies below 6 GHz, mmWave, and Terahertz. They addressed transmission and deployment challenges that are specific to mmWave and THz. In a hybrid Radio Frequency (RF) and THz relay network named hybrid relay selection, Lou et al. [261] introduced a dual-hop decode-and-forward routing protocol. This protocol prioritizes THz relays for higher data rates or shorter distances between the source and destination. For lower data rates or longer distances between the source and destination, RF relays are utilized.

Extensive research is currently being conducted on incorporating THz technology into vehicle communication, as proposed by Rasheed and Hu [269]. They introduced an inventive approach that incorporates software-defined networking-controlled and cognitive radio-enabled V2X routing to achieve ultra-high data rates. This strategy involves predictive V2X routing, facilitating intelligent switching between mmWave and THz frequencies, resulting in an impressive data rate of 1 Gbps using mmWave within a 300 m range. The THz bands can even reach speeds of 100 Gbps within a 50 m range, offering significant advantages, particularly when a driver is waiting at traffic lights and concurrently downloading data from a nearby THz base station.

5.6.2. Free Space Optical Communication

In addition to its integration with currently deployed sub-6 GHz and mmWave frequency bands, THz can collaborate with 6G’s Free Space Optical Communication (FSOC). FSOC, known for its cost-effectiveness, compactness, lightweight nature, and energy efficiency [270], taps into extensive unlicensed bandwidth, enabling a high data rate of transmission over considerable distances [271]. This characteristic makes it well suited for co-deployment with THz base stations [272]. However, FSOC encounters propagation impairments, where even short-distance channels may suffer from notable signal attenuation due to factors like fog, rain, snow, and turbulence. Since RF is less affected by turbulence, snow, and fog, several scholars have proposed hybrid communication systems integrating FSOC and RF.

Vishwakarma and Swaminathan [273] introduced an adaptive switching system between FSOC and RF in a hybrid setup, comparing its performance with other communication systems. Wu et al. [274] presented a parallel transmission hybrid FSOC/ RF communication system, enabling simultaneous data transmission through FSOC and RF. They employed maximal ratio combining to merge signals from both transmission channels, mitigating channel fading. Sandeep et al. [37] employed roadside infrastructure as a relay node to connect vehicles to base stations, utilizing a hybrid FSOC/RF technology for communication. Lu et al. [275] implemented an FSOC-THz link, achieving a high total data rate of 86.112 Gbps with a bit error rate < 3.8 ×

10^{- 3}

and an error vector magnitude <10%. Li et al. [276] investigated the performance of a hybrid THz/FSOC wireless transmission system. Their system extended the transmission range and facilitated faster and more secure signal transmission. Additionally, as THz and FSOC links utilize separate frequency bands, they can circumvent any form of interference.

FSOC’s limitation lies in requiring an LoS transmission channel between the transmitter and the receiver, restricting it to point-to-point transmission. Non-Line of Sight Free Space Optical Communication (NLoS-FSOC) can be adopted to address this limitation [277]. NLoS-FSOC reflects light to every angle except itself through a diffuse reflector [278], enabling signal transmission beyond point-to-point scenarios. Esubonteng and Rojas-Cessa [279] applied NLoS-FSOC in a vehicle communication scenario and proposed a heuristic algorithm to optimize the tilt angle of the diffuse reflector for enhanced transmission speed.

In the realm of vehicular networks, Niu et al. [280] introduced an application of FSOC, leveraging highly collimated beams to enhance V2X connectivity. Their strategy incorporates a low-rate control link alongside multiple Gbps-assisted FSOC links operating in parallel. Initially, a control link is utilized for the exchange of sensor data relating to vehicle attitude dynamics, enabling FSOC beam tracking. Subsequently, the latter FSOC link establishes an ultra-reliable high data rate connection. The collaborative efforts of local and distributed processing guarantee continuous and precise pointing, ensuring a stable transmission.

5.6.3. Visible Light Communication

The visible light spectrum spans a wide, unregulated frequency range from 400 to 800 THz [281]. With the benefits of license-free operation, visible light communication (VLC) can effectively leverage abundant spectrum resources, notably around 400 THz, showcasing substantial potential for enhancing wireless communication capacity [4]. VLC also demonstrates significantly improved energy efficiency compared to other RF communications [282]. In VLC, Light-Emitting Diodes (LEDs) serve as highly energy-efficient light sources, catering to both communication and illumination needs, while photodiodes function as receivers. Communication through LEDs is recognized as environmentally friendly and can operate under typical lighting conditions without causing radiation harm to humans [283].

Considering the influence of vehicle density on the latency performance of RF technology in vehicular applications, VLC stands out as a leading choice for deploying safety-critical protocols that require rapid communication among vehicles [284]. This is facilitated by the widespread deployment of LED lighting systems on modern automobiles and motorcycles, coupled with the highly directional nature of optical pathways, allowing VLC to encode information into light carriers through intensity modulation schemes [285].

Aly et al. [286] investigated the use of dual photodetectors within vehicle VLC systems, implementing selection combining to improve signal reception in the presence of diverse mobility conditions. They measured the electrically received signal-to-noise ratio (SNR) for each specific photodetector and at the output of the selection combiner on straight and curved roads. Eldeeb et al. [287] addressed VLC LoS limitations using an optical RIS and studied the impact of asymmetric headlight intensity distribution, sunlight, and weather on system performance. Alsalami et al. [288] examined fluctuating inter-vehicle distances and ambient noise levels at different instances to simulate and validate the dynamic V2V-VLC channels. Refas [289] proposed a multi-hop relay system, employing vehicles as wireless relays to maintain LoS transmission channels for visible light. They analyzed the impact of transceiver parameters and the number of relay points on system performance. Memed and Dressler [290] utilized headlight modules with LED array technology to implement spatial division techniques [291]. These techniques enable different lighting functions such as low beam, high beam, and turning illumination. Individual LEDs in the matrix lighting module have separate irradiation modes, allowing for the division of irradiation modes to improve the VLC spectrum’s efficiency and reduce interference. Aly et al. [292] employed VLC to facilitate communication between large vehicles such as trucks, utilizing the low beams of vehicles as transmitters. They proposed a path loss model considering vertical displacement and oscillation effects and studied the optimal position of photodetectors to reduce the error rate performance. Sharda and Bhatnagar [293] modeled the impact of actual outdoor propagation characteristics to present LoS and NLoS V2V-VLC models and analyzed the path loss and data rate of this model.

Given the widespread use of LED lights in contemporary vehicles, they can function as transmitters for VLC. VLC, possessing multiple advantages over RF, is regarded as a superior relay communication technology in the realm of vehicle communication. In instances where the direct link between a vehicle and a base station faces interference, VLC can be applied for data transmission by utilizing other vehicles as relay points [294].

6. Architectural Framework, Open Challenges, and Future Research Directions for Video Transmission in Next-Generation Vehicular Networks

With the evolution of multimedia, people are increasingly looking for immersive experiences, such as volumetric video, 360° videos, and high-resolution content. This trend has created a pressing demand for low latency and high bandwidth. To meet these requirements, video processing and optimization technologies have become crucial. However, as the demand for connected vehicles grows, this trend faces greater challenges. During peak traffic hours, network congestion issues may arise, especially in areas with a heavy traffic flow. Addressing these challenges requires the adoption of advanced networking technologies and transmission strategies to ensure stable video streaming in various scenarios.

The future architecture for video transmission will not only focus on immersive experiences but will also need to provide solutions for complex network environments and the ever-growing demands of connected vehicles. This will depend on innovative technologies in video processing and networking to ensure that users can enjoy high-quality video content anytime, anywhere. Therefore, this section will propose a vehicular network architecture for future video transmission based on the technologies that we introduced in the preceding sections and discuss the challenges that we face.

6.1. Architectural Framework for Video Transmission in Next-Generation Vehicular Networks

As depicted in Figure 2, the architecture for the next generation of vehicular network transmitting of video will employ a distributed computing structure to reduce the computational complexity compared to the traditional centralized control architecture.

In our future video content landscape, we anticipate six main user categories: 2D VoD, 2D Live Video, 360° VoD, 360° Live Video, Volumetric VoD, and Volumetric Live Video. Each category demands unique processing approaches based on its distinctive characteristics.

For 2D Live Video, 360° Live Video, and Volumetric Live Video with high latency sensitivity, our focus will be on optimizing encoding and decoding times. As the foundation, we will employ the AVC encoding standard [102]. Conversely, for 2D VoD, 360° VoD, and Volumetric VoD, where bandwidth conservation is paramount, we will leverage the VVC encoding standard [113], recognized for its superior compression efficiency. Enhancements to these standards will be implemented using LCEVC [130] to streamline the encoding complexity and improve compression efficiency. The video content will be encoded into a BL and multiple ELs, allowing for adaptive transmission based on the user’s available bandwidth.

Volumetric videos manifest in various representation formats, such as point clouds and 2D video synthesis. Given the substantially larger bandwidth requirements of volumetric videos compared to traditional 2D videos, the compression of point clouds and methods for 2D video synthesis have become predominant research directions. The chosen format for volumetric videos depends on their application scenarios. For interactive volumetric videos, such as experimental demonstrations or remote surgeries, point cloud compression in V-PCC [135] is preferred because of its ability to support collision detection with 3D geometric shapes [72]. V-PCC specifically processes the point clouds within the user’s FoV, leading to a substantial reduction in bandwidth requirements during transmission.

Non-interactive volumetric videos, such as live sports events, employ the MIV encoding standard [137], utilizing 2D video synthesis. MIV enables seamless switching between 6DoF and 2D video at any time, allowing for selective transmission of specific angles based on bandwidth allocation strategies, such as focusing on the central area of a sports field to achieve bandwidth savings. Additionally, MIV provides a backward compatibility option, supporting 360° video to adapt to different network or device conditions.

All videos will incorporate VSR technology [161]. Initially, low-resolution versions of the videos and VSR models will be transmitted to the user’s end. Leveraging the computational capabilities within the device, the VSR model will then restore the low-resolution versions of the videos into high resolution, thereby economizing on the bandwidth that is required to transmit high-resolution videos.

Finally, all videos will incorporate ABR technology, dynamically adjusting the video resolution based on the user’s current available bandwidth. This adaptive approach aims to address issues such as video interruptions or insufficient video quality. For 360° VoD, 360° Live Video, Volumetric VoD, and Volumetric Live Video, we will integrate FoV prediction technology [180]. In cases where there is a higher tolerance for latency, such as in 360° VoD and Volumetric VoD, pyramid projection [13] will be applied to videos with minimal FoV changes within a specified time period to reduce the data volume.

For videos with significant FoV changes, including 360° Live Video and Volumetric Live Video, CMP projection [67] will be employed to ensure a smoother viewing experience. Following FoV prediction, the resolution for the tiles that users are most likely to watch will be dynamically adjusted based on the available bandwidth, while the remaining tiles will be transmitted using the lowest resolution.

Acknowledging the inherent inaccuracy in FoV prediction, especially during periods of high user head movement frequency, the implementation of tiling adaptation can refine the video quality. This process involves appropriately extending the size of tiles to alleviate prediction errors that are associated with the FoV, ensuring that any portion of a tile intersecting with the predicted viewport is selected. Importantly, if the prediction error does not cross a tile, it will not impact the tile selection result [184]. This refinement contributes to a more accurate and seamless video viewing experience.

In vehicular communication, the landscape is categorized into V2N, V2I, and V2V. Given the substantial bandwidth requirements for future UHD and immersive videos, a diverse range of high-bandwidth technologies is essential to support this demand. Anticipated utilization includes all communication bands that are supported by 5G and 6G, covering sub-6 GHz, mmWave, THz, and NLoS-FSOC base stations. Initially, servers will be strategically placed adjacent to all base stations to store videos, reducing video transmission latency. The incorporation of THz and NLoS-FSOC for ultra-high transmission rates, combined with the use of sub-6GHz and mmWave for effective communication coverage, aims to establish a vehicular network environment that meets the requirements of both transmission speed and communication range, covering all road segments. Furthermore, leveraging the considerable bandwidth that is provided by THz and NLoS-FSOC allows for the realization of an “information shower” [260], facilitating the rapid provision of substantial bandwidth to users. This enables 2D VoD, 360° VoD, and Volumetric VoD to buffer more video segments in a short time, significantly reducing the likelihood of video interruptions.

Moreover, V2I technology can be employed to enable all autonomous vehicles to engage in information transmission with RSUs along their routes. This integration supports the cloud-based traffic information system in overseeing the route planning of all autonomous vehicles. Additionally, 6G’s VLC technology can be harnessed for V2V communication. In scenarios where the video services for autonomous vehicle users face challenges meeting bandwidth demands, especially in congested road segments, the autonomous vehicle can request assistance from RSUs to allocate the required bandwidth. Through VLC-V2V technology, other autonomous vehicles can collaborate in bandwidth scheduling. Furthermore, by leveraging other vehicles that are traveling on the same route and base stations in uncongested segments, pre-downloading video segments becomes feasible. Upon the two vehicles reaching the same road segment, VLC-V2V technology is employed to transmit the pre-downloaded video segments.

In the realm of network technology, caching techniques [205] can be employed to pre-store video segments that autonomous vehicle users may potentially watch. These segments are strategically stored in servers near the upcoming base stations that the vehicle is about to pass through, effectively reducing the latency of video transmission. Moreover, these video segments can be cached in other autonomous vehicles and transmitted to the target vehicle via VLC-V2V technology. Expanding beyond video segment caching, edge servers can also cache previously computed results from VSR models. This facilitates the handling of computation requests with similar characteristics. Additionally, to achieve swifter responses to user queries, pre-trained models can be cached at the edge to perform resource-intensive inference tasks [295].

Furthermore, if two or more autonomous vehicles are within the communication range of the same base station and are watching the same video content, multicast can be employed to reduce overall bandwidth requirements. Vehicles requesting the same content are divided into multiple groups based on the current network conditions. The video is then broadcast to all vehicles as the basic layer through multicast, while vehicles with better network conditions can receive an enhanced layer for higher video quality. Vehicles with good channel conditions may belong to multiple groups, independently receiving data from different layers, which can later be combined. Through this approach, the “basic”-quality content is multicast to all vehicles, while the “enhanced”-quality content is transmitted only to vehicles with better channel conditions [296]. The system can dynamically adjust groups based on the current network conditions of vehicles in real time to ensure optimal performance. In addition, vehicles with better network conditions can even use V2V communication to multicast the received video to those vehicles that are located at the edge of base station coverage or experiencing weaker signals, thereby improving their reception quality.

6.2. Open Challenges and Future Research Directions

In the preceding subsection, we presented the architecture and scenarios that we designed to facilitate video transmission in next-generation vehicular networks. This architecture seamlessly integrates various emerging technologies with vehicular networks, considering the distinct characteristics of both vehicles and fixed nodes. However, several challenges arise, and numerous issues require further consideration. Consequently, this subsection aims to delve into the challenges that are encountered within this framework and outline potential directions for future research.

6.2.1. Challenges in Vehicular Networks

The inherent challenges in vehicular networks encompass various aspects, including the characteristics of their wireless channels, intricacies of resource management, diverse interfaces, dynamic topology, efficient routing, trajectory design, congestion policy, and critical considerations of security and reliability. Furthermore, the optimization of sensing, control, and communications collectively poses a significant challenge [297].

The distinctive attributes of vehicular networks, characterized by high node mobility and heterogeneity, create hurdles for routing performance in vehicular communications. While cluster-based dual-phase routing methods, leveraging fog computing, offer scalability and flexibility, there remains a demand for more robust routing protocols that are capable of adapting to the dynamic topology and swift node movements in vehicular networks [45].

6.2.2. V2V Communication during High-Speed Vehicle Movement

While several studies advocate for using car lights as signal transmitters for VLC in V2V communication, the escalating speed of vehicle movement introduces challenges. The swift motion may result in unstable communication connections, and in high-velocity scenarios, the probability of successful communication between vehicular network entities is diminished due to the limited transmission range [298]. Similarly, in moderately fast scenarios, vehicular network communication may encounter issues relating to the Doppler effect, frequent link disruptions, and increased end-to-end latency [8]. Kamiya et al. [299] demonstrated the effectiveness of vehicles in receiving visible light communication signals when moving at a speed of 40 km per hour. However, on highways, vehicle speeds far exceed 40 km per hour, presenting challenges that need to be addressed to enable the use of VLC for V2V communication in the future.

6.2.3. RF–Optical Heterogeneous and Hybrid Networks

In realizing the spectrum versatility of the next-generation vehicular network, there is a need for the integration of systems across various frequency bands, encompassing both RF and optical wireless segments to address heterogeneity. Additionally, supporting mixed systems and networks that cover the entire spectrum is a crucial aspect of advancement.

However, bringing RF–optical heterogeneous systems and networks into existence encounters a set of challenges. These challenges include aspects such as mobility management and network switching, the design of transmission network protocols, load balancing, the synchronization of diverse networks, resource allocation, improvement in energy efficiency, spectrum allocation [300], and the coordination of access points and power distribution [301].

On the flip side, given the significant frequency difference between the RF and optical bands, RF–optical heterogeneous networks encounter specific challenges. Firstly, the heterogeneity of access points poses significant challenges due to frequent switching when users are on the move [302]. Integrating optical and RF hardware systems to meet the diverse bandwidth requirements of different transmission mediums becomes challenging for hybrid systems [303]. Secondly, different optical wireless networks often have distinct network selection standards, differing from existing RF communication networks. It is essential to consider how to design optimal network selection strategies [304]. Lastly, the security challenges that are brought about by heterogeneous systems should not be underestimated [253].

6.2.4. Multicasting

Presently, the predominant focus of research lies in addressing challenges relating to traditional wireless network multicast grouping and resource optimization. However, the inherent mobility of nodes in vehicular networks introduces a dynamic and complex network topology, coupled with unpredictable user density. These networks are highly influenced by surrounding environments and intermittent connections, rendering existing solutions insufficient for the evolving landscape of vehicular networks.

Given these challenges, future research on vehicular network multicast must prioritize enhancing adaptability. Innovative solutions should effectively tackle node mobility and the dynamic topology in vehicular environments, accounting for unpredictable variations in user density. Developing more intelligent routing protocols is essential to handling the rapidly changing network topology. Moreover, proposing adaptive multicast mechanisms is crucial to ensuring the efficient utilization of resources.

In addition, considering the dynamic nature of mobile nodes in vehicular networks, exploring predictive technologies, such as machine learning-based prediction models, can better anticipate node behavior and forecast network changes. To summarize, future multicast systems in vehicular networks need to evolve towards greater intelligence, adaptability, and reliability to effectively address the intricate and dynamic nature of vehicular network environments.

6.2.5. Video Compression Efficiency and Encoding Delay

Despite the significant advantages offered by VVC over HEVC, it encounters challenges such as device compatibility issues, longer encoding times, and increased complexity. As highlighted in [305], while AVC continues to dominate in most VoD and live streaming content, there is a discernible uptick in the adoption of VVC. This indicates a likely increase in the prevalence of devices supporting VVC in the coming years.

Moreover, the use of efficient compression algorithms and high-quality encoding standards, while beneficial, results in extended encoding times and heightened video complexity, posing challenges for real-time applications. Although LCEVC can alleviate the complexity of VVC, limitations persist in specific real-time communication and video streaming scenarios. Hence, it is imperative to address the challenges that are associated with VVC’s complexity and encoding time.

6.2.6. 360° Video Streaming Optimization

In the context of future 360° video streaming, the adoption of variable-sized tiles and alternative projection schemes presents a solution to alleviate bandwidth pressure. However, this advantage comes with additional storage and computational costs, demanding careful consideration in system design. When implementing transmission methods, addressing uncertainties or failures in FoV predictions is essential in order to enable users to smoothly transition to the correct view in case of prediction inaccuracies, ensuring a continuous viewing experience. In such situations, the development of efficient and intelligent adaptive algorithms is imperative. These algorithms should account for user heterogeneity, taking into consideration factors such as the device performance, network bandwidth, and viewing preferences. Achieving more precise predictions and adaptive adjustments may involve leveraging machine learning and establishing user models.

Within the dynamic and unstable vehicular network environment, ensuring an excellent video streaming experience under any condition is crucial. Furthermore, for enhanced exploration experiences, prioritizing the seamless transition of the FoV is essential. This involves achieving fluid switches between video content and dynamically adjusting FoV predictions based on user behavior, ensuring natural transitions as users rotate their heads.

6.2.7. Volumetric Video Optimizing

The variations in data formats for volumetric videos are substantial, with each format presenting its own set of advantages and disadvantages. For example, volumetric data types like point clouds and meshes offer a more precise representation of the scene but come with the drawback of increased encoding complexity. Conversely, MVD data opt for reducing processing complexity at the expense of reconstruction quality.

To enhance encoding quality, a proposed strategy involves leveraging point cloud data to capture and encode intricate and essential objects in the scene, while employing MVD data for simpler and less critical elements. This approach capitalizes on the distinctive characteristics of both data types to elevate the overall encoding quality. Consequently, there is a need to explore more efficient and high-quality unified encoding frameworks that can seamlessly handle different data formats and effectively exploit relationships between dissimilar data types.

6.2.8. UAV-Assisted Communication

Due to their extensive coverage capabilities, UAVs can function as aerial radio access points in the 6G-V2X network. UAVs offer various services for vehicle users, including relaying, caching, and computation. Particularly in congested driving environments, UAVs can collaborate with stationary network nodes, such as base stations, to manage wireless networks and enhance the user experience. With their nearly unrestricted 3D movements, UAVs serve as flying agents, enabling a range of unique V2X applications.

Despite advancements in UAV technology, challenges persist in the UAV V2X system domain. Maintaining reliable and fast wireless communication between UAVs and ground vehicles proves challenging due to the dynamic channel characteristics resulting from the movement of both entities. While LoS links are anticipated in UAV–ground vehicle channels, ongoing efforts are directed at measuring and modeling such channels. Other critical challenges include ensuring safety and compliance with regulations, smooth integration with existing networks, and addressing the limited battery life of UAVs [306].

6.2.9. Satellite Network

Due to their extensive coverage capabilities, satellite networks operate seamlessly across challenging terrains, making LEO satellites a valuable asset for aiding computational processes. LEO satellites serve as an extension of and enhancement to ground wireless networks, playing a pivotal role in computing and transmitting resources. For instance, LEO technology can be integrated into the edge servers of ground networks. In a research paper [307], the suggestion is to leverage LEO satellites for directly handling the substantial data that are produced by IoT devices, eliminating dependence on ground-based cloud servers. This approach effectively reduces IoT network traffic, leading to decreased power consumption in dedicated ground servers.

While LEO satellites have the potential to enhance network communication efficiency, they encounter various technical challenges compared to ground network edge computing. These challenges include the expenses that are associated with individual satellites, adverse conditions in space, and the rapid movement of LEO satellites. Moreover, devices connecting to LEO satellites often switch between satellites, resulting in potential delays or failures in data transmission.

6.2.10. Challenges of Blockchain in Vehicular Networks

A smart contract is a specialized protocol that is used for establishing contracts within the blockchain. In the future, these contracts may play a role in facilitating payments within vehicular networks, such as managing charging payments or optimizing available bandwidth in vehicles. This potential integration could bring greater flexibility to resource allocation across the entire vehicular network. However, incorporating blockchain into the IoV presents challenges and limitations.

Firstly, while blockchain may integrate with technologies like MEC and cloud computing in vehicular networks, the execution of smart contracts requires substantial computing resources, potentially leading to performance issues [238]. Secondly, as the number of transactions and the data scale increases, concerns arise about the ability of blockchain systems to efficiently process these data. The expansion of the blockchain requires storing and synchronizing the entire blockchain history, resulting in a significant increase in storage requirements and synchronization time [308]. Therefore, minimizing processing latency is crucial in addressing these challenges.

6.2.11. Redefining QoE

In the realm of multimedia technology, the importance of enhancing the QoE has significantly grown, emerging as a key indicator for technological development and applications. QoE encompasses not only the transmission and delivery of multimedia content but also various aspects of user perception. Similarly, there is a need to redefine users’ QoE when dealing with volumetric video. Achieving this requires interdisciplinary collaboration, establishing connections between network demands and how users perceive visuals. For instance, research indicates that the human eye cannot detect images that are shown for durations under 13 milliseconds [309], setting a maximum threshold for network timing requirements.

Presently, there is a noticeable gap in QoE research that is specifically dedicated to volumetric video. Achieving high-quality volumetric video transmission requires a concerted effort to enhance research on QoE that is specific to this domain. This may entail the development of new encoding standards, transmission protocols, and decoding technologies to ensure audiences receive an exceptional visual experience in this highly immersive video environment. In-depth research in this field will play a crucial role in advancing the future development of multimedia technology, ultimately providing more captivating and engaging video viewing experiences.

6.2.12. Pricing Strategy for Vehicle Computing and Transmission

The rise of edge computing and transmission has significantly improved performance and efficiency across various applications. Nevertheless, challenges persist when integrating this technology into video streaming systems. Firstly, video processing and transmission tasks may escalate computational complexity and energy consumption for computing facilities in vehicles, and users may be hesitant to engage in computations without providing feedback to the systems. Secondly, due to limitations in bandwidth and computing resources, achieving a balanced allocation is crucial [310]. In this context, developing sensible pricing strategies becomes essential to offer appropriate user feedback and strike a balance between energy utilization and cost-effectiveness.

7. Conclusions

As we witness continuous advancements in autonomous driving and multimedia technologies, we are ushering in a new era where multimedia applications dominate vehicular networks. In the forthcoming years, vehicles will be equipped with a diverse array of multimedia applications, with immersive videos taking the lead. However, the widespread adoption of immersive multimedia applications in the vehicular environment brings forth challenges that are posed by vehicular networks and immersive videos, such as node mobility and the need for high bandwidth and low latency. In this dynamic setting, it is imperative to leverage the distinctive characteristics of vehicular networks and implement cutting-edge network technologies to address challenges and ensure the seamless operation of multimedia applications.

This survey is dedicated to examining video streaming within vehicular networks, with a notable focus on advancements in video processing technologies and next-generation network technologies. This survey begins by outlining the essential features of video transmission in vehicular networks and tracing the evolution of vehicular communications. It offers insights into the distinctions and characteristics of different video types.

Following this introduction, this survey thoroughly explores contemporary streaming approaches. It commences with fundamental video coding principles and advances to processing methods and coding standards that are specifically tailored to immersive videos. The discussion encompasses AI-based VSR techniques, bandwidth-adaptive transmission technologies, and immersive video FoV prediction, designed to alleviate computational load and bandwidth requirements. Furthermore, the survey introduces the incorporation of caching and multicasting technologies that are relevant to vehicular networks, aiming to further diminish transmission latency and reduce bandwidth consumption.

Acknowledging the existing research gaps in video streaming within next-generation vehicular networks, this survey outlines various challenges and suggests future research directions. These encompass the optimization of vehicular communication technologies, the development of more efficient adaptive transmission strategies, and the integration and application of emerging technologies. The survey not only contributes to improving the in-car experience for users but also provides guidance for video equipment suppliers and vehicle manufacturers in their development endeavors. Video equipment suppliers are urged to provide equipment that aligns with the latest video processing technologies, while vehicle manufacturers can design corresponding software and hardware by comprehending users’ preferences for in-car video viewing. For example, as highlighted in [51], features like automatic seat movement and glass screen projection can be incorporated during holographic meetings, offering users a more immersive experience. In essence, this survey aims to propel advancements in the field of video transmission within vehicular networks.

Author Contributions

Conceptualization and methodology, C.-J.H.; writing—reviewing and editing, H.-W.C., Y.-H.L. and M.-E.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Science and Technology Council, Taiwan, for financially supporting this research under Contract Number NSTC 112-2221-E-259-006.

Data Availability Statement

Data sharing is not applicable.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Hakak, S.; Gadekallu, T.R.; Maddikunta, P.K.R.; Ramu, S.P.; Parimala, M.; De Alwis, C.; Liyanage, M. Autonomous Vehicles in 5G and beyond: A Survey. Veh. Commun. 2023, 39, 100551. [Google Scholar] [CrossRef]
Xie, J.; Wang, Z.; Chen, Y. Joint Caching and User Association Optimization for Adaptive Bitrate Video Streaming in UAV-Assisted Cellular Networks. IEEE Access 2022, 10, 106275–106285. [Google Scholar] [CrossRef]
Kim, J.; Kim, K.; Kim, W. Impact of immersive virtual reality content using 360-degree videos in undergraduate education. IEEE Trans. Learn. Technol. 2022, 15, 137–149. [Google Scholar] [CrossRef]
Chiariotti, F. A survey on 360-degree video: Coding, quality of experience and streaming. Comput. Commun. 2021, 177, 133–155. [Google Scholar] [CrossRef]
Liu, Z.; Li, Q.; Chen, X.; Wu, C.; Ishihara, S.; Li, J.; Ji, Y. Point cloud video streaming: Challenges and solutions. IEEE Netw. 2021, 35, 202–209. [Google Scholar] [CrossRef]
Zhu, Y.; Huang, Y.; Qiao, X.; Tan, Z.; Bai, B.; Ma, H.; Dustdar, S. A semantic-aware transmission with adaptive control scheme for volumetric video service. IEEE Trans. Multimed. 2022, 25, 7160–7172. [Google Scholar] [CrossRef]
Wong, E.S.; Wahab, N.H.A.; Saeed, F.; Alharbi, N. 360-Degree Video Bandwidth Reduction: Technique and Approaches Comprehensive Review. Appl. Sci. 2022, 12, 7581. [Google Scholar] [CrossRef]
Hussein, N.H.; Yaw, C.T.; Koh, S.P.; Tiong, S.K.; Chong, K.H. A comprehensive survey on vehicular networking: Communications, applications, challenges, and upcoming research directions. IEEE Access 2022, 10, 86127–86180. [Google Scholar] [CrossRef]
Jiang, X.; Yu, F.R.; Song, T.; Leung, V.C. Resource allocation of video streaming over vehicular networks: A survey, some research issues and challenges. IEEE Trans. Intell. Transp. Syst. 2021, 23, 5955–5975. [Google Scholar] [CrossRef]
Ruan, J.; Xie, D. A survey on QoE-oriented VR video streaming: Some research issues and challenges. Electronics 2021, 10, 2155. [Google Scholar] [CrossRef]
Tang, F.; Mao, B.; Kawamoto, Y.; Kato, N. Survey on machine learning for intelligent end-to-end communication toward 6G: From network access, routing to traffic control and streaming adaption. IEEE Commun. Surv. Tutor. 2021, 23, 1578–1598. [Google Scholar] [CrossRef]
Xu, M.; Li, C.; Zhang, S.; Le Callet, P. State-of-the-art in 360 video/image processing: Perception, assessment and compression. IEEE J. Sel. Top. Signal Process. 2020, 14, 5–26. [Google Scholar] [CrossRef]
Yaqoob, A.; Bi, T.; Muntean, G.M. A survey on adaptive 360 video streaming: Solutions, challenges and opportunities. IEEE Commun. Surv. Tutor. 2020, 22, 2801–2838. [Google Scholar] [CrossRef]
van der Hooft, J.; Amirpour, H.; Vega, M.T.; Sanchez, Y.; Schatz, R.; Schierl, T.; Timmerer, C. A Tutorial on Immersive Video Delivery: From Omnidirectional Video to Holography. IEEE Commun. Surv. Tutor. 2023, 25, 1336–1375. [Google Scholar] [CrossRef]
Cai, Y.; Li, X.; Wang, Y.; Wang, R. An overview of panoramic video projection schemes in the IEEE 1857.9 standard for immersive visual content coding. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 6400–6413. [Google Scholar] [CrossRef]
Khan, M.A.; Baccour, E.; Chkirbene, Z.; Erbad, A.; Hamila, R.; Hamdi, M.; Gabbouj, M. A survey on mobile edge computing for video streaming: Opportunities and challenges. IEEE Access 2022, 10, 120514–120550. [Google Scholar] [CrossRef]
Mahmoud, M.; Rizou, S.; Panayides, A.S.; Kantartzis, N.V.; Karagiannidis, G.K.; Lazaridis, P.I.; Zaharis, Z.D. A Survey on Optimizing Mobile Delivery of 360° Videos: Edge Caching and Multicasting. IEEE Access 2023, 11, 68925–68942. [Google Scholar] [CrossRef]
Wang, C.; Shen, J.; Vijayakumar, P.; Gupta, B.B. Attribute-based secure data aggregation for isolated IoT-enabled maritime transportation systems. IEEE Trans. Intell. Transp. Syst. 2021, 24, 2608–2617. [Google Scholar] [CrossRef]
Coll-Perales, B.; Lucas-Estañ, M.C.; Shimizu, T.; Gozalvez, J.; Higuchi, T.; Avedisov, S.; Altintas, O.; Sepulcre, M. End-to-end V2X latency modeling and analysis in 5G networks. IEEE Trans. Veh. Technol. 2022, 72, 5094–5109. [Google Scholar] [CrossRef]
Sehla, K.; Nguyen, T.M.T.; Pujolle, G.; Velloso, P.B. Resource allocation modes in C-V2X: From LTE-V2X to 5G-V2X. IEEE Internet Things J. 2022, 9, 8291–8314. [Google Scholar] [CrossRef]
Chen, S.; Hu, J.; Shi, Y.; Zhao, L.; Li, W. A vision of C-V2X: Technologies, field testing, and challenges with Chinese development. IEEE Internet Things J. 2020, 7, 3872–3881. [Google Scholar] [CrossRef]
Pan, R.; Jie, L.; Zhao, X.; Wang, H.; Yang, J.; Song, J. Active Obstacle Avoidance Trajectory Planning for Vehicles Based on Obstacle Potential Field and MPC in V2P Scenario. Sensors 2023, 23, 3248. [Google Scholar] [CrossRef]
Suleman, D.; Shibl, R.; Ansari, K. Investigation of Data Quality Assurance across IoT Protocol Stack for V2I Interactions. Smart Cities 2023, 6, 2680–2705. [Google Scholar] [CrossRef]
Lopukhova, E.; Abdulnagimov, A.; Voronkov, G.; Kutluyarov, R.; Grakhova, E. Universal Learning Approach of an Intelligent Algorithm for Non-GNSS Assisted Beamsteering in V2I Systems. Information 2023, 14, 86. [Google Scholar] [CrossRef]
Ding, H.; Shin, K.G. Context-aware beam tracking for 5G mmWave V2I communications. IEEE Trans. Mob. Comput. 2021, 22, 3257–3269. [Google Scholar] [CrossRef]
Yan, D.; Li, Z.; Guan, K.; He, D.; Cheng, X.; Kim, J.; Chung, H.; Zhong, Z. Modeling and Analysis of V2I Links for the Handover Situations At Mmwave Band. IEEE Trans. Veh. Technol. 2023, 72, 12450–12463. [Google Scholar] [CrossRef]
Qiong, W.; Shuai, S.; Ziyang, W.; Qiang, F.; Pingyi, F.; Cui, Z. Towards V2I age-aware fairness access: A DQN based intelligent vehicular node training and test method. Chin. J. Electron. 2023, 32, 1230–1244. [Google Scholar] [CrossRef]
Guo, S.; Hu, B.J.; Wen, Q. Joint resource allocation and power control for full-duplex V2I communication in high-density vehicular network. IEEE Trans. Wirel. Commun. 2022, 21, 9497–9508. [Google Scholar] [CrossRef]
Jin, H.; Seo, J.; Park, J.; Kim, S.C. A Deep Reinforcement Learning-based Two-dimensional Resource Allocation Technique for V2I communications. IEEE Access 2023, 11, 78867–78878. [Google Scholar] [CrossRef]
Das, D.; Banerjee, S.; Chatterjee, P.; Ghosh, U.; Biswas, U. A secure blockchain enabled V2V communication system using smart contracts. IEEE Trans. Intell. Transp. Syst. 2022, 24, 4651–4660. [Google Scholar] [CrossRef]
Wang, S.; Chen, G.; Jiang, Y.; You, X. A cluster-based V2V approach for mixed data dissemination in urban scenario of IoVs. IEEE Trans. Veh. Technol. 2022, 72, 2907–2920. [Google Scholar] [CrossRef]
Mollah, M.B.; Wang, H.; Karim, M.A.; Fang, H. mmWave Enabled Connected Autonomous Vehicles: A Use Case with V2V Cooperative Perception. IEEE Netw. 2023. [Google Scholar] [CrossRef]
Jiang, H.; Xiong, B.; Zhang, H.; Basar, E. Hybrid Far-and Near-field Modeling for Reconfigurable Intelligent Surface Assisted V2V Channels: A Sub-Array Partition Based Approach. IEEE Trans. Wirel. Commun. 2023, 22, 8290–8303. [Google Scholar] [CrossRef]
Wang, S.; Zhang, Q.; Chen, G. V2V-CoVAD: A vehicle-to-vehicle cooperative video alert dissemination mechanism for Internet of Vehicles in a highway environment. Veh. Commun. 2022, 33, 100418. [Google Scholar] [CrossRef]
Chowdhury, D.R.; Nandi, S.; Goswami, D. Cost-effective live video streaming for Internet of Connected Vehicles using heterogeneous networks. Ad Hoc Netw. 2024, 153, 103334. [Google Scholar] [CrossRef]
Kanavos, A.; Barmpounakis, S.; Kaloxylos, A. An Adaptive Scheduling Mechanism Optimized for V2N Communications over Future Cellular Networks. Telecom 2023, 4, 378–392. [Google Scholar] [CrossRef]
Sandeep, V.S.V.; Gurjar, D.S.; Yadav, S.; Pattanayak, P.; Jiang, Y. On the Performance Analysis of V2N Mixed RF and Hybrid FSO/RF Communication System. IEEE Photonics J. 2022, 14, 7361114. [Google Scholar] [CrossRef]
Hasegawa, R.; Okamoto, E. Adaptive Transmission Suspension of V2N Uplink Communication Based on In-Advanced Quality of Service Notification. Vehicles 2023, 5, 203–222. [Google Scholar] [CrossRef]
Lucas-Estañ, M.C.; Coll-Perales, B.; Shimizu, T.; Gozalvez, J.; Higuchi, T.; Avedisov, S.; Altintas, O.; Sepulcre, M. An analytical latency model and evaluation of the capacity of 5G NR to support V2X services using V2N2V communications. IEEE Trans. Veh. Technol. 2022, 72, 2293–2306. [Google Scholar] [CrossRef]
He, W.; Guo, C.; Wang, X. Age of information aware resource allocation and packet sampling control in vehicular networks. IEEE Wirel. Commun. Lett. 2022, 11, 2245–2249. [Google Scholar] [CrossRef]
Jang, W.M. The 5G Cellular Downlink V2X Implementation Using V2N With Spatial Modulation. IEEE Access 2022, 10, 129105–129115. [Google Scholar] [CrossRef]
Khalid, S.; Abidin, H.Z.; Mazalan, L.; Abdullah, S.A.C. Optimising Video Transmission Performance in 5G New Radio Technology for Vehicle-to-Network (V2N) Application: A Comprehensive Analysis. In Proceedings of the 2023 11th International Conference on Information and Communication Technology (ICoICT), Melaka, Malaysia, 3–24 August 2023; pp. 487–492. [Google Scholar]
Hajisami, A.; Lansford, J.; Dingankar, A.; Misener, J. A Tutorial on the LTE-V2X Direct Communication. IEEE Open J. Veh. Technol. 2022, 3, 388–398. [Google Scholar] [CrossRef]
Moradi-Pari, E.; Tian, D.; Bahramgiri, M.; Rajab, S.; Bai, S. DSRC versus LTE-V2X: Empirical performance analysis of direct vehicular communication technologies. IEEE Trans. Intell. Transp. Syst. 2023, 24, 4889–4903. [Google Scholar] [CrossRef]
Nurkahfi, G.N.; Triwinarko, A.; Prawara, B.; Armi, N.; Juhana, T.; Syambas, N.R.; Mulyana, E.; Dogheche, E.; Dayoub, I. On SDN to Support The IEEE 802.11 and C-V2X based Vehicular Communications Use-Cases and Performance: A Comprehensive Survey. IEEE Access 2023. [Google Scholar] [CrossRef]
Garcia, M.H.C.; Molina-Galan, A.; Boban, M.; Gozalvez, J.; Coll-Perales, B.; Şahin, T.; Kousaridas, A. A tutorial on 5G NR V2X communications. IEEE Commun. Surv. Tutor. 2021, 23, 1972–2026. [Google Scholar] [CrossRef]
Alalewi, A.; Dayoub, I.; Cherkaoui, S. On 5G-V2X use cases and enabling technologies: A comprehensive survey. IEEE Access 2021, 9, 107710–107737. [Google Scholar] [CrossRef]
Gyawali, S.; Xu, S.; Qian, Y.; Hu, R.Q. Challenges and Solutions for Cellular-Based V2X Communications. IEEE Commun. Surv. Tutor. 2020, 23, 222–255. [Google Scholar] [CrossRef]
Zhang, Z.; Lung, C.H.; St-Hilaire, M.; Lambadaris, I. Smart proactive caching: Empower the video delivery for autonomous vehicles in ICN-based networks. IEEE Trans. Veh. Technol. 2020, 69, 7955–7965. [Google Scholar] [CrossRef]
Chowdhury, D.R.; Nandi, S.; Goswami, D. Distributed Gateway Selection for Video Streaming in VANET Using IP Multicast. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 2022, 18, 1–24. [Google Scholar] [CrossRef]
Yu, Z.; Jin, D.; Song, X.; Zhai, C.; Wang, D. Internet of vehicle empowered mobile media scenarios: In-vehicle infotainment solutions for the mobility as a service (MaaS). Sustainability 2020, 12, 7448. [Google Scholar] [CrossRef]
Faurecia. Faurecia to Collaborate with Microsoft for Digital Services Inside the Cockpit of the Future. Available online: https://www.faurecia.com/en/newsroom/faurecia-and-microsoft-collaborate-digital-services-inside-cockpit-future (accessed on 28 January 2024).
Nayak, S.; Patgiri, R. 6G communication: A vision on the potential applications. In Edge Analytics: Select Proceedings of 26th International Conference—ADCOM 2020; Springer: Singapore, 2020; pp. 203–218. [Google Scholar]
Zhang, X.; Zhong, H.; Cui, J.; Gu, C.; Bolodurina, I.; Liu, L. AC-SDVN: An Access Control Protocol for Video Multicast in Software Defined Vehicular Networks. IEEE Trans. Mob. Comput. 2022, 22, 5657–5674. [Google Scholar] [CrossRef]
Yu, S.; Yi, F.; Qiulin, X.; Liya, S. A framework of 5g mobile-health services for ambulances. In Proceedings of the 2020 IEEE 20th International Conference on Communication Technology (ICCT), Nanning, China, 28–31 October 2020; pp. 528–532. [Google Scholar]
Yu, Y.; Lee, S. Remote driving control with real-time video streaming over wireless networks: Design and evaluation. IEEE Access 2022, 10, 64920–64932. [Google Scholar] [CrossRef]
Charissis, V.; Falah, J.; Lagoo, R.; Alfalah, S.F.; Khan, S.; Wang, S.; Altarteer, S.; Larbi, K.B.; Drikakis, D. Employing emerging technologies to develop and evaluate in-vehicle intelligent systems for driver support: Infotainment AR HUD case study. Appl. Sci. 2021, 11, 1397. [Google Scholar] [CrossRef]
Netflix. Available online: https://www.netflix.com/ (accessed on 6 December 2023).
Youtube. Available online: https://www.youtube.com/ (accessed on 6 December 2023).
Twitch. Available online: https://www.twitch.tv/ (accessed on 6 December 2023).
Facebook Live. Available online: https://www.facebook.com/watch/live/ (accessed on 6 December 2023).
Ma, X.; Li, Q.; Zou, L.; Peng, J.; Zhou, J.; Chai, J.; Jiang, Y.; Muntean, G.M. QAVA: QoE-aware adaptive video bitrate aggregation for HTTP live streaming based on smart edge computing. IEEE Trans. Broadcast. 2022, 68, 661–676. [Google Scholar] [CrossRef]
Taraghi, B.; Hellwagner, H.; Timmerer, C. LLL-CAdViSE: Live Low-Latency Cloud-Based Adaptive Video Streaming Evaluation Framework. IEEE Access 2023, 11, 25723–25734. [Google Scholar] [CrossRef]
Wei, X.; Zhou, M.; Jia, W. Towards low-latency and high-quality adaptive 360-degree streaming. IEEE Trans. Ind. Inform. 2022, 19, 6326–6336. [Google Scholar] [CrossRef]
Chen, C.Y.; Hsieh, H.Y. Cross-Frame Resource Allocation with Context-Aware QoE Estimation for 360° Video Streaming in Wireless Virtual Reality. IEEE Trans. Wirel. Commun. 2023, 22, 7887–7901. [Google Scholar] [CrossRef]
Jiang, H.; Sheng, Z.; Zhu, S.; Dong, Z.; Huang, R. Unifuse: Unidirectional fusion for 360 panorama depth estimation. IEEE Robot. Autom. Lett. 2021, 6, 1519–1526. [Google Scholar] [CrossRef]
Bross, B.; Wang, Y.K.; Ye, Y.; Liu, S.; Chen, J.; Sullivan, G.J.; Ohm, J.R. Overview of the versatile video coding (VVC) standard and its applications. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 3736–3764. [Google Scholar] [CrossRef]
Pi, J.; Zhang, Y.; Zhu, L.; Lin, J.; Ho, Y.S. Texture-Aware Spherical Rotation for High Efficiency Omnidirectional Intra Video Coding. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 8768–8780. [Google Scholar] [CrossRef]
Hussain, I.; Kwon, O.J. Evaluation of 360° Image Projection Formats; Comparing Format Conversion Distortion Using Objective Quality Metrics. J. Imaging 2021, 7, 137. [Google Scholar] [CrossRef]
Xiong, H. Digital Twin Oriented Visual Saliency Analysis on 360-Degree Panoramic Image. In Proceedings of the 2022 IEEE 12th International Conference on Electronics Information and Emergency Communication (ICEIEC), Beijing, China, 15–17 July 2022; pp. 220–223. [Google Scholar]
Jin, Y.; Hu, K.; Liu, J.; Wang, F.; Liu, X. From Capture to Display: A Survey on Volumetric Video. arXiv 2023, arXiv:2309.05658. [Google Scholar]
Vadakital, V.K.M.; Dziembowski, A.; Lafruit, G.; Thudor, F.; Lee, G.; Alface, P.R. The MPEG Immersive Video Standard—Current Status and Future Outlook. IEEE Multimed. 2022, 29, 101–111. [Google Scholar] [CrossRef]
Eisert, P.; Schreer, O.; Feldmann, I.; Hellge, C.; Hilsmann, A. Volumetric video–acquisition, interaction, streaming and rendering. In Immersive Video Technologies; Academic Press: Cambridge, MA, USA, 2023; pp. 289–326. [Google Scholar]
Wang, Y.; Xiao, Y.; Xiong, F.; Jiang, W.; Cao, Z.; Zhou, J.T.; Yuan, J. 3dv: 3d dynamic voxel for action recognition in depth video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 511–520. [Google Scholar]
Mekuria, R.; Blom, K.; Cesar, P. Design, implementation, and evaluation of a point cloud codec for tele-immersive video. IEEE Trans. Circuits Syst. Video Technol. 2016, 27, 828–842. [Google Scholar] [CrossRef]
Bonatto, D.; Fachada, S.; Rogge, S.; Munteanu, A.; Lafruit, G. Real-time depth video-based rendering for 6-DoF HMD navigation and light field displays. IEEE Access 2021, 9, 146868–146887. [Google Scholar] [CrossRef]
Li, J.; Zhang, C.; Liu, Z.; Hong, R.; Hu, H. Optimal volumetric video streaming with hybrid saliency based tiling. IEEE Trans. Multimed. 2022, 25, 2939–2953. [Google Scholar] [CrossRef]
Lee, K.; Yi, J.; Lee, Y.; Choi, S.; Kim, Y.M. GROOT: A real-time streaming system of high-fidelity volumetric videos. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, London, UK, 21–25 September 2020; pp. 1–14. [Google Scholar]
Hu, K.; Chen, Y.; Han, K.; Liu, J.; Yang, H.; Jin, Y.; Li, B.; Wang, F. LiveVV: Human-Centered Live Volumetric Video Streaming System. arXiv 2023, arXiv:2310.08205. [Google Scholar]
Gül, S.; Podborski, D.; Buchholz, T.; Schierl, T.; Hellge, C. Low latency volumetric video edge cloud streaming. arXiv 2020, arXiv:2001.06466. [Google Scholar]
Liu, J.; Zhu, B.; Wang, F.; Jin, Y.; Zhang, W.; Xu, Z.; Cui, S. CaV3: Cache-assisted Viewport Adaptive Volumetric Video Streaming. In Proceedings of the 2023 IEEE Conference Virtual Reality and 3D User Interfaces (VR), Shanghai, China, 25–29 March 2023; pp. 173–183. [Google Scholar]
Khan, H.; Samarakoon, S.; Bennis, M. Enhancing Video Streaming in Vehicular Networks via Resource Slicing. IEEE Trans. Veh. Technol. 2020, 69, 3513–3522. [Google Scholar] [CrossRef]
Spiteri, K.; Urgaonkar, R.; Sitaraman, R.K. BOLA: Near-optimal bitrate adaptation for online videos. IEEE/ACM Trans. Netw. 2020, 28, 1698–1711. [Google Scholar] [CrossRef]
Brunnström, K.; Beker, S.A.; Moor, K.D.; Dooms, A.; Egger, S.; Garcia, M.; Hossfeld, T.; Jumisko-Pyykkö, S.; Keimel, C.; Larabi, M.; et al. Qualinet White Paper on Definitions of Quality of Experience. 2013. Available online: https://hal.archives-ouvertes.fr/hal-00977812/document (accessed on 23 January 2024).
ITU-T Recommendation ITU-T P. 10/g. 100 (11/2017). Vocabulary for Performance, Quality of Service and Quality of Experience. 2017. Available online: https://www.itu.int/rec/T-REC-P.10 (accessed on 24 January 2024).
Saovapakhiran, B.; Naruephiphat, W.; Charnsripinyo, C.; Baydere, S.; Özdemir, S. QoE-driven IoT architecture: A comprehensive review on system and resource management. IEEE Access 2022, 10, 84579–84621. [Google Scholar] [CrossRef]
Gutierrez, J.; Perez, P.; Orduna, M.; Singla, A.; Cortes, C.; Mazumdar, P.; Viola, I.; Brunnström, K.; Battisti, F.; Garcia, N.; et al. Subjective Evaluation of Visual Quality and Simulator Sickness of Short 360o Videos: ITU-T Rec. P.919. IEEE Trans. Multimed. 2021, 24, 3087–3100. [Google Scholar] [CrossRef]
Anwar, M.S.; Wang, J.; Khan, W.; Ullah, A.; Ahmad, S.; Fei, Z. Subjective QoE of 360-degree virtual reality videos and machine learning predictions. IEEE Access 2020, 8, 148084–148099. [Google Scholar] [CrossRef]
Taha, M.; Canovas, A.; Lloret, J.; Ali, A. A QoE adaptive management system for high definition video streaming over wireless networks. Telecommun. Syst. 2021, 77, 63–81. [Google Scholar] [CrossRef]
Rao, R.R.R.; Göring, S.; Raake, A. AVQBits—Adaptive Video Quality Model Based on Bitstream Information for Various Video Applications. IEEE Access 2022, 10, 80321–80351. [Google Scholar]
Liu, X.; An, P.; Meng, C.; Yang, C.; Huang, X. Multiscale WS-SSIM for panoramic video quality assessment. Optoelectron. Imaging Multimed. Technol. VII 2020, 11550, 96–101. [Google Scholar]
Dziembowski, A.; Mieloch, D.; Stankowski, J.; Grzelka, A. IV-PSNR—The objective quality metric for immersive video applications. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 7575–7591. [Google Scholar] [CrossRef]
Zhou, M.; Chen, L.; Wei, X.; Liao, X.; Mao, Q.; Wang, H.; Pu, H.; Luo, J.; Xiang, T.; Fang, B. Perception-Oriented U-Shaped Transformer Network for 360-Degree No-Reference Image Quality Assessment. IEEE Trans. Broadcast. 2023, 69, 396–405. [Google Scholar] [CrossRef]
Cha, E.Y.; Jalil Piran, M.; Suh, D.Y. A Gaze-based Real-time and Low Complexity No-reference Video Quality Assessment Technique for Video Gaming. Multimed. Tools Appl. 2023, 1–20. [Google Scholar] [CrossRef]
Zhu, H.; Li, T.; Wang, C.; Jin, W.; Murali, S.; Xiao, M.; Ye, D.; Li, M. EyeQoE: A novel QoE assessment model for 360-degree videos using ocular behaviors. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2022, 6, 1–26. [Google Scholar] [CrossRef]
Kougioumtzidis, G.; Poulkov, V.; Zaharis, Z.D.; Lazaridis, P.I. A survey on multimedia services QoE assessment and machine learning-based prediction. IEEE Access 2022, 10, 19507–19538. [Google Scholar] [CrossRef]
Miranda, G.; Macedo, D.F.; Marquez-Barja, J.M. Estimating video on demand QoE from network QoS through ICMP probes. IEEE Trans. Netw. Serv. Manag. 2021, 19, 1890–1902. [Google Scholar] [CrossRef]
Dinaki, H.E.; Shirmohammadi, S.; Janulewicz, E.; Côté, D. Forecasting video QoE with deep learning from multivariate time-series. IEEE Open J. Signal Process. 2021, 2, 512–521. [Google Scholar] [CrossRef]
Sultan, M.T.; El Sayed, H. QoE-Aware Analysis and Management of Multimedia Services in 5G and Beyond Heterogeneous Networks. IEEE Access 2023, 11, 77679–77688. [Google Scholar] [CrossRef]
Song, C.; Xu, W.; Wu, T.; Yu, S.; Zeng, P.; Zhang, N. QoE-driven edge caching in vehicle networks based on deep reinforcement learning. IEEE Trans. Veh. Technol. 2021, 70, 5286–5295. [Google Scholar] [CrossRef]
Benmir, A.; Korichi, A.; Bourouis, A.; Alreshoodi, M.; Al-Jobouri, L. GeoQoE-Vanet: QoE-aware geographic routing protocol for video streaming over vehicular ad-hoc networks. Computers 2020, 9, 45. [Google Scholar] [CrossRef]
Ivanov, Y.V.; Moloney, D. Reference frame compression using embedded reconstruction patterns for H. 264/AVC decoder. In Proceedings of the 2008 the Third International Conference on Digital Telecommunications (ICDT 2008), Bucharest, Romania, 5–29 June 2008; pp. 168–173. [Google Scholar]
Kuo, T.Y.; Lu, H.J. Efficient reference frame selector for H. 264. IEEE Trans. Circuits Syst. Video Technol. 2008, 18, 400–405. [Google Scholar]
Xie, H.; Boukerche, A.; Loureiro, A.A. MERVS: A novel multichannel error recovery video streaming protocol for vehicle ad hoc networks. IEEE Trans. Veh. Technol. 2015, 65, 923–935. [Google Scholar] [CrossRef]
Sullivan, G.J.; Ohm, J.R.; Han, W.J.; Wiegand, T. Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 2012, 22, 1649–1668. [Google Scholar] [CrossRef]
Strukov, R.; Athitsos, V. Evaluation of Video Compression Methods for Network Transmission on Diverse Data: A Case Study. In Proceedings of the 16th International Conference on Pervasive Technologies Related to Assistive Environments, Corfu, Greece, 5–7 July 2023; pp. 300–305. [Google Scholar]
Xu, M.; Li, T.; Wang, Z.; Deng, X.; Yang, R.; Guan, Z. Reducing complexity of HEVC: A deep learning approach. IEEE Trans. Image Process. 2018, 27, 5044–5059. [Google Scholar] [CrossRef] [PubMed]
Jiménez-Moreno, A.; Martínez-Enríquez, E.; Díaz-de-María, F. Complexity control based on a fast coding unit decision method in the HEVC video coding standard. IEEE Trans. Multimed. 2016, 18, 563–575. [Google Scholar] [CrossRef]
Deng, X.; Xu, M.; Li, C. Hierarchical complexity control of HEVC for live video encoding. IEEE Access 2016, 4, 7014–7027. [Google Scholar] [CrossRef]
Chan, P.H.; Huggett, A.; Souvalioti, G.; Jennings, P.; Donzella, V. Influence of AVC and HEVC compression on detection of vehicles through Faster R-CNN. IEEE Trans. Intell. Transp. Syst. 2023, 25, 203–213. [Google Scholar] [CrossRef]
Labiod, M.A.; Gharbi, M.; Coudoux, F.X.; Corlay, P.; Doghmane, N. Enhanced adaptive cross-layer scheme for low latency HEVC streaming over Vehicular Ad-hoc Networks (VANETs). Veh. Commun. 2019, 15, 28–39. [Google Scholar] [CrossRef]
Jiang, X.; Feng, J.; Song, T.; Katayama, T. Low-complexity and hardware-friendly H. 265/HEVC encoder for vehicular ad-hoc networks. Sensors 2019, 19, 1927. [Google Scholar] [CrossRef]
Bross, B.; Chen, J.; Ohm, J.R.; Sullivan, G.J.; Wang, Y.K. Developments in international video coding standardization after avc, with an overview of versatile video coding (vvc). Proc. IEEE 2021, 109, 1463–1493. [Google Scholar] [CrossRef]
Saldanha, M.; Sanchez, G.; Marcon, C.; Agostini, L. Configurable fast block partitioning for VVC intra coding using light gradient boosting machine. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 3947–3960. [Google Scholar] [CrossRef]
Tissier, A.; Hamidouche, W.; Mdalsi, S.B.D.; Vanne, J.; Galpin, F.; Menard, D. Machine learning based efficient QT-MTT partitioning scheme for VVC intra encoders. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 4279–4293. [Google Scholar] [CrossRef]
Jiang, X.; Li, W.; Song, T. Low-complexity enhancement VVC encoder for vehicular networks. EURASIP J. Adv. Signal Process. 2023, 2023, 122. [Google Scholar] [CrossRef]
Choi, K. A Study on Fast and Low-Complexity Algorithms for Versatile Video Coding. Sensors 2022, 22, 8990. [Google Scholar] [CrossRef]
Wang, D.; Chen, L.; Lu, X.; Dufaux, F.; Li, W.; Zhu, C. Fast Learning-Based Split Type Prediction Algorithm for VVC. In Proceedings of the 2023 IEEE International Conference on Image Processing (ICIP), Kuala Lumpur, Malaysia, 8–11 October 2023; pp. 2315–2319. [Google Scholar]
Bossen, F.; Sühring, K.; Wieckowski, A.; Liu, S. VVC Complexity and Software Implementation Analysis. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 3765–3778. [Google Scholar] [CrossRef]
Wieckowski, A.; Brandenburg, J.; Hinz, T.; Bartnik, C.; George, V.; Hege, G.; Helmrich, C.; Henkel, A.; Lehmann, C.; Stoffers, C.; et al. VVenC: An open and optimized VVC encoder implementation. In Proceedings of the 2021 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), Shenzhen, China, 5–9 July 2021; pp. 1–2. [Google Scholar]
Brandenburg, J.; Wieckowski, A.; Hinz, T.; Henkel, A.; George, V.; Zupancic, I.; Stoffers, C.; Bross, B.; Schwarz, H.; Marpe, D. Towards fast and efficient VVC encoding. In Proceedings of the 2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP), Tampere, Finland, 21–24 September 2020; pp. 1–6. [Google Scholar]
Wieckowski, A.; Brandenburg, J.; Bross, B.; Marpe, D. VVC search space analysis including an open, optimized implementation. IEEE Trans. Consum. Electron. 2022, 68, 127–138. [Google Scholar] [CrossRef]
Jialu, C.H.U.; Qiang, L.I. Fast CU Partition Algorithm for VVC Inter Coding. J. Comput. Eng. Appl. 2022, 58, 249. [Google Scholar]
Li, T.; Xu, M.; Tang, R.; Chen, Y.; Xing, Q. DeepQTMT: A deep learning approach for fast QTMT-based CU partition of intra-mode VVC. IEEE Trans. Image Process. 2021, 30, 5377–5390. [Google Scholar] [CrossRef]
Nguyen, T.; Marpe, D. Compression efficiency analysis of AV1, VVC, and HEVC for random access applications. APSIPA Trans. Signal Inf. Process. 2021, 10, e11. [Google Scholar] [CrossRef]
Petreski, D.; Kartalov, T. Next Generation Video Compression Standards–Performance Overview. In Proceedings of the 2023 30th International Conference on Systems, Signals and Image Processing (IWSSIP), Ohrid, North Macedonia, 27–29 June 2023; pp. 1–5. [Google Scholar]
Le Tanou, J.; Blestel, M. Analysis of Emerging Video Codecs: Coding Tools, Compression Efficiency. SMPTE Motion Imaging J. 2019, 128, 14–24. [Google Scholar] [CrossRef]
Zhang, F.; Katsenou, A.V.; Afonso, M.; Dimitrov, G.; Bull, D.R. Comparing VVC, HEVC and AV1 using objective and subjective assessments. arXiv 2020, arXiv:2003.10282. [Google Scholar]
Bonnineau, C.; Hamidouche, W.; Fournier, J.; Sidaty, N.; Travers, J.F.; Déforges, O. Perceptual quality assessment of HEVC and VVC standards for 8K video. IEEE Trans. Broadcast. 2022, 68, 246–253. [Google Scholar] [CrossRef]
Meardi, G.; Ferrara, S.; Ciccarelli, L.; Cobianchi, G.; Poularakis, S.; Maurer, F.; Battista, S.; Byagowi, A. MPEG-5 part 2: Low complexity enhancement video coding (LCEVC): Overview and performance evaluation. Appl. Digit. Image Process. XLIII 2020, 11510, 238–257. [Google Scholar]
Ferrara, S.; Ciccarelli, L.; Moreno, A.J.; Zhao, S.; Joshi, Y.; Meardi, G.; Battista, S. The Next Frontier For MPEG-5 LCEVC: From HDR and Immersive Video to the Metaverse. IEEE MultiMedia 2022, 29, 111–122. [Google Scholar] [CrossRef]
Battista, S.; Meardi, G.; Ferrara, S.; Ciccarelli, L.; Maurer, F.; Conti, M.; Orcioni, S. Overview of the low complexity enhancement video coding (LCEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 7983–7995. [Google Scholar] [CrossRef]
Barman, N.; Schmidt, S.; Zadtootaghaj, S.; Martini, M.G. Codec Compression Efficiency Evaluation of MPEG-5 part 2 (LCEVC) using Objective and Subjective Quality Assessment. arXiv 2022, arXiv:2204.05580. [Google Scholar]
Ciccarelli, L.; Ferrara, S.; Maurer, F. MPEG-5 LCEVC for 3.0 next generation digital TV in Brazil. Front. Signal Process. 2022, 2, 884254. [Google Scholar] [CrossRef]
Graziosi, D.; Nakagami, O.; Kuma, S.; Zaghetto, A.; Suzuki, T.; Tabatabai, A. An overview of on-going point cloud compression standardization activities: Video-based (V-PCC) and geometry-based (G-PCC). APSIPA Trans. Signal Inf. Process. 2020, 9, e13. [Google Scholar] [CrossRef]
Ilola, L.; Kondrad, L.; Schwarz, S.; Hamza, A. An Overview of the MPEG Standard for Storage and Transport of Visual Volumetric Video-Based Coding. Front. Signal Process. 2022, 2, 883943. [Google Scholar] [CrossRef]
Garus, P.; Milovanović, M.; Jung, J.; Cagnazzo, M. MPEG Immersive Video. In Immersive Video Technologies, 1st ed.; Valenzise, G., Alain, M., Zerman, E., Ozcinar, C., Eds.; Academic Press: Cambridge, MA, USA, 2023; pp. 327–356. [Google Scholar]
Cao, K.; Cosman, P. Denoising and inpainting for point clouds compressed by V-PCC. IEEE Access 2021, 9, 107688–107700. [Google Scholar] [CrossRef]
Guede, C.; Andrivon, P.; Marvie, J.E.; Ricard, J.; Redmann, B.; Chevet, J.C. V-pcc performance evaluation of the first mpeg point codec. SMPTE Motion Imaging J. 2021, 130, 36–52. [Google Scholar] [CrossRef]
Gao, P.; Zhang, L.; Lei, L.; Xiang, W. Point Cloud Compression Based on Joint Optimization of Graph Transform and Entropy Coding for Efficient Data Broadcasting. IEEE Trans. Broadcast. 2023, 69, 727–739. [Google Scholar] [CrossRef]
Hussain, T.; Muhammad, K.; Ding, W.; Lloret, J.; Baik, S.W.; de Albuquerque, V.H.C. A comprehensive survey of multi-view video summarization. Pattern Recognit. 2021, 109, 107567. [Google Scholar] [CrossRef]
Park, C.S. Edge-based intramode selection for depth-map coding in 3D-HEVC. IEEE Trans. Image Process. 2014, 24, 155–162. [Google Scholar] [CrossRef]
Mora, E.G.; Jung, J.; Cagnazzo, M.; Pesquet-Popescu, B. Initialization, limitation, and predictive coding of the depth and texture quadtree in 3D-HEVC. IEEE Trans. Circuits Syst. Video Technol. 2013, 24, 1554–1565. [Google Scholar] [CrossRef]
Shen, L.; An, P.; Zhang, Z.; Hu, Q.; Chen, Z. A 3D-HEVC fast mode decision algorithm for real-time applications. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 2015, 11, 1–23. [Google Scholar] [CrossRef]
Khan, S.N.; Khan, K.; Muhammad, N.; Mahmood, Z. Efficient prediction mode decisions for low complexity MV-HEVC. IEEE Access 2021, 9, 150234–150251. [Google Scholar] [CrossRef]
Jeong, J.B.; Lee, S.; Ryu, E.S. VVC subpicture-based frame packing for MPEG immersive video. IEEE Access 2022, 10, 103781–103792. [Google Scholar] [CrossRef]
Chao, G.; Sun, S.; Bi, J. A survey on multiview clustering. IEEE Trans. Artif. Intell. 2021, 2, 146–168. [Google Scholar] [CrossRef] [PubMed]
Mieloch, D.; Dziembowski, A.; Domański, M. Depth map refinement for immersive video. IEEE Access 2021, 9, 10778–10788. [Google Scholar] [CrossRef]
Park, D.; Lim, S.G.; Oh, K.J.; Lee, G.; Kim, J.G. Nonlinear depth quantization using piecewise linear scaling for immersive video coding. IEEE Access 2022, 10, 4483–4494. [Google Scholar] [CrossRef]
Lee, S.; Jeong, J.B.; Ryu, E.S. Group-Based Adaptive Rendering System for 6DoF Immersive Video Streaming. IEEE Access 2022, 10, 102691–102700. [Google Scholar] [CrossRef]
Lepcha, D.C.; Goyal, B.; Dogra, A.; Goyal, V. Image super-resolution: A comprehensive review, recent trends, challenges and applications. Inf. Fusion 2023, 91, 230–260. [Google Scholar] [CrossRef]
Liu, H.; Ruan, Z.; Zhao, P.; Dong, C.; Shang, F.; Liu, Y.; Timofte, R. Video super-resolution based on deep learning: A comprehensive survey. Artif. Intell. Rev. 2022, 55, 5981–6035. [Google Scholar] [CrossRef]
Zhang, H.; Cao, Y.; Cai, J.; Cai, X.; Zhang, W. Dual feature enhanced video super-resolution network based on low-light scenarios. Signal Process. Image Commun. 2023, 115, 116984. [Google Scholar] [CrossRef]
Lai, Q.; Nie, Y.; Sun, H.; Xu, Q.; Zhang, Z.; Xiao, M. Video super-resolution via pre-frame constrained and deep-feature enhanced sparse reconstruction. Pattern Recognit. 2020, 100, 107139. [Google Scholar] [CrossRef]
Haris, M.; Shakhnarovich, G.; Ukita, N. Recurrent back-projection network for video super-resolution. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 3897–3906. [Google Scholar]
Sun, W.; Gong, D.; Shi, J.Q.; van den Hengel, A.; Zhang, Y. Video super-resolution via mixed spatial-temporal convolution and selective fusion. Pattern Recognit. 2022, 126, 108577. [Google Scholar] [CrossRef]
Wang, Z.; Chen, J.; Hoi, S.C. Deep learning for image super-resolution: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 3365–3387. [Google Scholar] [CrossRef]
Zhang, W.; Li, H.; Li, Y.; Liu, H.; Chen, Y.; Ding, X. Application of deep learning algorithms in geotechnical engineering: A short critical review. Artif. Intell. Rev. 2021, 54, 5633–5673. [Google Scholar] [CrossRef]
Liu, C.; Gang, R.; Li, J.; Fang, J.; Yu, H. An Overview of Video Super-Resolution Algorithms. In Proceedings of the Journal of Physics: Conference Series, Beijing, China, 29–31 July 2021; Volume 2025, p. 012051. [Google Scholar]
Wen, W.; Ren, W.; Shi, Y.; Nie, Y.; Zhang, J.; Cao, X. Video super-resolution via a spatio-temporal alignment network. IEEE Trans. Image Process. 2022, 31, 1761–1773. [Google Scholar] [CrossRef] [PubMed]
Wang, W.; Liu, Z.; Lu, H.; Lan, R.; Zhang, Z. Real-Time Video Super-Resolution with Spatio-Temporal Modeling and Redundancy-Aware Inference. Sensors 2023, 23, 7880. [Google Scholar] [CrossRef] [PubMed]
Li, Q.; Chen, Y.; Zhang, A.; Jiang, Y.; Zou, L.; Xu, Z.; Muntean, G.M. A Super-Resolution Flexible Video Coding Solution for Improving Live Streaming Quality. IEEE Trans. Multimed. 2022, 25, 6341–6355. [Google Scholar] [CrossRef]
Baniya, A.A.; Lee, T.K.; Eklund, P.W.; Aryal, S. Omnidirectional Video Super-Resolution using Deep Learning. IEEE Trans. Multimed. 2023, 26, 540–554. [Google Scholar] [CrossRef]
Deng, X.; Wang, H.; Xu, M.; Li, L.; Wang, Z. Omnidirectional image super-resolution via latitude adaptive network. IEEE Trans. Multimed. 2022, 25, 4108–4120. [Google Scholar] [CrossRef]
Luo, Z.; Chai, B.; Wang, Z.; Hu, M.; Wu, D. Masked360: Enabling Robust 360-Degree Video Streaming with Ultra Low Bandwidth Consumption. IEEE Trans. Vis. Comput. Graph. 2023, 29, 2690–2699. [Google Scholar] [CrossRef]
Taraghi, B.; Nguyen, M.; Amirpour, H.; Timmerer, C. Intense: In-depth studies on stall events and quality switches and their impact on the quality of experience in HTTP adaptive streaming. IEEE Access 2021, 9, 118087–118098. [Google Scholar] [CrossRef]
Nguyen, M.; Lorenzi, D.; Tashtarian, F.; Hellwagner, H.; Timmerer, C. DoFP+: An HTTP/3-Based Adaptive Bitrate Approach Using Retransmission Techniques. IEEE Access 2022, 10, 109565–109579. [Google Scholar] [CrossRef]
Wang, S.; Bi, S.; Zhang, Y.J.A. Adaptive wireless video streaming: Joint transcoding and transmission resource allocation. IEEE Trans. Wirel. Commun. 2021, 21, 3208–3221. [Google Scholar] [CrossRef]
Yu, J.; Wen, H.; Pan, G.; Zhang, S.; Chen, X.; Xu, S. Quality of experience oriented adaptive video streaming for edge assisted cellular networks. IEEE Wirel. Commun. Lett. 2022, 11, 2305–2309. [Google Scholar] [CrossRef]
Cheng, S.; Hu, H.; Zhang, X. ABRF: Adaptive BitRate-FEC Joint Control for Real-Time Video Streaming. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 5212–5226. [Google Scholar] [CrossRef]
Li, W.; Huang, J.; Wang, S.; Wu, C.; Liu, S.; Wang, J. An apprenticeship learning approach for adaptive video streaming based on chunk quality and user preference. IEEE Trans. Multimed. 2022, 25, 2488–2502. [Google Scholar] [CrossRef]
Wang, S.; Bi, S.; Zhang, Y.J.A. Deep reinforcement learning with communication transformer for adaptive live streaming in wireless edge networks. IEEE J. Sel. Areas Commun. 2021, 40, 308–322. [Google Scholar] [CrossRef]
Li, W.; Li, X.; Xu, Y.; Yang, Y.; Lu, S. MetaABR: A Meta-Learning Approach on Adaptative Bitrate Selection for Video Streaming. Appear IEEE Trans. Mob. Comput. 2023. [Google Scholar] [CrossRef]
Li, Y.; Chen, H.; Xu, B.; Zhang, Z.; Ma, Z. Improving Adaptive Real-Time Video Communication Via Cross-layer Optimization. arXiv 2023, arXiv:2304.03505. [Google Scholar] [CrossRef]
Yaqoob, A.; Togou, M.A.; Muntean, G.M. Dynamic viewport selection-based prioritized bitrate adaptation for tile-based 360° video streaming. IEEE Access 2022, 10, 29377–29392. [Google Scholar] [CrossRef]
Pang, Z. VATP360: Viewport Adaptive 360-Degree Video Streaming based on Tile Priority. arXiv 2023, arXiv:2307.15984. [Google Scholar]
Zeynali, A.; Hajiesmaili, M.; Sitaraman, R. K BOLA360: Near-optimal View and Bitrate Adaptation for 360-degree Video Streaming. arXiv 2023, arXiv:2309.04023. [Google Scholar]
Dong, P.; Shen, R.; Xie, X.; Li, Y.; Zuo, Y.; Zhang, L. Predicting Long-term Field of View in 360-degree Video Streaming. IEEE Netw. 2022, 37, 26–33. [Google Scholar] [CrossRef]
Nguyen, H.; Dao, T.N.; Pham, N.S.; Dang, T.L.; Nguyen, T.D.; Truong, T.H. An Accurate Viewport Estimation Method for 360 Video Streaming using Deep Learning. EAI Endorsed Trans. Ind. Netw. Intell. Syst. 2022, 9, e2. [Google Scholar] [CrossRef]
Li, J.; Han, L.; Zhang, C.; Li, Q.; Liu, Z. Spherical convolution empowered viewport prediction in 360 video multicast with limited FoV feedback. ACM Trans. Multimed. Comput. Commun. Appl. 2023, 19, 1–23. [Google Scholar] [CrossRef]
Chen, X.; Wang, M.; Xu, C.; Zhao, Y.; Shujie, Y.; Jiang, K.; Li, Q.; Zhong, L.; Muntean, G.M. FedLive: A Federated Transmission Framework for Panoramic Livecast with Reinforced Variational Inference. IEEE Trans. Multimed. 2023, 25, 8471–8486. [Google Scholar] [CrossRef]
Peng, S.; Hu, J.; Xiao, H.; Yang, S.; Xu, C. Viewport-Driven Adaptive 360° Live Streaming Optimization Framework. J. Netw. Netw. Appl. 2022, 1, 139–149. [Google Scholar] [CrossRef]
Sun, L.; Mao, Y.; Zong, T.; Liu, Y.; Wang, Y. Live 360° Video Delivery based on User Collaboration in a Streaming Flock. IEEE Trans. Multimed. 2022, 25, 2636–2647. [Google Scholar] [CrossRef]
Zhang, L.; Suo, Y.; Wu, X.; Wang, F.; Chen, Y.; Cui, L.; Liu, J.; Ming, Z. TBRA: Tiling and bitrate adaptation for mobile 360-degree video streaming. In Proceedings of the 29th ACM International Conference on Multimedia, New York, NY, USA, 20–24 October 2021; pp. 4007–4015. [Google Scholar]
Li, Y.; Dou, C.; Wu, Y.; Jia, W.; Lu, R. NOMA Assisted Two-Tier VR Content Transmission: A Tile-based Approach for QoE Optimization. Appear IEEE Trans. Mob. Comput. 2023. [Google Scholar] [CrossRef]
Gao, W.; Li, C.; Lv, H.; Dai, W.; Zou, J.; Xiong, H.; Pan, X.; Wang, H. Optimal Tile-Based Encoding for 360-Degree Video Streaming. In Proceedings of the 2022 Picture Coding Symposium (PCS), San Jose, CA, USA, 7–9 December 2022; pp. 295–299. [Google Scholar]
Kan, N.; Zou, J.; Li, C.; Dai, W.; Xiong, H. RAPT360: Reinforcement learning-based rate adaptation for 360-degree video streaming with adaptive prediction and tiling. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 1607–1623. [Google Scholar] [CrossRef]
Carreira, J.; de Faria, S.M.; Tavora, L.M.; Navarro, A.; Assuncao, P.A. 360° Video Coding using Adaptive Tile Partitioning. In Proceedings of the 2021 Telecoms Conference (ConfTELE), Leiria, Portugal, 11–12 February 2021; pp. 1–6. [Google Scholar]
Li, Z.; Wang, Y.; Liu, Y. SAD360: Spherical Viewport-Aware Dynamic Tiling for 360-Degree Video Streaming. In Proceedings of the 2022 IEEE International Conference on Visual Communications and Image Processing (VCIP), Suzhou, China, 13–16 December 2022; pp. 1–5. [Google Scholar]
Chen, J.; Luo, Z.; Wang, Z.; Hu, M.; Wu, D. Live360: Viewport-Aware Transmission Optimization in Live 360-Degree Video Streaming. IEEE Trans. Broadcast. 2023, 69, 85–96. [Google Scholar] [CrossRef]
Wang, Z.; Luo, Z.; Hu, M.; Chen, M.; Wu, D. Vaser: Optimizing 360-Degree Live Video Ingest via Viewport-Aware Neural Enhancement. IEEE Trans. Broadcast. 2023, 69, 927–940. [Google Scholar] [CrossRef]
Zhang, Y.; Wang, Z.; Liu, J.; Du, H.; Zheng, Q.; Zhang, W. Deep Reinforcement Learning Based Adaptive 360-degree Video Streaming with Field of View Joint Prediction. In Proceedings of the 2022 IEEE Symposium on Computers and Communications (ISCC), Rhodes, Greece, 30 June–3 July 2022; pp. 1–8. [Google Scholar]
Yun, W.J.; Kwon, D.; Choi, M.; Kim, J.; Caire, G.; Molisch, A.F. Quality-Aware Deep Reinforcement Learning for Streaming in Infrastructure-Assisted Connected Vehicles. IEEE Trans. Veh. Technol. 2021, 71, 2002–2017. [Google Scholar] [CrossRef]
Han, Y.; Aldaif, A.A.; Yuan, H.; Zhong, Y.; Zheng, Y.; Liao, Y.; Li, Q. QoE-aware 360-degree Video Streaming for Autonomous Vehicles. In Proceedings of the 2023 IEEE 97th Vehicular Technology Conference (VTC2023-Spring), Florence, Italy, 20–23 June 2023; pp. 1–5. [Google Scholar]
Fu, F.; Kang, Y.; Zhang, Z.; Yu, F.R.; Wu, T. Soft actor–critic DRL for live transcoding and streaming in vehicular fog-computing-enabled IoV. IEEE Internet Things J. 2020, 8, 1308–1321. [Google Scholar] [CrossRef]
Dai, P.; Song, F.; Liu, K.; Dai, Y.; Zhou, P.; Guo, S. Edge intelligence for adaptive multimedia streaming in heterogeneous internet of vehicles. IEEE Trans. Mob. Comput. 2021, 22, 1464–1478. [Google Scholar] [CrossRef]
Tuysuz, M.F.; Aydin, M.E. QoE-based mobility-aware collaborative video streaming on the edge of 5G. IEEE Trans. Ind. Inform. 2020, 16, 7115–7125. [Google Scholar] [CrossRef]
Khan, B.S.; Jangsher, S.; Ahmed, A.; Al-Dweik, A. URLLC and eMBB in 5G industrial IoT: A survey. IEEE Open J. Commun. Soc. 2022, 3, 1134–1163. [Google Scholar] [CrossRef]
Zhou, W.; Xia, J.; Zhou, F.; Fan, L.; Lei, X.; Nallanathan, A.; Karagiannidis, G.K. Profit maximization for cache-enabled vehicular mobile edge computing networks. IEEE Trans. Veh. Technol. 2023, 72, 13793–13798. [Google Scholar] [CrossRef]
He, Q.; Wang, C.; Cui, G.; Li, B.; Zhou, R.; Zhou, Q.; Xiang, Y.; Jin, H.; Yang, Y. A game-theoretical approach for mitigating edge DDoS attack. IEEE Trans. Dependable Secur. Comput. 2021, 19, 2333–2348. [Google Scholar] [CrossRef]
Zhou, J.; Chen, F.; He, Q.; Xia, X.; Wang, R.; Xiang, Y. Data Caching Optimization With Fairness in Mobile Edge Computing. IEEE Trans. Serv. Comput. 2022, 16, 1750–1762. [Google Scholar] [CrossRef]
Fu, Y.; Liu, J.; Ke, J.; Chui, J.K.T.; Hung, K.K.F. Optimal and Suboptimal Dynamic Cache Update Algorithms for Wireless Cellular Networks. IEEE Wirel. Commun. Lett. 2022, 11, 2610–2614. [Google Scholar] [CrossRef]
Sheraz, M.; Ahmed, M.; Hou, X.; Li, Y.; Jin, D.; Han, Z.; Jiang, T. Artificial intelligence for wireless caching: Schemes, performance, and challenges. IEEE Commun. Surv. Tutor. 2020, 23, 631–661. [Google Scholar] [CrossRef]
Tang, S.; He, K.; Chen, L.; Fan, L.; Lei, X.; Hu, R.Q. Collaborative cache-aided relaying networks: Performance evaluation and system optimization. IEEE J. Sel. Areas Commun. 2023, 41, 706–719. [Google Scholar] [CrossRef]
Wang, Q.; Grace, D. Proactive edge caching in vehicular networks: An online bandit learning approach. IEEE Access 2022, 10, 131246–131263. [Google Scholar] [CrossRef]
Wu, Q.; Zhao, Y.; Fan, Q.; Fan, P.; Wang, J.; Zhang, C. Mobility-aware cooperative caching in vehicular edge computing based on asynchronous federated and deep reinforcement learning. IEEE J. Sel. Top. Signal Process. 2022, 17, 66–81. [Google Scholar] [CrossRef]
Liu, W.; Zhang, H.; Ding, H.; Yuan, D. Delay and energy minimization for adaptive video streaming: A joint edge caching, computing and power allocation approach. IEEE Trans. Veh. Technol. 2022, 71, 9602–9612. [Google Scholar] [CrossRef]
Ma, Z.; Sun, S. Research on vehicle-to-road collaboration and end-to-end collaboration for multimedia services in the Internet of Vehicles. IEEE Access 2021, 10, 18146–18155. [Google Scholar] [CrossRef]
Zhang, Y.; Li, C.; Luan, T.H.; Yuen, C.; Fu, Y.; Wang, H.; Wu, W. Towards hit-interruption tradeoff in vehicular edge caching: Algorithm and analysis. IEEE Trans. Intell. Transp. Syst. 2021, 23, 5198–5210. [Google Scholar] [CrossRef]
Fu, W. Optimization of Caching Update and Pricing Algorithm Based on Stochastic Geometry Theory in Video Service. IEEE Access 2022, 10, 85470–85482. [Google Scholar] [CrossRef]
Nguyen, D.; Hung, N.V.; Phong, N.T.; Huong, T.T.; Thang, T.C. Scalable multicast for live 360-degree video streaming over mobile networks. IEEE Access 2022, 10, 38802–38812. [Google Scholar] [CrossRef]
Dai, J.; Yue, G.; Mao, S.; Liu, D. Sidelink-aided multiquality tiled 360° virtual reality video multicast. IEEE Internet Things J. 2021, 9, 4584–4597. [Google Scholar] [CrossRef]
Chen, S.; Yang, B.; Yang, J.; Hanzo, L. Dynamic resource allocation for scalable video multirate multicast over wireless networks. IEEE Trans. Veh. Technol. 2020, 69, 10227–10241. [Google Scholar] [CrossRef]
Ouyang, R.; Xiong, X.; Fu, M.; Wang, J.; Chen, S.; Alfarraj, O. A Scalable Video Multicast Scheme Based on User Demand Perception and D2D Communication. Sensors 2023, 23, 7325. [Google Scholar] [CrossRef] [PubMed]
Xiao, H.; Xu, C.; Feng, Z.; Ding, R.; Yang, S.; Zhong, L.; Liang, J.; Muntean, G.M. A transcoding-enabled 360 VR video caching and delivery framework for edge-enhanced next-generation wireless networks. IEEE J. Sel. Areas Commun. 2022, 40, 1615–1631. [Google Scholar] [CrossRef]
Dani, M.N.; So, D.K.; Tang, J.; Ding, Z. Resource allocation for layered multicast video streaming in NOMA systems. IEEE Trans. Veh. Technol. 2022, 71, 11379–11394. [Google Scholar] [CrossRef]
Li, Y.; Zhu, S.; Dai, J. Joint User Grouping and Resource Allocation for LEO Satellite Multicast. IEEE Syst. J. 2023, 17, 4695–4702. [Google Scholar] [CrossRef]
Zhong, L.; Wang, M.; Xu, C.; Yang, S.; Muntean, G.M. Decentralized Optimization for Multicast Adaptive Video Streaming in Edge Cache-Assisted Networks. IEEE Trans. Broadcast. 2023, 69, 812–822. [Google Scholar] [CrossRef]
Pan, Q.; Zeng, Q.; Zhuang, Y.; Chen, G. A BIER Multicast-based Low Latency Live Streaming System. In Proceedings of the 2023 International Wireless Communications and Mobile Computing (IWCMC), Marrakesh, Morocco, 19–23 June 2023; pp. 188–193. [Google Scholar]
Tong, W.; Hussain, A.; Bo, W.X.; Maharjan, S. Artificial intelligence for vehicle-to-everything: A survey. IEEE Access 2019, 7, 10823–10843. [Google Scholar] [CrossRef]
Ji, H.; Alfarraj, O.; Tolba, A. Artificial intelligence-empowered edge of vehicles: Architecture, enabling technologies, and applications. IEEE Access 2020, 8, 61020–61034. [Google Scholar] [CrossRef]
Liu, B.; Han, C.; Liu, X.; Li, W. Vehicle artificial intelligence system based on intelligent image analysis and 5G network. Int. J. Wirel. Inf. Netw. 2023, 30, 86–102. [Google Scholar] [CrossRef]
Chen, L.; Li, Y.; Huang, C.; Li, B.; Xing, Y.; Tian, D.; Li, L.; Hu, Z.; Na, X.; Li, Z.; et al. Milestones in autonomous driving and intelligent vehicles: Survey of surveys. IEEE Trans. Intell. Veh. 2022, 8, 1046–1056. [Google Scholar] [CrossRef]
Li, T.; Xie, S.; Zeng, Z.; Dong, M.; Liu, A. ATPS: An AI based trust-aware and privacy-preserving system for vehicle managements in sustainable VANETs. IEEE Trans. Intell. Transp. Syst. 2022, 23, 19837–19851. [Google Scholar] [CrossRef]
Alladi, T.; Kohli, V.; Chamola, V.; Yu, F.R.; Guizani, M. Artificial intelligence (AI)-empowered intrusion detection architecture for the internet of vehicles. IEEE Wirel. Commun. 2021, 28, 144–149. [Google Scholar] [CrossRef]
Kuutti, S.; Bowden, R.; Jin, Y.; Barber, P.; Fallah, S. A survey of deep learning applications to autonomous vehicle control. IEEE Trans. Intell. Transp. Syst. 2020, 22, 712–733. [Google Scholar] [CrossRef]
Rigas, E.S.; Ramchurn, S.D.; Bassiliades, N. Managing electric vehicles in the smart grid using artificial intelligence: A survey. IEEE Trans. Intell. Transp. Syst. 2014, 16, 1619–1635. [Google Scholar] [CrossRef]
Liu, L.; Hu, H.; Luo, Y.; Wen, Y. When wireless video streaming meets AI: A deep learning approach. IEEE Wirel. Commun. 2019, 27, 127–133. [Google Scholar] [CrossRef]
Zhang, J.; Zhang, W.; Xu, J. Bandwidth-efficient multi-task AI inference with dynamic task importance for the Internet of Things in edge computing. Comput. Netw. 2022, 216, 109262. [Google Scholar] [CrossRef]
Wu, W.; Li, R.; Xie, G.; An, J.; Bai, Y.; Zhou, J.; Li, K. A Survey of Intrusion Detection for In-Vehicle Networks. IEEE Trans. Intell. Transp. Syst. 2019, 21, 919–933. [Google Scholar] [CrossRef]
Hu, Q.; Luo, F. Review of Secure Communication Approaches for In-Vehicle Network. Int. J. Automot. Technol. 2018, 19, 879–894. [Google Scholar] [CrossRef]
Elkhail, A.A.; Refat, R.U.D.; Habre, R.; Hafeez, A.; Bacha, A.; Malik, H. Vehicle Security: A Survey of Security Issues and Vulnerabilities, Malware Attacks and Defenses. IEEE Access 2021, 9, 162401–162437. [Google Scholar] [CrossRef]
Rathore, R.S.; Hewage, C.; Kaiwartya, O.; Lloret, J. In-Vehicle Communication Cyber Security: Challenges and Solutions. Sensors 2022, 22, 6679. [Google Scholar] [CrossRef] [PubMed]
Wang, C.; Cheng, X.; Li, J.; He, Y.; Xiao, K. A survey: Applications of blockchain in the internet of vehicles. EURASIP J. Wirel. Commun. Netw. 2021, 2021, 1–16. [Google Scholar] [CrossRef]
Wang, X.; Zeng, P.; Patterson, N.; Jiang, F.; Doss, R. An improved authentication scheme for the internet of vehicles based on blockchain technology. IEEE Access 2019, 7, 45061–45072. [Google Scholar] [CrossRef]
Elagin, V.; Spirkina, A.; Buinevich, M.; Vladyko, A. Technological aspects of blockchain application for vehicle-to-network. Information 2020, 11, 465. [Google Scholar] [CrossRef]
Zuo, Y.; Guo, J.; Gao, N.; Zhu, Y.; Jin, S.; Li, X. A survey of blockchain and artificial intelligence for 6G wireless communications. IEEE Commun. Surv. Tutor. 2023, 25, 2494–2528. [Google Scholar] [CrossRef]
Jain, S.; Ahuja, N.J.; Srikanth, P.; Bhadane, K.V.; Nagaiah, B.; Kumar, A.; Konstantinou, C. Blockchain and autonomous vehicles: Recent advances and future directions. IEEE Access 2021, 9, 130264–130328. [Google Scholar] [CrossRef]
Mollah, M.B.; Zhao, J.; Niyato, D.; Guan, Y.L.; Yuen, C.; Sun, S.; Lam, K.Y.; Koh, L.H. Blockchain for the internet of vehicles towards intelligent transportation systems: A survey. IEEE Internet Things J. 2020, 8, 4157–4185. [Google Scholar] [CrossRef]
Alladi, T.; Chamola, V.; Sahu, N.; Venkatesh, V.; Goyal, A.; Guizani, M. A comprehensive survey on the applications of blockchain for securing vehicular networks. IEEE Commun. Surv. Tutor. 2022, 24, 1212–1239. [Google Scholar] [CrossRef]
Ayaz, F.; Sheng, Z.; Tian, D.; Nekovee, M.; Saeed, N. Blockchain-empowered AI for 6G-enabled Internet of Vehicles. Electronics 2022, 11, 3339. [Google Scholar] [CrossRef]
Kamal, M.; Srivastava, G.; Tariq, M. Blockchain-based lightweight and secured v2v communication in the internet of vehicles. IEEE Trans. Intell. Transp. Syst. 2020, 22, 3997–4004. [Google Scholar] [CrossRef]
Cui, J.; Ouyang, F.; Ying, Z.; Wei, L.; Zhong, H. Secure and efficient data sharing among vehicles based on consortium blockchain. IEEE Trans. Intell. Transp. Syst. 2021, 23, 8857–8867. [Google Scholar] [CrossRef]
Cheng, X.; Huang, Z.; Bai, L. Channel nonstationarity and consistency for beyond 5G and 6G: A survey. IEEE Commun. Surv. Tutor. 2022, 24, 1634–1669. [Google Scholar] [CrossRef]
Mahmood, A.; Abedin, S.F.; Sauter, T.; Gidlund, M.; Landernäs, K. Factory 5G: A review of industry-centric features and deployment options. IEEE Ind. Electron. Mag. 2022, 16, 24–34. [Google Scholar] [CrossRef]
He, C.; Wan, Y.; Zhao, L.; Lu, H.; Shimizu, T. Sub-6 GHz V2X-Assisted Synchronous Millimeter Wave Scheduler for Vehicle-to-Vehicle Communications. IEEE Trans. Veh. Technol. 2022, 71, 11717–11728. [Google Scholar] [CrossRef]
John, D.M.; Vincent, S.; Pathan, S.; Kumar, P.; Ali, T. Flexible Antennas for a Sub-6 GHz 5G Band: A Comprehensive Review. Sensors 2022, 22, 7615. [Google Scholar] [CrossRef]
Ikram, M.; Sultan, K.S.; Abbosh, A.M.; Nguyen-Trong, N. Sub-6 GHz and mm-Wave 5G Vehicle-to-Everything (5G-V2X) MIMO Antenna Array. IEEE Access 2022, 10, 49688–49695. [Google Scholar] [CrossRef]
Noh, G.; Kim, J.; Choi, S.; Lee, N.; Chung, H.; Kim, I. Feasibility validation of a 5G-enabled mmWave vehicular communication system on a highway. IEEE Access 2021, 9, 36535–36546. [Google Scholar] [CrossRef]
Tang, F.; Chen, X.; Zhao, M.; Kato, N. The Roadmap of Communication and Networking in 6G for the Metaverse. IEEE Wirel. Commun. 2022, 30, 72–81. [Google Scholar] [CrossRef]
Salameh, A.I.; El Tarhuni, M. From 5G to 6G—Challenges, Technologies, and Applications. Future Internet 2022, 14, 117. [Google Scholar] [CrossRef]
Pei, J.; Li, S.; Yu, Z.; Ho, L.; Liu, W.; Wang, L. Federated Learning Encounters 6G Wireless Communication in the Scenario of Internet of Things. IEEE Commun. Stand. Mag. 2023, 7, 94–100. [Google Scholar] [CrossRef]
Wang, C.X.; You, X.; Gao, X.; Zhu, X.; Li, Z.; Zhang, C.; Wang, H.; Huang, Y.; Chen, Y.; Haas, H.; et al. On the Road to 6G: Visions, Requirements, Key Technologies and Testbeds. IEEE Commun. Surv. Tutor. 2023, 25, 905–974. [Google Scholar] [CrossRef]
Han, C.; Wang, Y.; Li, Y.; Chen, Y.; Abbasi, N.A.; Kürner, T.; Molisch, A.F. Terahertz Wireless Channels: A Holistic Survey on Measurement, Modeling, and Analysis. IEEE Commun. Surv. Tutor. 2022, 24, 1670–1707. [Google Scholar] [CrossRef]
Shafie, A.; Yang, N.; Han, C.; Jornet, J.M.; Juntti, M.; Kurner, T. Terahertz Communications for 6G and Beyond Wireless Networks: Challenges, Key Advancements, and Opportunities. IEEE Netw. 2022, 37, 162–169. [Google Scholar] [CrossRef]
Lin, Z.; Wang, L.; Ding, J.; Xu, Y.; Tan, B. Tracking and Transmission Design in Terahertz V2I Networks. IEEE Trans. Wirel. Commun. 2022, 22, 3586–3598. [Google Scholar] [CrossRef]
Lin, Z.; Wang, L.; Ding, J.; Tan, B.; Jin, S. Channel Power Gain Estimation for Terahertz Vehicle-to-Infrastructure Networks. IEEE Commun. Lett. 2022, 27, 155–159. [Google Scholar] [CrossRef]
Li, Y.; Chen, Y.; Yan, D.; Guan, K.; Han, C. Channel Characterization and Ray-Tracing Assisted Stochastic Modeling for Urban Vehicle-to-Infrastructure Terahertz Communications. IEEE Trans. Veh. Technol. 2022, 72, 2748–2763. [Google Scholar] [CrossRef]
Azari, M.M.; Solanki, S.; Chatzinotas, S.; Bennis, M. THz-Empowered UAVs in 6G: Opportunities, Challenges, and Trade-Offs. IEEE Commun. Mag. 2022, 60, 24–30. [Google Scholar] [CrossRef]
Chaccour, C.; Soorki, M.N.; Saad, W.; Bennis, M.; Popovski, P.; Debbah, M. Seven Defining Features of Terahertz (THz) Wireless Systems: A Fellowship of Communication and Sensing. IEEE Commun. Surv. Tutor. 2022, 24, 967–993. [Google Scholar] [CrossRef]
Lou, Z.; Belmekki, B.E.Y.; Alouini, M.S. Coverage Analysis of Hybrid RF/THz Networks with Best Relay Selection. IEEE Commun. Lett. 2023, 37, 1634–1638. [Google Scholar] [CrossRef]
Pan, C.; Zhou, G.; Zhi, K.; Hong, S.; Wu, T.; Pan, Y.; Ren, H.; Di Renzo, M.; Swindlehurst, A.L.; Zhang, R.; et al. An Overview of Signal Processing Techniques for RIS/IRS-Aided Wireless Systems. IEEE J. Sel. Top. Signal Process. 2022, 16, 883–917. [Google Scholar] [CrossRef]
Yan, W.; Hao, W.; Huang, C.; Sun, G.; Muta, O.; Gacanin, H.; Yuen, C. Beamforming Analysis and Design for Wideband THz Reconfigurable Intelligent Surface Communications. IEEE J. Sel. Areas Commun. 2023, 41, 2306–2320. [Google Scholar] [CrossRef]
Zarini, H.; Gholipoor, N.; Mili, M.R.; Rasti, M.; Tabassum, H.; Hossain, E. Resource Management for Multiplexing eMBB and URLLC Services over RIS-Aided THz Communication. IEEE Trans. Commun. 2023, 71, 1207–1225. [Google Scholar] [CrossRef]
Fu, X.; Peng, R.; Liu, G.; Wang, J.; Yuan, W.; Kadoch, M. Channel Modeling for RIS-Assisted 6G Communications. Electronics 2022, 11, 2977. [Google Scholar] [CrossRef]
Humadi, K.; Trigui, I.; Zhu, W.P.; Ajib, W. User-Centric Cluster Design and Analysis for Hybrid Sub-6GHz-mmWave-THz Dense Networks. IEEE Trans. Veh. Technol. 2022, 71, 7585–7598. [Google Scholar] [CrossRef]
Chukhno, N.; Chukhno, O.; Moltchanov, D.; Pizzi, S.; Gaydamaka, A.; Samuylov, A.; Molinaro, A.; Koucheryavy, Y.; Iera, A.; Araniti, G. Models, Methods, and Solutions for Multicasting in 5G/6G mmWave and Sub-THz Systems. Appear IEEE Commun. Surv. Tutor. 2023. [Google Scholar] [CrossRef]
Moltchanov, D.; Sopin, E.; Begishev, V.; Samuylov, A.; Koucheryavy, Y.; Samouylov, K. A Tutorial on Mathematical Modeling of 5G/6G Millimeter Wave and Terahertz Cellular Systems. IEEE Commun. Surv. Tutor. 2022, 24, 1072–1116. [Google Scholar] [CrossRef]
Rasheed, I.; Hu, F. Intelligent super-fast Vehicle-to-Everything 5G communications with predictive switching between mmWave and THz links. Veh. Commun. 2021, 27, 100303. [Google Scholar] [CrossRef]
Aboelala, O.; Lee, I.E.; Chung, G.C. A Survey of Hybrid Free Space Optics (FSO) Communication Networks to Achieve 5G Connectivity for Backhauling. Entropy 2022, 24, 1573. [Google Scholar] [CrossRef]
Le, H.D.; Pham, A.T. Link-Layer Retransmission-Based Error-Control Protocols in FSO Communications: A Survey. IEEE Commun. Surv. Tutor. 2022, 24, 1602–1633. [Google Scholar] [CrossRef]
Singya, P.K.; Makki, B.; D’Errico, A.; Alouini, M.S. Hybrid FSO/THz-Based Backhaul Network for mmWave Terrestrial Communication. IEEE Trans. Wirel. Commun. 2022, 22, 4342–4359. [Google Scholar] [CrossRef]
Vishwakarma, N.; Swaminathan, R. On the Capacity Performance of Hybrid FSO/RF System with Adaptive Combining over Generalized Distributions. IEEE Photonics J. 2021, 14, 1–12. [Google Scholar] [CrossRef]
Wu, S.; Li, S.; Lin, Y.; Zhou, H. Performance Analysis of Hybrid FSO/RF Transmission Assisted Airborne Free-Space Optical Communication System. J. Commun. Inf. Netw. 2022, 7, 252–258. [Google Scholar] [CrossRef]
Lu, H.H.; Li, C.Y.; Tsai, W.S.; Lin, R.D.; Tang, Y.S.; Chen, Y.X.; Lin, Y.S.; Fan, W.C. An Integrated Fiber-FSO-5G NR Sub-THz Link With 86.112 Gbps High Aggregate Data Rates. J. Light. Technol. 2022, 40, 7790–7798. [Google Scholar] [CrossRef]
Li, S.; Yang, L.; Zhang, J.; Bithas, P.S.; Tsiftsis, T.A.; Alouini, M.S. Mixed THz/FSO Relaying Systems: Statistical Analysis and Performance Evaluation. IEEE Trans. Wirel. Commun. 2022, 21, 10996–11010. [Google Scholar] [CrossRef]
Esubonteng, P.K.; Nguyen, H.P.T.; Rojas-Cessa, R. STAR: A Carrier Sense Agnostic MAC Scheme for a Crowded NLoS-FSOC Optical LAN. J. Opt. Commun. Netw. 2022, 14, 815–827. [Google Scholar] [CrossRef]
Esubonteng, P.K.; Rojas-Cessa, R. Effect of the Incident Angle of a Transmitting Laser Light on the Coverage of a NLOS-FSO Network. Comput. Netw. 2023, 220, 109504. [Google Scholar] [CrossRef]
Esubonteng, P.K.; Rojas-Cessa, R. Orientation of a Diffuse Reflector for Improved Coverage in ID-FSOC for Vehicular Communications. Veh. Commun. 2022, 36, 100493. [Google Scholar] [CrossRef]
Niu, M.; Huang, X.; Wang, H. Vehicle-To-Anything: The Trend of Internet of Vehicles in Future Smart Cities. Intelligent Electronics and Circuits: Terahertz, ITS, and Beyond, 1st ed.; InTechOpen: Rijeka, Croatia, 2022; p. 107. [Google Scholar]
Hashima, S.; Fouda, M.M.; Sakib, S.; Fadlullah, Z.M.; Hatano, K.; Mohamed, E.M.; Shen, X. Energy-aware hybrid RF-VLC multiband selection in D2D communication: A stochastic multiarmed bandit approach. IEEE Internet Things J. 2022, 9, 18002–18014. [Google Scholar] [CrossRef]
Sun, S.; Yang, F.; Song, J.; Zhang, R. Intelligent reflecting surface for MIMO VLC: Joint design of surface configuration and transceiver signal processing. IEEE Trans. Wirel. Commun. 2023, 22, 5785–5799. [Google Scholar] [CrossRef]
Sejan, M.A.S.; Chung, W.Y. Secure VLC for wide-area indoor IoT connectivity. IEEE Internet Things J. 2022, 10, 180–193. [Google Scholar] [CrossRef]
Caputo, S.; Mucchi, L.; Cataliotti, F.; Seminara, M.; Nawaz, T.; Catani, J. Measurement-based VLC channel characterization for I2V communications in a real urban scenario. Veh. Commun. 2021, 28, 100305. [Google Scholar] [CrossRef]
Caputo, S.; Mucchi, L.; Umair, M.A.; Meucci, M.; Seminara, M.; Catani, J. The role of bidirectional VLC systems in low-latency 6G vehicular networks and comparison with IEEE802.11p and LTE/5G C-V2X. Sensors 2022, 22, 8618. [Google Scholar] [CrossRef]
Aly, B.; Elamassie, M.; Uysal, M. Vehicular VLC system with selection combining. IEEE Trans. Veh. Technol. 2022, 71, 12350–12355. [Google Scholar] [CrossRef]
Eldeeb, H.B.; Naser, S.; Bariah, L.; Muhaidat, S. Energy and Spectral Efficiency Analysis for RIS-Aided V2V-Visible Light Communication. IEEE Commun. Lett. 2023, 27, 2373–2377. [Google Scholar] [CrossRef]
Alsalami, F.M.; Benkhelifa, F.; Ashour, D.; Ghassemlooy, Z.; Haas, O.C.; Ahmad, Z.; Rajbhandari, S. Average channel capacity bounds of a dynamic vehicle-to-vehicle visible light communication system. IEEE Trans. Veh. Technol. 2023. [Google Scholar] [CrossRef]
Refas, S.; Acheli, D.; Yahia, S.; Meraihi, Y.; Ramdane-Cherif, A.; Van, N.V.; Ho, T.D. Performance Analysis of Bidirectional Multi-Hop Vehicle-to-Vehicle Visible Light Communication. IEEE Access 2023, 11, 129436–129448. [Google Scholar] [CrossRef]
Memedi, A.; Dressler, F. A location-aware RF-assisted MAC protocol for sectorized vehicular visible light communications. Comput. Commun. 2023, 197, 151–158. [Google Scholar] [CrossRef]
Tebruegge, C.; Memedi, A.; Dressler, F. Reduced multiuser-interference for vehicular VLC using SDMA and matrix headlights. In Proceedings of the 2019 IEEE Global Communications Conference (GLOBECOM), Waikoloa, HI, USA, 9–13 December 2019; pp. 1–6. [Google Scholar]
Aly, B.; Elamassie, M.; Uysal, M. Vehicular Visible Light Communication with Low Beam Transmitters in the Presence of Vertical Oscillation. IEEE Trans. Veh. Technol. 2023, 72, 9692–9703. [Google Scholar] [CrossRef]
Sharda, P.; Bhatnagar, M.R. Vehicular Visible Light Communication System: Modeling and Visualizing Critical Outdoor Propagation Characteristics. IEEE Trans. Veh. Technol. 2023, 72, 14317–14329. [Google Scholar] [CrossRef]
Nauryzbayev, G.; Abdallah, M.; Al-Dhahir, N. Outage analysis of cognitive electric vehicular networks over mixed RF/VLC channels. IEEE Trans. Cogn. Commun. Netw. 2020, 6, 1096–1107. [Google Scholar] [CrossRef]
Lim, W.Y.B.; Xiong, Z.; Niyato, D.; Cao, X.; Miao, C.; Sun, S.; Yang, Q. Realizing the metaverse with edge intelligence: A match made in heaven. IEEE Wirel. Commun. 2022, 30, 64–71. [Google Scholar] [CrossRef]
Fu, F.; Xue, B.; Cai, L.; Yang, L.T.; Zhang, Z.; Luo, J.; Wang, C. Live Traffic Video Multicasting Services in UAVs-assisted Intelligent Transport Systems: A Multi-Actor Attention Critic Approach. IEEE Internet Things J. 2023, 10, 19740–19752. [Google Scholar] [CrossRef]
Shen, L.H.; Feng, K.T.; Hanzo, L. Five facets of 6G: Research challenges and opportunities. ACM Comput. Surv. 2023, 55, 1–39. [Google Scholar] [CrossRef]
Ullah, A.; Yao, X.; Shaheen, S.; Ning, H. Advances in position-based routing towards ITS enabled FoG-oriented VANET–A survey. IEEE Trans. Intell. Transp. Syst. 2019, 21, 828–840. [Google Scholar] [CrossRef]
Kamiya, S.; Tang, Z.; Yamazato, T.; Kinoshita, M.; Kamakura, K.; Arai, S.; Yendo, T.; Fujii, T. Achieving Successful VLC Signal Reception Using a Rolling Shutter Image Sensor While Driving at 40 km/h. IEEE Photonics J. 2023, 15, 7302811. [Google Scholar] [CrossRef]
Kim, S. Hybrid RF/VLC network spectrum allocation scheme using bargaining solutions. IEEE Access 2022, 10, 20019–20028. [Google Scholar] [CrossRef]
Aboagye, S.; Ngatched, T.M.; Dobre, O.A.; Ibrahim, A. Joint access point assignment and power allocation in multi-tier hybrid RF/VLC HetNets. IEEE Trans. Wirel. Commun. 2021, 20, 6329–6342. [Google Scholar] [CrossRef]
Arshad, R.; Lampe, L. Stochastic geometry analysis of user mobility in RF/VLC hybrid networks. IEEE Trans. Wirel. Commun. 2021, 20, 7404–7419. [Google Scholar] [CrossRef]
Wang, C.X.; Lv, Z.; Gao, X.; You, X.; Hao, Y.; Haas, H. Pervasive wireless channel modeling theory and applications to 6G GBSMs for all frequency bands and all scenarios. IEEE Trans. Veh. Technol. 2022, 71, 9159–9173. [Google Scholar] [CrossRef]
Chowdhury, M.Z.; Hasan, M.K.; Shahjalal, M.; Hossan, M.T.; Jang, Y.M. Optical wireless hybrid networks: Trends, opportunities, challenges, and research directions. IEEE Commun. Surv. Tutor. 2020, 22, 930–966. [Google Scholar] [CrossRef]
Bitmovin. Available online: https://bitmovin.com/video-developer-report (accessed on 25 December 2023).
Noor-A-Rahim, M.; Liu, Z.; Lee, H.; Khyam, M.O.; He, J.; Pesch, D.; Moessner, K.; Saad, W.; Poor, H.V. 6G for vehicle-to-everything (V2X) communications: Enabling technologies, challenges, and opportunities. Proc. IEEE 2022, 110, 712–734. [Google Scholar] [CrossRef]
Li, C.; Zhang, Y.; Xie, R.; Hao, X.; Huang, T. Integrating edge computing into low earth orbit satellite networks: Architecture and prototype. IEEE Access 2021, 9, 39126–39137. [Google Scholar] [CrossRef]
Jabbar, R.; Dhib, E.; Said, A.B.; Krichen, M.; Fetais, N.; Zaidan, E.; Barkaoui, K. Blockchain technology for intelligent transportation systems: A systematic literature review. IEEE Access 2022, 10, 20995–21031. [Google Scholar] [CrossRef]
Popovski, P.; Chiariotti, F.; Croisfelt, V.; Kalør, A.E.; Leyva-Mayorga, I.; Marchegiani, L.; Pandey, S.R.; Soret, B. Internet of Things (IoT) connectivity in 6G: An interplay of time, space, intelligence, and value. arXiv 2021, arXiv:2111.05811. [Google Scholar]
Yuan, S.; Li, J.; Chen, H.; Han, Z.; Wu, C.; Zhang, Y. JIRA: Joint Incentive Design and Resource Allocation for Edge-Based Real-Time Video Streaming Systems. IEEE Trans. Wirel. Commun. 2022, 22, 2901–2916. [Google Scholar] [CrossRef]

Figure 1. Overview of this survey paper.

Figure 2. Video transmission in next-generation vehicular networks environment schematics.

Table 1. Comparative study of this paper and existing surveys on video streaming.

Year	Authors	Immersive Video	Video Processing Technology	Next-Gen Network Technology	Vehicular Network
2020	Xu et al. [12]	◐	◐	×	×
2020	Yaqoob et al. [13]	●	◐	●	×
2021	Ruan and Xie [10]	◐	◐	×	×
2021	Tang et al. [11]	×	⃝	◐	×
2021	Jiang et al. [9]	×	⃝	⃝	●
2022	Cai et al. [15]	◐	◐	×	×
2022	Khan et al. [16]	◐	⃝	◐	⃝
2022	Wong et al. [7]	◐	⃝	◐	×
2023	van der Hooft et al. [14]	●	●	×	×
2023	Mahmoud et al. [17]	◐	◐	⃝	×
	Our survey	●	●	●	●

Table 2. Comparative study of existing surveys on V2X.

Paper	Focus	Generations of Mobile Communication
[43]	Overview of the operational aspects of 4G-V2X in real-world driving scenarios.	4G
[44]	Thorough analysis of DSRC and 4G-V2X, with a focus on highlighting the limitations of each in supporting V2X applications.	4G
[20]	Overview of the vital technical aspects of 4G-V2X and 5G-V2X, highlighting challenges, unresolved issues, and upcoming technical trends relating to resource allocation in V2X.	4G and 5G
[45]	Comparisons of performance and coexistence considerations between IEEE 802.11 and Cellular V2X standards.	5G
[46]	Exploration of key components in 5G V2X, covering aspects such as resource allocation, QoS management, and mobility management for V2N communications.	4G and 5G
[47]	Unveiling requirements and investigating diverse 5G technologies for vehicular communications.	5G
[48]	A survey covering 4G V2X architecture and operational scenarios and addressing challenges and potential solutions for both 4G- and 5G-based vehicular communications.	4G and 5G

Table 3. Comprehensive review of encoding standards.

Paper	Video Coding Standard	Main Techniques	Results	Application Scenarios	Evaluation Indicators
[103]	AVC	Efficient reference frame selection mechanism	Complexity reduction in motion estimation for AVC encoder	×	×
[102]	AVC	Lossy inter-reference frame recompression	Increased compression efficiency in AVC with 5–15% rise in complexity	×	×
[107]	HEVC	CNN, LSTM	Reduced HEVC encoding complexity	×	×
[108]	HEVC	Hierarchical complexity control approach	Reduced HEVC encoding complexity	×	×
[109]	HEVC	Hierarchical complexity control approach	High-precision HEVC complexity control	Live video coding	×
[115]	VVC	CNN, decision trees	Reducing VVC encoding complexity while slightly increasing the bitrate	VoD	×
[123]	VVC	Multi-stage early termination CNN	Reducing VVC encoding time with a slight increase in BD-BR	×	×
[114]	VVC	Light Gradient Boosting Machine	Reducing VVC encoding time with a slight increase in BD-BR	×	×
[124]	VVC	Multi-stage exit CNN	Reducing VVC encoding time with a slight increase in BD-BR	×	×
[125]	AV1, VVC, HEVC	×	Higher bit rates, extended encoding time, and increased decoding time associated with VVC	×	PSNR, VMAF, multiscale structural similarity (MS-SSIM)
[126]	AV1, VVC, HEVC, AVC	×	Compression efficiency directly proportional to encoding time	×	PSNR, VMAF, MSSIM
[113]	VVC, HEVC	×	VVC video compression satisfying the requirements for applications such as 360° video and point cloud	2D video, 360° video	PSNR
[127,128]	AV1, VVC, HEVC	×	VVC achieving significantly superior performance compared to HEVC and AV1	×	PSNR, SSIM, VMAF
[129]	VVC, HEVC	×	Superior compression efficiency for 8K Video with VVC	×	VMAF, MS-SSIM

Table 4. Comparing ABR proposals across different papers.

Paper	Video Type	Algorithms or Techniques Used by ABR	FoV Prediction Method	Tiling Method	FoV Algorithm
[62]	Live video	DRL	×	×	×
[169]	VoD	DQN	×	×	×
[170]	Live video	Adaptive Forward Error Correction	×	×	×
[172]	Live video	DRL	×	×	×
[171]	VoD	Apprenticeship learning	×	×	×
[173]	VoD	Meta learning	×	×	×
[174]	VoD	DRL	×	×	×
[187]	360° VoD	DRL	User’s head rotation	Covering different areas of the video using tiles of three different sizes	CNN
[176]	360° VoD	DRL-A3C	Object motion tracking and ROI (user head motion trajectories and video saliency maps)	Traditional method	LSTM, YOLOv3
[189]	360° VoD	RL_A3C	User viewport trajectories	Finding the minimum rectangles (tiles) for both the FoV and the non-FoV	Spherical kernel density estimation
[64]	360° VoD	Jointly optimizing the rate distortion model and global bitrate allocation strategy	Current viewing trajectory and video content	Traditional method	XGBoost
[192]	360° VoD	DRL-A3C	User attention and historical viewing trajectory of multiple users	Traditional method	LSTM, Density-Based Spatial Clustering of Applications with Noise
[182]	360° Live	Modifications based on DASH	Joint temporal and spatial characteristics between the user’s viewport trajectory	Traditional method	CNN LSTM

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huang, C.-J.; Cheng, H.-W.; Lien, Y.-H.; Jian, M.-E. A Survey on Video Streaming for Next-Generation Vehicular Networks. Electronics 2024, 13, 649. https://doi.org/10.3390/electronics13030649

AMA Style

Huang C-J, Cheng H-W, Lien Y-H, Jian M-E. A Survey on Video Streaming for Next-Generation Vehicular Networks. Electronics. 2024; 13(3):649. https://doi.org/10.3390/electronics13030649

Chicago/Turabian Style

Huang, Chenn-Jung, Hao-Wen Cheng, Yi-Hung Lien, and Mei-En Jian. 2024. "A Survey on Video Streaming for Next-Generation Vehicular Networks" Electronics 13, no. 3: 649. https://doi.org/10.3390/electronics13030649

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Survey on Video Streaming for Next-Generation Vehicular Networks

Abstract

1. Introduction

1.1. Distinguishing Video Transmission between a Typical Network and a Vehicular Network

1.2. Related Surveys and Our Contributions

1.3. Work Organization

2. Characteristics of Vehicular Networks

2.1. V2I

2.2. V2V

2.3. V2N

3. Types of Video Streaming

3.1. Video on Demand

3.2. Live Video

3.3. 360° Video

3.4. Volumetric Video

4. Video Processing Technology

4.1. QoE Assessment and Prediction

4.2. Basic Video Coding

4.2.1. AVC

4.2.2. HEVC

4.2.3. VVC

4.2.4. Comparison of Three Coding Standards

4.3. Enhanced Video Coding

4.4. MPEG Immersive Media Standards

4.4.1. Point Cloud Compression

4.4.2. MPEG Immersive Video

4.5. Super Resolution

4.6. Adaptive Bitrate

5. Optimizing Video Transmission Involves Network Technologies and Next-Generation Wireless Communication Technologies

5.1. Caching

5.2. Multicasting

5.3. Artificial Intelligence in Vehicular Networks

5.4. Blockchain

5.5. 5G

5.6. 6G

5.6.1. Terahertz

5.6.2. Free Space Optical Communication

5.6.3. Visible Light Communication

6. Architectural Framework, Open Challenges, and Future Research Directions for Video Transmission in Next-Generation Vehicular Networks

6.1. Architectural Framework for Video Transmission in Next-Generation Vehicular Networks

6.2. Open Challenges and Future Research Directions

6.2.1. Challenges in Vehicular Networks

6.2.2. V2V Communication during High-Speed Vehicle Movement

6.2.3. RF–Optical Heterogeneous and Hybrid Networks

6.2.4. Multicasting

6.2.5. Video Compression Efficiency and Encoding Delay

6.2.6. 360° Video Streaming Optimization

6.2.7. Volumetric Video Optimizing

6.2.8. UAV-Assisted Communication

6.2.9. Satellite Network

6.2.10. Challenges of Blockchain in Vehicular Networks

6.2.11. Redefining QoE

6.2.12. Pricing Strategy for Vehicle Computing and Transmission

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI