Intelligent Caching for Mobile Video Streaming in Vehicular Networks with Deep Reinforcement Learning

: Caching-enabled multi-access edge computing (MEC) has attracted wide attention to support future intelligent vehicular networks, especially for delivering high-deﬁnition videos in the internet of vehicles with limited backhaul capacity. However, factors such as the constrained storage capacity of MEC servers and the mobility of vehicles pose challenges to caching reliability, particularly for supporting multiple bitrate video streaming caching while achieving considerable quality of experience (QoE). Motivated by the above challenges, in this paper, we propose an intelligent caching strategy that takes into account vehicle mobility, time-varying content popularity, and backhaul capability to improve the QoE of vehicle users effectively. First, based on the mobile video mean opinion score (MV-MOS), we designed an average download percentage (ADP) weighted QoE evaluation model. Then, the video content caching problem is formulated as a Markov decision process (MDP) to maximize the ADP weighted MV-MOS. Owing to the prior knowledge of video content popularity and channel state information that may not be available at the road side unit in practical scenarios, we propose a deep reinforcement learning (DRL)-based caching strategy to solve the problem while achieving a maximum ADP weighted MV-MOS. To accelerate its convergence speed, we further integrate the prioritized experience replay, dueling, and double deep Q-network technologies, which improve the performance of DRL algorithm. Numerical results demonstrate that the proposed DRL-based caching strategy signiﬁcantly improves QoE, and achieves better video delivery reliability compared to existing non-learning approaches.


Introduction
The advent of the 5G era and rapid development of cellular vehicle-to-everything (C-V2X) technologies have realized a massive number of vehicles equipped with advanced intelligent devices (e.g., high-definition (HD) players, 3D navigation equipment, vehicle mixed reality (MR) glasses), which bring significant mobile data traffic and service explosion [1]. Furthermore, the global mobile data traffic is expected to reach 77 exabytes per month by 2022, as reported by Cisco, where 79 percent of this data comes from mobile video [2]. Additionally, innovative mobile video technologies have inspired a wide range of video streaming applications in vehicular networks, becoming one of the most popular and indispensable services.
Generally, a massive amount of HD mobile video data is downloaded through the backhaul network of a macro base station (MBS), resulting in congested traffic loads. Consequently, the limited backhaul bandwidth capacity poses one of the most significant challenges to HD video data delivery services in the internet of vehicles (IoV) [3]. To effectively alleviate the backhaul traffic load, reduce the content delivery latency, and meet the diverse quality of service (QoS) data transmission requirements [4], multi-access edge computing (MEC) has been proposed as an efficient paradigm [5], which enables data caching at the edge of mobile networks, thus offers fast, and video delivery. The quality of 1.
We propose a mobile video caching framework in vehicular networks, where vehicle users can access the high-resolution video from the MECS deployed on RSU directly without occupying the backhaul network to release the bandwidth resource and reduce the data transmission delay.

2.
We designed an average download percentage (ADP) weighted mobile video mean opinion score (MV-MOS) model for vehicle users. Then, the video content caching problem is formulated as a Markov decision process (MDP) to maximize the ADP weighted MV-MOS. 3.
Based on a deep Q-learning network (DQN), we propose an intelligent caching algorithm to solve the problem while achieving a maximum ADP weighted MV-MOS. Furthermore, a P3DVQC caching scheme in vehicular networks was proposed to improve the performance of algorithm by integrating the prioritized experience replay, dueling, and double DQN technologies.
The remainder of this paper is organized as follows. The related work is presented in Section 2. We describe the system model in Section 3. In Section 4, we formulate intelligent caching for mobile video problems. Section 5 gives the solution of the proposed method. Numerical results and discussion are presented in Section 6. Finally, a conclusion is drawn in Section 7.

Related Work
A considerable study has been conducted on the design of mobile edge caching algorithms for mobile networks. The authors in [17] proposed proactive caching of popular content during off-peak periods to reduce peak traffic demands. The authors in [18] conducted a comprehensive survey regarding different aspects of mobile edge caching. Furthermore, they have discussed caching schemes based on different caching locations and performance criteria. The work in [19,20] leveraged edge nodes (e.g., small cells) to store popular content, such as multimedia files, to reduce latency and improve the performance of 5G networks. The authors in [21] propose a novel deep learning-based proactive caching framework in cellular networks that obtain higher backhaul offloading and user satisfaction. Along this line of mobile edge caching, video placement has been studied over different heterogeneity of networks. Considering that user devices and preferences and needs for specific videos may vary, adaptive bitrate (ABR) streaming becomes a pivotal technique to improve the quality of delivered video on networks. The authors in [22] envision a collaborative joint caching and processing strategy for multiple bitrate video delivered to adapt to the heterogeneity of user capabilities and wireless communication conditions. According to the authors in [23] scalable video coding (SVC) based video services are considered to formulate the joint video quality selection and caching problem, to maximize vehicular user's QoE. The solutions based on SVC on a system called DASH to ensure the quality of streaming media services have been proposed in recent studies [24].
Many works focus on the optimization efficiency of video caching and reducing video delivery delay in vehicular networks. The work in [25] proposed a cooperative transmission strategy for video transmission in small-cell networks with caching. Authors in [26] investigate a problem of cooperative mobile edge caching for scalable video streaming in HetNets. Adaptive video technology is applied to popular streaming services such as YouTube, Netflix, and Youku to provide smooth streaming and improve quality, such as Microsoft smooth streaming, Adobe's HDS, and Apple's HLS [27]. These streaming services encode videos into multiple versions with discrete bitrates. The authors in [28] proposed a mechanism based on MEC to cache only the highest available bit-rate video content while converting it to the requested lower bit-rate version using the available processing power of MEC. Many recent works focus on an adaptive bitrate streaming to cope with time-varying channels incurred by vehicular users' high mobility in IoV. In [29], the authors use a technique to effectively use both ABR streaming and BS caching in vehicular networks with high channel variations. Quality of experience serves as a direct evaluation of vehicle users' experiences in mobile video transmission, and thus the authors in [30] propose a deep learning-based QoE prediction approach with a large-scale QoE dataset for mobile video transmission. The work [31] proposed to simultaneously optimize energy consumption and QoE metrics in video streaming over software-defined mobile networks (SDMN) combined with MEC.
Zhao et al. [32] by considering the interaction between video encoding and edge caching, the authors proposed a QoE-driven cross-layer optimization scheme for secure video transmission over the backhaul links in cloud-edge networks. Liang et al. [33] proposed enhancing the quality of experience-aware wireless edge caching with bandwidth provisioning in software-defined wireless networks. Latency is decreased, and the utilization of caches is improved in the proposed scheme. Huang et al. [34], based on the video popularity and the wireless resource conditions of the network, proposed a joint cache allocation and video delivery scheme for the video streaming system. Alberto et al. [35] presented demonstrates the possibility of developing a DRL-based quality optimization framework which can guarantee an adequate QoE. Li et al. [36] studied a QoE-driven mobile edge caching placement optimization problem for dynamic adaptive video streaming that by the optimal caching placement of representations for multiple videos, they maximize the aggregate average video distortion reduction of all users while minimizing the additional cost of downloading. Qiao et al. [37] proposed a deep deterministic policy gradient (DDPG)-based cooperative caching scheme to jointly optimize the content delivery and content placement in vehicular networks.

System Model
As shown in Figure 1, we consider a highway vehicular networks scenario that includes an MBS, several RSUs, and vehicle users (VUs). The MBS connects to the core network (CN) through the backhaul link. Denote M = {1, . . . , M} and U = {1, . . . , U} be the RSUs set and vehicle users set, respectively. Each RSU is equipped with an MECS of size ϕ MEC to store a number of popular video replicas, which helps reduce the delay of content delivery and improve QoE of VUs. Vehicular users can access nearby RSU or MBS and download videos from the MECS or CN. We assume that each vehicular user request one of the interesting videos once the vehicle enters the service coverage areas of RSU but experiences a higher download rate if the requested one is pre-cached in MECS. Set the vehicle user arrival probability to a Poisson distribution of B different parameters that obey the Markov process, then the parameter of vehicle user arrival probability at time slot τ is recorded as λ(τ) ∈ {λ 1 , λ 2 , . . . , λ B }. Then, at time slot τ, the probability that the number of vehicle users U(τ) in the service coverage is expressed as We define the state transition probability of the parameter of arrival probability λ(τ) = λ i , i ∈ {1, . . . , B} at time slot τ transfers to λ(τ + 1) = λ j , j ∈ {1, . . . , B} at time slot τ + 1 as ϑ i,j , the transition matrix Γ of parameter λ is expressed as

Caching Model
The video content popularity reflects the statistical results of the video requested by vehicle users over a period. As a result, the time slot varying scale of video content popularity is much larger than the cache content update of the MECS. Furthermore, the time slot varying scale of the video content delivery process is much smaller than the cache content update of the MECS. Hence, the mobile video caching in the vehicular network can be modeled as a multi-time scale model, as shown in Figure 2. Popularity variation time scale: We assume that one video popularity variation time slot is included K CN caching placement time slot. The time scales of popularity variation can be defined as t x , x ∈ {1, 2, 3, . . . , K CG }, which containing K CG time slots, and define the length of the one MECS caching update period is t x y , then the length of t x is expressed as Caching placement time scale: The times scales of caching placement can be defined as t x y , y ∈ {1, 2, 3, . . . , K CN }, and caching video content of MECS is updated once every time slot of t x y . Suppose D is the length of service covered, the length of t x y at vehicles speed v t x y is calculated as Video delivery time scale: The rapid movement of the vehicle causes the geographic location and channel status to change, the transmission rate of the vehicle user is time-vary. The t x y is discretized into K SN segments to simplify the model. Then the times scales of video delivery can be defined as t x y,z , z ∈ {1, 2, 3, . . . , K SN }. It can be considered that the communication rate in each content distribution period t x y,z remains unchanged. The length t x y,z of the time slot t x y,z is calculated as The video requested by vehicle users (such as 4K/8K high definition movies, etc.) comes from a video content library F = { f 1 , f 2 , . . . , f F }, which containing F video files. Assumed that F obeys Zipf's law [27], and the videos of F are arranged in descending order of their content popularity. Using constant bitrate (CBR) technology to encode the videos of F into a constant bitrate multiple bitrate video L = { 1 , 2 , . . . , L }, its bitrate level is still arranged in descending order 1 > 2 > . . . > L , that is, 1 is the replicas of video with the highest bitrate. Without loss of generality, assume that the time lengths of the videos in F are equal, denoted as , it can be considered that after coding using the CBR technology, all video copies of the same bitrate level in F have the same size. The replicas size g j of video f at the j bitrate level is expressed as Hence, the caching state matrix X(t x y ) of MECS at the time slot t x y , is defined as where F is the number of videos and L is bitrate level. The X(t x y ) can be further expressed as X t x y ∈ {0, 1} F×L . Then caching state variable of video f i with bitrate level j at the time slot t x y is expressed as The video caching state vector is defined as The Formula (9) is the constraint that the MECS can only cache one video copy of the bitrate for the same video simultaneously. Additionally, existing research shows that low bitrate video copies can be obtained from high bitrate video transcoding by MECS. Hence, multiple video copies which are cached simultaneously will cause heavy waste of caching resources [34]. Therefore, the constraint condition of the caching capacity of MECS is expressed as Based on Zipf's law, the video's requested probability of F can be denoted as P x . Then video f i is the requested probability by the vehicle user at the time slot t x and can be expressed as where µ(t x ) is popularity parameter at the time slot t x , which reflects the shape of video popularity distribution. Without loss of generality, suppose µ(t x ) obeys a Markov process µ(t x ) ∈ {µ 1 , µ 2 , . . . , µ G }, which containing G parameter. Let φ i,j denote transition probability from popularity parameter µ i at the time slot t x to µ j at time t x+1 . Then the state transition probability matrix Φ of the video popularity parameter µ is calculated as Hence, the cumulative distribution function of the top S videos in F requested by the VUs is expressed as When the caching state matrix is X(t x y ), and caching vector is χ F t x y , then the caching hit rate at the time slot t x y is expressed as Hence, the H t x y can be rewritten as Then the caching loss rate at the time slot t x y is calculated as

Communication Model
We assume that the access system of the vehicular network is based on orthogonal frequency division multiple access (OFDMA). Therefore, every vehicle linked to RSU or MBS is assigned an orthogonal subchannel. Thus, we do not consider interference among different links. For simplicity, the channel gains are assumed to remain constant during one video delivery period. They have the same distribution, so it is sufficient to concentrate on one vehicle user to study the performance of interest. The transmission rate between vehicle and RSU at time slot t x y,z can be calculated by where W r is the channel bandwidth allocated by RSU to vehicle users, P r is the signal transmit power of RSU, δ 2 is the noise power, h r t x y,z is the RSU channel gain at time slot t x y,z , it can be expressed as where is the path loss coefficient, h 0 t x y,z is a complex Gaussian distributed random variable for Rayleigh channel fading, |.| is absolute operation, h 0 t x y,z 2 ∼ CN(0, 1), G r is the antenna gain coefficient of RSU, d 1,u t x y,z represents the distance between the vehicle user and RSU. If the MECS does not cache any bitrate video copies of the video requested by the vehicle user, MBS will take over the service of the vehicle user through the backhaul network, and distribute video copies with the highest bitrate under the constrained backhaul bandwidth to the VUs. The transmission rate between the vehicle user and MBS is expressed as where B 0 is MBS allocates bandwidth to each vehicle user, P bs is the MBS transmit power, MBS channel gain at the time slot t x y,z is calculated as where G bs is the antenna gain coefficient of MBS, d 0,u t x y,z represents the distance between the vehicle user and MBS. Due to R MBS is both limited by the wireless access network and the backhaul bandwidth C BM , when the number of vehicle users N t x y,z connected to MBS at time t x y,z does not reach the maximum number of bearer users N Max of MBS, the principle of fairness will be taken into consideration. The bandwidth resource required for caching update will allocate a size of C MBS backhaul bandwidth to each vehicle user. N Max is defined as where . is the rounding down operation. When the number of vehicle users connected to MBS exceeds N Max , the backhaul resources will be average allocated to users, and the transmission rate R Back t x y,z can be expressed as Hence, the service state probability can be expressed as where U t x y,z is the number of vehicle users at the time slot t x y,z , N t x y,z is the number of vehicle users who download the video by the backhaul link. The average transmission rate by backhaul link at the time t x y,z is calculated as where E is the expect operation. Then, the transmission rate of MBS can be rewritten as

ADP Weighted MV-MOS Model
In summary, the multiple bitrate video distribution strategy is that if the video requested by the vehicle user has been pre-cached in the MECS, the video file with the highest bitrate within the MECS will be sent to the vehicle user directly. Otherwise, VUs request the lowest bitrate replicas of videos via the MBS. The intelligent caching in IoV problem aims to maximize vehicle users' QoE. The MV-MOS is a mobile video experience metric at the device level, it has been widely used in the QoE evaluation of mobile video, e.g., [33,34,38]), so it is expressed as where Mv j , R j is required data transmission rate, j is the video definition level, and R j is the minimum bitrate requirement corresponding to the video definition level j . Then mobile video mean opinion score for the vehicle user to obtain the video from the MECS at the time t x y,z is calculated as Hence, mobile video mean opinion score for the vehicle user to obtain a video from MBS at time slot t x y,z can be expressed as When the number of users in the service coverage area at time slot t x y,z is U t x y,z , and the number of VUs downloading videos through MBS is N t x y,z , all are obtained from the MECS, mobile video mean opinion score for the vehicle user is expressed as The mobile video mean opinion score for the N t x y,z VUs who obtain videos from MBS through the backhaul network at the time slot t x y,z is expressed as When the number of vehicle users via the RSU service at the time slot t x y,z is U t x y,z , and the number of vehicle users downloading videos through MBS is N t x y,z , then the average mobile video mean opinion score at the time slot t x y,z is expressed as Hence, the average mobile video mean opinion score at time slot t x y,z is expressed as Therefore, the average MV-MOS of vehicle users for one period t x y is expressed as The multiple bitrate level of mobile video not only affects the resolution of the video, but is also related to the size of the video file. Owing to the service time of RSU for vehicle users is limited, it may cause the vehicle user to only complete a small part of the download task within the time of the RSU service range, which means bad QoE. When the vehicle user leaves the RSU service coverage area, MBS will continue to distribute the unfinished part of the replicas of video. This is very likely to be difficult to meet the QoE requirements of the corresponding bitrate replicas of video due to the low MBS transmission rate, resulting in serious video service fail. To deal with the above problems, it introduces the evaluation index of the ADP of vehicle users into the QoE evaluation model, then proposes an ADP weighted MV-MOS model that comprehensively considers the completion of the video download task. The calculation formula of ADP for mobile video is expressed as where R a t x y is transmission rate, g j is file size of the video bitrate level j . When vehicle users obtain video from the MECS, the P MEC t x y can be expressed as When a vehicle user obtains a video from MBS through the backhaul at the time slot t x y , the P BH t x y is calculated as Hence, the average ADP at time slot t x y is calculated as In summary, the ADP weighted MV-MOS at the time slot t x y is expressed as

ADP Weighted Mobile Video Mean Opinion Score
This section, we focus on the problem of intelligent caching for mobile video streaming in the IoV. In mobile networks, the mobile user's location is considered static, and then the QoE evaluation method cannot be applied to the dynamic scene of the vehicular network. To adapt to the vehicle's fast mobility scenario, we formulate the problem of mobile video caching in IoV as an optimized expression under multiple constraints to maximize the ADP weighted MV-MOS. The objective function can be formulated as where (39b) is a restriction of multiple bitrate copies. In the same time slot, the MECS can only caching one replica of video of the same video content. (39c) is a restriction of the storage capacity of the MECS, (39d) is transmission rate constraint, (39e) is bandwidth constraint, and optimization problem (39a) is a dynamic optimization problem under multi-dimensional constraints.

Average Mobile Video Mean Opinion Score
For better experimental comparison, this section formula a sub-problem P1 for the optimization problem of Formula (39a), which is to relax (39a) and strip the ADP weighting term to replacement optimization objective is Formula (33), so the multi-dimensional constraint expression can be formulated as In fact, P1 is a simplification of P, then constraint conditions of P1 the same as P, but only the optimization objective is different. So the parameters definition and constraint conditions of (40) are the same as (39).

Deep Reinforcement Learning-Based Caching Solution
The video content caching problem in this section is formulated as MDP. A novel caching scheme DQN-based is proposed to achieve maximum ADP weighted MV-MOS, to solve this complex problem. Furthermore, we also improve the convergence speed of DQN and enhance the performance of proposed algorithm. The MDP can be represented by the 4-tuple <S, A, E , R>. S is the set of environment states, A denotes the set of agent actions, E represents the state transition probability, and R indicates the reward function.

Markov Decision Process
At the beginning, the agent will observe environment information. The environment state is represented as The action a t x y is represented as After an action a t x y is taken, the reward R s t x y , a t x y is represented as where ω t x y ∈ (0, 1] is discounted factor. Hence, the optimal caching strategy π * in the IoV is expressed as In summary, the agent perceives the environment state s t x y , then selects and executes an action a t x y , after the system environment will feedback an immediate reward R s t x y , a t x y , it is represented as Otherwise, the system environment will feedback on a cost (e.g., −1).

DRL-Based Caching Algorithm
The high mobility of vehicle users leads to environmental information dynamic changes in vehicular networks. To cope with the challenge, we consider a novel algorithm that effectively solves excessive state and action space in a dynamic environment by combining deep learning and reinforcement learning to solve the complex caching problem. Especially, as the update rule of the DQN does not require knowledge about the transition and reward functions. Therefore, we proposed the P3DVQC to solve the caching problem. The parameters of the P3DVQC main networks update formula is expressed as where α is learning rate, γ is discount factor, ∇ is gradient operator. The P3DVQC is integrated with prioritized experience replay (PER) technology to achieve priority sampling by changing sampling distribution to improve the performance of the DQN. The P3DVQC sampling weight formula is expressed as where ρ i PER is priority parameter of experience pool sampling i, it can be expressed as where ε is disturbance coefficient, E TD (t x y+1 ) is a distance of <s t x y , a t x y , R(t x y ), s t x y+1 > between output and target value. It is calculated as where θ t x y is the parameter of P3DVQC main network, θ − t x y is parameter of P3DVQC target network, |.| is absolute operation. In addition, P3DVQC also integrated dueling technology to improve the deep Q-learning algorithm. The framework of the proposed P3DVQC is shown in Figure 3. The target network output function is expression as where α t x y is parameter of advantage function A s t x y , a t x y ; θ t x y , α t x y , β t x y is parameter of value function V s t x y ; θ t x y , β t x y . The pseudo code of the proposed DRL-based caching algorithm is provided in Algorithm 1.

Algorithm 1 Prioritized experience replay Dueling Double DQN IoV QoE Caching (P3DVQC)
1: 1. Initialize: 2: s t x y ∈ S, a t x y ∈ A, α, γ, the size of minibatch G and the memory pool N 3: Parameters θ t x y and θ − t x y 4: Parameters E TD (t x y ) 5: Parameters V s t x y ; θ t x y , β t x y , A s t x y , a t x y ; θ t x y , α t x y , 6: 2. Learning: 7: for t x y ∈ K c do 8: Choose the π(t x y ) by ε − greedy policy 9: Observe environment, evaluate and estimate s t x y 10: Perform a t x y according to π and observe feedback 11: while t x y,z ∈ T s do 12: Observe: environment and communication state if ERP t > N max then 20: for j ∈ N F do 21: Calculate: ρ i PER , sort by SumTree, and sample < s t x y , a t x y , R(t x y ), s t x y+1 >

Numerical Results and Discussion
The simulations of the caching scheme are carried out in this section. The performance of P3DVQC was compared with the baseline schemes. The simulation environment for the mobile video caching system in IoV was programmed in Python. In addition, the TensorFlow platform was used to implement the P3DVQC caching scheme based on the open-source package convolutional neural networks. The main system parameters used in the simulations are summarized in Table 1. For performance comparison, five benchmark schemes were presented: (1) Resolution Optimal Caching Scheme (ROCS): The scheme always priority caches the video copies with the highest bitrate level to realize the rapid delivery of high-quality video content and improve the quality of experience of vehicle users.
(2) Fluency Optimal Caching Scheme (FOCS): The scheme caches the video copies with the lowest bitrate to cache as many videos as possible and realizes the diversity of the cache to improve the cache hit rate.
(3) Random Caching Scheme (RCS): The scheme selects the video copies with the random algorithm, which means an equal probability choose each copy cached until the maximum cache capacity of MECS is reached.
(4) Cost Efficient Scheme (CES): The scheme only depends on the backhaul network delivery of video files, not to use of MECS caching equipment to minimize device cost and energy consumption.
(5) Brute Force Scheme (BFS): The scheme is optimal, but it is obtained based on ideal conditions where all system information is known. In actual scenarios, system information, such as wireless channel state information, cannot be obtained in advance. Therefore, BFS cannot carry out in real vehicular network scenarios.  Figure 4 shows the performance of the two DRL-based caching algorithms proposed. One is based on the traditional deep Q-network IoV caching algorithm (DVQC), and another is the P3DVQC algorithm which integrates the prioritized experience replay, the dueling, and the double deep Q-network technologies. The size of the experience replay pool is 3000 in the simulation. Figure 4 also shows that the P3DVQC completes the convergence at nearly 5000-time slots, while the DVQC at about 6000-time slots. That is due to the P3DVQC employing the PER technology improving the deep neural network training efficiency. In addition, the P3DVQC shows more stability than DVQC at the convergence state, this is beacuse the P3DVQC integrated DDQN and Dueling technology, over-estimation of DQN is effectively eliminated, avoiding unnecessary misselections to achieve rapid convergence to the optimal caching strategy. Figure 5 shows the convergence performance comparison between the P3DVQC and the DVQC. Before 3000 time slots is the experience replay pool initialization stage. After 3000 time slots, the P3DVQC achieves a faster convergence rate than DVQC. That is due to the P3DVQC employing the PER to achieve TD weighting of the experience pool samples, which makes deep neural networks more effective. The P3DVQC algorithm determines the priority of the sample selected for training according to the size of TD errors to improve the training effect and speed up the convergence speed. However, the DVQC algorithm selects samples for training through a uniform sampling strategy, so the the convergence speed is slower than P3DVQC.   Figure 6 shows the average MV-MOS performance comparison between the P3DVQC scheme and the five baseline schemes. BFS is the ideal upper bound algorithm of the experiment, it is obtained by traversing all the solution spaces and requires knowing all the system information, which requires a lot of time overhead to search the solution space, hardly realized in the actual scene. The CES scheme does not use any caching technology, so its performance is the lowest. Secondly, the RCS chooses video content caching by the random algorithm, resulting in the performance being very unstable. The FOCS only caches the lowest bit rate video. Because the lowest bit rate video file is small, the FOCS can cache more different video content varieties, but it cannot effectively provide highdefinition video. ROCS caches high-bitrate HD videos as much as possible, so vehicle users can achieve high average MV-MOS if the required HD video is pre-cached in MECS. However, the size of high bitrate video content is larger, so the higher storage overhead leads to a lower cache hit rate and performance. The P3DVQC proposed by this paper comprehensively considers the multiple bitrate video quality, backhaul bandwidth, and caching capacity to maximize objective. Therefore, the caching strategy obtained by the P3DVQC is that it caches a mixture of low and high bitrates. Figure 6 shows that the proposed can converges to the upper bound, and its performance is better than other benchmark algorithms.  Figure 7 shows that ADP weighted MV-MOS performance indicators by different caching schemes. The ROCS scheme achieves a poor performance in ADP weighted MV-MOS, almost the same as the CES algorithm. This is because the video's bitrate is directly proportional to the file size. A high bitrate means a large storage cost. Hence, when the service time is fixed, the ROCS can only complete a small proportion of tasks, which will cause vehicle users to only enjoy a short time of the high-definition video within the service area. It will immediately interrupt when it leaves the caching service area. Therefore, based on the average MV-MOS model cannot be directly applied to the vehicular networks. Figure 7 also shows that the ADP weighted MV-MOS performance of the P3DVQC can converges to the optimal BFS, which is better than other benchmark algorithms.   Figure 7. The BFS is a straight line in Figure 7, so the cumulative gain is meaningless. To increase the caching hit rate, FOCS only caches the lowest bitrate videos so that its solution space is much smaller than the P3DVQC, so the cumulative return obtained in the early stage is more significant. However, with the P3DVQC converging to the optimal strategy, it achieves a greater cumulative return than the FOCS. Other benchmark algorithms can be seen from the analysis of Figure 7, and their performance is lower than the P3DVQC.

Conclusions
In this paper, a DRL-based P3DVQC algorithm is proposed to solve the mobile video caching problem while achieving the maximum ADP weighted MV-MOS. The numerical results show that compared with other benchmark schemes, the proposed has a faster convergence speed and significant performance.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: