Distance-Based Resource Allocation for Vehicle-to-Pedestrian Safety Communication

: Cellular Vehicle to Everything (V2X) has redeﬁned the vehicular communication architecture as something that needs an ultra-reliable link, high capacity, and fast message delivery in vehicular networks. The V2X scenarios are broadly categorized as Vehicle to Vehicle (V2V), Vehicle to Infrastructure (V2I), Vehicle to Pedestrians (V2P), and Vehicle to Network (V2N). Vulnerable pedestrians belong to the V2P category and hence require an ultra-reliable link and a fast message delivery in case the moving vehicle is in the close proximity of the pedestrian. However, congestion in the network calls for an optimized resource allocation that would allow a fast and secure connection between a vehicle and the pedestrian. In this paper, we have proposed a distance-based resource allocation that classiﬁes the pedestrians in di ﬀ erent categories, performs a one-to-many weighted bipartite matching, and ﬁnally a reinforcement learning based power allocation.

the survey V2X communication is still in its preliminary stage and requires an extensive research in areas of RRM, effective handoff mechanisms, and reliable channel signal information.
The V2X technology is broadly categorized in V2V, V2I, V2P, and V2N scenarios. V2V scenario considers two vehicles in vicinity of each other to share safety information in order to avoid accidents or to alert the other vehicle about traffic jams etc. The V2I scenario is simply a vehicle communicating with the infrastructure. Like V2V, V2P scenario is also for safety communication between a vehicle and pedestrian. Whereas, V2N is a vehicle which is accessing the network for cloud-based services. V2V and V2I require ultra-reliability and high capacity, respectively. Other aspects of the V2X communication like vehicle-to-sensors (V2S), vehicle-to-grid (V2G), and vehicle-to-human (V2H) are studied in [4] as an emerging technology in current R&D.
V2P being one of the most crucial scenarios for vulnerable road users (VRUs); i.e., users which are in close proximity of a moving vehicle and are prone to an accident. Pedestrians on the verge of crossing a road, a cyclist nearby or on the road, and pedestrians on a busy street all qualify as VRUs. A recent survey on VRUs in [5] shows the increasing number of fatalities in different countries. The author classifies the VRUs into categories such as cyclist, motorized two wheelers, and pedestrians. Furthermore, the author performs a case study for pre-crash scenarios under different technologies.
While most of the research consider scenarios that include V2I and V2V users, we have aimed to present a roadside scenario, that specifically focus on the V2P links. In our scenario, vehicular user equipment (VUE) share resources with cellular user equipment (CUE). A VUE and CUE form a D2D pair when they are in certain range of each other. This pair is referred to as V2P. V2P which is similar to a V2V link is aimed for highly reliable safety communication. However, our aim here is to categories V2P users based on their vulnerability by using a distance-based classification, and then allocate them resources, accordingly. Unlike traditional network scenarios, our scenario focuses only on the safety communication of VRUs. A user in the vicinity of the D2D range should receive the information regarding the approaching vehicle. The feasible range for D2D Communication is studied in [6]. In this paper, we have taken the range to be 50 meters, which means that any user lying in 50m radius of another user can involve in D2D communication.
In this paper, we have proposed a distance-based resource allocation scheme in a D2D-based V2P scenario. Therefore, our major contribution is:

•
Optimum power allocation using Q-learning based reinforcement learning with different discount factors. • A one-to-many weighted bipartite matching scheme (with maximum flow) to create connections between users for frequency assignment.
The rest of the paper is organized as follows: Section 2 is related to current research trends in this area. Section 3 describes the network and system model. Section 4 is the classification of the VRUs. Section 5 is regarding optimal power assignment based on the classification of users in Section 4. Section 6 introduces a one-to-many bipartite matching scheme. Section 7 will present the simulation setup and results. Finally, Sections 8 and 9 discusses future directions and conclusions, respectively.

Related Work
Keeping in mind the differentiated quality of service (QoS) of the different communication scenarios, resource allocation among users becomes a major milestone. Resource allocation in LTE is the distribution of resources like power and frequency among the users, such that the method is optimum and solves the network congestion problem. A survey on radio resource allocation for V2X communication [7] thoroughly investigated the communication modes, radio resource management techniques, and discussions on different methodologies. Resource allocation for D2D-based vehicular communication has been deeply researched in the recent past with techniques like traditional optimization techniques [8], matching [9], graph theory [10], and machine learning [11] being used to investigate the performance of the network under varying scenarios. A resource allocation based on an immune algorithm for D2D-based vehicular networks has been in investigated in [12], where the authors have incorporated an adaptive cloning and population update strategy to provide a highly efficient method for solving resource sharing among vehicles. Power control using a distributed deep deterministic policy gradient method [13], introduced two models to solve a multi-agent energy-efficient power allocation problem. The author proposed a model that employs neural networks to overcome problems with existing approaches.
Resource allocation with low-latency vehicular communication has been studied by several authors. Low latency with packet retransmission is investigated in [14], where the authors have presented the queuing analysis and then derived an expression for average packet sojourn time. A twin-timescale scheme for low-latency vehicular networks [15], aimed for reduction of maximum transmission latency by using a two-stage process, by first minimizing the worst-case transmission latency and then the base station allocating a total power at short-term timescale. The idea for using a twin-timescale is to provide a more realistic approach for avoiding frequent exchange of near instantaneous channel state information. hybrid strategy, whereby, cellular-based V2V communications is introduced to IEEE 802.11p-based vehicular networks [16] to improve network latency, is also being studied.
Matching and graph theory are among the most researched areas in D2D-based vehicular communication. Both techniques have primarily been used for frequency assignment problems. Frequency resource blocks (RB) are shared among users in the cellular network. In order to avoid communication impairment due to interference, the RB are assigned in an optimum manner. Graph theory-based resource assignment has been studied in both two and three dimensions. Interference is usually taken as a metric to define the edges between the vertices in a graph. Interference hypergraph-based 3D matching resource allocation for V2X [17] using a weighted 3-partite interference hypergraph based on greedy and iterative matching, has been shown to improve the network throughput.
Dynamic proximity-aware resource allocation in V2V communications [18] employs zone formation based on traffic patterns for vehicles. Then, a matching game is proposed to allocate resources to V2V pairs within each zone. The zone formation reduces interference and signaling overhead and hence satisfies the quality of service in terms of SINR. Distance-based power control schemes have been employed using stochastic geometry [19] in a D2D setting by using distancedependent path loss parameters. Location information has been used to monitor the QoS requirements in [20] and dynamically modifies resource assignment to interfering links.
Most of the research has focused on creating a network scenario with V2V and V2I users and then the problem is formulated using an objective function and constraints. The constraints usually include power, latency, SINR, and outage probability. The formulated problem is then solved using the techniques mentioned above. While matching and graph theory have specifically been used for frequency assignment problems, other optimization techniques along with machine learning has been used to solve a wide variety of resource sharing problems.

Network and System Model
We consider a single cell network scenario with one evolved node B(eNb) in the center of the cell, a road that runs in the middle of the cell, vehicular users on that road, and cellular users distributed randomly throughout the cell. Let c = {1 . . . . . . , C} by the CUE involved in direct communication with the infrastructure and let v = {1 . . . . . . , V} be the V2P users (a pair consisting of a cellular and a vehicular user).

Network Architecture
The network architecture consists of a vehicle surrounded by several road-side users. The vehicle is involved in direct D2D communication with the road users in the vicinity (V2P). In this network, we also have some cellular users involved in direct communication with the infrastructure. Figure 1 shows the vehicle with road-side users. If the distance between a cellular user and the vehicle is more than 50 meters, the cellular user is involved in direct communication with the infrastructure (CUE). When the distance becomes 50 meters or less, V2P communication starts.

System & Channel Model
Since the vehicle is assumed to be moving, it will encounter both small-and large-scale fading and therefore will have different channel model compared to a cellular user which is static or moving very slowly. The SINR of the two types of users, is given by Equations (1) and (2): The channel power gains are denoted by h in Equations (1) and (2). The subscript 'x' denotes the frequency resource block (RB) shared among users. We assume that the RB of V2P user can be shared by a CUE, but resource blocks cannot be shared between VUEs. The gain hc,B is the desired channel gain from a CUE user to the base station, hv,c is the interference gain from the D2D pair to the CUE user, hv is the desired gain between the D2D based V2P pair, and hc,v is interference gain from CUE to the D2D pair. The channel gain, in turn is dependent on pathloss, shadowing, distance, and smallscale fading component; given by Equation (3): where PL is the path loss, Y is the log-normal shadowing component, ℓ is the distance between transmitter and receiver, and γ is the decay exponent. The path loss models for V2P and CUE are given in Table 1.

System & Channel Model
Since the vehicle is assumed to be moving, it will encounter both small-and large-scale fading and therefore will have different channel model compared to a cellular user which is static or moving very slowly. The SINR of the two types of users, is given by Equations (1) and (2): The channel power gains are denoted by h in Equations (1) and (2). The subscript 'x' denotes the frequency resource block (RB) shared among users. We assume that the RB of V2P user can be shared by a CUE, but resource blocks cannot be shared between VUEs. The gain h c,B is the desired channel gain from a CUE user to the base station, h v,c is the interference gain from the D2D pair to the CUE user, h v is the desired gain between the D2D based V2P pair, and h c,v is interference gain from CUE to the D2D pair. The channel gain, in turn is dependent on pathloss, shadowing, distance, and small-scale fading component; given by Equation (3): where PL is the path loss, Y is the log-normal shadowing component, is the distance between transmitter and receiver, and γ is the decay exponent. The path loss models for V2P and CUE are given in Table 1.

Problem Formulation
In this section we will define the optimization problem; objective function and the constraints. Traditionally, most problems [9] assign high capacity to the V2I link and high reliability to V2V link. However, our scenario is particularly focused on the V2P links, so our aim is to provide a high capacity and reliable link to the V2P, while providing a minimum threshold SINR to the CUEs: Equation (4) shows the objective function while equations (4) shows the constraints. In the next section, we have further decomposed Equation (4) into three different parts for each category of user. While all the CUEs get a uniform power depending on the threshold SINR. The maximum allowed power is 23 dBm. Table 2 summarizes the list of symbols with their respective definitions.

Classification of Vulnerable Road Users
In this section we will classify the V2P users in three different clusters as shown in Table 3. Type 1 (very critical) V2P users are in most crucial state which requires an ultra-reliable link, Type 2 (critical) users are slightly less vulnerable, and Type 3 users are also classified according to their requirements. The classification is distance dependent (distance between the vehicle and cellular user). A 50-meter distance initiates the V2P communication and thereafter the users are classified accordingly. In the next two sections, we will use this classification for resource allocation. In Figure 2, we can see the V2P users classified into three types, depending on their vulnerability. Users involved in direct communication with infrastructure are referred as CUEs. The shortest distance between two users is found using the coordinates during simulation.
In a real-life scenario, a number of methods can be used to calculate the distance between devices. The first and foremost is the global positioning system (GPS) that can be used to find the coordinates. However, accuracy of this method is still low especially when we talk about safety communication. Secondly, received signal strength indicator (RSSI) can be used to estimate the distance between two devices. The author in [22] has used the RSSI-based technique, assuming the absence of the GPS, to find the distances between the vehicles for short-range communication. In our scenario distance is calculated between two users once they enter a D2D range. User equipment (UE) location reporting is defined in the 3GPP TS 23.303 standard, where UE reports its location on periodic basis to their corresponding servers [23]. This information is available at the network level.
Electronics 2020, 9, x FOR PEER REVIEW 6 of 17 communication. Secondly, received signal strength indicator (RSSI) can be used to estimate the distance between two devices. The author in [22] has used the RSSI-based technique, assuming the absence of the GPS, to find the distances between the vehicles for short-range communication. In our scenario distance is calculated between two users once they enter a D2D range. User equipment (UE) location reporting is defined in the 3GPP TS 23.303 standard, where UE reports its location on periodic basis to their corresponding servers [23]. This information is available at the network level.

Reinforcement Learning Based Power Allocation
Reinforcement larning (a type of machine learning) has recently been used in the domain of resource allocation in wireless networks. Q-learning, which is basically a subtype of reinforcement learning has found its way to wireless communication problems. If we specifically talk about D2D communication, some authors have researched on that including the author in [24] has used deep reinforcement learning for allocating resources to vehicles in a network scenario composed of V2V links with stringent latency and V2I links with high capacity requirements. The author in [25] has used a distributed reinforcement learning method, where each agent in the network keeps its own table of Q-values.
Q-Learning involves agents, actions, states, and rewards. The learning in this method is based on interaction with the environment. It is represented by a tuple < S, A, T, R(s, a) >, where S is the set of states in the environment, A is the agent, T is a state transition probability, and R is the reward function.

Learning Process
An agent interacts with the environment by sensing the current state and taking an action (according to a policy). This results in environment transitioning in a new state and an agent is rewarded accordingly. The process is repeated till we have an optimum policy. The learning process under Q-learning [26] is given by Equations (5)-(10).
Under a policy π, a state is given by Equation (5):

Reinforcement Learning Based Power Allocation
Reinforcement larning (a type of machine learning) has recently been used in the domain of resource allocation in wireless networks. Q-learning, which is basically a subtype of reinforcement learning has found its way to wireless communication problems. If we specifically talk about D2D communication, some authors have researched on that including the author in [24] has used deep reinforcement learning for allocating resources to vehicles in a network scenario composed of V2V links with stringent latency and V2I links with high capacity requirements. The author in [25] has used a distributed reinforcement learning method, where each agent in the network keeps its own table of Q-values.
Q-Learning involves agents, actions, states, and rewards. The learning in this method is based on interaction with the environment. It is represented by a tuple < S, A, T, R(s, a) >, where S is the set of states in the environment, A is the agent, T is a state transition probability, and R is the reward function.

Learning Process
An agent interacts with the environment by sensing the current state and taking an action (according to a policy). This results in environment transitioning in a new state and an agent is rewarded accordingly. The process is repeated till we have an optimum policy. The learning process under Q-learning [26] is given by Equations (5)- (10).
Under a policy π, a state is given by Equation (5): where η is the discount factor with value ranging between 0 and 1. The discount factor helps to determine the future reward. A discount factor of 0 will make the agent strive for only current rewards, whereas, a discount factor near to 1, will make the agent strive for long term rewards. Vπ(s) is the expected discounted reward. There exists at least one optimal policy π*, such that: and the optimal policy Q*(s,a) is given by: and so, we get: The update rule, that helps the Q-learning algorithm (by adjusting the Q-values) is given by: Here, α denotes the learning rate. The values of learning rate are between 0 and 1. A learning rate of 0 means that the agent has learnt nothing, while a learning rate of 1 means the agent only considers the most recent information. The update rule in Equation (10) is based on the steps taken by an agent to reach a terminal point. The Q-values are adjusted based on the difference between discounted new values and the old values. The parameter gamma is used to discount the new values and the step size is adjusted using learning rate.

Proposed Methodology
In this section, we will solve the optimization problem presented in Section 3 using the distance-based classification in Section 4 to find the optimum power. A decentralized approach, where each user maintains its own Q-values table has been used since a centralized approach would cause an excessive overhead [23]. Firstly, we need to define the states, agents, actions, and reward function in our scenario.
Agent: The V2P users will act as agents. We have v numbers of users engaged in V2P communication. Therefore, 1 ≤ v ≤ V number of agents in the network.
State: User v in a state S, on resource block x, at time t with an interference level of I x t is given by: The interference level is given by: The equation above guarantees the minimum required SINR for the cellular users. Action: The actions consists of the uplink power of the V2P users.
where the subscript n denotes that we have 'n' number of powers. Reward Function: The reward function consists of our objective function, that was mentioned in Section 3.
Learning Rate and Discount Factors: We have used a constant learning rate α = 0.5. While we used three different discount factors (η = 0.8, 0.5, 0.3) based on the vulnerability of the users.

One-to-Many Bipartite Matching
In the last section, distance-based optimum power was assigned to the users. However, since the user share the uplink frequency resource blocks, interference can degrade the performance of the system. Therefore, we aim to perform a pair matching technique that can cater the interference by using matching. Matching has been used extensively in various wireless network problems [27]. It has mostly found its way in problems related to resource sharing. Most matching problems are two dimensional that employ one-to-one matching like the Hungarian algorithm used in [9]. While some authors have used three-dimensional matching for resource allocation in more complex problems [28,29]. In this paper, we have used a variant of the bipartite matching known as one-to-many bipartite matching with maximum flow. The motivation behind using the one-to-many bipartite matching is inspired by the coalition formation for multi robot task in [30], where a single task is assigned to multiple robots. We have taken the key idea from here to create connections and assign weights.

Graph Structure
A single vehicle is communicating with multiple cellular users. For the sake of simplicity, we have assumed that one vehicle can communicate with up to five cellular users, at a given time. Let V = {v 1 , v 2 , . . . , v n } be the set of vehicles and C = {c 1 , c 2 , . . . , c n } be the set of cellular users. A cellular user is connected to a vehicle when it enters into a 50m radius from the vehicle, forming an edge. Let G = ({X,Y},E,W) be a bipartite graph with X vertices corresponding to cellular users and Y vertices corresponding to the vehicles. E is an edge set that consists of pairs with one vertex from each X and Y. The weight W of the edge is defined by the distance between the vehicle and cellular user. A cellular user is allocated to at most one vehicle at a time. Our main objective is to assign multiple edges to a single vertex (vehicle). The idea is illustrated in Figure 3. The vertex v1 is shared by c1, . . . , c5 and the edges between the vehicle and cellular user are denoted by the blue line.

Algorithm
The algorithm takes a bipartite graph (W(C,V)) as input. The output is an array that maps a vehicle to the cellular users. For each V that is available, a C is connected to that V (if the vehicle has less than 5 connections). A group is complete once a V connects to five Cs at a given time. We decomposed the bipartite matching problem by converting it into maximum flow problem using the greedy Ford Fulkerson maximum flow method [31]. The pseudo code is shown in Algorithm 1.

Procedure: createConnections
Inputs: W(C,V): distance relation between all V and C for each C unit do CALL "assignVeh(C)" to connect C with optimum V unit Procedure:  (2) with max weight (lowest distance from current C) Figure 3. One-to-many bipartite matching.

Algorithm
The algorithm takes a bipartite graph (W(C,V)) as input. The output is an array that maps a vehicle to the cellular users. For each V that is available, a C is connected to that V (if the vehicle has less than 5 connections). A group is complete once a V connects to five Cs at a given time. We decomposed the bipartite matching problem by converting it into maximum flow problem using the greedy Ford Fulkerson maximum flow method [31]. The pseudo code is shown in Algorithm 1.
A vehicular user acts as a central controller with a W(C,V) matrix. This matrix has rows and columns showing a distance relationship between the cellular user and a vehicular user. The matching algorithm runs on each vehicle. For every cellular user (C), the assignVeh(C) function is called. A vehicle (V) is available (V available ) if the W(C,V) matrix has an entry greater than zero and the vehicle has not been visited before. Each available vehicle is then put into three different groups based on the number of connections (Conn v ) it has with the cellular users. If number of connections is less than 1 (min_connection), available vehicle is directly assigned to Group 1. If the number of connections is more than 1 and less than 5 (max_connection), the available vehicle is assigned to Group 2. If number of connections is greater than equal to 5, than the available vehicle is assigned to Group 3. If a group is full but there is a cellular user (C') that is more vulnerable than any of the current members of the group, then it is assigned to that group with a replacement weight, replacing the least vulnerable user of that group. Once the vehicles are all assigned to the groups, the C is connected to the optimum vehicle (V optimum ). Once each group is full (maximum connections), distance wise sorting is done. Maximum weight is assigned to the C closest in distance to the vehicle.

Procedure: createConnections
Inputs: W(C,V): distance relation between all V and C for each C unit do CALL "assignVeh(C)" to connect C with optimum V unit Procedure:

Complexity Analysis
The motivation behind using this method was the time complexity. The Ford Fulkerson method decomposes the bipartite matching problem into a maximum flow problem. So, the Ford Fulkerson problem can find a maximum flow matching in bipartite graph in O(EF) time, where E is the number of edges and F is maximum flow. The complexity of finding an augmenting path is O(E) and computing the bottleneck capacity is O(F). So, we can find maximum partite matching in O(EF) time by reducing the problem to network flow.

Simulations and Results
In this section, we will check the performance of the methodologies used in this paper. In order to ensure the reliability of users according to their distances, we have assigned three different discount factors in the Q-learning based power control: very critical (v.c = 0.8), critical (c = 0.5), and normal (n = 0.3). We will check the performance in terms of vehicular speed, number of vehicular users, and the threshold SINR mentioned above. Table 4 shows the simulation parameters used in our setup. The simulation was carried out in MATLAB and Python.
An increasing speed of vehicles lowers the throughput of the V2P links. Figure 4 below shows the performance of increasing speed. The very critical and critical pedestrians still get a higher capacity compared to a normal pedestrian. The benchmark technique used for comparison is the method proposed in [9], where the pair matching is done using Hungarian Algorithm and optimum power is found using bisection search. We have used the same method in our scenario and compared the result with our proposed method.

Vehicular Speed vs V2P Capacity
For total V2P throughput we have taken the SINR of each V2P link and calculated the sum from all V2P UE. Using the equation below: An increasing speed of vehicles lowers the throughput of the V2P links. Figure 4 below shows the performance of increasing speed. The very critical and critical pedestrians still get a higher capacity compared to a normal pedestrian. The benchmark technique used for comparison is the method proposed in [9], where the pair matching is done using Hungarian Algorithm and optimum power is found using bisection search. We have used the same method in our scenario and compared the result with our proposed method.

Number of Vehicles vs System Throughput under variable SINR
The system throughput (Mbps) is calculated by the summation of the total throughput of V2P users and total throughput of CUE given by Equation (16):

Number of Vehicles vs. System Throughput under Variable SINR
The system throughput (Mbps) is calculated by the summation of the total throughput of V2P users and total throughput of CUE given by Equation (16): Cellular users with greater SINR threshold contribute better to the overall system throughput as evident from the formula and shown in Figure 5.

Number of Vehicles vs V2P Throughput
Increasing the number of vehicles in the network improves the V2P throughput especially for the critical users the Q-learning algorithm performs well as shown in Figure 6 below.

Variable Threshold SINRs of CUEs
By varying the SINR threshold, we look at the V2P link capacity under a variable vehicular

Number of Vehicles vs. V2P Throughput
Increasing the number of vehicles in the network improves the V2P throughput especially for the critical users the Q-learning algorithm performs well as shown in Figure 6 below.

Number of Vehicles vs V2P Throughput
Increasing the number of vehicles in the network improves the V2P throughput especially for the critical users the Q-learning algorithm performs well as shown in Figure 6 below.

Variable Threshold SINRs of CUEs
By varying the SINR threshold, we look at the V2P link capacity under a variable vehicular speed. At high speeds the V2P Capacity decreases. An SINR of 5dBs is therefore chosen as the base threshold, since the capacity is higher compared to other SINR thresholds as shown in Figure 7.

Variable Threshold SINRs of CUEs
By varying the SINR threshold, we look at the V2P link capacity under a variable vehicular speed. At high speeds the V2P Capacity decreases. An SINR of 5dBs is therefore chosen as the base threshold, since the capacity is higher compared to other SINR thresholds as shown in Figure 7.

Number of VUEs and Reliability
In order to ensure the reliability of the VUEs, we define the term SINR probability as the ratio of VUEs with SINR more than 5 dBs divided by the total number of VUEs.
SINR Probability = Amount of UE with SINR more than 5 db/total amount of UE Figure 8 shows that higher number of VUEs results in better SINR probability.

Discussion and Future Work
The objective of this paper was to establish a roadside scenario where vulnerable road users are classified according to their distances and allocated resources accordingly. Simulation results show that the performance of our method is efficient compared to a one-to-one matching scheme. We have  Figure 7. Vehicular Speed vs. V2P throughput.

Number of VUEs and Reliability
In order to ensure the reliability of the VUEs, we define the term SINR probability as the ratio of VUEs with SINR more than 5 dBs divided by the total number of VUEs.
SINR Probability = Amount of UE with SINR more than 5 db/total amount of UE Figure 8 shows that higher number of VUEs results in better SINR probability.

Number of VUEs and Reliability
In order to ensure the reliability of the VUEs, we define the term SINR probability as the ratio of VUEs with SINR more than 5 dBs divided by the total number of VUEs.
SINR Probability = Amount of UE with SINR more than 5 db/total amount of UE Figure 8 shows that higher number of VUEs results in better SINR probability.

Discussion and Future Work
The objective of this paper was to establish a roadside scenario where vulnerable road users are classified according to their distances and allocated resources accordingly. Simulation results show that the performance of our method is efficient compared to a one-to-one matching scheme.

Discussion and Future Work
The objective of this paper was to establish a roadside scenario where vulnerable road users are classified according to their distances and allocated resources accordingly. Simulation results show that the performance of our method is efficient compared to a one-to-one matching scheme. We have aimed to show results under varying vehicular speed and number of users. Under varying vehicular speed, our method has performed better than the benchmark [9] scheme. A higher threshold SINR for cellular users can result in better overall system throughput but can result in increased interference to the V2P links and hence decreases the capacity of these links. Therefore, an SINR threshold of 5dB was chosen. Finally, a higher number of VUEs contribute better to the SINR probability of the V2P, which ensures a better capacity of the links. In future, we want to extend this approach for other crucial scenarios like an out-of-coverage or relay assisted scenario. The integration of this scenario with mobile edge computing (MEC) technology will bring newer challenges while overcoming the latency issues. These issues include interoperability between different standardization organizations, deployment, handover, and offloading decision [32].
The resource allocation in D2D-based vehicular networks has seen a tremendous research interest in the recent past. Methods ranging from heuristic optimization [12] to machine learning [33], have been used under varying sets of users and their requirements. Machine learning, being the current hot topic in this domain; centralized vs. distributed reinforcement learning, multi-agent deep reinforcement learning [34], and other methods that could provide a faster convergence with better performance than traditional methods. Relay-assisted [35], out-of-coverage scenarios [36] and energy efficient resource allocation [37] is currently being researched upon. Resource allocation in UAV-enabled vehicular communication is another new area of research being investigated for emergency coverage [38]. Resource allocation for virtual reality (VR) content sharing in D2D multicast communication [39] is another interesting area of research where the authors have created D2D-based multicast clusters for video content sharing. This approach could be integrated in a vehicular network scenario in the future. Hybrid network architectures [40] including the already in-service DSRC along with emerging LTE V2X could call for a more complex resource management. Channel modeling for V2X need more research due to the delay in the availability of channel state information (CSI) at the base station (due to high vehicular speed). Moreover, interference mitigation techniques to overcome interference from adjacent frequencies and co-occurring technologies, need further research and consideration. Nevertheless, there is a vast room for research in the area of cellular V2X, which will facilitate in the implementation of advanced applications and become a major enabler for intelligent transportation system (ITS) and autonomous driving.

Conclusions
In this paper we have proposed a D2D-based V2P resource allocation based on the distance between vehicles and road-side pedestrians. We have performed two separate tasks after classifying the users on the basis of their vulnerability; firstly, we have performed a Q-learning-based reinforcement learning for power allocation. A one-to-many bipartite matching has been performed for pair assignment. The matching allows the users to form the connections based on their weights (depending on distance) and the reinforcement learning allocates the optimum power by giving a higher discount factor to the most vulnerable pedestrians. Users outside the D2D range are treated as normal cellular users with a minimum guaranteed quality of service. The results have shown that our method provides a higher throughput to the most vulnerable links while taking other factors such as threshold SINR, vehicular speed, and number of users into account.