# On the Impact of the Rules on Autonomous Drive Learning

## Abstract

## 1. Introduction

## 2. Related Works

## 3. Model

#### 3.1. Road Graph

#### 3.2. Cars

#### 3.3. Drivers

#### 3.3.1. Observation

#### 3.3.2. Action

#### 3.4. Rules

#### 3.4.1. Intersection Rule

#### 3.4.2. Distance Rule

#### 3.4.3. Right Lane Rule

#### 3.5. Reward

#### 3.6. Policy Learning

## 4. Experiments

- (a)
- cars are kept with status $s=\mathrm{dead}$ in the road graph for ${t}_{\mathrm{dead}}$ time steps, and then are removed; and
- (b)
- cars are kept with status $s=\mathrm{dead}$ in the road graph for ${t}_{\mathrm{dead}}$ time steps, and then their status is changed back into $s=\mathrm{alive}$.

## 5. Results

#### Robustness to Traffic Level

## 6. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

**Figure 4.**Training results with cars removed after ${t}_{\mathrm{dead}}$ time steps. Here, we draw the training values of R, E, and C, at a certain training episode, averaged on ${n}_{\mathrm{trial}}$ experiments. We indicate with solid lines the mean of R, E, and C among the ${n}_{\mathrm{car}}$ vehicles, and with shaded areas their standard deviation among the ${n}_{\mathrm{car}}$ vehicles.

**Figure 5.**Training results with cars restored after ${t}_{\mathrm{dead}}$ time steps. Here, we draw the training values of R, E, and C, at a certain training episode, averaged on ${n}_{\mathrm{trial}}$ experiments. We indicate with solid lines the mean of R, E, and C among the ${n}_{\mathrm{car}}$ vehicles, and with shaded areas their standard deviation among the ${n}_{\mathrm{car}}$ vehicles.

**Figure 6.**Overall number of collisions in the simulation against the overall traveled distance in the simulation, averaged across simulations with the same ${n}_{\mathrm{car}}$. Each dot is drawn from the sum of the values computed on the ${n}_{\mathrm{car}}$ vehicles.

Param | Meaning | Value |
---|---|---|

${l}_{\mathrm{car}}$ | Car length | 7 |

${t}_{\mathrm{coll}}$ | Impact duration | 10 |

${t}_{\mathrm{dead}}$ | Collision duration | 20 |

${d}_{\mathrm{view}}$ | Driver’s view distance | 50 |

${v}_{\mathrm{max}}$ | Driver’s maximum speed | 50 |

${a}_{\mathrm{max}}$ | Driver’s acceleration (deceleration) | 2 |

$\Delta t$ | Time step duration | 0.2 |

$\left|S\right|$ | Number of road sections | 12 |

$\left|I\right|$ | Number of road intersections | 9 |

$w\left(p\right),p\in G$ | Number of lanes | $\in \{1,2\}$ |

$l\left(p\right),p\in S$ | Section length | 100 |

${n}_{\mathrm{car}}$ | Cars in the simulation | 40 |

T | Simulation time steps | 500 |

Param | Meaning | Value |
---|---|---|

${n}_{\mathrm{trial}}$ | Number of trials | 20 |

${n}_{\mathrm{train}}$ | Training iterations | 500 |

${n}_{\mathrm{car}}$ | Cars in the simulation | 40 |

$\gamma $ | Discount factor | 0.999 |

