# Ensemble-Based Hybrid Context-Aware Misbehavior Detection Model for Vehicular Ad Hoc Network

## Abstract

## 1. Introduction

## 2. Related Work

## 3. Materials and Methods

#### 3.1. Data Collection Phase

#### 3.2. Features Derivation Phase

#### 3.2.1. Data-Centric-Based Features

#### 3.2.2. Behavioral-Based Features Set

#### 3.3. Context Representation Phase

#### 3.4. Multifaceted Vehicles Evaluation and Pre-Detection Phase

#### 3.5. Ensemble Learning Phase

#### 3.6. Decision Phase

## 4. Performance Evaluation

#### 4.1. Experimental Setup

#### 4.1.1. Sample Selection

#### 4.1.2. Simulation of Environmental Noises

#### 4.1.3. Simulation of Message Losses

#### 4.1.4. Simulation of Misbehaving Vehicles

#### 4.2. Performance Metrics

## 5. Results

## 6. Discussion

## 7. Conclusions and Future Work

## References

**Figure 1.**The proposed misbehavior detection system model (ensemble-based hybrid context-aware misbehavior detection system (EHCA-MDS)).

**Figure 5.**Pseudocode for features derivation, context reference model construction, and the set of classification rules (HCA-MDS and DCA-MDS).

**Figure 7.**Average effectiveness of EHCA-MDS, HCA-MDS, CA-MDS, ECT-MDS and MDS baseline models in terms of the: (

**a**) Accuracy, (

**b**) False Positive Rate (FPR), (

**c**) Detection Rate (DR), and (

**d**) F-Measure.

**Figure 8.**Detailed comparison of EHCA-EC-MDS in terms of the: (

**a**) Accuracy, (

**b**) False Positive Rate (FPR), (

**c**) Detection Rate (DR), and (

**d**) F-Measure.

DFS# | Type | Name | Description | |
---|---|---|---|---|

DC1 | ${f}_{1}$ | Consistency | Latitude prediction error | Consistency-based feature: The difference between the received latitude with the predicted one using a Kalman filter in the DSA-ABR algorithm. |

DC2 | ${f}_{2}$ | Longitude prediction error | The difference between the received longitudes with the predicted one using a Kalman filter in the DSA-ABR algorithm. | |

DC3 | ${f}_{3}$ | Latitude speed Prediction error | Speeds errors towards latitude: the rate of latitude prediction error | |

DC4 | ${f}_{4}$ | Longitude speed prediction error | Predicted speed errors towards longitude: the rate of change of longitude prediction error | |

DC5 | ${f}_{5}$ | Plausibility | Communication Range | It is a plausibility-based feature, which is the distance between the sender and receiver. The distance between two vehicles can be calculated using the Euclidean distance formula distance function (e.g., Euclidean distance) [72]. |

DC6 | ${f}_{6}$ | Vehicle appearance distance | Similar to ${f}_{5}$ but it is calculated one time when a new vehicle enters the communication range of the subject vehicle | |

DC7 | ${f}_{7}$ | Overlapping Frequency | The number of overlap times with the neighboring vehicles |

BF# | Feature Name | Description | |
---|---|---|---|

BF1 | ${f}_{8}$ | Connection Length | The time difference between the first received mobility message from each neighboring vehicle and the current time epoch. |

BF2 | ${f}_{9}$ | Received Messages | The total number of messages that are received from each neighbor. |

BF3 | ${f}_{10}$ | Broadcasting Rate | The moving average of the number of received messages divided by the connection length. |

BF4 | ${f}_{11}$ | Broadcasting delay | The moving average of the differences between time of receiving the messages by the neighboring vehicles and their creation time in the subject vehicle. |

BF5 | ${f}_{12}$ | Jerk acceleration | The rate of change of acceleration |

BF6 | ${f}_{13}$ | Speed deviation | The divergence between the sender’s average speed and the median speed of all neighboring vehicles |

Symbol | Description |
---|---|

${f}_{e\left(k\right)}^{{v}_{i}}$ | Consistency features: innovation errors of vehicle (${v}_{i})$ at time epoch ($k$) |

${y}_{k}^{{v}_{i}}$ | Received mobility data |

${\stackrel{\u02c7}{y}}_{k|k-\tau \left(i\right)}^{{v}_{i}}$ | Predicted mobility data by Kalman filter |

${d}_{\left(i,j\right)}$ | The distance between the vehicle ${v}_{i}$ at position ${p}_{i}\left({x}_{i},{y}_{i}\right)$ and vehicle ${v}_{j}$ at position ${p}_{j}\left({x}_{j},{y}_{j}\right)$ |

$\tau \left(i\right)$ | The time epoch of last received mobility data |

${f}_{cr\left(k\right)}^{{v}_{i}}$ | Range-based features: the distance between the current vehicle and the neighboring vehicle |

$o{s}_{k}^{{v}_{i}}$ | Overlapping-based feature |

${d}_{min\left(i,j\right)}$ | The minimum accepted distance between vehicles ${v}_{i}and$ ${v}_{j}$ |

${f}_{j\left(k\right)}^{{v}_{i}}$ | Feature number $j$ for vehicle ${v}_{i}$ at time epoch $k$ |

${f}_{b\left(k\right)}^{{v}_{i}}$ | Behavioral features of the vehicle ${v}_{i}$ at time epoch $k$ |

$CR{M}_{\left(k\right)}$ | Context-reference model parameters |

${\varnothing}_{k}$ | Median of the vehicles’ temporal summaries |

${\delta}_{k}$ | The median absolute deviation of vehicles’ temporal summaries |

$HU{B}_{k}$ | Hampel upper bound, also called $CRU{B}_{\left(k\right)}$ context reference upper bound |

$HL{B}_{k}$ | Hampel lower bound, also called $CRL{B}_{\left(k\right)}$ context-reference lower bound |

$\beta $ | Tuning parameters |

${z}_{{f}_{j}\left(k\right)}^{{v}_{i}}$ | Vehicle ${v}_{i}$ Hampel-based Z-score with respect to feature ${f}_{j\left(k\right)}^{{v}_{i}}$ at time epoch $k$ |

${o}_{{f}_{j}\left(k\right)}^{{v}_{i}}$ | Classification rule |

Dataset | Host Vehicle Id | Vehicle Regime | Average Speed (m/s) | Duration (s) | Dataset Size CAMs | Total Neighbors |
---|---|---|---|---|---|---|

DS1 | 13 | Free-Flow | 16.8 | 94.8 | 113,258 | 177 |

DS2 | 252 | Free-Flow | 23.3 | 55.8 | 155,908 | 255 |

DS3 | 455 | Free-Flow | 21.6 | 60.3 | 145,904 | 260 |

DS4 | 2280 | Free-Flow | 24.0 | 54.1 | 197,511 | 270 |

DS5 | 5 | Lane-Change | 22.4 | 70.2 | 80,568 | 119 |

DS6 | 1133 | Lane-Change | 31.7 | 39.3 | 107,565 | 214 |

DS7 | 1687 | Lane-Change | 22.4 | 76.5 | 110,051 | 314 |

DS8 | 1 | Lane-Change | 18.0 | 88.4 | 88,971 | 134 |

DS9 | 268 | Flowing-Mode | 26.2 | 58.5 | 156,941 | 255 |

DS10 | 1066 | Flowing-Mode | 33.0 | 47.5 | 111,305 | 225 |

DS11 | 1964 | Flowing-Mode | 21.5 | 72.9 | 223,211 | 317 |

DS12 | 7 | Flowing-Mode | 22.5 | 71.1 | 85,244 | 127 |

DS13 | 1593 | Flowing-Mode | 21.1 | 74.2 | 186,260 | 294 |

DS14 | 2885 | Random-Flow | 16.5 | 94.5 | 150,127 | 200 |

DS15 | 1899 | Random-Flow | 19.9 | 78.8 | 231,867 | 331 |

DS16 | RSU | Mixed | 28.0 | 57.3 | 479,823 | 284 |

Model | Accuracy% | FPR% | DR% | Precision% | Recall% | F-Measure% |
---|---|---|---|---|---|---|

EHCA–MDS (Proposed) | 97.01 | 1.19 | 90.45 | 95.32 | 90.45 | 92.82 |

HCA–MDS (Proposed) | 93.51 | 4.45 | 86.11 | 85.00 | 86.11 | 84.44 |

DCA–MDS (Proposed) | 90.98 | 2.33 | 66.18 | 89.19 | 66.18 | 75.05 |

Bissmeyers’ ECT-MDS [20] | 74.79 | 2.98 | 30.65 | 83.50 | 30.65 | 44.49 |

Stübing’s MDS [58] | 87.37 | 4.79 | 62.55 | 86.91 | 62.55 | 71.60 |

Model | Accuracy | FPR | DR | Precision | Recall | F-Measure |
---|---|---|---|---|---|---|

HCA-MDS (Proposed) | 3.5 | −3.26 | 4.34 | 10.32 | 4.34 | 8.38 |

(3.74%) | (73.26%) | (5.04%) | (12.14%) | (5.04%) | (9.92%) | |

DCA-MDS (Proposed) | 6.03 | −1.14 | 24.27 | 6.13 | 24.27 | 17.77 |

(36.67%) | (36.67%) | (36.67%) | (36.67%) | (36.67%) | (36.67%) | |

Bissmeyers’ ECT-MDS [20] | 22.22 | −1.79 | 59.80 | 11.82 | 59.80 | 48.33 |

(29.71%) | (60.07%) | (195.11%) | (14.16%) | (195.11%) | (108.63%) | |

Stübing’s MDS baseline [58] | 9.64 | −3.60 | 27.90 | 8.41 | 27.90 | 21.22 |

(11.03%) | (75.16%) | (44.60%) | (9.68%) | (44.60%) | (29.64%) |

Model | Ensemble | Hybrid * | Context-Aware | Data-Centric | Trust-Based | Performance |
---|---|---|---|---|---|---|

EHCA-MDS | √ | √ | √ | √ | 92.82 | |

HCA-MDS | √ | √ | √ | 84.44 | ||

DCA-MDS | √ | √ | √ | 75.05 | ||

ECT-MDS | √ | √ | √ | 71.60 | ||

MDS Baseline | √ | 44.49 |

