Intruder Detection in VANET Data Streams Using Federated Learning for Smart City Environments

Arya, Monika; Sastry, Hanumat; Dewangan, Bhupesh Kumar; Rahmani, Mohammad Khalid Imam; Bhatia, Surbhi; Muzaffar, Abdul Wahab; Bivi, Mariyam Aysha

doi:10.3390/electronics12040894

Open AccessArticle

Intruder Detection in VANET Data Streams Using Federated Learning for Smart City Environments

by

Monika Arya

¹,

Hanumat Sastry

²

,

Bhupesh Kumar Dewangan

³

,

Mohammad Khalid Imam Rahmani

^4,*

,

Surbhi Bhatia

⁵

,

Abdul Wahab Muzaffar

^4,*

and

Mariyam Aysha Bivi

⁶

¹

Department of Computer Science and Engineering, Bhilai Institute of Technology, Durg 496001, India

²

School of Computer Science, University of Petroleum and Energy Studies, Dehradun 248007, India

³

Department of Computer Science and Engineering, OP Jindal University, Raigarh 469109, India

⁴

College of Computing and Informatics, Saudi Electronic University, Riyadh 11673, Saudi Arabia

⁵

Department of Information Systems, College of Computer Science and Information Technology, King Faisal University, Al-Ahsa 31982, Saudi Arabia

⁶

Department of Computer Science, College of Computer Science, King Khalid University, Gregar, Abha 62529, Saudi Arabia

^*

Authors to whom correspondence should be addressed.

Electronics 2023, 12(4), 894; https://doi.org/10.3390/electronics12040894

Submission received: 9 January 2023 / Revised: 30 January 2023 / Accepted: 1 February 2023 / Published: 9 February 2023

(This article belongs to the Section Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

Vehicular networks improve quality of life, security, and safety, making them crucial to smart city development. With the rapid advancement of intelligent vehicles, the confidentiality and security concerns surrounding vehicular ad hoc networks (VANETs) have garnered considerable attention. VANETs are intrinsically more vulnerable to attacks than wired networks due to high mobility, common network medium, and lack of centrally managed security services. Intrusion detection (ID) servers are the first protection layer against cyberattacks in this digital age. The most frequently used mechanism in a VANET is intrusion detection systems (IDSs), which rely on vehicle collaboration to identify attackers. Regrettably, existing cooperative IDSs get corrupted and cause the IDSs to operate abnormally. This article presents an approach to intrusion detection based on the distributed federated learning (FL) of heterogeneous neural networks for smart cities. It saves time and resources by using the most efficient intruder detection approach. First, vehicles use a federated learning technique to develop local, deep learning-based IDS classifiers for VANET data streams. They then share their locally learned classifiers upon request, significantly reducing communication overhead with neighboring vehicles. Then, an ensemble of federated heterogeneous neural networks is constructed for each vehicle, including locally and remotely trained classifiers. Finally, the global ensemble model is again shared with local devices for their updating. The effectiveness of the suggested method for intrusion detection in VANETs is evaluated using performance indicators such as attack detection rates, classification accuracy, precision, recall, and F1 scores over a ToN-IoT data stream. The ID model shows 0.994 training and 0.981 testing accuracy.

Keywords:

smart city; deep learning; machine learning; VANETs; intrusion detection system; data streams; federated learning; classification

Graphical Abstract

1. Introduction

As an integral component of smart cities, vehicular ad hoc networks (VANETs) are a fast-emerging research field in which the research community and industry collaborate. Vehicular networks improve quality of life, security, and safety, making them crucial to smart city development [1]. The primary objective is to provide cutting-edge communication technologies and infrastructures for automobiles to communicate with one another [2,3]. The concept of VANETs has only lately been presented and implemented as a subcategory of wireless sensor networks to provide intelligent solutions for traffic control in a modernized world [4]. Intelligent vehicles and mobile ad hoc networks (MANETs) come together in VANETs as intelligent technology. Vehicle-to-vehicle communication is the goal of VANETs, which are networks in which the communication units (vehicles) move constantly. VANETs rely on the distribution of information to facilitate the establishment of cooperative relationships between vehicles. Collaboration and information exchange is required to ensure safety and other VANET applications [5]. Spatiotemporal difficulties resulting from VANETs affect their effectiveness in traffic management systems, affecting accuracy and the user experience [4]. Multiple data streams, both homogeneous and heterogeneous, all across the globe are processed by VANET systems. By analyzing these data streams, they can be automatically forecasted, controlled, and used to make decisions. Thus, analyzing raw data from VANETs to identify activity patterns, outliers, clustering, and classification is a new research application for constructing intelligent traffic management systems.

In early 2000, the idea of VANETs was first proposed. Non-safety and safety-related VANET applications are the two most common. For example, the transmission of information on traffic jams and other hazards, such as accidents and sharp turns, was one of the applications included in safety. Infotainment and traffic management are the primary emphases of non-safety applications. VANETs are ad hoc networks that spontaneously form when vehicles interact, allowing two-way communication. Ad hoc wireless networks can be built in the first example when the vehicle communicates with another car without any infrastructure support. Vehicle-to-vehicle interaction is a term for this type of contact. On-board units (OBUs) and application units (AUs) are installed in vehicles, allowing them to connect and form a network. AUs feature a software entity to manage communications, whereas OBUs are hard-ware devices. Road side units (RSUs) are used in VANETs to communicate with the infrastructure, as shown in Figure 1.

RSUs connect cars and infrastructure, as well as other networks, by providing communications. RSUs can communicate with each other and generate a global picture of the route. Even though VANETs offer a meaningful solution to current transportation and traffic concerns, there are many obstacles to overcome.

VANETs have inherent flaws that cause these difficulties. For example, they have high mobility, intermittent connectivity, a large available network, wireless communications, and time-bound message delivery. Security, network administration, and environmental interference are only some of the challenges VANETs confront because of these qualities. They also face a variety of social and economic issues.

Threats come hand in hand with technological advancement. VANETs are susceptible to various security threats, just like other wireless networks. Security in VANETs is a major concern that has piqued the interest of many researchers and academics. When it comes to VANETs, a little security compromise can have a significant impact because human lives are at stake in this situation. VANETs deploy intrusion detection systems (IDSs) to track down any malicious activity occurring within the network [6]. An IDS analyzes the network and detects any attempts or actual intrusions into the network, so appropriate measures can be performed to avoid damage from occurring.

However, the heterogeneity of VANET data sources poses many challenges for the ID system. One such challenge can be the incomplete datasets collected and stored from various sources. To detect the pattern of intruders in VANETs, data-driven approaches, such as ML, require a large dataset. An incomplete dataset can result in poor performance of the models. To address these issues, distributed learning methods that integrate several learners in the same environment to enhance the dataset are becoming a potential approach; currently, FL is a critical methodology [7]. FL was created to maximize the utilization of distributed data among learners while maintaining the privacy of each learner’s data [8]. This decentralized way of making train models is good for privacy, security, government regulation, and the economy [9]. FL is often divided into initialization, local training, and global aggregation. The FL server establishes the training set, data requirements, and participants during initialization. After initialization, each participant trains a local model with its data using the initialization information from the local training process. Finally, in the global aggregation step, each participant uploads the parameters, such as weights, of its local model to the FL server after averaging them. After that, the server averages the models and updates the participants with the resulting global model.

In this work, we incorporated the idea of federated learning, where the ML algorithm is trained on distributed local devices or servers. These local edge devices retain local data, eliminating the need to exchange data. The most prominent advantage of federated learning is that identifiable data are permanently restricted to remote edge devices, ensuring data privacy throughout learning. In the proposed work, a 1D convolution neural network (CNN)-based information extractor is used as the initial component of the federated learning-based CNN framework. CNNs have been chiefly employed in computer vision applications such as picture categorization and target identification. CNNs have a high capacity to learn the spatial features of individual pixels in a picture [10]. Therefore, we plan to employ a CNN to identify features that assist machines in learning from a wide range of spatial data. As the original traffic structure is a sequence ordered by a hierarchy rather than an image, we chose a 1D CNN over the popular 2D version. The relevant work based on a CNN’s capacity to classify traffic demonstrates that a 1D CNN is more accurate than a 2D CNN. High communication costs and data heterogeneity are common problems for current FL methods as they employ a traditional loss function for local model updating and treat each local model equally when aggregating them into a global model [11]. In the traditional FL approach, the same weight is applied to all the clients of the FL model when averaging them for the server federated model. However, as the dataset used to train the clients may contain a different quantity of data for each output class than the global training dataset, it is appropriate to weight each model accordingly [8]. As a result, the same weight should not be applied to all models when averaging them. So far, the present approaches cannot combine data from heterogeneous models. To overcome this limitation, in this work, instead of averaging the weights as in the traditional FL approach, the weight optimization is performed using a nature-inspired particle swarm optimization (PSO) algorithm. PSO is capable of solving complex optimization problems. The weight optimization provides a federated learning approach that may merge models with heterogeneous structures.

The predictions of clients’ FL models are integrated using the ensemble technique to obtain a more accurate prediction. As a result, the ensemble approach provides more accurate predictions than a single model.

1.1. Motivation

Every day, vehicular traffic is expanding at a frightening rate. Due to this, the demand for an intelligent transportation system has steadily increased. Unfortunately, as vehicular traffic increases, the accident rate, continuous heavy traffic, pace of public investment, amount of fuel consumed, number of pollutants released into the environment, and other negative consequences also increase. To solve these issues, a well-managed system is required. Researchers have suggested VANETs as one of the potential answers. Several ad hoc networks use automobiles as nodes to connect to other nodes, and VANETs are one of those nodes. VANETs are part of the ad hoc network family.

1.2. Contribution and Novelty

We propose a VANET ID system based on DL and ensemble FL. We incorporate the idea of FL, where the ML algorithm is trained on distributed local devices or servers.
The nature-inspired PSO algorithm is used to optimize the weight of the server in the FL approach.
In addition, to improve the accuracy of the proposed framework, predictions from client FL models are added using the ensemble learning method.
We use a realistic data stream called ToN-IoT, as most currently conducted studies are on the NSL-KDD and KDD-CUP99 datasets. These databases do not include recent attacks. In contrast, the ToN-IoT data stream was compiled from an IoT network of varying sizes and complexity.

1.3. Organization of the Paper

The remaining sections of the paper are organized as follows: Section 2 discusses related works of the ID system used for the VANET. The stepwise methodology for the suggested approach is explained in Section 3. The experiments and their outcomes are presented in Section 4. In the last section, the suggested approach is summarized along with future research directions.

2. Related Work

ML techniques were used in [12] to build an ID-based network that was very secure. The authors used ToN-IoT data collected from an extensive, diverse network of IoT devices. [10] A CNN-based framework was established; the network learns features from multiple resolutions and integrates them by combining regional and global information effectively and adaptively, allowing it to be used in various applications. A combination of correlation-based feature selection and ensemble classifiers was used in [13] to improve the performance of an intrusion detection system. The authors of [14] proposed a hybrid algorithm for VANET intrusion detection based on the SVM kernel. Several data mining techniques were reviewed in [15] for analyzing and mining VANET data streams. The authors propose trust-based collaborative intrusion detection. The cars collect real-time network traffic and analyze it based on the local IDS agent in the vehicles. The K-NN nonlinear classifier with k-nearest neighbors is included. An ID framework called AECFV was presented in [6] that tries to protect a network from the most harmful attacks possible. Node mobility and network vulnerability are considered when forming clusters in this system. When constructing a cluster, care must be taken to ensure it is both stable and well-connected. Cluster-heads (CHs) are chosen depending on the trustworthiness of the vehicle and the mobility of the nodes within it. As part of [16], a systematic and adaptive ID system is developed to identify and categorize unanticipated hostile assaults using robust classical ML classifiers built on Spark MLlib (the machine learning library). Using KNN and SVM algorithms, a machine learning technique was described in [17] to group and classify VANET incursions. It is formulated by analyzing the offset ratio and time gap between the CAN message requests and the CAN messages that respond. Table 1 compares recent and related work in intrusion detection in VANETs.

The research gaps identified by studying the relevant and recent works can be summarized as follows:

Due to their high mobility, common network medium, and lack of centrally managed security services provided by dedicated equipment such as firewalls and authentication servers, the data streams generated by VANETs are intrinsically more vulnerable to attacks than wired networks.
Current IDSs can only detect unusual activity within a network’s subnets, not the full VANET.
IDSs continue to face a significant problem in managing the ever-increasing volume of vehicle-related data in urban environments.
VANETs and their integration with critical systems that need to store, send, archive, and obtain data from networks quickly are still affected by network security issues in a big way.
Inadequate privacy protection mechanisms make it difficult for users to share information and prevent nodes from working cooperatively.
Due to the distinctive characteristics of VANETs, such as a wide geographic scope and significant node mobility, it takes a long time to query and update the reputation score, and it is challenging to meet real-time criteria for intrusion detection.

3. Proposed Framework

In the proposed framework, an approach to intrusion detection based on the distributed FL of heterogeneous neural networks for smart cities is suggested. In the initial stage, vehicles use a federated learning technique to develop local, deep learning-based IDS classifiers for VANET data streams. Mathematically, it can be expressed as:

\arg \min L (x, y, w) = \sum_{k} p_{k} L_{k} (x, y, w)

(1)

where

p_{k}

is the value of the weight of the kth client, in a federated, decentralized scenario, and F₁, F₂,………F_k represent the multi-user using datasets D₁, D₂,…………D_k, respectively. They then share their locally learned classifiers upon request, significantly reducing communication overhead with neighboring vehicles. Then, an ensemble of federated heterogeneous neural networks is constructed for each vehicle, including locally and remotely trained classifiers. Finally, the global ensemble model is again shared with local devices for updating.

There are four major stages in the ID system that have been proposed. A pre-processing step is the initial step of the proposed method. The second stage is federated learning, which permits the training of a high-performance shared CNN utilizing a distributed learning approach on decentralized data. Further, the PSO optimization technique is employed to obtain optimal parameters for a centralized federated server. The next stage in the suggested strategy is to combine heterogeneous models into a single model. As there is no evident mapping between the parameters in the distinct models, a simple averaging of the weights of the client model cannot yield the weight of the server model. We use a weighted average ensemble for this purpose. The federated ensemble CNN model is then used in the third stage of the process to look for intrusion. Finally, the alarm module in this proposed technique analyzes the proposed ID system and detects any potentially harmful incoming network data.

The detailed descriptions of each stage are as follows:

3.1. Stage 1—Pre-Processing the Heterogeneous Data for Individual Clients of the FL Model

Preparation is critical before feeding data into machine learning systems to achieve a good performance [25]. We encountered numerous difficulties with the dataset in our trials, including incomplete data, category features, and class imbalance. In addition, some superfluous features may affect the performance of the machine learning approach employed. Based on the literature, several preparation strategies were performed and tested for the selected machine learning methods, which used mutations of different preparatory and standardization approaches.

3.2. Stage 2—Training of Different Client Models on Edge Devices

In the proposed approach, a 1D CNN is used as a client model for FL. The clients get trained on heterogeneous local data sets. The trained model detects abused traffic and categorizes it as relevant or intrusive. The suggested solution combines the advantages of a CNN and distributed federated learning. A CNN is composed of two primary units, as shown in Figure 2; one for classifying and the other for extracting features. The unit for extracting features consists of two layers: convolutional and pooling. The feature extraction unit provides input to the classification unit. The encoder uses convolutional layers, and the decoder uses deconvolutional layers. Upon completion of the decoding process, the SoftMax classifiers are applied for a class probability distribution. The model, after training, is evaluated here to see if it is maliciously using the test set as an input. The weights of the trained client models are then aggregated to update the weight of the centralized server model.

3.3. Stage 3—Weighted Ensemble-Based Aggregation of Client Models

The weighted ensemble-based aggregation approach is applied to the heterogeneous client model in this step. As no evident mapping exists between the parameters of the various models, traditional approaches for aggregation, such as federated average (FedAVG) and federated stochastic gradient descent (FedSGD), cannot be used. Instead, the weighted average combines all of the models, which can lead to a less-than-ideal performance if the local training datasets are biased and spread out across the edge devices. Equation 2 can be used to represent the process.

y = \sum_{l = 1}^{N} \sum_{m = 1}^{C} α_{l m} \cdot M_{l} (x)

(2)

where

α_{l m}

is the weight for the lth model and mth class,

x is the input, and

y is the final output.

3.4. Stage 4—Tuning Weight of Server Model Using PSO Optimization Algorithm

The PSO optimization algorithm is used to fine-tune the weight

α

of the weighted average ensemble [26]. During this process, α is set to a fixed value. Then, based on the past data of parameters and obtained accuracy, the PSO optimizer suggests a new candidate for α. The final output is calculated by multiplying the weight

α

with the output vectors

M_{i}

from each model. At last, the ensemble model’s accuracy is computed, and the results are fed back into the optimizer. This process is carried out a predetermined number of times. The overall proposed federated learning process is shown in Figure 3.

3.5. Stage 5—Alarm

The alert module is the last level of the suggested ID system. The final ID system component alerts the administrator or end-user to a problem in the network.

3.6. Stage 6—Model Evaluation

The effectiveness of the suggested model is assessed using metrics such as accuracy, precision, recall, F1 scores, and false-positive rates. Figure 4 summarizes the above processes for training the federated learning-based CNN model for intrusion detection in VANETs using ToN-IoT datasets.

Figure 4 shows a flowchart for the presented framework.

The overall algorithm to implement the proposed framework is given in Algorithm 1.

Algorithm 1: Algorithm for intrusion detection based on the distributed FL of heterogeneous neural networks

Input:

w_{t}

, N- number of epochs, n—number of batches
Output:

w_{t + 1}^{k}

updated weight
1: Receive

w_{t}

from server
2:

w_{t + 1}^{k}

\leftarrow w_{t}

3: for e← 1 to N, Do
4: for b ← 1 to n Do
5:

w_{t + 1}^{k} \leftarrow w_{t + 1}^{k} - η g k

6: End
7: End
8: Send

w_{t + 1}^{k}

to server
9: for i ← 1 to N Do
10: for j←1 to C Do
11:

y = \sum \propto_{i, j} . m_{i} (x)

12: End
13: End
14: Optimize weighted ensembled weight using PSO optimization algorithm
15: Update the server using

w_{t + 1}^{k}

16: Send the updated weight

w_{t + 1}^{k}

to all the clients
17: END

4. Experiments and Results

4.1. Dataset Description

The effectiveness of the suggested method for intrusion detection in VANETs is evaluated in this section using the ToN-IoT data stream. The dataset used is ToN-IoT [7]. ToN-IoT collects data from linked devices, Linux and Windows operating systems, and IoT network traffic. The data came from a medium-sized IoT network. A labelled column representing attack or normal behaviour was added to the ToN-IoT datasets, with subcategories for attack type. It identifies the different attacks, such as ransomware, password assault, and man-in-the-middle (MITM). On the IoT network, these assaults targeted numerous IoT sensors. Suitable evaluation parameters provide the best results when evaluating the model.

4.2. Experimental Setup

The experiments that formed the basis of this work used Python version 3.8, a Windows 10 operating system with a 7th generation Core processor, and 16 GB of RAM to run all of the trials.

4.3. Results and Discussion

The results obtained from the proposed framework using the ToN-IoT data stream and their comparison with the benchmark ML algorithms, such as linear regression (LR), naïve Bayes (NB), decision tree (DT), random forest (RF), K-nearest neighbor (KNN), and support vector machine (SVM), are presented in Table 2. The results obtained from the proposed approaches were utilized to evaluate their efficiency and effectiveness.

Figure 5 shows the performance comparison of benchmark algorithms with the proposed work over the ToN-IoT data stream.

The graph shows that the proposed framework outperforms the existing benchmark techniques for intrusion detection in VANETs. The graph in Figure 6 compares the training and testing accuracy.

The graph shows that the training and testing accuracy of the proposed framework is nearly equal, which indicates that the model is not overfitting.

The proposed approach is also compared with some recent state-of-art methods for intrusion detection in terms of accuracy in Table 3.

Figure 7 shows the performance comparison of state-of-art methods with the proposed work over the ToN-IoT data stream.

The federated learning approach to train the ML model and the ensemble approach to integrate the predictions from various client federated learning ML models have significantly improved the performance of the proposed intrusion detection system compared to benchmark algorithms as well as state-of-art methods.

5. Conclusions and Future Scope

A new IDS for VANETs based on federated learning and a CNN for the ToN-IoT dataset were presented in this paper. This dataset includes network records with class imbalance and missing values. The proposed approach can cover a larger number of attacks than previous work that used traditional datasets such as KDD-CUP99. Compared to their performance, the suggested model and state-of-art machine learning algorithms are evaluated using a range of assessment measures, including recall, precision, training and testing accuracy, F1 scores, and FPR. After examining the various machine learning approaches, it can be concluded that the proposed framework outperformed all other machine learning methods. In our research, we can deploy the model that can achieve better performance in Apache Spark and Kafka. It is also possible to apply optimization methods inspired by nature for dimensionality reduction.

Author Contributions

Conceptualization, M.A., H.S. and B.K.D.; methodology, M.A., M.K.I.R. and S.B.; software, H.S., M.K.I.R., A.W.M. and B.K.D.; validation, M.A., S.B. and M.A.B.; formal analysis, M.A., H.S., S.B. and B.K.D.; investigation, M.A., H.S., A.W.M., M.K.I.R. and B.K.D.; resources, S.B., A.W.M. and M.A.B.; data curation, B.K.D., M.K.I.R. and S.B.; writing—original draft preparation, M.A., H.S. and B.K.D.; writing—review and editing, S.B., M.K.I.R., A.W.M. and M.A.; visualization, S.B. and M.A.B.; funding acquisition, A.W.M. and M.K.I.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Links to online repositories for dataset: https://research.unsw.edu.au/projects/toniot-datasets (accessed on 3 March 2021).

Conflicts of Interest

The authors declare no conflict of interest.

References

Reis, J.; Marques, P.A.; Marques, P.C. Where Are Smart Cities Heading? A Meta-Review and Guidelines for Future Research. Appl. Sci. 2022, 12, 8328. [Google Scholar] [CrossRef]
Esashika, D.; Masiero, G.; Mauger, Y. An investigation into the elusive concept of smart cities: A systematic review and meta-synthesis. Technol. Anal. Strateg. Manag. 2021, 33, 957–969. [Google Scholar] [CrossRef]
Soyturk, M.; Muhammad, K.N.; Avcil, M.N.; Kantarci, B.; Matthews, J. From Vehicular Networks to Vehicular Clouds in Smart Cities; Elsevier Inc.: Amsterdam, The Netherlands, 2016; ISBN 9780128034637. [Google Scholar]
Liang, W.; Li, Z.; Zhang, H.; Wang, S.; Bie, R. Vehicular Ad Hoc networks: Architectures, research issues, methodologies, challenges, and trends. Int. J. Distrib. Sens. Networks 2015, 2015, 745303. [Google Scholar] [CrossRef]
Pattnaik, O.; Pattanayak, B.K. Security in vehicular ad hoc network based on intrusion detection system. Am. J. Appl. Sci. 2014, 11, 337–346. [Google Scholar] [CrossRef]
Sedjelmaci, H.; Senouci, S.M. An accurate and efficient collaborative intrusion detection framework to secure vehicular networks. Comput. Electr. Eng. 2015, 43, 33–47. [Google Scholar] [CrossRef]
Brendan McMahan, H.; Moore, E.; Ramage, D.; Hampson, S.; Agüera y Arcas, B. Communication-efficient learning of deep networks from decentralized data. Proc. 20th Int. Conf. Artif. Intell. Stat. 2017, 54, 1273–1282. [Google Scholar]
Uddin, M.P.; Xiang, Y.; Lu, X.; Yearwood, J.; Gao, L. Mutual Information Driven Federated Learning. IEEE Trans. Parallel Distrib. Syst. 2021, 32, 1526–1538. [Google Scholar] [CrossRef]
Zeng, Y.; Mu, Y.; Yuan, J.; Teng, S.; Zhang, J.; Wan, J.; Ren, Y.; Zhang, Y. Adaptive Federated Learning with Non-IID Data. Comput. J. 2022. [Google Scholar] [CrossRef]
Maggiori, E.; Tarabalka, Y.; Charpiat, G.; Alliez, P. High-Resolution Aerial Image Labeling with Convolutional Neural Networks. IEEE Trans. Geosci. Remote Sens. 2017, 55, 7092–7103. [Google Scholar] [CrossRef]
Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. Federated Optimization in Heterogeneous Networks. Proc. Mach. Learn. Syst. 2018, 2, 429–450. [Google Scholar]
Gad, A.R.; Nashat, A.A.; Barkat, T.M. Intrusion Detection System Using Machine Learning for Vehicular Ad Hoc Networks Based on ToN-IoT Dataset. IEEE Access 2021, 9, 142206–142217. [Google Scholar] [CrossRef]
Qi, H.; Xiao, S.; Shi, R.; Ward, M.O.; Chen, Y.; Tu, W.; Su, Q.; Wang, W.; Wang, X.; Zhang, Z. Enhanced Reader.pdf. Nature 2018, 388, 539–547. [Google Scholar]
Adhikary, K.; Bhushan, S.; Kumar, S.; Dutta, K. Hybrid Algorithm to Detect DDoS Attacks in VANETs. Wirel. Pers. Commun. 2020, 114, 3613–3634. [Google Scholar] [CrossRef]
Mohammed AL Zamil, S.S. Applications of Data Mining Techniques for Vehicular Ad hoc Networks. arXiv 2018, arXiv:1807.02564. [Google Scholar]
Khan, M.A.; Kim, J. Toward developing efficient Conv-AE-based intrusion detection system using heterogeneous dataset. Electronics 2020, 9, 1771. [Google Scholar] [CrossRef]
Alshammari, A.; Zohdy, M.A.; Debnath, D.; Corser, G. Classification Approach for Intrusion Detection in Vehicle Systems. Wirel. Eng. Technol. 2018, 9, 79–94. [Google Scholar] [CrossRef]
Shu, J.; Zhou, L.; Zhang, W.; Du, X.; Guizani, M. Collaborative Intrusion Detection for VANETs: A Deep Learning-Based Distributed SDN Approach. IEEE Trans. Intell. Transp. Syst. 2021, 22, 4519–4530. [Google Scholar] [CrossRef]
Bangui, H.; Ge, M.; Buhnova, B. A hybrid machine learning model for intrusion detection in VANET. Computing 2022, 104, 503–531. [Google Scholar] [CrossRef]
Bangui, H.; Ge, M.; Buhnova, B. A hybrid data-driven model for intrusion detection in VANET. Procedia Comput. Sci. 2021, 184, 516–523. [Google Scholar] [CrossRef]
Zhang, T.; Zhu, Q. Distributed Privacy-Preserving Collaborative Intrusion Detection Systems for VANETs. IEEE Trans. Signal Inf. Process. Networks 2018, 4, 148–161. [Google Scholar] [CrossRef]
Zeng, Y.; Qiu, M.; Ming, Z.; Liu, M. Senior2Local: A Machine Learning Based Intrusion Detection Method for VANETs; Springer International Publishing: Berlin/Heidelberg, Germany, 2018; Volume 11344, ISBN 9783030057541. [Google Scholar]
Zeng, Y.; Qiu, M.; Zhu, D.; Xue, Z.; Xiong, J.; Liu, M. DeepVCM: A Deep Learning Based Intrusion Detection Method in VANET. In Proceedings of the 2019 IEEE 5th Intl Conference on Big Data Security on Cloud (BigDataSecurity), IEEE Intl Conference on High Performance and Smart Computing, (HPSC) and IEEE Intl Conference on Intelligent Data and Security (IDS), Washington, DC, USA, 27–29 May 2019; pp. 288–293. [Google Scholar] [CrossRef]
Yu, Y.; Zeng, X.; Xue, X.; Ma, J. LSTM-Based Intrusion Detection System for VANETs: A Time Series Classification Approach to False Message Detection. IEEE Trans. Intell. Transp. Syst. 2022, 23, 23906–23918. [Google Scholar] [CrossRef]
Braga, D.C. Field Drilling Data Cleaning and Preparation for Data Analytics Applications. Master’s Thesis, Louisiana State University and Agricultural & Mechanical College, Baton Rouge, LA, USA, 2019. [Google Scholar]
Jana, G.; Mitra, A.; Pan, S.; Sural, S.; Chattaraj, P.K. Modified Particle Swarm Optimization Algorithms for the Generation of Stable Structures of Carbon Clusters, C_n (n = 3–6, 10). Front. Chem. 2019, 7, 485. [Google Scholar] [CrossRef] [PubMed] [Green Version]

Figure 1. Communication units in VANETs [5].

Figure 2. Major components of the 1D CNN.

Figure 3. The proposed federated learning approach.

Figure 4. Flowchart for the presented framework.

Figure 5. Performance comparison.

Figure 6. Comparison of training and testing accuracy.

Figure 7. Comparison of accuracy.

Table 1. Comparison of recent works.

Author	Year	Dataset Used	Methodology	Evaluation Parameters
Author	Year	Dataset Used	Methodology	Accuracy	Precision	Recall	F1 Score	Others
Shu, J. et al. [18]	2020	KDD99 dataset	The authors suggested installing a distributed SDN controller on each base station to create a cooperative intrusion detection system based on distributed SDN.	Y	Y	Y	Y	Y
Bangui, H. et al. [19]	2022	CICIDS2017 dataset	For the purpose of addressing real-time attack detection in VANETs, authors put out a hybrid machine learning model for intrusion detection. The model uses an unsupervised clustering approach based on coresets to filter out unknown attacks and the random forest as a classifier to identify well-known attacks.	Y	Y	Y	Y	-
Bangui, H. et al. [20]	2021	CICIDS2017	Authors suggested a hybrid machine learning technique to carry out thorough intrusion detection in VANETs effectively. The suggested approach combines coresets-based clustering and data categorization. It makes use of coresets to reduce overhead in computational time consumption and improve IDS inference capabilities in VANETs.	Y	-	-	Y	Y
Zhang, T. et al. [21]	2018	NSL-KDD data	The authors suggested a collaborative IDS (PML-CIDS) for VANETs that protects user privacy using machine learning. The suggested algorithm trains a classifier to recognise intrusions in VANETs and applies the alternating direction method of multipliers to a class of empirical risk minimization issues.	Y	-	-	-	Y
Zeng, Y. et al. [22]	2018	-	Senior2Local, a unique ML-based intrusion detection approach for VANETs, was presented by the authors. They utilized game theory to develop a system of trust for RSUs. ANN is implemented using a model based on dependable RSUs in order to secure CHs. After deleting malicious CHs, a lightweight SVM is employed to detect cluster-to-cluster harmful MPRs.	Y	-	-	-	Y
Zeng, Y. et al. [23]	2019	NS-3 VANET simulated dataset and ISCX 2012 IDS dataset	A deep learning (DL)-based end-to-end intrusion detection system was proposed by the authors in order to automatically detect malware traffic for OBUs. In contrast to earlier intrusion detection techniques, the suggested method just needs raw traffic, not human-extracted private information attributes.	-	Y	Y	Y	-
Yu, Y. et al. [24]	2022	Time series dataset	To improve false emergency message detection, authors presented a time series classification and deep learning-based IDS. A classifier based on long short-term memory (LSTM) is built and trained to determine whether the emergency message is authentic or not.	Y	-	-	Y	Y

Table 2. Comparison with benchmark algorithms.

Model	Training Accuracy	Testing Accuracy	Precision	Recall	F1 Score	FPR
Linear Regression	0.868	0.766	0.760	0.856	0.833	0.12
Naïve Bayes	0.668	0.388	0.412	0.946	0.661	0.78
Decision Tree	0.981	0.791	0.893	0.967	0.942	0.03
Random Forest	0.980	0.854	0.941	0.935	0.942	0.04
K-NN	0.989	0.86	0.897	0.971	0.955	0.005
Support Vector Machine	0.869	0.665	0.854	0.853	0.813	0.135
Proposed Approach	0.994	0.981	0.974	0.995	0.987	0.008

Table 3. Comparison with state-of-art methods.

Model	Accuracy
CIDS [18]	0.9675
Hybrid ML [19]	0.9693
Senior2Local [22]	0.985
LSTM-Based ID [24]	0.9708
Proposed Approach	0.994

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Arya, M.; Sastry, H.; Dewangan, B.K.; Rahmani, M.K.I.; Bhatia, S.; Muzaffar, A.W.; Bivi, M.A. Intruder Detection in VANET Data Streams Using Federated Learning for Smart City Environments. Electronics 2023, 12, 894. https://doi.org/10.3390/electronics12040894

AMA Style

Arya M, Sastry H, Dewangan BK, Rahmani MKI, Bhatia S, Muzaffar AW, Bivi MA. Intruder Detection in VANET Data Streams Using Federated Learning for Smart City Environments. Electronics. 2023; 12(4):894. https://doi.org/10.3390/electronics12040894

Chicago/Turabian Style

Arya, Monika, Hanumat Sastry, Bhupesh Kumar Dewangan, Mohammad Khalid Imam Rahmani, Surbhi Bhatia, Abdul Wahab Muzaffar, and Mariyam Aysha Bivi. 2023. "Intruder Detection in VANET Data Streams Using Federated Learning for Smart City Environments" Electronics 12, no. 4: 894. https://doi.org/10.3390/electronics12040894

APA Style

Arya, M., Sastry, H., Dewangan, B. K., Rahmani, M. K. I., Bhatia, S., Muzaffar, A. W., & Bivi, M. A. (2023). Intruder Detection in VANET Data Streams Using Federated Learning for Smart City Environments. Electronics, 12(4), 894. https://doi.org/10.3390/electronics12040894

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Intruder Detection in VANET Data Streams Using Federated Learning for Smart City Environments

Abstract

1. Introduction

1.1. Motivation

1.2. Contribution and Novelty

1.3. Organization of the Paper

2. Related Work

3. Proposed Framework

3.1. Stage 1—Pre-Processing the Heterogeneous Data for Individual Clients of the FL Model

3.2. Stage 2—Training of Different Client Models on Edge Devices

3.3. Stage 3—Weighted Ensemble-Based Aggregation of Client Models

3.4. Stage 4—Tuning Weight of Server Model Using PSO Optimization Algorithm

3.5. Stage 5—Alarm

3.6. Stage 6—Model Evaluation

4. Experiments and Results

4.1. Dataset Description

4.2. Experimental Setup

4.3. Results and Discussion

5. Conclusions and Future Scope

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI