Enhancing Security and Accountability in Autonomous Vehicles through Robust Speaker Identification and Blockchain-Based Event Recording

Njoku, Judith Nkechinyere; Nwakanma, Cosmas Ifeanyi; Lee, Jae-Min; Kim, Dong-Seong

doi:10.3390/electronics12244998

Open AccessArticle

Enhancing Security and Accountability in Autonomous Vehicles through Robust Speaker Identification and Blockchain-Based Event Recording

¹

Department of IT Convergence Engineering, Kumoh National Institute of Technology, Gumi 39177, Republic of Korea

²

ICT-Convergence Research Center, Kumoh National Institute of Technology, Gumi 39177, Republic of Korea

^*

Authors to whom correspondence should be addressed.

Electronics 2023, 12(24), 4998; https://doi.org/10.3390/electronics12244998

Submission received: 7 November 2023 / Revised: 6 December 2023 / Accepted: 12 December 2023 / Published: 13 December 2023

(This article belongs to the Special Issue Selected Papers from 2nd International Conference on Maritime IT Convergence (ICMIC))

Download

Browse Figures

Versions Notes

Abstract

:

As the deployment of Autonomous Vehicles (AVs) gains momentum, ensuring both security and accountability becomes paramount. This paper proposes a comprehensive approach to address these concerns. With the increasing importance of speaker identification, our first contribution lies in implementing a robust mechanism for identifying authorized users within AVs, enhancing security. To counter the threat of voice spoofing, an ensemble-based approach leveraging speaker verification techniques is presented, ensuring the authenticity of user commands. Furthermore, in scenarios of accidents involving AVs, the need for accurate accountability and liability allocation arises. To address this, we introduce a novel application of blockchain technology, enabling an event recording system that ensures transparent and tamper-proof records. The proposed system enhances AV security and establishes a framework for reliable accident investigation using speakers’ records. In addition, this paper presents an innovative concept where vehicles act as impartial judges during accidents, utilizing location-based identification. Results show the viability of the proposed solution for accident investigation and analysis.

Keywords:

automatic speaker identification; autonomous vehicles; blockchain; accident; voting; ensemble

1. Introduction

Autonomous and self-driving vehicles have emerged as transformative technologies with the potential to reshape the transportation landscape [1]. These vehicles offer many advantages, including enhanced road safety, improved user experiences, expanded mobility, and proactive driving capabilities [2]. As depicted in Figure 1, autonomous vehicles encompass diverse levels of automation, with a predominant presence at levels 2 to 4 [3]. The heightened automation at these levels accentuates the significance of seamless communication between vehicles and users. Advancements in autonomous vehicle (AV) technology have seen significant milestones, such as a blind man traveling unaccompanied in a Google self-driving car in Austin [4]. These accomplishments, among others, prove the potential of AVs to provide increased mobility and independence to users with disabilities.

1.1. Background and Motivation

Amidst the many benefits of AVs, concerns arise over the safety of these vehicles on the road. There is a risk of AVs causing road accidents when the system malfunctions. According to data from the National Highway Traffic and Safety Administration (NHTSA), 130 crashes involved AVs between July 2021 and May 2022 [5]. Several other notable accidents involving AVs gained significant public attention. In 2016, a Google self-driving car hit a bus, despite traveling at a low speed of 2 mph (3 km/h), while the bus was advancing at 15 mph (24 km/h) [6]. In 2017, there was another incident involving a Tesla Model S, which collided with an 80-year-old cyclist [7]. These accidents highlight the urgency to develop reliable solutions in AVs that can prevent such incidents and ensure the safety of all road users [8]. Furthermore, research has shown that these incidents have caused a negative public perception of the technology [9].

Another issue related to AVs is the problem of responsibility. If an AV participates in an accident—whether with other AVs, regular cars, or people—it can be hard to figure out who was at fault. When there is an accident involving AVs, how can liability be apportioned appropriately? It is essential to have reliable accident forensics to help decide where the liability lies [10]. This can lead to confusion and disputes about liability. Sometimes, people might even create fake proof to avoid being blamed. Is there a way to record these incidents for investigation later so we can know what happened? Moreover, how can we ensure these records are accurate and cannot be tampered with? This paper describes an ensemble speaker identification system, a blockchain-inspired accident, and an event recording method that ensures that verifiable and reliable forensics are obtained.

Speaker identification involves identifying the users that control an AV using voice [11]. By harnessing distinctive spectral attributes inherent in user voices, speaker identification systems distinguish and categorize individuals [12]. The accuracy and resilience of speaker identification within AVs are pivotal to addressing some concerns and ensuring secure and efficient human–vehicle interactions [13,14]. As depicted in Figure 2, speaker identification is pivotal, enabling vehicles to accurately discern and differentiate speakers—passengers, drivers, or external parties engaged in interaction. Moreover, speaker identification facilitates exclusive access for authorized individuals. The relevance of robust speaker identification systems in autonomous vehicles has become even more apparent, especially with the alarming number of accidents and incidents involving self-driving cars [15]. However, they also emphasize how critical it is for speaker identification systems to be robust enough to enable seamless and reliable interactions between users and the autonomous vehicle. A speaker identification system can limit the number of authorized users for an autonomous vehicle, as depicted in Figure 2.

One challenge with speaker identification systems is that they are prone to voice spoofing attacks, whereby an attacker tries to mimic the voice of a genuine speaker, either by speech synthesis or voice conversion, to issue commands to the vehicle. This could pose a significant risk to both AV and road users. Such a scenario is illustrated in Figure 3.

In autonomous vehicles, security becomes an imperative concern beyond user interactions. Blockchain technology, initially popularized by cryptocurrencies, has found applications in enhancing the security and accountability of autonomous vehicles [16,17]. Blockchain’s decentralized and tamper-resistant nature lends itself to establishing secure and transparent data records, mitigating risks associated with data tampering and unauthorized access [18]. Leveraging blockchain technology for secure data storage and access control bolsters the trustworthiness of autonomous vehicle systems, addressing potential vulnerabilities and enhancing overall system reliability [6,19].

1.2. Contributions

Expanding upon the foundation of prior research, this study introduces a framework for proactive accident event recording by incorporating ensemble learning, witness vehicle voting, and blockchain concepts.

This paper makes the following specific contributions:

We developed an ensemble-based speaker identification system to overcome the challenge of voice spoofing.
We proposed an event recording system for the identified speakers to enable proactive forensics research.
We developed an accident monitoring and recording algorithm that allows witnesses and accident vehicles to vote on the conditions of the accident and provide verifiable claims for liability.
We conducted extensive experiments to show some proof for the proposed solutions.

The proposed system addresses challenges in accurate identification and acknowledges the pressing need for security against voice spoofing attacks. Additionally, the study embraces the potential of blockchain to ensure traceability, accountability, and security in the context of autonomous vehicle operations. The remainder of this paper is organized as follows: The related works section is presented in Section 2. In Section 3, we present the proposed system and all corresponding architectures and algorithms. We conduct extensive experiments to validate the proposed system in Section 4 and provide corresponding results. Section 5 discusses and analyzes the potential challenges, while Section 6 concludes the paper. A list of abbreviations in this work is included at the end of this paper.

2. Related Works

This section presents a literature review on speaker identification techniques tailored to AVs and in-vehicle communication systems. We also review previous studies on voice spoofing and blockchain-aided event recording systems for AVs.

2.1. Speaker Identification

The speaker identification problem has been extensively studied in various domains but poses unique challenges [13,14,20]. One critical challenge is creating highly accurate and computationally efficient models for speaker identification [21]. Consequently, researchers have explored different speech features and leveraged machine learning (ML) techniques with impressive results [20,22,23]. Speaker identification has also been explored in the realm of AVs [24]. Ref. [24] focused on highlighting the use of microphone arrays for speaker localization and identification. Another study has explored using classical ML algorithms for speaker identification in AVs [25]. In [26], an artificial neural network (ANN) algorithm was explored for speaker identification in AVs. All of these studies highlight the critical need for robust speaker identification models for AVs.

Developing a robust ML model for speaker identification requires considering several factors, such as the type of data, the ML algorithm used, and the feature extraction method. Distinctive speech features play a crucial role in speaker identification, considering factors such as gender and language characteristics [13]. Mel frequency cepstral coefficients (MFCCs) are the most commonly used features in speech recognition and speaker identification tasks. They are highly robust and successfully applied in various speech-related tasks, including speech recognition, emotion recognition, and speaker identification. Several studies, such as [14,20,22,27,28], have demonstrated the effectiveness of MFCC features in speaker identification systems.

However, gamma tone cepstral coefficients (GTCCs) [29] are more resilient to noise than MFCCs [12], making them a preferred choice for speaker identification tasks [12,13]. Another commonly applied feature for speaker identification is pitch [23], which has been used to enhance speaker identification performance in [11]. In [11], a hybrid feature comprising pitch and MFCC was utilized to develop a more robust speaker identification system. It is thus necessary to identify the best features suitable for speaker identification. Furthermore, finding the most robust ML model for speaker identification remains a significant concern [21]. Considerations for models with low computational complexity and low memory footprints are necessary to implement a real-time speaker identification system.

2.2. Voice Spoofing

Prior research endeavors related to speaker identification have predominantly concentrated on refining features, ML models, or resource-intensive methodologies. Nevertheless, a pivotal facet that has often been relegated to the sidelines is the susceptibility of these systems to voice spoofing attacks. This vulnerability can potentially compromise security and accountability [30]. Many research investigations have attempted to counteract spoofing by exploring various strategies [31]. In certain instances, the spotlight has fallen on post-processing the speech samples using neural networks. These networks have been fine-tuned to minimize the divergence between the features characterizing counterfeit and authentic utterances [32].

An alternative approach involves the exploration of artifact estimations, a result that arises when an impostor tries to transform their speech into a genuine version [33]. This study was based on the assumption that all manipulated speech samples would exhibit artifacts. The work undertaken by researchers in [34] focused on subsampling voiced frames before initiating spoofing detection procedures. Similarly, ref. [35] charted a comparable course by translating speech features into an innovative ensemble of attributes that facilitates the identification of multiple spoofing attacks. ML-based methodologies have also found application in countering spoofing. In the context of [36], a one-class learning approach was used to enhance the detection accuracy for unidentified voice spoofing attacks in real-world scenarios. Moreover, the concept of ensemble learning has been investigated, as shown by studies found in [37,38,39]. In [37], an ensemble learning model composed of deep neural networks and traditional ML models was developed, whose predictions are combined using logistic regression. Similarly, ref. [38] developed an ensemble of ML models for differentiating voice spoofing attacks to speaker identification systems. A comparative analysis of ML classifiers such as support vector machines (SVMs) and K-nearest neighbor (KNN) to an ensemble model as a countermeasure against voice replay attacks was developed in [39]. The solution employed within our study draws inspiration from this approach, where the premise revolves around amalgamating a collection of models that have demonstrated robustness in their ability to discern fraudulent voice samples. This ensemble of models is curated to complement each other’s strengths and collectively enhance overall performance.

2.3. Safety and Accountability of AVs

The safety and accountability of AVs are crucial concerns in the era of autonomous driving. AVs are equipped with various sensors, such as LIDAR, radar, and cameras, to perceive their environment accurately. Researchers continually improve sensor fusion techniques to enhance AVs’ detection and prediction capabilities, thereby reducing the risk of accidents caused by system failures or environmental factors [40]. AI plays a pivotal role in the decision-making processes of AVs. The development of more sophisticated ML algorithms can improve the vehicles’ ability to navigate complex traffic scenarios safely [41]. Ongoing research focuses on enhancing AI’s predictive analytics to foresee and avoid potential hazards [42]. The shift from driver-controlled to autonomous vehicles raises questions about liability in accidents. Legal scholars and policymakers are exploring frameworks where manufacturers, software developers, and other stakeholders share responsibility, depending on the cause of the accident [43].

AVs must make split-second decisions in critical situations. Integrating ethical decision-making models into AV software is an area of ongoing research [41]. These models aim to ensure the vehicle’s actions in unavoidable accident scenarios align with societal values and ethics. Governments and international bodies are working on establishing comprehensive legal and regulatory frameworks to govern the operation of AVs. These frameworks aim to define safety, testing, and liability standards, providing clear guidelines for manufacturers and users. Similar to black boxes in airplanes, event data recorders (EDRs) in AVs record crucial data related to vehicle operation. These data can be used to reconstruct events leading up to an accident, providing valuable information for adjudication purposes. Post-accident analysis often relies on interpreting visual and sensor data to determine the sequence of events. Researchers are developing sophisticated algorithms to analyze these data more accurately, which can be crucial in legal proceedings and insurance claims [43].

2.4. Blockchain Solutions for AVs

The integration of blockchain technology [44] has demonstrated its utility in addressing various challenges within AVs. Many of these solutions are fundamentally centered around bolstering the security of AV operations [45,46]. In the work presented by [47], the emphasis lies on enhancing cybersecurity measures for AVs, culminating in the proposition of tracking systems designed to document vehicular actions meticulously. On a parallel trajectory, the study conducted by [48] set its sights on safeguarding the identities and privacy of individuals who witness accidents, focusing on introducing what they termed as randomizable signatures. The concept introduced by [49] took a collaborative approach, suggesting that AVs could acquire knowledge from the collective experiences of their counterparts by interfacing with a publicly accessible ledger. This approach, in turn, would relieve manufacturers of the onus of individually training each AV. A framework centered around proof-of-event recording was meticulously developed and documented in [16,50].

This innovative system, designed to establish verifiable forensic records during accidents, is a stepping stone for our proposition. Expanding on this premise, we introduce an event-recording mechanism that enables the creation of verifiable accident forensics and facilitates the streamlined identification of accident causality. While the domain largely lacks solutions addressing speaker identification in AVs, a study by [51] endeavors to fortify user authentication via blockchain technology. A caveat to this approach is the absence of a robust anti-spoofing mechanism, which merits further exploration and refinement.

3. Methodology

In this section, we discuss the entire system as summarized in Figure 4. We introduce three main components: (a) the vehicular network, (b) the ensemble model, and (c) the proposed solution for accident monitoring and blockchain integration.

3.1. Vehicular Network

For this study, we consider a cellular vehicular network scenario. Within this context, all vehicles

V_{n m}

are served by a common base station denoted as

B_{n m}

, operating within a specific cell denoted by

N_{n m}

. This arrangement is illustrated in Figure 5. In this depiction, n signifies the number of cells, while m represents the number of vehicles within a cell. All vehicles within a given cell play a role in the event of an accident, either as the accident-involved vehicle or as a witness to the accident.

An accident-involved vehicle refers to one that directly participates in a collision with another vehicle. Conversely, a witness vehicle is merely present and operational within the network during the incident. To summarize the assumptions made in this study:

We consider a cellular vehicular network with blockchain integration, where vehicles within a cell share a standard blockchain for secure and transparent data management.
We assume the identification of the network and AVs through GPS longitudes, ensuring precise location-based tracking and communication.
We posit that all AVs have registered license plates with the Department of Motor Vehicles (DMV), enabling regulatory compliance and traceability.

In communication protocols, AVs primarily employ the IEEE 802.11 standard [52] for dedicated short-range communications (DSRCs) to transmit and receive information, including event generation requests. Furthermore, AVs maintain connectivity to the cellular network, facilitating the broadcasting and verification of event data within the same network.

3.2. Ensemble Model for Anti-Spoofing

In this subsection, we introduce the ensemble speaker identification model summarized in Figure 6. Specifically, we discuss six primary components of this research, including (a) problem formulation, (b) feature extraction, (c) feature fusion and selection, (d) ML model selection, (e) ensemble model selection, and (f) the dataset.

3.2.1. Problem Formulation

Consider an autonomous vehicle (

AV

) designed to respond to voice commands from authorized users (

U

). The challenge lies in preventing voice spoofing attempts by unauthorized users (

V

), who might mimic the voices of authorized users to manipulate

{AV}^{'} s

actions.

Both

U

and

V

can issue voice commands represented as signals (x), falling within a specific frequency range (

H

). We define a closed set speaker identification task by extracting distinctive features (

F

) from these signals. The objective is to develop an algorithm (A) that, after training on optimal features from set

F

obtained only from legitimate speakers in set

N

, accurately and swiftly distinguishes speakers within

N

. The intent is to design a robust model impervious to unauthorized users (

V

) attempting to imitate

U

voices. This model will serve as a platform to test countermeasures and enhance

AV

’s speaker identification system against intentional threats.

3.2.2. Feature Extraction

Feature extraction aims to convert the speech signals to a format that clearly shows the distinctive attributes of the speaker’s voice. The feature extractor transforms the speech signals into feature vectors, which are numerical samples that will be fed to the training model. In this paper, five different features were employed.

Mel Frequency Cepstral Coefficients (MFCCs): MFCCs are critical features for ASI tasks. They are extracted through a cepstral analysis of the speech signal. The cepstral analysis of a speech signal separates the source components into excitation source and vocal tract source. To compute the MFCC, the speech signal is first framed and windowed, after which the Fourier transform of the output is taken to obtain the magnitude spectrum. In this paper, the hamming window (w) was used, and the following formula gives the hamming coefficients:

$w (k) = 0.54 - 0.46 cos (\frac{2 π k}{n - 1}),$

(1)

where n represents the length of the filter and $k = 0, 1, \dots, n - 1$
This resultant spectrum is then transformed to the Mel scale using the Mel filterbanks, a non-linear scale that closely approximates the pitch to how it sounds to humans. The result is a spectrum in different frequency bands.
Gammatone Cepstral Coefficients (GTCCs): The gamma tone (GT) filter banks are an improvement over Mel scale triangular filters. The GT filter is a linear filter that is represented by an impulse response $g (t)$ given by:

$g (t) = a t^{n - 1} e^{- 2 π b t} cos (2 π f t + ϕ),$

(2)

where $ϕ$ represents the phase of the carrier in $r a d i a n s$ and f denotes the center frequency in Hz, n is the order of the filter, a is the amplitude, t is the time, and b is the bandwidth of the filter in Hz.
Pitch: Speech can be tonal or non-tonal, voiced or unvoiced. Such classes are brought about by the modulation of air from the lungs; the sound that ensues oscillates with a reasonably low frequency, representing the pitch when the speech is voiced. The excitation produced by voiced speeches is quasiperiodic. However, when speech is unvoiced, the excitation is noise-like because, as the air from the lungs is constricted in the vocal tracts, it becomes turbulent. Pitch can be generated by first applying a pre-emphasis filter to the speech signal, which would enhance the high-frequency content of the signal. Afterward, the output signal is divided into overlapping frames of about 20 to 30 ms. Afterward, a pitch detection algorithm like cepstrum or autocorrelation is applied to each frame to obtain the pitch period.
Short-term Energy (STE): The short-time energy is a feature that helps to differentiate speech from silence. When the energy of a frame falls below a certain predefined threshold, the frame is declared to be silent. Otherwise, it is a speech. The short-term energy can be obtained by first converting the speech signal to a discrete-time signal, after sampling at a high rate (16 kHz), then breaking these signals into overlapping frames with fixed length (20 ms with a 10 ms overlap). Afterward, square each speech sample in a frame and sum the squares. This sum is then divided by the total number of samples in a frame to obtain the short-term energy.

3.2.3. Zero-Crossing Rate (ZCR)

The zero-crossing rate (

Z C R

) is another crucial feature that helps to distinguish unvoiced speech from voiced speech. When there are many zero crossings, we can imply that low-frequency oscillations are not dominant. If the zero-crossing rate for a frame is above a predefined threshold, then the frame can be declared unvoiced speech. The

Z C R

can be obtained by first converting the speech signal to a discrete-time signal, after sampling at a high rate (16 kHz), and then computing the sum of the absolute differences between each speech sample and the previous one and dividing this by the total number of samples.

Z C R

is represented with the formula:

Z C R = \frac{1}{T - 1} \sum_{t - 1}^{T - 1} I \{x_{t} x_{t - 1} 0\},

(3)

where x denotes the signal that has a length of T, while the function

I {B}

results in 1 if B is true and results in 0 otherwise.

This study conducted experiments on different sets of fused features for speaker identification. The performance of five different single and fused feature configurations was compared based on the accuracy and time of validation and testing. The following paragraphs detail the different hybrid feature configurations.

GTCC features: The GTCC features were extracted using a Hamming window of length $0.03 \times s a m p l e r a t e$ and an overlap length of $0.025 \times s a m p l e r a t e$ .
MFCC features: The MFCC features were extracted using a Hamming window of length $0.03 \times s a m p l e r a t e$ and an overlap length of $0.025 \times s a m p l e r a t e$ .
GTCC-Pitch features: This hybrid feature consists of the GTCC, pitch, short-time energy, and $Z C R$ features. The MFCC features were extracted using a Hamming window of length $0.03 \times s a m p l e r a t e$ and an overlap length of $0.025 \times s a m p l e r a t e$ . The short-time energy features were extracted with a threshold of $0.005$ , while the $Z C R$ features were extracted with a threshold of $0.2$ .
MFCC-Pitch features: This hybrid feature consists of the MFCC, pitch, short-time energy, and $Z C R$ features. The MFCC features were extracted using a Hamming window of length $0.03 \times s a m p l e r a t e$ and an overlap length of $0.025 \times s a m p l e r a t e$ . The energy features were extracted with a threshold of $0.005$ , while the $Z C R$ features were extracted with a threshold of $0.2$ .
GTCC-MFCC-Pitch features: This hybrid feature comprised GTCC, MFCC, and Pitch features along with the $Z C R$ and energy features.

Since the

Z C R

and short-time energy features are used to decide when to use the pitch feature, it was only employed for hybrid features involving pitch.

3.2.4. Feature Fusion and Selection

Normalization: To prevent bias in the classifier, all features were normalized by deducting the mean and dividing by the standard deviation, thus ensuring they were all on the same scale.
Concatenation: All feature vectors were concatenated, and the output is a matrix, which serves as an input to a K-nearest neighbor (KNN) classifier. In this matrix, each of the columns matches a particular feature.

The KNN algorithm is a nonparametric classifier that uses a supervised learning approach to perform its classification tasks. KNN is used to identify the nearest neighbors to a given point based on distance metrics. This study adopted the Euclidean distance as denoted below.

d (x, y) = \sqrt{\sum_{i = 1}^{n} {(y_{i} - x_{i})}^{2}} .

(4)

We also used

k = 5

neighbors and a squared inverse distance weight. All feature configurations were used to train and test the KNN algorithm. The best-performing configuration was then employed for the next stage.

3.2.5. ML Model Selection

After selecting the best feature configuration, we trained several ML classifiers. The primary objective was identifying the best-performing classifiers based on specific evaluation metrics. Below is a discussion of each of the classifiers employed:

K-Nearest Neighbors (KNN): KNN is a simple yet effective algorithm that classifies data points based on the majority class among their nearest neighbors. It is known for its simplicity and ease of implementation.
Support Vector Machine (SVM): SVM is a robust classifier that aims to find a hyperplane that best separates data points of different classes. It is particularly effective in high-dimensional spaces where the data are not linearly separable.
Random Forest (RF): RF is an ensemble learning method that combines multiple decision trees to improve accuracy and reduce overfitting. It is robust and performs well on a variety of data types.
Naive Bayes (NB): NB is a probabilistic classifier based on Bayes’ theorem. It is simple, efficient, and works well for tasks where the independence assumption holds.
Decision Tree (DT): DT is a tree-like structure where internal nodes represent features, branches represent decisions, and leaf nodes represent outcomes. It is interpretable and can handle both categorical and numerical data.
AdaBoost: AdaBoost is an ensemble learning method combining weak and robust learners. It focuses on samples misclassified by previous weak classifiers, improving overall performance.
AdaBoost with Decision Tree: This is an extension of AdaBoost where decision trees are used as weak learners. It often performs well when combined with AdaBoost’s boosting mechanism.
Linear Discriminant Analysis (LDA): LDA is a dimensionality reduction and classification technique that aims to find linear combinations of features that best separate different classes.
Quadratic Discriminant Analysis (QDA): QDA is similar to LDA but relaxes the assumption of equal covariance matrices for different classes, making it more flexible when the covariance structures vary significantly.

Each classifier was evaluated based on specific criteria, and the best-performing models were selected for further refinement and integration into the ensemble model.

3.2.6. Ensemble Model Selection

We explored various ensemble configurations in developing a robust ensemble model while carefully considering the individual models’ performances within a predefined threshold. The ensemble strategy employed here was majority voting, which involves aggregating the predictions of all models within an ensemble category.

Mathematically, the majority voting process can be represented as follows:

Let E be the set of individual models within an ensemble category, and let F represent the total number of models in this set. For a given classification task, each model

e_{i}

in M generates a prediction

P_{i}

corresponding to the class label assigned to an input sample. The majority voting process combines these individual predictions to make a final decision by selecting the class label with the most votes. It can be expressed as:

Ensemble Prediction = arg max_{j} \sum_{i = 1}^{F} δ (P_{i}, j),

(5)

where

Ensemble Prediction

is the final prediction made by the ensemble model,

arg {max}_{j}

selects the class label j with the highest total votes, and

δ (P_{i}, j)

is a function that equals 1 if

P_{i}

matches class label j, and 0 otherwise.

In essence, the ensemble model leverages the wisdom of multiple individual models, and by employing majority voting, it aggregates their predictions to make a final decision that often leads to improved classification performance.

3.2.7. Dataset

The dataset employed was obtained from the “Common voice” dataset from Mozilla and consists of 10 speakers with a uniform distribution between males and females [53]. Each speaker utters short sentences at a signal frequency of 48 kHz. This dataset was suitable for this work as it registers different races, sexes, and accents. As shown in Table 1, a total of

28, 844

speech samples were used in the experiment.

3.3. Event Recording with Blockchain

The data available must be correct and reliable to fast-track forensics investigation after an accident. Such data can be recorded by bystanders or those involved in the accident. This subsection introduces the blockchain-based event recording solution. Specifically, it discusses (i) the accident monitoring algorithm, (ii) the AV’s incident lifecycle, and (iii) the blockchain solution.

3.3.1. Accident Monitoring Algorithm

In this paper, we propose two steps towards accomplishing the goal: firstly, to record the ID of speakers that control the AV, to trace the cause of the accident, and secondly, to gather trustworthy accident data based on observations by those involved. The steps are enumerated in Algorithm 1. All vehicles driving are subjected to standard speed limits, where the maximum speed allowed is defined as the threshold. Vehicles driving above this speed are deemed to have broken the rule and can be liable in the event of an accident. In Algorithm 1, there are witness vehicles who are within range of the accident, and there are accident vehicles who are directly involved in a collision. All witness vehicles vote on which accident vehicle is at fault by comparing their speed with that of the accident vehicle. If excess speed was detected, the witness vehicle votes True, otherwise False.

Algorithm 1 Excessive Speed Detection with Weighted Votes

1:: Inputs:
2:: Witness vehicle speed ( $S_{v w}$ ) in m/s
3:: Accident vehicle speed ( $S_{v n}$ ) in m/s
4:: Excessive speed threshold (S) in m/s
5:: Distance from witness vehicle to accident site ( $D_{i}$ ) in meters
6:: Outputs:
7:: Excessive speed vote ( $V$ ) as a boolean (true or false)
8:: Weight of each vote based on distance (W)
9:: procedure DetectExcessiveSpeed( $S_{v w}$ , $S_{v n}$ , S, $D_{w}$ )
10:: relative_speed, $S_{r} \leftarrow | S_{v w} - S_{v n} |$
11:: ${vote}_{s} \leftarrow false$ ▷ Initialize vote
12:: $weight, W \leftarrow \frac{1}{D_{w}}$ ▷ Calculate weight based on distance
13:: if $S_{r} > S$ then
14:: ${vote}_{s} \leftarrow true$ ▷ Excess speed detected
15:: end if
16:: return ${vote}_{s}, W$
17:: end procedure
18:: Final Vote Aggregation:
19:: Collect all votes and weights from witness vehicles
20:: $Final Vote = arg {max}_{j} \sum_{i = 1}^{N} δ (P_{i}, j) \cdot W_{i}$

3.3.2. AV Incident Lifecycle

The “AV Incident Lifecycle” refers to the series of stages or phases that an AV goes through, from the onset of a driving session to the resolution and analysis of any incidents or accidents that may occur. This section will discuss three sessions, including a driving session, an accident session, and a forensics session.

First Session: Driving

One or more users, $U_{v}$ in AV, $V_{n m}$ , issue instructions to control the AV.
$V_{n m}$ executes the instruction and identifies the speaker for every executed instruction.
$V_{n m}$ records the predicted and identified user ID in a hash along with time stamps and location, as illustrated in Figure 7.

Second Session: Accident

Two randomly selected AVs, $V_{n 1}$ and $V_{n 2}$ , collide in an accident and broadcast ’event generation’ requests to the witness vehicles.
All witness vehicles $V_{n, w}$ cast a vote to determine which vehicle was at fault based on excessive speed detection Algorithm 1. The weight of each vote ( $W_{i}$ ) is determined by the inverse of the distance ( $D_{i}$ ) between the witness vehicle and the accident site:

$W_{i} = \frac{1}{D_{i}} .$

(6)
AVs $V_{n 1}$ and $V_{n 2}$ also vote to determine the cause of the accident. Similarly, the weight of each vote is based on the inverse of the distance ( $D_{1}$ and $D_{2}$ ) of the respective AVs to the accident site:

$W_{1} = \frac{1}{D_{1}}, W_{2} = \frac{1}{D_{2}} .$

(7)
All votes are collated and aggregated by majority voting with weighted votes, where the weight of each vote ( $W_{i}$ ) is taken into account:

$Final Vote = arg max_{j} \sum_{i = 1}^{N} δ (P_{i}, j) \cdot W_{i} .$

(8)

where $Final Vote$ is the final verdict regarding the accident’s cause, $arg {max}_{j}$ selects the class label j with the highest total weighted votes, and $δ (P_{i}, j)$ is a function that equals 1 if $P_{i}$ matches class label j, and 0 otherwise.
The final vote is recorded in a hash, along with time stamps and location, as illustrated in Figure 7.

Third Session: Forensics

The forensics investigator queries the blockchain records with the specific date of the accident.
Investigator examines final vote from accident session.
Investigator examines records from driving session.

Figure 8 gives an overview of what happens after an accident occurs. Accident vehicles 1 and 2 collide in an accident. Witness vehicles 3, 4, 5, and 6 within the communication range receive the ’event generation’ requests and cast their votes. Accident vehicles also cast their votes. All vehicles then generate their event data consisting of votes, time stamps, GPS location, and the corresponding hash digest and broadcast within the vehicular network. All the broadcast events will be verified and saved in a new block based on a consensus.

Every newly generated block of the accident is saved in the DMV for record keeping. During forensics investigation, the authorities can review the DMV record for the speaker and accident event data stored in the blockchain.

3.3.3. Blockchain Solution

To record AV accident data and speaker information on the blockchain, a structured process is required. This study was based on a private Ethereum blockchain network. This section introduces the following discussions: (i) connection with the Ethereum node, (ii) developing the smart contract, (iii) formal verification of the smart contract, (iv) executing the smart contract, and (v) interacting with the smart contract.

Connection with Ethereum node: To connect with the Ethereum node, Web3.py library is used. This Python library is used to interface with the Ethereum blockchain. An HTTP provider is used to connect to a local Ethereum node using its URL, for instance: web3 = Web3(Web3.HTTPProvider(‘http://127.0.0.1:8545’)).
Developing the smart contract: A smart contract is a set of predefined rules and functions written in a programming language (like Solidity for Ethereum) that runs on the blockchain [54]. It automatically executes and enforces the terms of a contract when certain conditions are met. This study uses a smart contract on the Ethereum blockchain to record AV accident data. Each AV is equipped with a unique identifier, and in the event of an accident, the vehicle’s system records key data, such as the time, location, and relevant sensor information. These data are then transmitted to the blockchain via a smart contract function executed from a decentralized application (DApp). The transaction includes the vehicle’s ID and a unique speaker ID for the accident. The smart contract ensures that each accident record is unique to prevent duplicate entries. The system records the transaction time, allowing for the calculation of latency—the time taken for the data to be recorded on the blockchain after the accident. As illustrated in Figure 9, each vehicle in the network and within proximity of the accident votes on the liable vehicle in the accident. The smart contract executes two contracts: recording accident votes and speaker ID. Within this smart contract, a function is defined to handle the recording of AV accident data. This function takes specific parameters $v e h i c l e_i d$ , $s p e a k e r_i d$ , time, and location and records them on the blockchain. The data are stored to align with blockchain principles, making them immutable and transparent. The function runs validation checks to ensure a speaker ID has not yet been recorded for a given vehicle. If all checks pass, the function records the data on the blockchain. This may involve updating a mapping or array within the smart contract to store the new data. The function then emits an event to notify the system that a new record has been added.
Formal verification of the smart contract: To guarantee the robustness and accuracy of our smart contract, we utilized formal verification and analysis techniques in conjunction with $R e m i x$ IDE and $S o l h i n t$ . Formal verification within Remix entails meticulously examining the smart contract code to ensure its conformance with its specifications [55]. This process is essential in detecting potential vulnerabilities, logical errors, and inefficiencies. Remix also provides integrated testing and debugging tools, allowing developers to interactively test their contracts and pinpoint specific code segments where issues may arise [56]. $S o l h i n t$ is a linter offering security and style guide validations for Solidity code. It is critical in enhancing code quality and ensuring adherence to best practices. By employing Solhint, the smart contract code was scanned for known vulnerabilities and anti-patterns. This is crucial in the blockchain domain, where security breaches can lead to significant financial and reputational losses. $S o l h i n t$ also ensures that the smart contract adheres to established coding standards, which are vital for maintaining the code’s readability, consistency, and maintainability, especially in collaborative development environments. The integration of $R e m i x$ and $S o l h i n t$ provided a comprehensive analysis workflow. $R e m i x$ allowed for real-time interaction with the smart contract, while $S o l h i n t$ offered an automated, rule-based analysis. The formal verification and analysis using $R e m i x$ and $S o l h i n t$ were instrumental in ensuring the integrity and security of the smart contract utilized for recording AV accident data. This rigorous approach to smart contract development not only improved the reliability of the application but also instilled confidence in its usage for sensitive data handling in AV accident scenarios.
Executing the smart contract: The smart contract’s address and application binary interface (ABI) are required to interact with it. The ABI is a JSON representation of the contract, which tells the Web3.py library how to interact with the contract’s functions. A transaction is built and then sent to the Ethereum network. This involves specifying the sender’s address (from), the nonce (a counter to ensure each transaction is unique), and gas price. The transaction is then signed using the sender’s private key. Keeping the private key secure is crucial, as it grants control over the sender’s funds and the ability to execute transactions. The signed transaction is sent to the network, and its hash is returned. This hash uniquely identifies the transaction on the blockchain. The code waits for the transaction to be mined and confirmed by the network, after which a receipt is received. This receipt contains details about the transaction, such as its success or failure, gas used, and the event logs, if any.
Interacting with the smart contract: To identify the AV at fault or the speaker ID, it is necessary to query the smart contract. A function communicates with the smart contract to retrieve the vehicle ID count of votes. To measure the efficiency of this interaction, the function records the start and end times of the call, computing the latency as the difference between these timestamps. Similarly, another function is designed to retrieve the speaker ID associated with a specific vehicle ID.

Figure 10 illustrates the sequence of events in the system from the event of an accident to the execution of the contract and then the data recording in the blockchain.

4. Experiments and Evaluation

4.1. Metrics

4.1.1. Speaker Identification Metrics

The results of the models for speaker identification were evaluated based on three metrics:

F_{1}

score, Jaccard index, and execution time. The

F_{1}

score measures the accuracy of the validation and test sets. At the same time, the Jaccard similarity index is usually used to estimate the similarity between two data sets, in this case, the predicted and the actual label (speaker). The

F_{1}

score is calculated using the statutory formula defined as follows:

F_{1} = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l} .

(9)

The Jaccard index score is calculated using the following formula:

J (y, \hat{y}) = \frac{|y \cap \hat{y}|}{|y| + |\hat{y}| - |y \cap \hat{y}|} = \frac{\hat{y}}{y + y - \hat{y}},

(10)

where y is the total number of samples, and

\hat{y}

is the number of correctly predicted samples.

Computation time refers to the time in seconds it takes a model to predict the speaker’s identity. Execution time shows the execution time in seconds required for each simulation of the blockchain network. It measures the entire system’s efficiency in processing data and making decisions.

4.1.2. Blockchain-Based Event Recording Metrics

In the context of blockchain transactions, particularly those involving Ethereum or similar blockchain platforms, “latency” and “gas cost” are two critical metrics that are often used to evaluate the performance and efficiency of transactions. The following are explanations for each:

Latency: In blockchain terms, latency refers to the time taken for a transaction to be completed and confirmed on the blockchain network. It is the duration from when a transaction is initiated (sent to the network) to when it is confirmed (included in a block and validated by the network).

$L = T_{c o n f i r m e d} - T_{i n i t i a t e d},$

(11)

where L is the latency, and $T_{c o n f i r m e d}$ is the timestamp when the transaction is confirmed. $T_{i n i t i a t e d}$ is the timestamp for initiating the transaction.
Gas Cost: Gas cost in Ethereum and similar blockchains refers to the fee required to conduct a transaction or execute a smart contract on the network. Gas is a unit that measures the computational effort required to execute operations.

$G = Gas Used \times Gas Price,$

(12)

G is the total gas cost, and Gas Used is the number of gas units the transaction consumes. Gas Price is the price per gas unit, typically denoted in Gwei (1 Gwei = $10^{- 9}$ Ether).

4.2. Simulation Details

4.2.1. Speaker Identification Models

The development of speaker identification models was carried out using MATLAB 2023a, a widely used tool for numerical computing. The models were tested on a machine equipped with a 2.90 GHz CPU, 8 GB of RAM, and a 64-bit system architecture, which ensures efficient processing and multitasking capabilities. This machine’s energy consumption is also well-balanced, making it an ideal choice for the development and testing phases.

4.2.2. Ensemble Model Reproduction and Accident Simulation

The ensemble model, initially conceived in MATLAB, was reproduced in Python using a Jupyter Notebook environment. This environment allows for interactive data manipulation and visualization, making it suitable for simulating accident scenarios and the associated voting system. The seamless transition between model development and practical application simulation was made possible using the same Python environment for both tasks.

4.2.3. Blockchain Simulation and Smart Contract Development

For simulating the Ethereum network, Ganache was utilized. Ganache is a personal blockchain simulator that enables Ethereum blockchain development by providing a test environment for deploying contracts, developing applications, and running tests [19]. It is an essential tool for dApps without incurring real-world costs. The smart contract was developed in Remix IDE, an open-source tool for writing Solidity contracts directly from the browser. Remix simplifies the smart contract development, compilation, and testing process. The Web3.py library was employed in Python to facilitate interaction with the developed smart contract, enabling transaction creation, contract deployment, and data retrieval on the Ethereum blockchain. The Remix IDE and Ganache are illustrated in Figure 11.

4.3. Feature and Model Selection Results

Table 2 presents the outcomes of various feature configurations used for speaker identification. The results were assessed based on the previously defined metrics. The MFCC, GTCC, Pitch, STE, and ZCR hybrid features demonstrated the best performance, making them the preferred choice for further analysis.

Table 3 illustrates the results of several machine learning models, aiming to identify the most effective models for ensemble-based speaker identification. The objective is to find models that exhibit individual solid performance, making them suitable candidates for ensemble modeling. The KNN model achieved a remarkable validation accuracy of 98.82%, showcasing its strong ability to predict speaker IDs accurately. However, there was a slight drop in test accuracy to 85.38%, suggesting some overfitting tendencies. The SVM model demonstrated robust and consistent performance, making it a strong candidate for ensemble inclusion. The RF model displayed competitive individual performance and remained a viable option for ensemble modeling. The AdaDT model emerged as the top-performing model, offering high validation accuracy and robust and consistent performance, making it a strong candidate for ensemble inclusion. The QDA model also delivered solid individual results, qualifying it as a compelling candidate for ensemble modeling. Ultimately, the exceptional performance of the AdaDT model makes it a solid foundation for constructing an ensemble model with enhanced predictive capabilities.

4.4. Ensemble Model Results

The selection criteria for ensemble fusion were determined by systematically examining various feature fusion configurations, as depicted in Table 4. The primary aim was to identify model combinations exhibiting high individual performance and complementary fusion outcomes. Several criteria were assessed, each focusing on specific aspects of model performance.

The first criterion, “Models Jac > 60”, emphasized models achieving a Jaccard Index greater than 60%. This criterion resulted in an ensemble with a test accuracy of 89.36% and a noteworthy Jaccard Index of 79.27%. It demonstrated the potential of models that excel in similarity measurement.

The subsequent criterion, “Models Jac > 65”, further increased the Jaccard Index threshold, favoring models with even higher similarity measurement capabilities. This ensemble configuration achieved a test accuracy of 89.85% and an exceptional Jaccard Index of 79.87%, further underscoring the models’ effectiveness with robust similarity metrics.

In the “Models Val Acc 80–90%” criterion, the focus shifted to models with validation accuracies ranging from 80% to 90%. This ensemble exhibited a test accuracy of 84.86% and a Jaccard Index of 72.52%, showcasing the importance of considering models with diverse validation accuracy ranges.

The criterion “Models Val Acc >80%” favored models with validation accuracies exceeding 80%. This ensemble configuration achieved a test accuracy of 89.58% and a Jaccard Index of 79.90%, indicating that models with high validation accuracy contributed significantly to the ensemble’s performance.

For the “All Models” criterion, a combination of all available models was explored. While this ensemble achieved a solid test accuracy of 88.66%, its Jaccard Index reached 78.41%, demonstrating that aggregating diverse models could still yield strong performance.

Lastly, the “Models Val Acc > 90%” criterion emphasized models with validation accuracies surpassing 90%. This ensemble configuration achieved the highest test accuracy of 90.07% and a Jaccard Index of 80.52%, underscoring the significant impact of models with exceptional validation accuracy.

4.5. Accident Simulation and Event Recording Results

4.5.1. Accident Simulation

To better analyze the results of this experiment, different scenarios based on speed limits were created, including (i) school zones and residential areas with 30 km/h limits, (ii) suburban roads with 60 km/h limits, and (iii) highways and motorways with 80 km/h limits.

School zones are sensitive areas with reduced speed limits of 30 km/h and increased pedestrian activity, particularly among children. To simulate this scenario, the vehicle speed threshold is set to 30. We also set the total number of vehicles to 10, the number of witness vehicles to 5, and 2 accident vehicles. Figure 12 illustrates the simulation with AVs 1 and 8 as the accident vehicles and 5 witness vehicles. Suburban roads connect residential neighborhoods to main city roads and typically maintain a speed limit of 60 km/h. To simulate this scenario, the speed threshold is set to 60. We also set the total number of vehicles to 10, the number of witness vehicles to 6, and 2 accident vehicles. Roads that traverse rural areas are used for faster travel and are less likely to have intersections and pedestrian traffic. To simulate such a scenario, the speed threshold is set to 80. We also set the total number of vehicles to 10, the number of witness vehicles to 8, and 2 accident vehicles. Figure 13 illustrates the simulation result of these different scenarios.

4.5.2. Event Recording

The event recording results of all the simulated scenarios are illustrated in Figure 14. As shown in these results, the vote for each vehicle was recorded in the blockchain with a unique transaction hash. The transaction latency and gas cost in Ether were also recorded for each transaction. From observation, the gas cost for initiating the first vote for a vehicle costs

4 . 4714^{- 13}

. However, subsequent votes for that vehicle were reduced to

2 . 7614^{- 13}

. The cost of recording the speaker ID was

4 . 7078^{- 13}

.

4.6. Query Results

The result of a typical query by an insurance officer is illustrated in Figure 15. As illustrated, the query produces the vehicle with the most votes, the number of votes, the query latency, the recorded speaker ID, and its query latency.

Figure 16 is an illustration of a comparison between simulations for different numbers of witnesses at various speed limits and various numbers of vehicles. There is no specific trend with latency. However, it is generally low and adequate for AV accident data recording and querying.

5. Discussion and Potential Challenges

5.1. Utility and Applicability

The proposed blockchain solution integrates seamlessly with the autonomous vehicle ecosystem, promising a framework that is not only secure and transparent but also autonomous in analyzing and establishing the circumstances leading to an accident. It leverages real-time data, assuring an objective assessment based on factual data.

5.2. Legal and Ethical Considerations

In the continuum of legal discourse, introducing a decentralized, autonomous consensus mechanism poses significant questions. The solution introduces a potential shift in how liability is determined in road traffic accidents, veering towards a community consensus grounded in data rather than relying on individual testimonies or police reports, which can sometimes be subjective. Furthermore, ethical considerations surrounding data privacy and the potential for system manipulation necessitate robust cryptographic measures to safeguard the integrity of the voting process.

5.3. Insurance Implications

From an insurance perspective, the solution stands to streamline claims processing significantly. Insurance companies can potentially access a transparent, immutable record of events, expediting claims assessments and reducing opportunities for fraud. Moreover, it opens avenues for more dynamic insurance policies, where premiums are calculated based on real-time data and safer driving habits are incentivized.

5.4. Technological Advancements and Future Directions

As technology evolves, the potential for integrating more sophisticated sensors in autonomous vehicles could enhance the solution’s efficacy. Future iterations could incorporate a more comprehensive array of data, including weather conditions, road quality, and traffic patterns, fostering a more nuanced understanding of accidents.

Moreover, further research could explore integrating intelligent city infrastructures, creating a cohesive network that leverages data from various sources to foster safer, more efficient urban transport ecosystems.

5.5. Challenges and Limitations

While promising, the solution has its challenges. Ensuring system security against potential hacks and unauthorized manipulations is paramount. Furthermore, the solution requires high cooperation and standardization among AV manufacturers to facilitate seamless data exchange and consensus formation. One main challenge with implementing this solution is the activation of EDR with the ability to sense the speed limits of other AVs and estimate whether the AV is driving within the proscribed limit. Implementing our blockchain solution becomes easier if such technology is instituted. Another challenge is with the AVs communicating with the blockchain network. Future research will focus on achieving such communication. There might be technical challenges in ensuring the accuracy and reliability of the sensors involved in data collection. The system would necessitate regular maintenance and updates to ensure optimal functioning.

6. Conclusions

In conclusion, this paper addresses the growing concerns surrounding AVs by proposing a multifaceted solution that enhances security, accountability, and accident adjudication. The key contributions of this work are threefold. Firstly, it introduces a robust mechanism for speaker identification within AVs to bolster security and prevent unauthorized access. Secondly, an ensemble-based approach leveraging speaker verification techniques is presented to combat voice spoofing, ensuring the authenticity of user commands. Finally, in the context of accidents involving AVs, this paper introduces an application of blockchain technology for transparent and tamper-proof event recording, enabling accurate accountability and liability allocation. This holistic approach fortifies AV security and establishes a robust framework for reliable accident investigation, ultimately fostering public trust in AVs.

Author Contributions

Conceptualization, J.N.N. and C.I.N.; Methodology, J.N.N.; Software, J.N.N.; Investigation, J.N.N.; Writing—original draft, J.N.N.; Writing—review & editing, J.N.N. and C.I.N.; Supervision, C.I.N., J.-M.L. and D.-S.K.; Project administration, J.-M.L. and D.-S.K.; Funding acquisition, J.-M.L. and D.-S.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by Priority Research Centers Program through the National research foundation (NRF) funded by the Ministry of Education, Science, and Technology (MEST) (2018R1A6A1A03024003), and by the Ministry of Science and ICT (MSIT), Korea, under the Innovative Human Resource Development for Local Intellectualization support program (IITP-2024-2020-0-01612) supervised by the IITP (Institute for Information & communications Technology Planning & Evaluation).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analysis, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations were used within this manuscript.

Abbreviation	Meaning
ABI	Application Binary Interface
ANN	Artificial Neural Network
AVs	Autonomous Vehicles
DApp	Decentralized Application
DMV	Department of Motor Vehicles
DSRCs	Dedicated Short-Range Communications
DT	Decision Tree
EDRs	Event Data Recorder
GT	Gamma Tone
GTCCs	Gamma Tone Cepstral Coefficients
IDE	Integrated Development Environment
KNN	K Nearest Neighbor
LDA	Linear Discriminant Analysis
MFCCs	Mel Frequency Cepstral Coefficients
ML	Machine Learning
NB	Naive Bayes
NHTSA	National Highway Traffic and Safety Administration
QDA	Quadratic Discriminant Analysis
RF	Random Forest
STE	Short-term Energy
SVMs	Support Vector Machines
ZCR	Zero-Crossing Rate

References

Njoku, J.N.; Nwakanma, C.I.; Kim, D.S. Evaluation of Spectrograms for Keyword Spotting in Control of Autonomous Vehicles for The Metaverse. In Proceedings of the Conference of the Korean Institute of Communications and Information Sciences (KICS), Seoul, Republic of Korea, 20–25 May 2022; Volume 78, pp. 1777–1778. [Google Scholar]
Njoku, J.N.; Anyanwu, G.O.; Igboanusi, I.S.; Nwakanma, C.I.; Lee, J.M.; Kim, D.S. State-of-the-Art Object Detectors for Vehicle, Pedestrian, and Traffic Sign Detection for Smart Parking Systems. In Proceedings of the 2022 13th International Conference on Information and Communication Technology Convergence (ICTC), Jeju Island, Republic of Korea, 19–21 October 2022; pp. 1585–1590. [Google Scholar] [CrossRef]
Rajabli, N.; Flammini, F.; Nardone, R.; Vittorini, V. Software Verification and Validation of Safe Autonomous Cars: A Systematic Literature Review. IEEE Access 2021, 9, 4797–4819. [Google Scholar] [CrossRef]
Halsey, A., III; Laris, M. Blind Man Sets Out Alone in Google’s Driverless Car; The Washington Post: Washington, DC, USA, 2016. [Google Scholar]
Pitts, W. 12 Self-Driving Cars Crashed in Arizona in the Last Year; 12News: Phoenix, AZ, USA, 2022. [Google Scholar]
Lee, D. Google Self-Driving Car Hits a Bus; BBC: London, UK, 2016. [Google Scholar]
Bevilacqua, M. Cyclist Killed by Tesla Car with Self-Driving Features; The Bicycling: London, UK, 2017. [Google Scholar]
Chougule, A.; Chamola, V.; Sam, A.; Yu, F.R.; Sikdar, B. A Comprehensive Review on Limitations of Autonomous Driving and its Impact on Accidents and Collisions. IEEE Open J. Veh. Technol. 2023, 1–20. [Google Scholar] [CrossRef]
Penmetsa, P.; Sheinidashtegol, P.; Musaev, A.; Adanu, E.K.; Hudnall, M. Effects of the autonomous vehicle crashes on public perception of the technology. IATSS Res. 2021, 45, 485–492. [Google Scholar] [CrossRef]
Oham, C.; Michelin, R.A.; Jurdak, R.; Kanhere, S.S.; Jha, S. WIDE: A witness-based data priority mechanism for vehicular forensics. Blockchain Res. Appl. 2022, 3, 100050. [Google Scholar] [CrossRef]
Nasr, M.A.; Abd-Elnaby, M.; El-Fishawy, A.S.; El-Rabaie, S.; Abd El-Samie, F.E. Speaker identification based on normalized pitch frequency and Mel Frequency Cepstral Coefficients. Int. J. Speech Technol. 2018, 21, 941–951. [Google Scholar] [CrossRef]
Zhao, X.; Wang, D. Analyzing noise robustness of MFCC and GFCC features in speaker identification. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–31 May 2013; pp. 7204–7208. [Google Scholar] [CrossRef]
Ayoub, B.; Jamal, K.; Arsalane, Z. Gammatone frequency cepstral coefficients for speaker identification over VoIP networks. In Proceedings of the 2016 International Conference on Information Technology for Organizations Development (IT4OD), Fez, Morocco, 30 March–1 April 2016; pp. 1–5. [Google Scholar] [CrossRef]
Leu, F.Y.; Lin, G.L. An MFCC-Based Speaker Identification System. In Proceedings of the 2017 IEEE 31st International Conference on Advanced Information Networking and Applications (AINA), Taipei, Taiwan, 27–29 March 2017; pp. 1055–1062. [Google Scholar] [CrossRef]
Totakura, V.; Vuribindi, B.R.; Reddy, E.M. Improved Safety of Self-Driving Car using Voice Recognition through CNN. IOP Conf. Ser. Mater. Sci. Eng. 2021, 1022, 012079. [Google Scholar] [CrossRef]
Guo, H.; Meamari, E.; Shen, C.C. Blockchain-inspired Event Recording System for Autonomous Vehicles. In Proceedings of the 2018 1st IEEE International Conference on Hot Information-Centric Networking (HotICN), Shenzhen, China, 15–17 August 2018; pp. 218–222. [Google Scholar] [CrossRef]
Sun, S.; Tang, H.; Du, R. A Novel Blockchain-Based IoT Data Provenance Model. In Proceedings of the 2022 2nd International Conference on Computer Science and Blockchain (CCSB), Wuhan, China, 28–30 October 2022; pp. 46–52. [Google Scholar] [CrossRef]
Rewatkar, H.R.; Agarwal, D.; Khandelwal, A.; Upadhyay, S. Decentralized Voting Application Using Blockchain. In Proceedings of the 2021 10th IEEE International Conference on Communication Systems and Network Technologies (CSNT), Bhopal, India, 18–19 June 2021; pp. 735–739. [Google Scholar] [CrossRef]
Ahamed, N.N.; Vignesh, R. A Build and Deploy Ethereum Smart Contract for Food Supply Chain Management in Truffle—Ganache Framework. In Proceedings of the 2023 9th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 17–18 March 2023; Volume 1, pp. 36–40. [Google Scholar] [CrossRef]
Sharan, R.V.; Abeyratne, U.R.; Swarnkar, V.R.; Porter, P. Automatic Croup Diagnosis Using Cough Sound Recognition. IEEE Trans. Biomed. Eng. 2019, 66, 485–495. [Google Scholar] [CrossRef]
Jahangir, R.; Teh, Y.W.; Nweke, H.F.; Mujtaba, G.; Al-Garadi, M.A.; Ali, I. Speaker identification through artificial intelligence techniques: A comprehensive review and research challenges. Expert Syst. Appl. 2021, 171, 114591. [Google Scholar] [CrossRef]
Sardar, V.M.; Shirbahadurkar, S.D. Speaker identification of whispering speech: An investigation on selected timbrel features and KNN distance measures. Int. J. Speech Technol. 2021, 21, 545–553. [Google Scholar] [CrossRef]
Guglani, J.; Mishra, A. Automatic speech recognition system with pitch dependent features for Punjabi language on KALDI toolkit. Appl. Acoust. 2020, 167, 107386. [Google Scholar] [CrossRef]
Marques, I.; Sousa, J.; Sá, B.; Costa, D.; Sousa, P.; Pereira, S.; Santos, A.; Lima, C.; Hammerschmidt, N.; Pinto, S.; et al. Microphone Array for Speaker Localization and Identification in Shared Autonomous Vehicles. Electronics 2022, 11, 766. [Google Scholar] [CrossRef]
Njoku, J.N.; Nwakanma, C.I.; Lee, J.-M.; Kim, D.S. Multi-Feature Concatenation for Speech Dependent Automatic Speaker Identification in Maritime Autonomous Vehicles. In Proceedings of the 2nd International Conference on Maritime IT Convergence (ICMIC 2023), Jeju Island, Republic of Korea, 23–25 August 2023; Volume 2, pp. 103–106. [Google Scholar]
Pfalzgraf, A.M.; Sullivan, C.; Sánchez, D.S. Autonomous Vehicle Speaker Verification System; Bradley University: Peoria, IL, USA, 2014. [Google Scholar]
Sardar, V.M.; Shirbahadurkar, S.D. Timbre features for speaker identification of whispering speech: Selection of optimal audio descriptors. Int. J. Comput. Appl. 2019, 43, 1047–1053. [Google Scholar] [CrossRef]
Soleymanpour, M.; Marvi, H. Text-independent speaker identification based on selection of the most similar feature vectors. Int. J. Speech Technol. 2017, 20, 99–108. [Google Scholar] [CrossRef]
Valero, X.; Alias, F. Gammatone Cepstral Coefficients: Biologically Inspired Features for Non-Speech Audio Classification. IEEE Trans. Multimed. 2012, 14, 1684–1689. [Google Scholar] [CrossRef]
Ren, Y.; Peng, H.; Li, L.; Xue, X.; Lan, Y.; Yang, Y. Generalized Voice Spoofing Detection via Integral Knowledge Amalgamation. IEEE/ACM Trans. Audio Speech Lang. Process. 2023, 31, 2461–2475. [Google Scholar] [CrossRef]
Khan, A.; Malik, K.M.; Ryan, J.; Saravanan, M. Battling voice spoofing: A review, comparative analysis, and generalizability evaluation of state-of-the-art voice spoofing counter measures. Artif. Intell. Rev. 2023, 56, 513–566. [Google Scholar] [CrossRef]
Ding, Y.Y.; Zhang, J.X.; Liu, L.J.; Jiang, Y.; Hu, Y.; Ling, Z.H. Adversarial Post-Processing of Voice Conversion against Spoofing Detection. In Proceedings of the 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Auckland, New Zealand, 7–10 December 2020; pp. 556–560. [Google Scholar]
Hemavathi, R.; Kumaraswamy, R. Voice conversion spoofing detection by exploring artifacts estimates. Multimed. Tools Appl. 2021, 80, 23561–23580. [Google Scholar] [CrossRef]
Muttathu Sivasankara Pillai, A.S.; L. De Leon, P.; Roedig, U. Detection of Voice Conversion Spoofing Attacks Using Voiced Speech. In Secure IT Systems; Reiser, H.P., Kyas, M., Eds.; Springer: Cham, Switzerland, 2022; pp. 159–175. [Google Scholar]
Javed, A.; Malik, K.M.; Malik, H.; Irtaza, A. Voice spoofing detector: A unified anti-spoofing framework. Expert Syst. Appl. 2022, 198, 116770. [Google Scholar] [CrossRef]
Zhang, Y.; Jiang, F.; Duan, Z. One-Class Learning Towards Synthetic Voice Spoofing Detection. IEEE Signal Process. Lett. 2021, 28, 937–941. [Google Scholar] [CrossRef]
Chettri, B.; Stoller, D.; Morfi, V.; Ramírez, M.A.M.; Benetos, E.; Sturm, B.L. Ensemble Models for Spoofing Detection in Automatic Speaker Verification. arXiv 2019, arXiv:1904.04589. [Google Scholar] [CrossRef]
Zhou, J.; Hai, T.; Jawawi, D.N.A.; Wang, D.; Ibeke, E.; Biamba, C. Voice spoofing countermeasure for voice replay attacks using deep learning. J. Cloud Comput. 2022, 11, 51. [Google Scholar] [CrossRef]
Monteiro, J.; Alam, J.; Falk, T.H. An Ensemble Based Approach for Generalized Detection of Spoofing Attacks to Automatic Speaker Recognizers. In Proceedings of the ICASSP 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 6599–6603. [Google Scholar] [CrossRef]
Yeong, D.J.; Velasco-Hernandez, G.; Barry, J.; Walsh, J. Sensor and Sensor Fusion Technology in Autonomous Vehicles: A Review. Sensors 2021, 21, 2140. [Google Scholar] [CrossRef] [PubMed]
Sana, F.; Azad, N.L.; Raahemifar, K. Autonomous Vehicle Decision-Making and Control in Complex and Unconventional Scenarios—A Review. Machines 2023, 11, 676. [Google Scholar] [CrossRef]
Wäschle, M.; Thaler, F.; Berres, A.; Pölzlbauer, F.; Albers, A. A review on AI Safety in highly automated driving. Front. Artif. Intell. 2022, 5, 952773. [Google Scholar] [CrossRef] [PubMed]
Kropka, C. “Cruise”ing for “Waymo” Lawsuits: Liability in Autonomous Vehicle Crashes; Richmond: Richmond, VA, USA, 2016. [Google Scholar]
De Brito Gonçalves, J.P.; Spelta, G.; da Silva Villaça, R.; Gomes, R.L. IoT Data Storage on a Blockchain Using Smart Contracts and IPFS. In Proceedings of the 2022 IEEE International Conference on Blockchain (Blockchain), Espoo, Finland, 22–25 August 2022; pp. 508–511. [Google Scholar] [CrossRef]
Rajendar, S.; Thangavel, U.; Devendran, S.; Selvi, V.; Muthumanickam, S.S. Blockchain for Securing Autonomous Vehicles. In Proceedings of the 2023 Second International Conference on Electronics and Renewable Systems (ICEARS), Tuticorin, India, 2–4 March 2023; pp. 713–717. [Google Scholar] [CrossRef]
Aishwarya, R.; Vivek Anand, M. Blockchain Framework For Securing Autonomous Vehicles. In Proceedings of the 2023 2nd International Conference on Advancements in Electrical, Electronics, Communication, Computing and Automation (ICAECA), Coimbatore, India, 16–17 June 2023; pp. 1–5. [Google Scholar] [CrossRef]
Narbayeva, S.; Bakibayev, T.; Abeshev, K.; Makarova, I.; Shubenkova, K.; Pashkevich, A. Blockchain Technology on the Way of Autonomous Vehicles Development. Transp. Res. Procedia 2020, 44, 168–175. [Google Scholar] [CrossRef]
Tyagi, R.; Sharma, S.; Mohan, S. Blockchain Enabled Intelligent Digital Forensics System for Autonomous Connected Vehicles. In Proceedings of the 2022 International Conference on Communication, Computing and Internet of Things (IC3IoT), Chennai, India, 10–11 March 2022; pp. 1–6. [Google Scholar] [CrossRef]
Gandhi, G.M.; Salvi. Artificial Intelligence Integrated Blockchain for Training Autonomous Cars. In Proceedings of the 2019 Fifth International Conference on Science Technology Engineering and Mathematics (ICONSTEM), Chennai, India, 14–15 March 2019; Volume 1, pp. 157–161. [Google Scholar] [CrossRef]
Guo, H.; Li, W.; Nejad, M.; Shen, C.C. Proof-of-Event Recording System for Autonomous Vehicles: A Blockchain-Based Solution. IEEE Access 2020, 8, 182776–182786. [Google Scholar] [CrossRef]
Kara, M.; Merzeh, H.R.; Aydın, M.A.; Balık, H.H. VoIPChain: A decentralized identity authentication in Voice over IP using Blockchain. Comput. Commun. 2023, 198, 247–261. [Google Scholar] [CrossRef]
IEEE Std 802.11-2020; IEEE Standard for Information Technology–Telecommunications and Information Exchange between Systems—Local and Metropolitan Area Networks–Specific Requirements—Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications. IEEE: Piscataway, NJ, USA, 2023; pp. 1–4379. [CrossRef]
Ardila, R.; Branson, M.; Davis, K.; Henretty, M.; Kohler, M.; Meyer, J.; Morais, R.; Saunders, L.; Tyers, F.M.; Weber, G. Common Voice: A Massively-Multilingual Speech Corpus. In Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), Online, 11–16 May 2020; pp. 4211–4215. [Google Scholar]
Alnavar, K.; Babu, C. Blockchain-based Smart Contract with Machine Learning for Insurance Claim Verification. In Proceedings of the 2021 5th International Conference on Electrical, Electronics, Communication, Computer Technologies and Optimization Techniques (ICEECCOT), Mysuru, India, 10–11 December 2021; pp. 247–252. [Google Scholar] [CrossRef]
Krichen, M.; Lahami, M.; Al-Haija, Q.A. Formal Methods for the Verification of Smart Contracts: A Review. In Proceedings of the 2022 15th International Conference on Security of Information and Networks (SIN), Sousse, Tunisia, 11–13 November 2022; pp. 1–8. [Google Scholar] [CrossRef]
Abdellatif, T.; Brousmiche, K.L. Formal Verification of Smart Contracts Based on Users and Blockchain Behaviors Models. In Proceedings of the 2018 9th IFIP International Conference on New Technologies, Mobility and Security (NTMS), Paris, France, 26–28 February 2018; pp. 1–5. [Google Scholar] [CrossRef]

Figure 1. Levels of vehicle automation.

Figure 2. A speaker identification system used to reduce the number of authorized users of an autonomous vehicle.

Figure 3. How voice spoofing affects autonomous vehicles.

Figure 4. Proposed process flow for the development of the proposed system.

Figure 5. Cellular vehicular network.

Figure 6. Speaker identification system model.

Figure 7. Hash digest of speaker ID and accident votes data.

Figure 8. When an accident happens, accident and witness vehicles vote on the accident event.

Figure 9. An illustration of the smart contract employed.

Figure 10. Sequence diagram for accident and witness vehicles.

Figure 11. Illustration of the platforms used for blockchain simulation. (a) Remix IDE. (b) Ganache.

Figure 12. Simulation showing accident, witness, and normal AVs.

Figure 13. Simulation results for (a) 30 km/h, (b) 60 km/h, and (c) 80 km/h.

Figure 14. Event recording results for (a) 30 km/h, (b) 60 km/h, and (c) 80 km/h.

Figure 15. Query results for (a) 30 km/h, (b) 60 km/h, and (c) 80 km/h.

Figure 16. Comparison of latency result.

Table 1. Statistics of dataset.

Dataset/Speaker Index	1	2	3	4	5	6	7	8	9	10	Total
Validation data	3094	1424	4339	2128	2661	2904	2290	968	1857	1456	23,121
Test data	756	441	1029	525	732	673	461	226	481	399	5723
Total dataset	3850	1835	5368	2653	3393	3577	2751	1194	2338	1855	28,844

Table 2.

F_{1}

score, Jaccard index, and computation time results of all feature fusion configurations.

Table 2.

F_{1}

score, Jaccard index, and computation time results of all feature fusion configurations.

Metrics & Features	$F_{1}$ Score (%)		Jaccard Index		Computation Time (s)
	Validation	Testing	Validation	Testing	Validation	Testing
GTCC [12,13]	96.99	69.29	0.94	0.56	0.15	0.251
MFCC [14]	97.66	81.5	0.95	0.71	1.56	0.26
GTCC + Pitch + Short-term energy + ZCR	96.82	71.23	0.94	0.58	0.18	0.26
MFCC+ Pitch + Short-term energy + ZCR [11]	97.73	82.23	0.96	0.73	0.44	0.29
MFCC + GTCC + Pitch + Short-term energy + ZCR	98.76	84.54	0.98	0.76	0.29	0.50

Table 3.

F_{1}

score, Jaccard index, and computation time results of all models.

Table 3.

F_{1}

score, Jaccard index, and computation time results of all models.

Models/Metrics	Validation Accuracy (%)	Test Accuracy (%)	Test $F_{1}$ Score (%)	Jaccard Index
KNN	98.82	85.38	85.38	72.62
SVM	93.91	87.16	87.16	75.04
RF	92.10	83.71	83.71	69.83
NB	76.57	74.07	74.07	58.37
DT	84.36	69.12	69.12	50.95
Ada	77.44	72.81	72.81	56.35
AdaDT	88.50	81.60	81.60	66.91
ImpKNN	99.17	84.22	84.22	70.85
LDA	84.65	82.62	82.62	69.65
QDA	92.60	86.55	86.55	74.97

Table 4. Test accuracy and Jaccard index of all Ensemble model configurations.

Ensemble Criteria	Test Accuracy (%)	Jaccard Index
Models Jac > 60	89.36	79.27
Models Jac > 65	89.85	79.87
Models Val Acc 80–90%	84.86	72.52
Models Val Acc > 80%	89.58	79.90
All Models	88.66	78.41
Models Val Acc > 90%	90.07	80.52

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Njoku, J.N.; Nwakanma, C.I.; Lee, J.-M.; Kim, D.-S. Enhancing Security and Accountability in Autonomous Vehicles through Robust Speaker Identification and Blockchain-Based Event Recording. Electronics 2023, 12, 4998. https://doi.org/10.3390/electronics12244998

AMA Style

Njoku JN, Nwakanma CI, Lee J-M, Kim D-S. Enhancing Security and Accountability in Autonomous Vehicles through Robust Speaker Identification and Blockchain-Based Event Recording. Electronics. 2023; 12(24):4998. https://doi.org/10.3390/electronics12244998

Chicago/Turabian Style

Njoku, Judith Nkechinyere, Cosmas Ifeanyi Nwakanma, Jae-Min Lee, and Dong-Seong Kim. 2023. "Enhancing Security and Accountability in Autonomous Vehicles through Robust Speaker Identification and Blockchain-Based Event Recording" Electronics 12, no. 24: 4998. https://doi.org/10.3390/electronics12244998

APA Style

Njoku, J. N., Nwakanma, C. I., Lee, J.-M., & Kim, D.-S. (2023). Enhancing Security and Accountability in Autonomous Vehicles through Robust Speaker Identification and Blockchain-Based Event Recording. Electronics, 12(24), 4998. https://doi.org/10.3390/electronics12244998

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing Security and Accountability in Autonomous Vehicles through Robust Speaker Identification and Blockchain-Based Event Recording

Abstract

1. Introduction

1.1. Background and Motivation

1.2. Contributions

2. Related Works

2.1. Speaker Identification

2.2. Voice Spoofing

2.3. Safety and Accountability of AVs

2.4. Blockchain Solutions for AVs

3. Methodology

3.1. Vehicular Network

3.2. Ensemble Model for Anti-Spoofing

3.2.1. Problem Formulation

3.2.2. Feature Extraction

3.2.3. Zero-Crossing Rate (ZCR)

3.2.4. Feature Fusion and Selection

3.2.5. ML Model Selection

3.2.6. Ensemble Model Selection

3.2.7. Dataset

3.3. Event Recording with Blockchain

3.3.1. Accident Monitoring Algorithm

3.3.2. AV Incident Lifecycle

3.3.3. Blockchain Solution

4. Experiments and Evaluation

4.1. Metrics

4.1.1. Speaker Identification Metrics

4.1.2. Blockchain-Based Event Recording Metrics

4.2. Simulation Details

4.2.1. Speaker Identification Models

4.2.2. Ensemble Model Reproduction and Accident Simulation

4.2.3. Blockchain Simulation and Smart Contract Development

4.3. Feature and Model Selection Results

4.4. Ensemble Model Results

4.5. Accident Simulation and Event Recording Results

4.5.1. Accident Simulation

4.5.2. Event Recording

4.6. Query Results

5. Discussion and Potential Challenges

5.1. Utility and Applicability

5.2. Legal and Ethical Considerations

5.3. Insurance Implications

5.4. Technological Advancements and Future Directions

5.5. Challenges and Limitations

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI