An Enhanced Model for Detecting and Classifying Emergency Vehicles Using a Generative Adversarial Network (GAN)

: The rise in autonomous vehicles further impacts road networks and driving conditions over the road networks. Cameras and sensors allow these vehicles to gather the characteristics of their surrounding traffic. One crucial factor in this environment is the appearance of emergency vehicles, which require special rules and priorities. Machine learning and deep learning techniques are used to develop intelligent models for detecting emergency vehicles from images. Vehicles use this model to analyze regularly captured road environment photos, requiring swift actions for safety on road networks. In this work, we mainly developed a Generative Adversarial Network (GAN) model that generates new emergency vehicles. This is to introduce a comprehensive expanded dataset that assists emergency vehicles detection and classification processes. Then, using Convolutional Neural Networks (CNNs), we constructed a vehicle detection model demonstrating satisfactory performance in identifying emergency vehicles. The detection model yielded an accuracy of 90.9% using the newly generated dataset. To ensure the reliability of the dataset, we employed 10-fold cross-validation, achieving accuracy exceeding 87%. Our work highlights the significance of accurate datasets in developing intelligent models for emergency vehicle detection. Finally, we validated the accuracy of our model using an external dataset. We compared our proposed model’s performance against four other online models, all evaluated using the same external dataset. Our proposed model achieved an accuracy of 85% on the external dataset.


Introduction
Smart cities are widely developed and have been investigated differently recently.Technological development helps to build and monitor these cities.They aim to improve the quality of life by improving the quality of services such as education, healthcare, and transportation.These services have been linked to technological innovation [1][2][3].Traffic crises are the most critical challenges in traditional cities, especially crowded ones.Modern technologies emerge and provide solutions, especially in the matter of enhancing the safety conditions on road networks.Emergency vehicle management is one of the most critical problems that require sensitive and real-time solutions.Special driving rules are announced to vehicles around emergency vehicles, such as opening the way for them and giving them the highest priority to traverse a signalized road intersection.
Detecting emergency vehicles on the road network is the first step in reacting according to location, speed, and other parameters.Drivers used to be able to see these emergency vehicles and respond accordingly visually.Emergency vehicles produce sirens in more crowded and fast road scenarios to alert the surrounding drivers.Several drivers panic to react to these sudden sirens.Sometimes, they fail to determine these emergency vehicles' exact location, speed, or other characteristics.Thus, they fail to respond promptly and adequately, threatening the road network's safety conditions.Benefiting from modern vehicles' advanced equipment and technologies, it is necessary to develop an intelligent emergency vehicle detection system.Models that use artificial intelligence technology can improve and increase the efficiency of logistics transportation services, reduce response time, choose the healthiest and fastest routes, and alert surrounding vehicles to open the way for them [4][5][6][7].
Artificial intelligence (AI) offers many varied solutions in this field.It has excellent potential in Internet applications to monitor and manage traffic and predict future events through machine learning (ML).Many scientific research studies have tried to address this topic by finding a way to detect and classify vehicles on the roads.Some of this research specifically studied the issue of detecting emergency vehicles through photos and videos to inform and alert the concerned authorities [8,9].Some problems are reported regarding these previous studies that consider either the used method, the tested dataset, or the accuracy of the obtained model.
Several studies have introduced intelligent emergency vehicle detection models in the literature to classify vehicles on road networks.The main weakness of the previous studies is the suitability of the datasets used.The unbalanced dataset rarely involves an emergency vehicle and is not the best choice for training the machine learning mechanisms and predicting the existing emergency vehicles [10].We have noticed that some datasets used are unrealistic in several scenarios.These datasets' considered photos or videos are not taken from road scenarios (e.g., toys or vehicles on an agency) [11][12][13].In addition, the method of determining the accuracy as a criterion is not always the most appropriate criterion to rely on to accept the results [14].An additional study, exemplified by Sheng et al. [15], has put forth a learning-oriented strategy addressing video temporal coherence due to the inadequacy of recently devised techniques, specifically, filters designed for improving, restoring, editing, and analyzing static images when applied to video clips.The inherent distinction lies in recognizing that a video transcends a mere sequence of individual images.This disparity poses a challenge in the context of effectively detecting emergency vehicles in transit on roads.
The real problem we may face in applying the proposed model is the same dataset on which we will train the model.It turns out that several conditions must be achieved in the dataset to obtain the precise and accurate detection of emergency vehicles.Among these conditions are the size of the dataset and the quality of the images, in addition to the balance characteristics [16].Consequently, this work aims to find a model that achieves these goals with the required accuracy standards.We also plan to produce a suitable new dataset that achieves acceptable accuracy using Generative Adversarial Networks (GANs) due to their proven ability to improve and enhance dataset quality and address specific challenges in vehicle detection.GANs have been successfully used to enhance datasets in many fields [17][18][19][20], while GANs have been used to improve data quality in image translation models to improve detection under poor lighting conditions to improve night images and increase the accuracy of vehicle detection in the dark [21,22].
In addition, some image restoration models using GANs have proven effective in mitigating the negative effects of occlusion on vehicle detection [23].This confirms the benefit of GAN networks to create artificial images that enhance the dataset and simulate reality in various difficult circumstances while maintaining the quality of the dataset, which ultimately leads to more accurate detection of emergency vehicles.Finally, we aim to test and verify the generated dataset.Figure 1 illustrates the general steps of the proposed work, encompassing all the stages from obtaining the initial dataset and its refinement using GANs, to the evaluation and validation of the newly generated dataset.
The remainder of this paper is organized as follows: Section 2 studies previous work in this field of research.Then, Section 3 investigates the main characteristics of existing datasets that have been used to detect emergency vehicles on road networks.It clarifies the main weaknesses and problems in each dataset.Section 4 presents the steps of gathering, preparing, and augmenting the images for the dataset.It also clearly shows the steps of generating the new images using GANs.The details of the testing, verification, and comparing processes are explained in Section 6.Finally, Section 7 concludes the entire paper and recommends some future studies.

Related work
The significant increase in the number of vehicles and the traffic crises that may disrupt the mobility of emergency vehicles encourage several researchers to work in this field.Using new technology, such as the Internet of Things (IoT), to propose new tools or models should help reduce these problems' effects and assist emergency vehicles over the road network.This section first investigates previous studies of vehicle and object detection methods over the road network.Then, we study the usage of the GANs model to improve or augment the existing datasets and its role in improving the detection of vehicles.

Vehicle and Object Detection Methods
In our exploration of studies related to object detection, numerous research endeavors were identified, particularly those focusing on feature extraction from images.For instance, the investigation undertaken by Jiang al. [24] sought to seamlessly incorporate empirical aesthetic rules with conventional machine learning algorithms and deep learning techniques for feature extraction.This integration demonstrated efficacy in object detection; however, a noteworthy challenge surfaced.The protracted duration required for feature extraction from images posed a hindrance, rendering it incongruent with the exigencies of real-time emergency vehicle detection.Subsequently, it was observed that several research studies have leveraged advanced artificial intelligence and machine learning technologies to detect and categorize vehicles within the road network.Shuvendu Roy and Sakif Rahman [9] proposed an automated detection system to detect emergency vehicles from a closed circuit television camera (CCTV) using a neural network.They used a pre-trained model to detect objects containing 80 categories (e.g., bike, car, and motorcycle).This study aimed to classify the detected vehicles into emergency vehicles and non-emergency vehicles.Then, a Convolutional Neural Network (CNN) technology was used for the emergency vehicle classification class based on analyzing the visually detected images.They augmented the dataset by changing properties, such as mirroring the images, zooming in, zooming out, rotating, etc. [14].After that, they used the YOLOv3 algorithm to detect emergency vehicles [25].In the obtained results, the accuracy of classifying the vehicles is more than 97%.This is of better accuracy than the traditional system that had previously been used.However, this study never determined the size of the used dataset before or after the augmentation process.Moreover, the confusion matrix has not been investigated to show if the data are balanced and to ensure that the accuracy standard used can be adopted or needs more improvement.
On the other hand, two recent works were proposed by D. Ganesh [26] and G. Punyavathi et al. [27].The first study (i.e., D. Ganesh, [26]) presented an automated system to detect emergency vehicles using machine learning and CNN technologies.The researcher standardized the dataset with some algorithms and adjusted the images.This study developed a new model by the neural network, using the pre-trained CNN model named "ResNet152 CNN Model".The researchers explained how the model works and the required mechanism for building it.This is without exposing the actual application of the model or giving any results or accuracy measures of the detection objects.The latter study proposed by Punyavathi et al. [27] investigated various vehicle detection methods, including traditional statistical methods, a novel algorithm for vehicle classification, and deep learning using the YOLOV3 algorithm.The study demonstrated the effectiveness of the YOLOV3 algorithm for vehicle detection and tracking, achieving an average precision of 95.8% and a recall of 97.5% on a dataset of vehicles captured under varying lighting and weather conditions.The authors collected the dataset using cameras placed on vehicles and fixed locations.While the study did not explicitly state whether the dataset was balanced or imbalanced, the confusion matrix presented in their research suggests that it is unbalanced.Therefore, as previously noted, relying on accuracy and recall as measures may not be appropriate.
Moreover, Pillai and Valles [28] designed a new model to classify and detect vehicles using CNN and YOLO algorithms based on their types and colors.The researcher first designed two CNN models to classify the vehicle's types and colors.Then, they designed a faster model to detect vehicles using YOLO.In this study, the images were improved by eliminating fog, enhancing contrast, and removing noise, accomplished with various tools, including DCP [29] and CLAHE [30].This process led to an improvement in the accuracy of vehicle detection and classification.The object was extracted in several ways, such as a colored graph, HOG, and pre-trained CNN, using the coordinates of the four corners of the image perimeter and the proportion of the vehicle's presence relative to the square.However, the other proposed methodology uses YOLO9000 to classify the vehicle by detecting the type of vehicle as well as the color of the vehicle.Despite the precise sequence of the work method, this study has presented a proposal for the detection method without testing or discussing the results of the experiments.Table 1 summarizes some previous studies that have mainly proposed to detect emergency vehicles on the road network.The main objectives, algorithm, datasets, obtained results, and drawbacks of each study are illustrated in this table.The source of the dataset is not mentioned Accuracy > 97% According to the number of data in the confusion matrix, the data were not balanced, and acc and Recall could not be adopted as real accuracy indicators.
Pillai and Valles [28] Classification and detection of the type and color of vehicles

New model for classification and detection of the type and color of vehicles by two algorithms
The author presented only a proposal without offering any results of the experiments.

Using Generative Adversarial Networks (GANs) to Generate Augmented Datasets
On the other hand, while numerous research studies have proposed creating improved datasets to enhance results in this field, a distinct lack of focus on studies dedicated to emergency vehicles is observed.Notably, Agrawal and Choudhary [17] recently utilized Generative Adversarial Networks (GANs) in the medical domain, specifically generating chest images.This study highlights the importance of GANs in overcoming challenges related to limited or rare images.
Tanaka and Aranha [18] used GANs to generate new artificial datasets.Then, they used the generated datasets to train the machine learning algorithms.These trained algorithms were evaluated to prove the usefulness of GANs in generating new datasets.The training did not use all the synthetic datasets, and the decision tree classifier was used to examine the quality of the training datasets.The performance evaluation result of using the generated GANs' datasets was better than the original dataset.The artificially generated datasets obtain better accuracy and recall than the original datasets.The results confirm the need to oversample data using the GANs method for images specifically.
V. Kukreja et al. [19] proposed creating a new model using GANs to identify vehicle plates with high accuracy and speed.After collecting the images and changing their size, they omitted many images that were less than the size required by the experiments.They need to use GANs to over-sample the datasets of images.The proposed model is that after the camera captures the images of the vehicles, it sends them to the GANs.In turn, the GANs model generates new artificial images accordingly.Then, the CNN algorithm uses deep learning technology to detect the plates inside the images.The generator in GANs generates new images similar to authentic images, and the discriminator (classifier) tries to differentiate between fake and real images.The dataset was used from Pascal, increased from other datasets, and the images were taken by the camera and used STARGAN for implementation.The classifier trains the dataset and creates a noisy subset sample with size m.The proposed method achieves an accuracy of up to 99% in recognizing car plates compared to other studies that did not use the GANs to increase the dataset sample.Furthermore, Y. Gao et al., 2021 [20] proved that data augmentation using GANs is suitable for generating new images of unmanned aerial vehicles (UAVs).Infrared data collection is the primary method for detecting all-weather UAVs through CNN.To obtain better detection effects with better accuracy, the researcher increased the infrared dataset of UAVs through the hostile generation network (i.e., GANs).The work was divided into two parts; the first was to train the GANs to generate new images that are identical to reality by entering the original set of images for the GANs.Second, after testing, the generated images were combined with the authentic images to be trained to detect UAVs through CNN.After that, the GAN dataset was tested, and the detection of UAVs was examined on the trained dataset.The results obtained from the artificially generated dataset are better than those obtained from the original dataset in terms of accuracy, recall, and F-Score metrics.
On the other hand, some image retrieval models using GANs have proven effective in alleviating the negative effects of occlusion on vehicle detection.Xu et al. [23] proposed a framework aimed at accurately determining the spatial and temporal distribution of vehicle wheel loads in scenarios involving occlusion.They used object detection models and key points to identify the load of the vehicle and the wheels, respectively.A binary image classification model was used for the covered vehicles, followed by a novel image mapping approach using Generative Adversarial Networks (GANs).This method aims to effectively convert images of occluded vehicles into unoccluded images, which facilitates the accurate determination of wheel-bearing locations and has been validated through field tests.
Kandasamy and D. Rajamanickam [21] merged YOLO to detect vehicles with GANs algorithms to improve night images and increase the accuracy of detecting vehicles at dark.Thus, night images are introduced to GANs, which generate new daytime images to increase vehicle detection accuracy, that is, by translating night images into daytime images using (Cycle GANs) to improve the features of images and vehicles and improve lighting conditions and bad weather.Two GAN models are used in this work; one generator and one discriminator are assigned for each model.The first generator is responsible for translating nighttime images into daytime images.Then, the first discriminator checks the fake "generated" daytime images for the real-time image.The second generator takes real daytime images and generates nighttime images.Then, the second discriminator checks the output to see if it is a real or fake image.Cycle GANs are the first proposed model for improving the images.After that, another model is proposed to detect inside the tested images using the neural network and YOLOv5.
Finally, a similar successful study includes using GANs specifically to reduce the decrease in accuracy in vehicle detection models in low-light conditions at night.Wu and Yixun [22] proposed a model to improve detection accuracy using image translation technology Using CycleGAN.They relied on training datasets BDD and UA-DETRAC, and custom nighttime vehicle images.They converted well-established daytime vehicle datasets into nighttime equivalents to enhance the training sets for detection models based on YOLO-v5.This approach significantly improved detection accuracy, with a 10.4% increase in the PR curve area and a 9% rise in peak F1 score.The study results highlight the potential for image translation to enhance detection accuracy at night, albeit with some practical limitations and computational requirements.
As we can see, all previous research shows that using the GANs model to pre-process data either enhances or augments the images, or balances the datasets.

Available Traffic Datasets Contains Emergency Vehicles
This section analyzes various datasets containing images of emergency vehicles, specifically, ambulances, fire trucks, and police vehicles.We assess the datasets based on the type, quantity, and quality of images, emphasizing their realism and the fact that they were captured from real-world scenarios.Furthermore, we investigate the image augmentation techniques used in these datasets, how researchers used them, and for what purposes.We aim to identify the limitations and shortcomings in the available datasets.Moreover, we validate our findings by testing the previously proposed models in this field on real-world images to determine their effectiveness.

Emergency Vehicle Detection
The "emergency vehicle detection" dataset [13] from Roboflow [31] contains 365 training images and 158 testing and validation images with a medium quality of 640 × 640 pixels.However, all the images in the training set are of ambulance vehicles, and there are no other emergency vehicles, such as police or fire truck vehicles.It is an unbalanced dataset; only a few images contain ambulance vehicles.Additionally, no image augmentation techniques were applied.Dissanayake et al. [32] used this dataset by the Yolo3 detection algorithm.The dataset was divided into 80% training and 20% testing and validation, aiming to detect emergency vehicles upon their arrival at the traffic light and give them a higher priority to pass through the signalized intersection.This study obtained an accuracy of 82%.After testing the online model available on Roboflow [13] with several realistic images of emergency vehicles, it was observed that the model's detection performance was poor.This suggests that the model is not trained to detect many types of emergency vehicles outside its limited dataset.

"JanataHack_AV_ComputerVision" and "Emergency vs. Non-Emergency Vehicle Classification"
The second dataset, found in multiple locations on Kaggle "JanataHack_AV_ ComputerVision" [11], "Emergency vs. Non-Emergency Vehicle Classification" [33], contains approximately 3300 images, including 1000 emergency vehicles such as ambulances, fire trucks, and police vehicles.It also contains around 1300 images of other vehicles, making it a nearly balanced dataset.However, the images are of poor quality, with a resolution of 224 × 224 pixels.Kherraki and Ouazzani [34] used this dataset for emergency vehicle classification, achieving over 90% accuracy.Still, the primary issue remains the quality of the images and the limited use of data augmentation.

Ambulance Regression
The "Ambulance Regression" dataset [35] contains 307 images of ambulances in the YOLOv8 format, with 294 training images and 13 testing images.This dataset applies only standard augmentation techniques such as rotation, cropping, and brightness adjustment.However, the dataset lacks real road images, and there are very few test images.

Ambulans
The "Ambulans" dataset [36] contains 2134 images of ambulances in YOLOv8 format and uses only standard augmentation techniques, including rotation, cropping, brightness adjustment, exposure adjustment, and Gaussian blur.However, there is a problem with the low quality of some images due to augmentation.In addition, many of the images are not from real roads but from exhibitions or the Internet.

Ambulance_detect
The "ambulance_detect" dataset [37] contains 1400 images of ambulances in the YOLOv8 format without any augmentation techniques applied.The dataset contains 1400 images divided into 980 training images, 140 test images, and 280 validation images.However, the main challenge is the lack of real images of roads in the dataset and the fact that some images in the dataset are from car exhibitions.In contrast, other images were taken from real-world scenarios.In addition, the number of test images is relatively small compared to the number of training images, which may affect the model's ability to generalize.

Emergency Vehicle Detection
The "Emergency Vehicle Detection" dataset [38] contains 1680 vehicle images in the YOLOv8 format, as it relies only on applying standard augmentation techniques such as horizontal flip, random cropping, and salt noise.The dataset is divided into 1470 training images, 71 test images, and 139 validation images.The challenge here is that the images do not focus solely on emergency vehicles for training purposes.Most of the images in the "Emergency Vehicle Detection" dataset were captured using video cameras placed in specific locations.This may limit the ability of the dataset to train the model to detect vehicle emergencies in different locations.In addition, some of the captured images may not show the detailed features of emergency vehicles due to the cameras being located at a far distance, which creates challenges for the model in classifying and detecting emergency vehicles accurately.

FALCK
The "FALCK" dataset [12] contains 176 images of ambulances and firefighting vehicles in the YOLOv8 format, with no augmentation techniques applied.The dataset has 140 training images, one testing image, and 35 validation images.The images are of good quality, but there is a lack of real road images, and the number of test images is too small for training images.

Sirens
Similarly, the "Sirens" dataset [39] comprises 213 medium-quality images of ambulances, firefighting, and police vehicles in the YOLOv8 format, including 145 training images, 22 testing images, and 44 validation images.However, this dataset is very small, which limits its ability to build high-accuracy and realistic detection models.There is also a lack of real-world road images in this dataset.

Smart Car
The "Smart car" dataset [40] was designed for detecting emergency vehicles, including ambulances, firefighting, and police vehicles.It includes 1152 images pre-processed with auto-orientation and resizing to 640 × 640 (stretch), with no augmentation techniques.The dataset is split into 921 training and 231 testing images, with medium-quality images.However, the dataset lacks real road images, and some images are unrealistic.Moreover, most of the images were not taken on the road.Furthermore, the dataset contains no images of vehicles from other classes.
The datasets we reviewed exhibit various limitations and inadequacies.

1.
Many available datasets lack realism.They comprise images not obtained from realworld scenarios.This may affect the models' ability to generalize to practical situations.

2.
Available datasets often suffer from a class imbalance.Some types of emergency vehicles have a disproportionate number of images compared to others.Consequently, the models' performance may be biased toward certain classes and suboptimal for others.

3.
Most datasets have limited test data, making it challenging to assess the models' performance accurately.Poor test data quality also makes developing models that work effectively in real-world scenarios difficult.

4.
Some datasets have few images and lack comprehensive data augmentation techniques, hindering the model's generalization of different scenarios.Excessive augmentation techniques may also reduce image quality.

5.
The limited usage of these datasets in published research suggests that they are not widely recognized or effective for ambulance detection.
Table 3 summarizes the main findings from the datasets examined.After defining the main limitations in the available datasets, to address them, we explored the potential of Generative Adversarial Networks (GANs) for augmentation as discussed in the next section.

Generating a Newly Balanced Dataset
In this section, we gather images from available datasets and augment them with realistic images captured from public streets, utilizing Generative Adversarial Networks (GANs) to generate new images that were integrated into the existing dataset to enhance its diversity and balance to address the limitations imposed on previous datasets and improve the model's ability to detect emergency vehicles accurately.Therefore, providing a large and balanced dataset that includes realistic images of all categories of emergency vehicles, such as ambulances, fire engines, police vehicles, and other vehicles, ensures training an effective and accurate model for emergency vehicle detection.

Gathering a Dataset for Emergency Vehicle Detection
Building a new dataset for emergency vehicle detection requires going through the following steps:

Initial Data Collection
In the first step, we gathered an initial dataset of emergency and non-emergency vehicle images by identifying relevant sources and collecting representative images.We obtained about 500 images of emergency vehicles from previously available datasets [35,36,39,40] and another approximately 500 images of non-emergency vehicles from the available dataset [41].This enabled us to curate a diverse and realistic set of images for training our model.
Moreover, we conducted a more extensive search for datasets containing at least one class of emergency vehicles.We came across the Firetruck Dataset [42] and the Police Cars Dataset [43].The latter two datasets helped us with augmentation to expand our dataset by around 700 additional images.Thus, the total number of images became approximately 1700.

Image Collection through Video Recording and Frame Selection
Here, we utilized our cameras to capture additional images.We conducted multiple tours across Jordan, specifically in Irbid, and recorded videos of emergency and nonemergency vehicles on public streets.These videos were converted into individual frames, and we selected appropriate frames that captured the vehicles at various stages.This added approximately 2900 images of emergency and non-emergency vehicles to our dataset, increasing the total number of images to more than 4600.
To further expand our dataset, we searched for publicly available videos on YouTube featuring emergency vehicles, such as the ones in [44].We downloaded these videos and extracted individual frames, carefully selecting the ones that captured emergency vehicles.We obtained 6000 images of both emergency and non-emergency vehicles from this step.There were 3200 images of emergency vehicles, and the remainder are non-emergency vehicles.Figure 2 displays a sample of the images obtained through this step.

Dataset Pre-processing
To prepare the dataset for analysis, several pre-processing steps are required.The first step involves resizing the images to a uniform size of 416 × 416 pixels.After that, the pixel values of the images need to be scaled down to the range between 0 and 1.These steps are crucial in standardizing the data and preparing them for analysis.Removing fake images from the dataset is important, as they can adversely affect the results.Removing duplicates is also necessary to avoid redundancy and ensure each data point is unique.These pre-processing steps are essential for cleaning and standardizing the dataset for further analysis.

Dataset Augmentation
Dataset augmentation aims to expand and increase the dataset size using a data generator from TensorFlow [45].This tool generates new variations of data.In our model, an image data generator takes existing images as input and applies random transformations to generate diverse images.Transformations include rotation, shifting, shearing, zooming, and flipping.A callback function saves newly generated images if validation accuracy exceeds 0.7, ensuring high-quality augmented images.This enhances the detection model's performance during training.The dataset grew from around 6000 to 18,000 images by applying these techniques, boosting training data quantity and diversity.

Generating New Images Using GANs
Generative Adversarial Networks (GANs) comprise a generator and a discriminator, making them adept at generating images resembling real ones [46].The generator produces synthetic data akin to real data, while the discriminator discerns between real and synthetic data [47].Deep Convolutional Generative Adversarial Network (DCGAN) is a type of GAN that employs CNNs in both the generator and discriminator networks, allowing it to capture spatial dependencies in images and generate realistic images [48].The DCGAN model can help generate new images similar to real ones.The generated images can be added to the existing dataset, increasing its diversity and size and improving the performance of machine learning models trained on the dataset.
Figure 3 visually represents the image generation process using GANs.In this process, the generator initiates by taking random noise as input and progressively transforms it through multiple layers, ultimately producing the desired image.On the other hand, the discriminator plays a crucial role in assessing the authenticity of the generated image.It takes the generated image as input and passes it through its layers to determine its realism by comparing it to the original dataset.As we discussed earlier, the GANs model consists of two networks (i.e., generator and discriminator).Figure 4 illustrates the general architecture of the designed GANs in our work.The exact architectures of each included network are presented in this section.First, the generator network contains four hidden layers and one output layer as shown in Figure 5.These layers are explained in detail here: • Hidden Layer 1: The input to the generator is a random noise vector of size latent_dim.This layer has n_nodes nodes, calculated as 16 × 16 × 128.The reshape layer is used to reshape the output of this layer into a 4D tensor of shape (16,16,128).On the other hand, the discriminator network has three hidden layers and one output layer as shown in Figure 6.The details of these layers are explained here:

•
Hidden Layer 1: This layer uses a convolutional layer (Conv2D) with 64 filters and a kernel size (4,4).It has a stride of (2, 2) and uses the LeakyReLU activation function with a negative slope of 0.2.

•
Hidden Layer 2: This layer uses a convolutional layer (Conv2D) with 128 filters and a kernel size (4,4).It has a stride of (4, 4) and uses the LeakyReLU activation function with a negative slope of 0.2.
• Hidden Layer 3: This layer uses a convolutional layer (Conv2D) with 128 filters and a kernel size (4,4).It has a stride of (2, 2) and uses the LeakyReLU activation function with a negative slope of 0.2.

•
Flatten and Output Layer: This layer flattens the output from the previous layer into a 1D tensor and applies a dropout layer to drop some connections for better random generalization.The final output layer has a single node with a sigmoid activation function that outputs a probability between 0 and 1, indicating whether the input image is real or fake.The discriminator model is trained to classify the input images as real or fake, so the loss function used during training is binary cross-entropy.The key hyperparameters to be adjusted in the designed GANs model for training and generating the desired images are latent_dim, learning rate, and batch size.The latent_dim hyperparameter is specifically related to the generator model.It determines the size of its input and is set to 128 based on preliminary experiments and the literature, suggesting it provides a good balance between the diversity and quality of generated images [49].Our model explicitly set the learning rate hyperparameter to 0.0002 for the Adam (Adaptive Moment Estimation) optimizer [50] used by the discriminator model.The Adam optimizer is a popular optimization algorithm that adapts the learning rate during training.By setting the learning rate to 0.0002, informed by its widespread use in stabilizing GAN training and providing effective convergence rates, we can control the step size used by the Adam optimizer to update the model's parameters.Additionally, the batch size was set to 32, which defines the number of samples processed together in each iteration during train-ing, considering that smaller batch sizes can help reduce overfitting while maintaining manageable memory usage [51].Moreover, the models were trained for 20,000 epochs, indicating the complete passes through the training dataset to ensure thorough learning without overfitting.The dataset used in the experiment consisted of 1000 images with a shape of (256, 256, 3), and the pixel values were scaled to be between −1 and 1.As for selecting additional parameters such as beta_1 (set to 0.5) and dropout_rate (0.3), as well as filters, kernel_sizes, and strides, they were determined based on their proven effectiveness to improve GAN performance in image generation [52].Table 4 illustrates the parameters in the designed GANs model.
The DCGAN model was trained using an adversarial approach, where the generator generates images that deceive the discriminator while the discriminator learns to differentiate between real and fake images.During training, we employed binary cross-entropy loss as the loss function.We recorded each epoch's discriminator and generator loss to monitor the training progress.The discriminator loss indicates the accuracy of the discriminator in classifying real and fake images, while the generator loss measures how well the generator can deceive the discriminator.Additionally, after each training epoch, we evaluated the performance of the discriminator on both real and generated images.The discriminator accuracy on real images indicates its ability to distinguish between real and fake images.In contrast, the discriminator accuracy on generated images shows how well the generator can deceive the discriminator.The experimental study utilized a DCGAN model with 7,579,332 parameters out of 7,149,955 that were trainable.The discriminator and generator components had 429,377 and 7,149,955 trainable parameters, respectively.The model's performance was evaluated during the training process after 300 epochs.
The results showed that as the model continued to train, the discriminator and generator losses decreased, while the discriminator's accuracy on real and fake images increased.For instance, at epoch 300, the generator loss was 1.749, and the discriminator accuracy on real and fake images was 0.4 and 0.8, respectively.However, after 12,000 epochs, the generator loss decreased to approximately 1, and the discriminator accuracy on real and fake images significantly improved to 0.94.The outcomes of the model demonstrated the effectiveness of the DCGAN architecture in generating new images of emergency vehicles.The model's performance gradually improved throughout the training process, although with some degree of fluctuation in the outcomes.A set of sample images generated during various epochs of the training is displayed in Figure 7.And Figure 8 displays a sample of images illustrating the progression of image enhancement produced by GANs as the number of epochs.However, despite producing many images during the training, the model still requires refinement, particularly when dealing with the existing dataset.Additionally, further improvement in the results is anticipated through continuous training and increased epochs, which could necessitate using a supercomputer or cloud computing to accelerate and enhance the training process.At the end of this section, a new dataset containing approximately 20,000 images was obtained.The new dataset is well balanced, comprising 10,000 images of emergency vehicles and 10,000 images of non-emergency vehicles.Most of the images in the dataset were taken from real-life scenarios and can be detected over real road scenarios.

Performance Evaluation: Test and Validate the New Dataset
As previously mentioned, the lack of a model with accurate detection capabilities for emergency vehicles can be attributed to the characteristics and realism of the dataset used.After obtaining a new dataset with approximately 20,000 images, we aimed to test and validate the dataset and develop an effective model for emergency vehicle detection.The first step involved developing an emergency vehicle detection model using the Convolutional Neural Network (CNN) algorithm.Then, the 10-fold cross-validation process was applied to validate the performance of the proposed detection model.Finally, we used a small external dataset collected using our smartphones to compare the performance of our proposed model with previously developed and online available emergency detection models.We employed several standard evaluation metrics to measure the performance of our emergency vehicle detection system.These metrics include accuracy, precision, recall, and F1-score [53].The definitions of these metrics, their meanings, and how to compute each metric are illustrated below: • F1-score It provides a balanced measure of precision and recall, considering both metrics to evaluate the system's performance:

Emergency Vehicle Detection Model Using CNN
The detection model utilized in this study employed the Convolutional Neural Network (CNN) algorithm, a highly effective model for deep learning and particularly suitable for tasks involving image classification.The classification process focused on distinguishing between emergency and non-emergency vehicle images, and it was conducted in two stages.The initial stage involved 6000 images before augmentation operations, with 3200 images for emergency vehicles and 2800 non-emergency vehicles.Subsequently, the second stage incorporated 20,000 images after augmentation.The CNN model was constructed using the TensorFlow-Keras framework.The model architecture encompassed various layers, including convolutional, max-pooling, flattening, and fully connected layers.The key parameters included a kernel size of (3,3) for the four convolutional layers, a pool size of (2, 2) for the max-pooling layers, and a dropout rate of 0.5 in the first dense layer.The training involved 10 epochs and a batch size of 32.The initial dataset comprised 6000 images collected from the previous datasets and taken by our camera, with no augmentation applied.It achieved an accuracy of 90.3%, precision of 91.1%, recall of 90.0%, and the F1-score reached 90.5%.The evaluation results for the training, validation, and testing datasets are presented in Table 5.In contrast, the confusion matrix for the testing dataset is shown in Table 6. Figure 9a   The final obtained dataset that comprises 20,000 images was employed, including adding images generated using Generative Adversarial Networks (GANs).Each category consisted of 10,000 images.The results showed notable improvement, with accuracy exceeding 90.9%, precision 93.0%, recall 88.2%, and the F1-score reaching 90.5%.The evaluation results for the training, validation, and testing datasets are presented in Table 7.In contrast, Table 8 shows the confusion matrix for the testing dataset.Figure 10a illustrates the training accuracy results for the final dataset across different epochs, whereas Figure 10b depicts the learning results for the final dataset during training and validation.Figure 11 graphically illustrates comparative results of the emergency vehicle detection model using the initial and final datasets.The model significantly improves when the number of images is increased from 6000 to 20,000 as indicated by the higher accuracy, precision, and F1-score.However, the recall for detecting negative cases is slightly decreased (90.0% to 88.2%).This can be attributed to the increased diversity of negative cases in the augmented dataset, challenging the model to identify a wider range of non-emergency vehicle instances.

10-Fold Cross-Validation
After classifying the model using the CNN algorithm, the dataset had to be evaluated to validate and ensure the new balanced dataset's performance.The 10-fold cross-validation technique was used on the two datasets: 6000 images in the initial dataset and 20,000 images in the final dataset.In the initial dataset of 6000 images before augmentation, the 10-fold cross-validation was applied to evaluate the model's performance.The results showed an average accuracy of 83.32%, precision of 83.44%, recall of 79.82%, and F1-score of 80.99%.Table 9 displays the results of all folds.The final dataset, consisting of 20,000 images (10,000 for each category), significantly improved the model's performance.Accuracy clearly improved, indicating the better identification of positive cases.While the recall is increased and more positive cases are captured, further improvements are possible.In general, the F1-score increased convergently for accuracy and recall, reflecting a well-balanced performance between accuracy and recall.Reinforcement and balancing positively affected the ability of the model to detect both positive and negative states.Table 10 presents the results of the 10-fold cross-validation in the final dataset.Figure 12 illustrates the improvement before and after augmentation.

Comparative Performance Study
In this section, we first test the performance of our developed model on an external dataset.Then, we aim to compare the performance of our proposed emergency vehicle detection model to previous models in this field.An external dataset of 200 images was used to test and compare the models' performance to ensure unbiased evaluation.The tested dataset was not included in the training process.Our developed model achieved an accuracy of over 85% on the external dataset, which is considered satisfactory.After evaluating the performance of our new model and confirming its effectiveness in detecting emergency vehicles, we conducted a brief comparison with some other available online models for detecting emergency vehicles [35,[54][55][56].This comparison was carried out using the same external dataset consisting of 200 images that were not included in the training set of our model.The objective was to assess how our model performed compared to these existing models.
The results of this comparison are presented in Table 11.The latter table indicates that our new model outperformed the other models in terms of accuracy and overall performance.It demonstrated superior abilities to detect emergency vehicles when presented with unseen images.It is essential to highlight that the primary obstacles faced by existing models stem from the quality and sufficiency of the datasets employed during their training.Our newly proposed model stands out, as it was trained on a more extensive and realistic dataset, contributing to its superior emergency vehicle detection performance.Figure 13 graphically compares the performance results between our new model for detecting emergency vehicles and the previously discussed models.As we infer from the figure, our proposed model performed better than previous models in this field in all the measured metrics.This means that the generated dataset has intensively improved the performance of the detection mechanisms.

Conclusions
In conclusion, the study provided a comprehensive overview of the necessity of having a realistic model capable of detecting emergency vehicles.We started with the main challenges in the current datasets and proposed a solution by creating a new dataset using several stages, the most important of which was the use of GANs, and we ultimately collected a new dataset consisting of 20,000 images.The new dataset was verified using 10-fold cross-validation, where a new model was built using the new dataset for emergency vehicle detection based on CNNs.The proposed model achieved a high accuracy of about 86%, outperforming several existing models with which it was compared, proving the dataset's realism and the possibility of using it for practical purposes such as traffic management.We believe that our study with the new dataset provides a valuable contribution to computer vision, confirms the effectiveness of using GANs to generate realistic images, and can inspire future research to improve object detection models' accuracy.With the new dataset generated, future research will be easier to use in other areas, such as determining the actual location of emergency vehicles or linking the model with traffic lights to open or close roads for emergency vehicles.

Figure 1 .
Figure 1.Methodology for developing the emergency vehicle detection model.

Figure 2 .
Figure 2. Samples images collected during video recording and frame selection.

• Hidden Layer 2 :
This layer uses a transposed convolutional layer (Conv2DTranspose) to upsample the input from the previous layer to a size of 52 × 52 pixels.It has 128 filters with a kernel size of (4, 4) and a stride of (2, 2).The ReLU activation function is used to introduce non-linearity.• Hidden Layer 3: This layer further upsamples the input to a size of 104 × 104 pixels.It has 256 filters with a kernel size of (4, 4) and a stride of (4, 4).The ReLU activation function is again used.• Hidden Layer 4: This layer upsamples the input to a size of 416 × 416 pixels.It has 512 filters with a kernel size of (4, 4) and a stride of (2, 2).The ReLU activation function is used.• Output Layer: The final output layer uses a convolutional layer (Conv2D) with three filters (for the three color channels in the image) and a kernel size of (5, 5).The tanh activation function generates pixel values in the range [−1, 1].

Figure 4 .
Figure 4.The architecture of the designed GANs.

•
Accuracy It measures the overall correctness of the system's predictions by calculating the ratio of correctly classified instances to the total instances: Accuracy = (True Positives + True Negatives)/Total number o f predictions (1) • Precision It quantifies the proportion of correctly predicted emergency vehicle instances out of all those predicted as emergency vehicles: Precision = (Total number o f True Positives)/(True Positives + False Positives) (2) • Recall It measures the proportion of correctly predicted emergency vehicle instances out of all the actual emergency vehicle instances: Recall = (Total number o f True Positives)/(True Positives + False Negatives)

Figure 9 .
Figure 9. Training and learning results on the initial dataset.(a) The training learning curve of the initial dataset.;(b) the training accuracy learning curve of the initial dataset.

Figure 10 .
Figure 10.Training and Learning Results on Final Dataset.(a) The Training Learning Curve of Final Dataset.(b) The Training Accuracy Learning Curve of Final Dataset.

Figure 11 .
Figure 11.Performance evaluation of the detection model.

Figure 13 .
Figure 13.Comparison of emergency vehicle detection models.

Table 1 .
Main characteristics of previous emergency vehicles detection mechanisms.
Table 2 summarizes the main previous studies that have used GANs to generate new datasets.It illustrates the main objective, used algorithm, original dataset, obtained results, drawbacks, and limitations.

Table 2 .
Main characteristics of previous studies that used GANs.

Table 3 .
Summary of the most important findings from the available datasets.

Table 5 .
CNN results on the initial dataset.

Table 6 .
CNN on initial dataset training confusion matrix.

Table 7 .
CNN results on final dataset.

Table 8 .
CNN on Final Dataset training Confusion matrix.

Table 9 .
The 10-fold cross-validation results on the initial dataset.

Table 11 .
Emergency vehicle detection model comparison.