1. Introduction
Aerial images captured by unmanned aerial vehicles (UAVs) are used for several applications such as urban planning, real-estate management, disaster evaluation, traffic congestion management, road network detection, vehicle detection, etc. [
1]. Analyzing aerial images can be useful for pattern recognition and the decision-making process. Recent advances in machine learning (ML) and deep learning (DL) can be used to extract meaningful data from aerial images. At the same time, Traditional ML methods involve functions such as feature selection and extraction. Feature selection and extraction layers were manual processes in traditional ML, and such layer necessities were automated in ML algorithms related to DL [
2,
3]. The automation of such layers is regarded as a benefit of DL networks, but it also has the drawback that DL networks necessitate more training datasets. The commonly utilized technique in DL-related image classifiers studies is the convolutional neural network (CNN). CNN with DL-related networks is a hastily emerging technology and is becoming a more extensive approach for image data-type classifiers studies [
4]. The renowned image classifications competition in recent years is the IMAGENET large-scale visual recognition challenge (ILSVRC). The highest-ranked participants employed CNN-related approaches in ILSVRC. CNN networks have two core structures: the pooling and convolution layers [
5,
6]. The convolution layer contains feature maps that are connected to each other and a series of weights, and connect to functions such as a rectified linear unit (RELU).
Generally, a UAV has an LTE or Wi-Fi communication device, flight control computers, and mission computers for sending and receiving drone data to the drone operating system or ground control station [
7]. On the other hand, the aerial images hold useful data relevant to confidential sites which need secure processing. It is of utmost importance to assure the security of transmission and processing of images acquired by aerial photography technologies as these images may contain crucial data concerning national security. Image encryption is a major and effective means to protect the security of image information [
8]. In simple terms, drones without security operations are utilized for criminal acts such as information theft, safety threats, invasion of privacy, and security threats. For prevention of such cybersecurity susceptibilities, special purpose drones deployed by military drones or state agencies for attack purposes, surveillance, and reconnaissance require data encrypting technology for protecting not just transmission data but also delicate data stored in the drone [
9,
10].
The encryption process can be executed by every device within the UAV depending on the data type that needs encryption [
11]. Flight control or communication data encryption can be processed in the flight control mechanism, and transmission data encrypting for the mission can be processed in the mission computers. Particularly, the drone has an encrypting operation in the transmission device, permitting all transmission data to be encrypted irrespective of data types [
12]. In the case of drone data encryption, there exists a technique of applying encrypting software on particular devices of the drones or attachment of hardware modules that have an encrypting function in the device for managing encryption just as in hardware modules. The software encrypting technique is based on hardware characteristics and the operating system of the mission-and-flight control computer. Moreover, many drone producers offer data encryption in a simple method by not altering the source code of prevailing drone flight control systems [
13].
This study develops an Artificial Intelligence-based Secure Communication and Classification for Drones Enabled Emergency Monitoring Systems (AISCC-DE2MS). The proposed AISCC-DE2MS technique applies an artificial gorilla troops optimizer (AGTO) algorithm with an ECC-Based ElGamal Encryption technique to assure security. For emergency situation classification, the AISCC-DE2MS model encompasses densely connected network (DenseNet) feature extraction, penguin search optimization (PESO)-based hyperparameter tuning, and long short-term memory (LSTM)-based classification. The simulation analysis of the AISCC-DE2MS model is tested using the AIDER dataset.
2. Literature Review
Sarkar et al. [
14] introduced a new Energy-aware Secure Internet of Drone (ESIoD) structure. A decisive research complexity tackled by this study is the way in which faster on-board processing can be accomplished and diminishes battery use for a drone for extending the flight duration while retaining data security of drone-captured images. In particular, UAV-captured realistic images were encoded by utilizing either RSA or AES techniques and offloaded by the on-board computers to a server to process cognitive activities, leveraging advanced faster R-CNN and standard Haar cascade classifiers. Ismael [
15] modelled an authentication and security mechanism related to the stream cipher lightweight HIGHT method and chaotic maps. The presented mechanism is particularly projected with an aim to decrease the less and computational use of UAV resources, and handles one drone and one ground control station (GCS) with several fly sessions. In [
16], the author indicates the drone map planner is a fog-related and service-oriented drone management mechanism that communicates, controls, and monitors drones across the network. Such a planner permits interaction with many UAVs on the internet that can be controlled anytime and anywhere without long-distance boundaries. The abovementioned planner offers access to fog computing sources for UAVs to heavy load computation.
Bera et al. [
17] devised a new access control technique for unauthorized drone detection and mitigation in an IoD environment, named ACSUD-IoD. By using the blockchain (BC)-related solution inculcates in ACSUD-IoD, the transaction data with the abnormal data for detecting unlicensed drones by the GSS were saved in a private BC, which are genuine and normal secure data from drones to the Ground Station Server (GSS). Ajmal [
18] presents an enhanced version of HEVC for encryption of multimedia data based on motion and texture energy estimation. For every frame of the video, the quantized and transformed coefficients were computed along with motion vectors. In the presented method, the energy can be computed for every block, and a comparison is made for threshold values in selecting the suitable encrypting method. Here, the author has employed the Advanced Encryption Standard (AES) as an encrypting method and devised for encrypting the most significant bits (MSB) as most of the data exist in MSB.
Zhang et al. [
19] propose a lightweight AKA method where there were only bitewise XOR operations and secure one-way hash functions when users and drones jointly authenticate one another. The presented method could reach AKA-security in the random oracle method and endure several renowned assaults. At the same time, the security comparison validates this presented method and offers security in a better way. Khan et al. [
20] designed an identity-related proxy signcryption method for addressing such complexities. At the time of data transfer among UAVs and to cloud servers, the devised method supports member revocation and outsourcing decryption. The projected technique depends upon the notion of Hyper-Elliptic-Curve-Cryptography (HECC) that enhances the efficiency of network computations. The author employs the Random Oracle Model (ROM) along with formal security analysis for assessing security toughness. Though several methods have been available in the literature, it is still needed to develop effective models for drone communication security and classification. Besides, the hyperparameter tuning of the DL models is not taken into account in the existing models, which needs to be addressed. Therefore, this work makes use of the PESO algorithm for the hyperparameter tuning of the DenseNet model.
3. Materials and Methods
In this study, a new AISCC-DE2MS technique has been developed for secure communication and classification of drone-based emergency monitoring systems. The proposed AISCC-DE2MS technique performed image encryption at the preliminary stage using the AGTO algorithm with an ECC-Based ElGamal Encryption technique in which the optimal key generation procedure takes place via the AGTO algorithm. To classify the images, the AISCC-DE2MS model encompasses DenseNet feature extraction, PESO-based hyperparameter tuning, and LSTM-based classification.
Figure 1 depicts the block diagram of the AISCC-DE2MS approach.
3.1. Data Used
In this work, the experimental validation of the AISCC-DE2MS model is tested using the AIDER dataset [
21]. The aerial images for disaster events are gathered from several online sources (for instance, news agencies, YouTube, Google images, web sites, bing images, and so on) utilizing the keywords “UAV” or “Drone” or “Aerial View” and events such as “Earthquake”, “Highway accident”, “Fire”, and so on. Images were primarily of distinct sizes compared to the standardized ones prior to training. Every image examined primarily comprises the events concerned. Next, the event was centered in the image so that some geometric transformations in augmentation could not eliminate it in the image view. In the data gathering method, several disaster events with distinct resolutions are taken, and in several conditions, in terms of viewpoint and illumination. In this study, a total of 8540 images under five classes are used for experimentation.
Table 1 provides a detailed description of the dataset.
3.2. Image Encryption Suing AGTO-ElGamal Technique
In this study, the AISCC-DE2MS technique makes use of AGTO with the ECC-based ElGamal encryption technique to accomplish security. The ECC-based ElGamal encryption with several parameters and steps utilized are given as follows [
22].
The additive homomorphic approach grasps by succeeding Equation (1),
whereas + symbol is intended for the additive homomorphic and the public key is
E. InECC, additive homomorphic encryption can be assumed. According to the elliptic curve’s (ECs) algebraic infrastructure on finite domains, ECC-based ElGamal was explained. The finite fields were separated into two types such as prime and binary fields
. During this current analysis, ECs over prime fields were examined. The special class of EC demonstrated in Equation (2) was utilized in EC over real numbers as
refers to the resultant curve where modulus is , and the altered co-efficient of formula assumed a and . The value of ranges from to but not on every point on the curve. Even with lesser bit size, the ECC also projected a similar security level by processing an overhead decrease when compared to homomorphic and RSA techniques. For the optimal key generation process, the AGTO algorithm is used in this study.
The AGTO approach consists of exploitation and exploration phases [
23]. Equations (3)–(13) describe the major concept of the presented model. The exploration phase is used primarily to implement a global search for space. It makes use of three major models such as moving to the position of other gorillas, migrating to an unknown position, and migrating to a known position. The exploitation phase can be expressed in the following:
Here,
is the gorilla’s existing location, and
represents the gorilla’s location in the
iteration.
defines a variable among [0, 1] that defines the migration method to be selected.
and
refer to the lower and upper limits, correspondingly.
stands for a randomly designated gorilla member from the population and
denotes the randomly chosen gorilla candidate location vector.
, and
show the random value within [0, 1] upgraded on all the iterations. Furthermore,
C,
L, and
H are evaluated as follows:
From the expression, stands for the existing iteration count and Max It refers to the overall iteration amount. In Equations (5) and (6), and denotes random values among [0, 1] upgraded on all the iterations. In Equation (8), shows the random number that lies within −C to C. Eventually, in the exploration stage, the algorithm evaluates the fitness value of each solution, and the fitness value is GX(t) < X(t), and the X(t) solution is substituted with the GX(t) solution.
The exploitation stage of AGTO makes use of two strategies, competing for adult female gorillas and following silverback gorillas. The approach is chosen by comparing the
value with the variable
W set. When
C ≥
W, the AGTO exploits the following silverback gorilla strategy; when
C <
W, competing for adult female gorillas is preferred. Following the silverback gorilla is formulated as follows:
In Equation (9),
denotes the silverback gorilla location. In Equation (10),
indicates the location of every candidate gorilla in the
iteration and
refers to the overall amount of gorillas. Competition with adult female gorillas is expressed below.
In Equation (13), denotes a random value within [0, 1] upgraded on all the iterations. In Equation (14), refers to a variable. In Equation (15), when rand ≥ 0.5, E denotes a random number in the standard distribution and the dimension of the problem; when rand < 0.5, indicates a random number selected from a standard distribution. Eventually, in the exploitation stage, the algorithm evaluates the fitness value of each solution. Where GX(t) < X(t), then the X(t) solution is substituted with the GX(t) solution, and the optimum solution chosen in the whole population is considered as a silverback gorilla.
In this study, the AGTO algorithm derives a fitness function for the maximization of the PSNR value.
3.3. Image Classification Model
In this study, the AISCC-DE2MS model performs image classification via three major processes such as DenseNet feature extraction, PESO-based hyperparameter tuning, and LSTM-based classification.
3.3.1. DenseNet Feature Extraction
DenseNet is a novel addition to the NN applied to visual object detection. DenseNet169 is a procedure of DenseNet [
24,
25]. DenseNet is intended for the implementation of an image classifier. DenseNet169 is higher than the remaining DenseNet. Generally, in DenseNet, each image has been trained. An ImageNet image database is trained using the methodology and tested and stored by loading that saved methodology instead of ImageNet. Now, the outcomes of the initial layer are attained and concatenated with the upcoming layer in DenseNet. DenseNet has demonstrated an ability to decrease the accuracy from a high level NN generated by gradient vanishing, whereas there exists a long path between the input and output layers and the information attained vanishes just prior to accomplishing its target. DenseNet is a kind of typical network. According to the novel statistics, a convolutional layer is accurate and more effective once it is short and connected amongst layers such as input and closer output. Now, DenseNet was applied to connect each layer in feedforward manner. Usually, a traditional convolutional network contains
layers. Furthermore,
linking exists amongst the
layer. That characterizes one link amongst each layer and the subsequent layers.
It takes
L (
L + 1)/2 direct connections from the network. For each layer as input, each presiding layer is exploited. To input each subsequent layer, the FM is employed. Various advantages are attained from DenseNet. It reduces the gradient vanishing problem. The feature propagation is reinforced, feature reprocessing is stimulated, and it reduces the parameter number. The suggested method is evaluated on the highly competitive image detection benchmark ImageNet and also it uses the load and saved function. The incorporation of the layer was possible as described when there is whole similarity from the FM dimension during addition or concatenation. DenseNet is separated into DenseBlock with a dissimilar filter count, but within the block, the dimension is similar. Batch normalization (BN) was implemented with the help of down-sampling with transition layers and it is considered to be a crucial phase with CNN. As per the development of the channel dimension, the number amongst the DenseBlocks of filter variation, and growth rate can be denoted as
. It acts as a significant part of the generalizing
layer. Further, the data count is essential from each layer and has been evaluated as follows:
3.3.2. Hyperparameter Tuning Using PESO Algorithm
For optimal tuning of the hyperparameters related to the DenseNet model, the PESO algorithm is exploited. A beneficial search activity of an animal is described as search activity from which the energy gain is moderately significant to that of the energy consumed [
26]. For penguins, breathing ability remains a base factor while diving since the dive is based on reserve oxygen. The more they consume oxygen, the more depth and speed they gain, and the trip time begins to decrease. Food needed by a massive amount of groups differs by availability of food, species, and age within the area of concern. An optimization has been straightforwardly developed using the hunting strategy of penguins. To design the model, the penguin position within the area of concern is represented by “
”. The distribution of the group is accomplished by the existence of food sources within the area of interest. Fundamental steps are established by employing these rules and are summarized as a pseudo-code demonstrated in Algorithm 1.
By considering the search space as a multi-dimension search space, the optimal solution is proposed by utilizing the food distribution probability to accomplish an optimum value and attain the maximal food quantity. Each member of a group utilizes a similar solution within the searching space. Generally, all the groups perform different dives on the basis of the probability of food expediency and the amount of reserve oxygen within the search space.
Penguins tends to switch essential data for determining an optimum solution and relocate the groups afterward carrying out multiple dives within the area of concern. The accomplishment of the global optimal without trapping in the local optimum after multiple iterations allows the PESO to implement better than traditional population-based techniques.
Algorithm 1: Penguin Search Optimization Algorithm (PESO) |
1: Produce a random population of P penguins in groups 2: Initialize the probability of the existence of fish in the levels and holes 3: For i = 1 to the number of generations 4: For every individual 5: While oxygen reserves are not exhausted 6: Take a random step 7: Enhance the penguin position using the location upgrade formula 8: Upgrade the quantity of fish eaten using this penguin 9: End While 10: End For 11: Upgrade the quantity of eaten fish in the levels, holes, and best groups 12: Reallocate the probability of penguins in the levels and holes 13: Upgrade the optimal solution 14: End For |
The PESO method makes a derivation of a fitness function (FF) that results in enhanced classifier performance. In this article, the reduction of the classifier error rate can be regarded as the FF, as presented in Equation (18).
3.3.3. LSTM Based Classification
Finally, the LSTM model is applied to allocate proper diverse class labels such as Collapsed Building, Fire/Smoke, Flood, Traffic Accident, and Normal. This is a gradient-based recurrent neural network structure that resolves the gradient disappearing problem. Long-term dependency in texts is a problem that advances at the training stage of a traditional RNN, whereas backpropagation through the time gradient descent tends to evaporate, or in rare situations, exponentially explode.
Figure 2 demonstrated the architecture of LSTM, and is capable of accurately managing long-term dependency in the text. Every LSTM is composed of forget, output, and input gates. Every gate is composed of pointwise multiplication operations and a sigmoid neural network layer. The input from the existing and hidden states of the preceding cell is first passed to the forget gate by utilizing
X as an input vector at time
and
as a number of LSTM cells in the forward pass, to decide whether to discard it with an output of
or store the data with an output of 1 (Equation (19)). The main objective of the forget gate is to define either forget knowledge or not. The sigmoid function output
of the product between the weights
and the sum of bias
and input
that encompasses the input from the preceding state
and the existing input
is the forget value
that lies within [0, 1].
The next process is to exploit the update of cell state (
C) based on Equation (20)
Let
be the output of the
function that exploits
, and
, expressed by
In Equation (21),
denotes the output of the sigmoid layer as follows:
Then, the sigmoid activation output (
O) is calculated by utilizing Equation (23) and based on bias
the existing input
, prior state
, and weights
,
The next step is to upgrade the hidden state (
h) by utilizing the below formula.
4. Results and Discussion
This section inspects the encryption and classification performance of the AISCC-DE2MS model [
27]. A few sample images are demonstrated in
Figure 3. The results are investigated in terms of mean square error (MSE), and peak signal-to-noise ratio (PSNR). It is defined as the ratio of the maximum possible value (power) of a signal and the power of distorting noise that influences the quality of its representation.
Table 2 offers a detailed encryption outcome of the AISCC-DE2MS model with other optimization algorithms.
Figure 4 provides a comparative MSE examination of the AISCC-DE2MS model with other models. The experimental values inferred that the AISCC-DE2MS model gained minimal MSE values under all images. For instance, on image-1, the AISCC-DE2MS model obtained a lower MSE of 0.042, whereas the genetic algorithm (GA), crow search algorithm (CSA), and artificial bee colony (ABC) algorithms attained a higher MSE of 0.060, 0.120, and 0.165, respectively. Additionally, on image-2, the AISCC-DE2MS approach achieved a lesser MSE of 0.086 whereas the GA, CSA, and ABC algorithms gained a superior MSE of 0.096, 0.153, and 0.202 correspondingly. In line with, image-3, the AISCC-DE2MS approach reached a minimal MSE of 0.063, whereas the GA, CSA, and ABC systems attained a higher MSE of 0.078, 0.099, and 0.166, correspondingly.
A comprehensive PSNR inspection of the AISCC-DE2MS model with recent models under diverse images is given in
Figure 5. The results reported that the AISCC-DE2MS model showed improved values of PSNR under every image. For instance, on image-1, the AISCC-DE2MS model depicted a maximum PSNR of 61.898 dB, whereas the GA, CSA, and ABC algorithms exhibited a minimal PSNR of 60.349 dB, 57.339 dB, and 55.956 dB, respectively. Moreover, on image-2, the AISCC-DE2MS approach illustrated a maximal PSNR of 58.786 dB, whereas the GA, CSA, and ABC algorithms showcased a reduced PSNR of 58.308 dB, 56.2849 dB, and 55.077 dB, correspondingly. Eventually, in image-3, the AISCC-DE2MS approach depicted a maximum PSNR of 60.137 dB, whereas the GA, CSA, and ABC systems outperformed the lower PSNR of 59.210 dB, 58.174 dB, and 55.930 dB, correspondingly.
The confusion matrices created by the AISCC-DE2MS model during the classification process are demonstrated in
Figure 6. The figures reported that the AISCC-DE2MS model accurately classified all the samples into different classes under all runs.
Table 3 and
Figure 7 exhibit an overall classification performance of the AISCC-DE2MS model under distinct runs. The experimental outcomes highlighted that the AISCC-DE2MS model reached enhanced results under all classes. For instance, on run-1, the AISCC-DE2MS model offered an average
,
,
,
, and
of 95.24%, 83.91%, 74.66%, 78.56%, and 84.66%, respectively. In line with run-2, the AISCC-DE2MS algorithm has an obtainable average
,
,
,
, and
of 93.90%, 82.16%, 66.28%, 72.38%, and 79.38%, correspondingly. Afterward, on run-3, the AISCC-DE2MS system had an accessible average
,
,
,
, and
of 92.23%, 82.10%, 54.49%, 61.75%, and 71.97%, correspondingly. At last, on run-4, the AISCC-DE2MS methodology provided an average
,
,
,
, and
of 92.94%, 81.30%, 59.62%, 66.44%, and 75.24%, correspondingly.
The training accuracy (TRA) and validation accuracy (VLA) acquired by the AISCC-DE2MS approach on the test dataset is in
Figure 8. The experimental result exposed that the AISCC-DE2MS system gained higher values of TRA and VLA. Specifically, the VLA looked superior to that of the TRA.
The training loss (TRL) and validation loss (VLL) accomplished by the AISCC-DE2MS approach on the test dataset are recognized in
Figure 9. The experimental result stated that the AISCC-DE2MS system has reached decreased values of TRL and VLL. When predefined, the VLL is lesser than TRL.
An obvious precision-recall investigation of the AISCC-DE2MS system on test dataset is represented in
Figure 10. The figure revealed that the AISCC-DE2MS algorithm has resulted in maximal values of precision-recall values under all classes.
A detailed ROC examination of the AISCC-DE2MS algorithm on the test dataset is demonstrated in
Figure 11. The outcomes representing the AISCC-DE2MS system outperformed its capability in classifying various classes of test dataset.
Finally, a detailed comparative study of the AISCC-DE2MS model with other existing models is offered in
Table 4 [
28]. The experimental values implied that the SCNet, SCFCNet, baseNet, and MobileNet models have shown lower performance over other models. Next to that, the ERNet and VGG16 models have certainly demonstrated improved outcomes.
However, the AISCC-DE2MS model has ensured better performance over other models with higher of 95.24% and processing time of 11.13 ms. Therefore, the presented AISCC-DE2MS model can be employed for the effectual classification model on the drone’s environment.