Human Pathogenic Monkeypox Disease Recognition Using Q-Learning Approach

While the world is working quietly to repair the damage caused by COVID-19’s widespread transmission, the monkeypox virus threatens to become a global pandemic. There are several nations that report new monkeypox cases daily, despite the virus being less deadly and contagious than COVID-19. Monkeypox disease may be detected using artificial intelligence techniques. This paper suggests two strategies for improving monkeypox image classification precision. Based on reinforcement learning and parameter optimization for multi-layer neural networks, the suggested approaches are based on feature extraction and classification: the Q-learning algorithm determines the rate at which an act occurs in a particular state; Malneural networks are binary hybrid algorithms that improve the parameters of neural networks. The algorithms are evaluated using an openly available dataset. In order to analyze the proposed optimization feature selection for monkeypox classification, interpretation criteria were utilized. In order to evaluate the efficiency, significance, and robustness of the suggested algorithms, a series of numerical tests were conducted. There were 95% precision, 95% recall, and 96% f1 scores for monkeypox disease. As compared to traditional learning methods, this method has a higher accuracy value. The overall macro average was around 0.95, and the overall weighted average was around 0.96. When compared to the benchmark algorithms, DDQN, Policy Gradient, and Actor–Critic, the Malneural network had the highest accuracy (around 0.985). In comparison with traditional methods, the proposed methods were found to be more effective. Clinicians can use this proposal to treat monkeypox patients and administration agencies can use it to observe the origin and current status of the disease.


Introduction
Monkeypox reports for 2022 indicate yet another worldwide virus following the COVID-19 epidemic that shook the world in 2020 [1]. Smallpox and cowpox viruses are closely related to this one. The main carriers of the disease are rats and monkeys. However, it is also common human-to-human transmission [2]. The virus was originally found in monkeys at a laboratory in Copenhagen, Denmark, in 1958 [3]. Monkeypox was reported in 1.
The detailed survey relevant to the classification of monkeypox diseases was carried out. The authors' contribution, limitations, and future scope are discussed; 2.
The proposed work is developed to recognize the Monkeypox Virus with respect to four classes; 3.
The performance of the model can be measured with the help of evaluation metrics, namely, AUC, CA, F1, precision, and Recall. The DQN approach achieves a classification accuracy (C.A.) of 0.975; 4.
The comparison of the proposed work with the benchmark mark algorithms, namely, DQN, DDQN, Policy Gradient, and Actor-Critic. Compared with other state-of-theart methods, the proposed DQN outperforms others with higher accuracy and AUC.
The organization of the paper is as follows: Section 2 explains the literature review and the main contribution of the work; Section 3 describes the proposed work, and the subsection includes the dataset, pre-processing, and reinforcement method; Section 4 describes the results and discussion of the proposed method; and Section 5 explains the conclusion and future work.

Related Works
Deep learning and machine learning have shown themselves to be quite helpful in the diagnosis and treatment of medical conditions. To forecast illnesses, researchers have developed systems using ML and DL. For Alzheimer's disease, there is presently no accurate diagnostic procedure. The authors searched EEG epochs for characteristics that would distinguish Alzheimer's patients from controls with the help of an ML technique called the Support Vector Machine (SVM) [22]. The accuracy of the research was good since it took into account how each patient's diagnosis was made.
One of the top five leading causes of transience in the world nowadays is heart disease. One of the biggest problems in medical detection is predicting cardiovascular disease. Machine learning has been shown to be capable of sifting information generated by the healthcare sector to find relevant information. A number of studies have just begun to scratch the surface of the potential applications of ML to heart disease prediction. The authors of [23] proposed a technique to advance cardiovascular disease identification by identifying key variables using ML techniques. The prediction model examined a diversity of feature arrangements and well-known classification techniques [24]. Parkinson's disease (P.D.) diagnosis is frequently reached following extensive medical evaluation and examination of clinical indications [25]. To finish this assessment, a range of motor symptoms must frequently be defined. However, conventional diagnostic techniques rely on the subjective estimation of gestures that could be challenging to spot [26]. By using machine learning algorithms, we may be able to identify relevant traits that are underused in the medical analysis of Parkinson's disease and which may be used to identify P.D. Liver disease is prevalent in medical settings and is linked to a higher risk of death (FLD). The ability to progress a practical strategy for anticipation, initial analysis, and treatment is provided by early analysis of FLD patients. The authors proposed a machine learning system to forecast the beginning and course of the illness to help with the identification of at-risk people, the diagnosis itself, and FLD prevention and care. For the purpose of predicting FLD, a number of classification models, including logistic regression (L.R.), random forest (R.F.), I Bayes (N.B.), and an artificial neural network (ANN), have been created. The effectiveness of the four models was evaluated using the receiver operating characteristic curve area (ROC). Four categorization algorithms that accurately diagnose fatty liver disease were created and researched by the authors of [27]. However, the R.F. performed better as compared with the classification methods. The help of a random forest model in the clinical setting may be advantageous for the early treatment of patients' liver wellness. A severe danger to health and well-being, chronic kidney disease (CKD), affects an alarmingly rising percentage of people worldwide. Early-stage CKD frequently has no symptoms; hence, its presence is frequently disregarded. The CKD-slowing medication works best when it is administered with patients' diagnoses. ML models' quick and precise detection skills can help therapists achieve this goal in a big way. The authors suggested an ML approach for diagnosing CKD [28]. The machine learning repository at UCI provided the CKD data repository, which is severely biased by missing values [29]. For a variety of reasons, patients could forget or were unable to give some metrics. As a result, it is typical for data gaps to be discovered in clinical practice. When the missing data were included, six ML algorithms were utilized to create the models. As compared with other models, the R.F. model's diagnostic accuracy was the greatest (99.75%). It was suggested to utilize a hybrid model that included logistic regression and a random forest using a perceptron after examining the flaws of the earlier models, reaching an accuracy of 99.83%.
The authors suggested a ground-breaking ML method to accurately detect coronary artery disease (CAD). Ten tried-and-true machine-learning methods were taken into account. The use of data standardization and pre-processing increased the efficacy of these tactics [30]. The authors combined stratified ten-fold cross-validation and particle swarm optimization, a type of genetic algorithm, allowing for simultaneous optimization of feature selection and classifier parameters. The recommended technique significantly improved the accuracy of the machine learning models employed in medical and scientific research, according to experimental data. There are currently 75 nations outside of Africa where there are verified occurrences of monkeypox, making it a serious global health problem. Due to the virus's resemblance to measles and chickenpox, it can be difficult to diagnose monkeypox early in the course of the illness. Deep learning systems have been found to be successful in automatically detecting skin lesions when given enough training data. Because monkeypox is so uncommon, there was already a knowledge gap across the globe prior to the current epidemic. In their quest to solve this puzzling problem, researchers are encouraged by the accomplishments of supervised machine learning in the identification of COVID-19. However, the scarcity of monkeypox skin photographs makes it difficult to use machine learning to identify the disease from patient skin scans. The authors provided the largest archive of images of monkeypox skin. A comprehensive image library of both healthy and unhealthy skin may now be located and used thanks to web scraping. Symptoms of measles, cowpox, chickenpox, smallpox, and monkeypox can be seen in photographs of afflicted skin [31]. The Monkeypox Skin Lesion Dataset was assembled by the authors of [32] using images of measles, chickenpox, and monkeypox skin lesions (MSLD). Most of these images originated from web pages that were open to the public. Initial approaches included a 3-fold cross-validation experiment and increased the model size with new data. The second stage involved categorizing ailments using pre-trained deep learning models, including VGG-16, ResNet50, and InceptionV3 (e.g., monkeypox).
ResNet50 achieved the highest level of overall accuracy. The authors suggested a DL model for identifying monkeypox illness that depends on picture data acquisition and execution using a modified version of VGG16 [33]. Since the data repository is created by assembling images from many open-source publications and websites, it is safer to use and distribute it for creating and installing any machine learning model. The VGG16 model with the modifications was utilized in two distinct research. According to the results of both studies, this model may successfully identify individuals who have monkeypox. The model's capacity to anticipate and extract such properties enables the development of a greater understanding of the characteristics of the monkeypox virus. In the existing system, there is a lack of a system that can detect A.D. diseases in prior knowledge; sometimes, the model fails to converge properly. The proposed approach can reduce the convergence problem by tuning the neurons and can be used to find meaningful patterns within the data, eventually helping identify patterns for diseases other than A.D. Our conclusions are supported by our explainable artificial intelligence (XAI) techniques. Table 1 illustrates the data analysis for the detection of monkeypox disease detection for  the feature extraction model; here, the DenseNet-169 model obtained an accuracy of around  84.24%, which is higher as compared to the remaining approach. Similarly, the f1 score value is higher, around 83.83 %, as compared to the other models. Table 2 illustrates the data analysis for the detection of monkeypox disease detection for the classification process, and the reinforcement learning models are compared; the resultant value shows that the Actor-Critic learning model obtained the highest accuracy, around 89%, as compared to the other approach.

Dataset
This major spate of monkeypox infections has raised concerns about public health because of its rapid expansion in over 65 nations. Timely diagnosis identification is essential to halting its rapid progression. However, significant amounts of Polymerase Chain Reaction (PCR) tests and other biochemical assays are not easily accessible [4]. Monkeypox detection from skin lesion photos using computer vision techniques may be useful in this situation. However, no such information is currently accessible. As a result, the "Monkeypox Skin Lesion Dataset (MSLD)" is made by gathering and analyzing pictures from various websites, news portals, and publicly available case reports. The "Monkeypox Image Lesion Dataset" was produced with the primary goal of separating monkeypox patients from related non-monkeypox instances. As a consequence, to produce a classifier, we introduced lesion pictures of "Chickenpox" and "Measles" to the "Monkeypox" category because of their similarity to the monkeypox rash and initial state pustules. It has a maximum of 228 photos, of which 102 are under the "Monkeypox" label and the remaining 126 are under the "Others" label, which include cases of non-monkeypox (such as chickenpox and measles) (https://www.kaggle.com/datasets/nafin59/monkeypox-skin-lesion-dataset). Figure 1 illustrates the sample raw data.

Augmented Images
Numerous image enhancement techniques, including rotation, translation, reflection, shear, hue, saturation, contrast and brightness jitter, noise, scaling, etc., were implemented with the help of MATLAB R2020a to help with the classification problem. Although Image Generator and other image augmenters make this simple to perform, the augmented images are placed in this folder to ensure the reproducibility of the results. The number of photos rose around 14-fold after enhancement. There are 1428 and 1764 photos, respectively, in the classifications "Monkeypox" and "Others".

Fold1
Three-fold cross-validation was carried out in order to remove the bias from the training process. With patient independence preserved, the original photos were divided about 70:10:20 into training, validation, and test sets. As per the widely accepted method of data preparation, only the training and validation sets of pictures were enhanced. Users can choose to use the folds directly or to use the original data and add other algorithms to it.

Reinforcement Learning
The main focus of this paper is the detection of the monkeypox diseases using the Qlearning approach. This paper suggests two strategies for improving monkeypox image classification precision. Based on reinforcement learning and parameter optimization for multi-layer neural networks, the suggested approaches are based on feature extraction and classification. The Q-learning algorithm determines the rate at which an act occurs in a particular state. Malneural networks are binary hybrid algorithms that improve the parameters of neural networks.
Reinforcement learning comes under the subgroup of machine learning. The agent read the fine-tuned policy with the help of the trial-and-error method. In real-time, this kind of approach is utilized in robotics, self-driving cars, etc. The agent learns the policy by communicating with the environment. Markov decision process is carried out by using a conditional probability distribution. Here, the future output completely depends upon the current state. The action and reward are introduced in the Markov process, called

Augmented Images
Numerous image enhancement techniques, including rotation, translation, reflection, shear, hue, saturation, contrast and brightness jitter, noise, scaling, etc., were implemented with the help of MATLAB R2020a to help with the classification problem. Although Image Generator and other image augmenters make this simple to perform, the augmented images are placed in this folder to ensure the reproducibility of the results. The number of photos rose around 14-fold after enhancement. There are 1428 and 1764 photos, respectively, in the classifications "Monkeypox" and "Others".

Fold1
Three-fold cross-validation was carried out in order to remove the bias from the training process. With patient independence preserved, the original photos were divided about 70:10:20 into training, validation, and test sets. As per the widely accepted method of data preparation, only the training and validation sets of pictures were enhanced. Users can choose to use the folds directly or to use the original data and add other algorithms to it.

Reinforcement Learning
The main focus of this paper is the detection of the monkeypox diseases using the Q-learning approach. This paper suggests two strategies for improving monkeypox image classification precision. Based on reinforcement learning and parameter optimization for multi-layer neural networks, the suggested approaches are based on feature extraction and classification. The Q-learning algorithm determines the rate at which an act occurs in a particular state. Malneural networks are binary hybrid algorithms that improve the parameters of neural networks.
Reinforcement learning comes under the subgroup of machine learning. The agent read the fine-tuned policy with the help of the trial-and-error method. In real-time, this kind of approach is utilized in robotics, self-driving cars, etc. The agent learns the policy by communicating with the environment. Markov decision process is carried out by using a conditional probability distribution. Here, the future output completely depends upon the current state. The action and reward are introduced in the Markov process, called MDP. Figure 2 illustrates the framework of reinforcement learning. Figure 3 represents the data creation for the various model. In the MDP, the output obtained not only depends on the Diagnostics 2023, 13, 1491 7 of 23 current state but also on the action that tends towards the future state of S. The trajectory distribution can be denoted as: val (q ) = E(r + γ val (q )); The above equation is known as the Bellman equation. The agent's choice of action depends upon the optimal policy. The Bellman equation is represented as follows: val* states the optimal value function. The quality function can be written as   the environment to the policy network. Figure 3 illustrates the data creation for the various model, the training dataset contains s0 up to an. The mini batch contains the st to st + 1.

DQN
Deep Q-learning network reads the input image from the higher dimensional. Taking regression into an account, m represents the target of regression, input is (q,b), and target is (q,b). The loss function can be written as where θ represents the vector and θ ∈ z | || | is the sample. The loss function can be minimized using Figure 4 illustrates the framework of the DQN model. The genetic samples consist of the dataset, so it is represented as st up to st + 1. The genetic samples are connected to prediction, rewards, and policy. Based on the policy, the rewards are measured. The prediction Q helps to predict the value based on the training data. The loss value is measured and backpropagated to reduce the error value. Here, W represents the length, b t , q t , q t + 1 are the probability of observations, and t represents the transition probability function. The aim of the Rl is to identify the optimal policy.
The expected reward maximization can be calculated by using a formula wherein π represents the policy. The discounted expected reward can be written as Equation (4) (3) The target of the R.L. is to recognize the optimal policy: The Bellman expectation equation can be written as Equation (5), where q represents the state, π represents the policy, and val(q(t)) represents the state value function. The transition probability can be written in Equation (7): Val Diagnostics 2023, 13, 1491 8 of 23 The above equation is known as the Bellman equation. The agent's choice of action depends upon the optimal policy. The Bellman equation is represented as follows: (8) val* states the optimal value function. The quality function can be written as Figure 2 illustrates the framework of the reinforcement learning approach. Here, environment plays an important role in extracting the features and performing the surgical data sequence. The agent acts in the policy network. Based on the situation, the actions are taken. The functions of the action are to move and classify. The policy is updated from the environment to the policy network. Figure 3 illustrates the data creation for the various model, the training dataset contains s0 up to an. The mini batch contains the st to st + 1.

System Model DQN
Deep Q-learning network reads the input image from the higher dimensional. Taking regression into an account, m represents the target of regression, input is (q,b), and target is (q,b). The loss function can be written as where θ represents the vector and θ ∈ z |q||z| is the sample. The loss function can be minimized using Figure 4 illustrates the framework of the DQN model. The genetic samples consist of the dataset, so it is represented as st up to st + 1. The genetic samples are connected to prediction, rewards, and policy. Based on the policy, the rewards are measured. The prediction Q helps to predict the value based on the training data. The loss value is measured and backpropagated to reduce the error value. the environment to the policy network. Figure 3 illustrates the data creation for the various model, the training dataset contains s0 up to an. The mini batch contains the st to st + 1.

System Model DQN
Deep Q-learning network reads the input image from the higher dimensional. Taking regression into an account, m represents the target of regression, input is (q,b), and target is (q,b). The loss function can be written as where θ represents the vector and θ ∈ z | || | is the sample. The loss function can be minimized using Figure 4 illustrates the framework of the DQN model. The genetic samples consist of the dataset, so it is represented as st up to st + 1. The genetic samples are connected to prediction, rewards, and policy. Based on the policy, the rewards are measured. The prediction Q helps to predict the value based on the training data. The loss value is measured and backpropagated to reduce the error value.   The limitation of the deep Q-learning network is that rate of Q* enhanced due to minimum value in Equation (10). The double deep Q-learning network overcomes the overestimation of Q. It produces the better performance as compared with the deep Qlearning network. Figure 5 illustrates the framework of DDQN model. The genetic samples consist of the dataset, so it is represented as st upto st+1. The genetic samples are connected to prediction, rewards, and policy. Based on the policy, the rewards are measured. The prediction Q helps to predicts the value based on the training data. The loss value is measured and backpropagated to reduce the error value. In this deep process, the neural network is designed in a detailed manner to predict the value.

DDQN
The limitation of the deep Q-learning network is that rate of Q* enhanced due to minimum value in Equation (10). The double deep Q-learning network overcomes the overestimation of Q. It produces the better performance as compared with the deep Qlearning network. Figure 5 illustrates the framework of DDQN model. The genetic samples consist of the dataset, so it is represented as st upto st+1. The genetic samples are connected to prediction, rewards, and policy. Based on the policy, the rewards are measured. The prediction Q helps to predicts the value based on the training data. The loss value is measured and backpropagated to reduce the error value. In this deep process, the neural network is designed in a detailed manner to predict the value.

Policy Gradient
The Policy Gradient DRL optimizes the objective function: The gradient of the objective function can be written as

Policy Gradient
The Policy Gradient DRL optimizes the objective function: The gradient of the objective function can be written as

Actor-Critic
Actor-Critic executes the policy gradient with the help of value-based function. The concept of Actor-Critic is to divide the model into two parts: (i) executes an action depends on state; and (ii) generates the q value. The advantage of the Actor-Critic network is that it consists of two networks, namely, actor network and critic network.
It can be written as Figure 7 illustrates the framework of the Actor-Critic model. Initially, the policy is predicted, then the probability distribution function is given; later, the process leads to training policy and the loss value are measured. The model is repeated until the loss value reduced. The loss value should be as low as possible; the model is executed repeatedly until the loss value becomes sufficiently low. Later, the model is moved to the next phase; here, the models are predicted and trained accordingly.

Actor-Critic
Actor-Critic executes the policy gradient with the help of value-based function. The concept of Actor-Critic is to divide the model into two parts: (i) executes an action depends on state; and (ii) generates the q value. The advantage of the Actor-Critic network is that it consists of two networks, namely, actor network and critic network.
It can be written as Figure 7 illustrates the framework of the Actor-Critic model. Initially, the policy is predicted, then the probability distribution function is given; later, the process leads to training policy and the loss value are measured. The model is repeated until the loss value reduced. The loss value should be as low as possible; the model is executed repeatedly until the loss value becomes sufficiently low. Later, the model is moved to the next phase; here, the models are predicted and trained accordingly. Figure 8 illustrates the proposed framework. Initially, the "Monkeypox Image Lesion Dataset" was produced with the primary goal of separating monkeypox patients from related non-monkeypox instances. As a consequence, to conduct classifier, we introduced lesion pictures of "Chickenpox" and "Measles" to the "Monkeypox" category because of its similarity to the monkeypox rash and the initial state pustules. It has a maximum of 228 photos, of which 102 are under the "Monkeypox" label and the remaining 126 under the "Others" label, which include cases of non-monkeypox. The count of the dataset is enhanced further by using image enhancement techniques, including rotation, translation, reflection, shear, hue, saturation, contrast and brightness jitter, noise, scaling, etc. These were implemented with the help of MATLAB R2020a to help with the classification problem. After augmentation, the numbers of images were 1428 and 1764, respectively, in the classifications "Monkeypox" and "Others". The next process is three-fold cross validation; it was carried out in order to remove any bias from the training process. The dataset was divided about 70:10:20 into training, validation, and test sets. The next process is feature extraction; the given features are extracted using fine-tuned Efficient-B3. Once the features are extracted, they proceed to the next phase, the classification phase. The images are classified by using two different approaches, namely, the reinforcement learning approach and the hybrid approach. In the first approach, the individual methods, namely DQN, DDQN, Policy Gradient, and the Actor-Critic Model, are applied over the extracted features. In the second approach Algorithm 1, the hybrid model called the Malneural network is developed. In this approach, the deep neural Q-learning and Policy Gradient models are tuned.  Figure 8 illustrates the proposed framework. Initially, the "Monkeypox Image Lesion Dataset" was produced with the primary goal of separating monkeypox patients from related non-monkeypox instances. As a consequence, to conduct classifier, we introduced lesion pictures of "Chickenpox" and "Measles" to the "Monkeypox" category because of its similarity to the monkeypox rash and the initial state pustules. It has a maximum of 228 photos, of which 102 are under the "Monkeypox" label and the remaining 126 under the "Others" label, which include cases of non-monkeypox. The count of the dataset is enhanced further by using image enhancement techniques, including rotation, translation, reflection, shear, hue, saturation, contrast and brightness jitter, noise, scaling, etc. These were implemented with the help of MATLAB R2020a to help with the classification problem. After augmentation, the numbers of images were 1428 and 1764, respectively, in the classifications "Monkeypox" and "Others". The next process is three-fold cross validation; it was carried out in order to remove any bias from the training process. The dataset was divided about 70:10:20 into training, validation, and test sets. The next process is feature extraction; the given features are extracted using fine-tuned Efficient-B3. Once the features are extracted, they proceed to the next phase, the classification phase. The images are classified by using two different approaches, namely, the reinforcement learning approach and the hybrid approach. In the first approach, the individual methods, namely DQN, DDQN, Policy Gradient, and the Actor-Critic Model, are applied over the extracted features. In the second approach Algorithm 1, the hybrid model called the Malneural network is developed. In this approach, the deep neural Q-learning and Policy Gradient models are tuned.  image ← Rotate (image, (−5, +5)) 4.
Else the size of the window increased 19.
End if 27. Load replay memory M to the capacity C 28. Load the function action Q along with arbitrary weight W 29. Load destination value function Q along with weight W-= W 30. For iteration = 1,N do 31.
The random action choosen bQ 34.
Compile bq in emulator and notice reward rq and yq + 1 of input 36.
If it stops at i + 1 40.
End for 46. End for

Experimental Setup
In this section, the experimental analysis is discussed. Initially, the fine-tuned Efficient-Net B3 model was build and executed. The fine-tuned layers are listed below, and add a 0.5 dropout layer. The reason to add drop out is to reduce the overfitting problem. One flattened layer, two dense, and two dropout layers are added. The fine-tuned model reduces the model generalization problem. The parameters include the follows: the optimizer is the Adam optimizer, the learning rate set to 0.001, the loss value is set to Binary cross entropy, and the epoch value set to 200, along with batch size 32 as represented in the Table 3.
The precision value lies between 0 and 1.

Recall
Recall states the proportion of the total is anticipated to be positive.

Recall and F1 Score Are given Equal Weighted values
There is a weighted F1 score that allows us to assign different weights to recall and precision. Recall and precision are assigned different weights in different issues, as described in the previous section.
Beta is the number of times recall is more important than precision. If recall is twice as important as precision, the value of Beta is 2. Table 4 Figure 9 represents the learning curve of training and validation accuracy; in the learning curve, the generalization gap does not increase. The training and validation learning curve decreases at a point of stability. Figure 9 represents Training and validation learning curve and Figure 10 represents the learning curve of training loss; in the learning curve, the generalization gap does not increase; the training and validation learning curve decreases at a point of stability. Figure 11 represents the validation loss learning curve; the learning curve keeps on decreasing and attains the stable value. Figure 12 represents the analysis of precision value for deep learning algorithms; here, four different algorithms are taken, namely, VGG-16, ResNet 50, inception v3, and DenseNet 169. These algorithms are kept as a benchmark and compared with the proposed method called fine-tuned EfficientNet B3. VGG 16 obtained a precision value around 92.1%, ResNet 50 obtained a precision value around 89.12%, Inception v3 obtained a precision value around 90.1, and DenseNet 169 obtained a precision value around 92.8%. Here, the proposed method obtained a higher accuracy (around 95.01), which is higher compared with the remaining approach. Table 5 represents the performance evaluation of monkeypox detection. Figure 13 represents the analysis of accuracy value for deep learning algorithms; here, four different algorithms are taken, namely VGG-16, ResNet 50, inception v3, and DenseNet 169. These algorithms are kept as a benchmark and compared with the proposed method called fine-tuned EfficientNet B3. VGG 16 obtained an accuracy value around 90.1%, ResNet 50 obtained an accuracy value around 85.12%, Inception v3 obtained an accuracy value around 91.1, and DenseNet 169 obtained an accuracy value around 92.8%. Here, the proposed method obtained a higher accuracy (around 96.01), which is higher compared with the remaining approach.   Figure 9 represents Training and validation learning curve and Figure 10 represents the learning curve of training loss; in the learning curve, the generalization gap does not increase; the training and validation learning curve decreases at a point of stability. Figure  11 represents the validation loss learning curve; the learning curve keeps on decreasing and attains the stable value.            Figure 9 represents Training and validation learning curve and Figure 10 represents the learning curve of training loss; in the learning curve, the generalization gap does not increase; the training and validation learning curve decreases at a point of stability. Figure  11 represents the validation loss learning curve; the learning curve keeps on decreasing and attains the stable value.       Here, the proposed method obtained a higher accuracy (around 96.01), which is higher compared with the remaining approach. Table 6 represents the performance evaluation of monkeypox detection.     Figure 13. Analysis of accuracy value for the deep learning algorithms. Figure 14 represents the analysis of recall value for deep learning algorithms; here, four different algorithms are taken, namely, VGG-16, ResNet 50, inception v3, and Dense-Net 169. These algorithms are kept as a benchmark and compared with the proposed method called fine-tuned EfficientNet B3. VGG 16 obtained a recall value around 85.1%, ResNet 50 obtained a recall value around 85.12%, Inception v3 obtained a recall value around 84.1, and DenseNet 169 obtained a recall value around 90.8%. Here, the proposed method obtained a higher accuracy (around 96.01), which is higher compared with the remaining approach. Table 7 represents the performance evaluation of monkeypox detection.   Figure 13. Analysis of accuracy value for the deep learning algorithms. Figure 14 represents the analysis of recall value for deep learning algorithms; here, four different algorithms are taken, namely, VGG-16, ResNet 50, inception v3, and DenseNet 169. These algorithms are kept as a benchmark and compared with the proposed method called fine-tuned EfficientNet B3. VGG 16 obtained a recall value around 85.1%, ResNet 50 obtained a recall value around 85.12%, Inception v3 obtained a recall value around 84.1, and DenseNet 169 obtained a recall value around 90.8%. Here, the proposed method obtained a higher accuracy (around 96.01), which is higher compared with the remaining approach. Table 7 represents the performance evaluation of monkeypox detection. Figure 15 represents the analysis of F1 score value for deep learning algorithms; here, four different algorithms are taken, namely, VGG-16, ResNet 50, inception v3, and DenseNet 169. These algorithms are kept as a benchmark and compared with the proposed method called fine-tuned EfficientNet B3. VGG 16 obtained an F1 score value around 90.1%, ResNet 50 obtained an F1 score value around 90.7%, Inception v3 obtained an F1 score value around 84.1, and DenseNet 169 obtained an F1 score value around 92.8%. Here the proposed method obtained a higher accuracy (around 95.01), which is higher compared with the remaining approach. Table 8 represents the performance evaluation of monkeypox detection. Figure 16 states that for the monkeypox class the precision value achieved around 0.95, recall value achieved around 0.95, and f1 score value achieved around 0.95. For other classes the precision value achieved around 0.96, recall value achieved around 0.96, and f1 score value achieved around 0.96. The overall macro average achieved around 0.95 and overall weighted average is 0.96. Figure 17 represents the confusion matrix for the monkeypox disease detection; the values are generated based on the true positive, true negative, false positive, and false negative. Here most of the classes are recognized correctly and performs better. Figure 18 represents the analysis of accuracy value for reinforcement learning algorithms; here, four different algorithms are taken, namely DQN, DDQN, Policy Gradient, and Actor-Critic. These algorithms are kept as a benchmark and compared with the proposed method called Malneural. DQN obtained an accuracy value around 96.5%, DDQN obtained an accuracy value around 89.7%, Policy Gradient obtained an accuracy value around 78.7%, and Actor-Critic obtained an accuracy value around 80.7%. Here, the proposed method obtained a higher accuracy (around 97.7%), which is higher as compared with the remaining approach. Table 9 represents accuracy calculation for the monkeypox disease detection results using the reinforcement learning approach.  Figure 14 represents the analysis of recall value for deep learning algorithms; here, four different algorithms are taken, namely, VGG-16, ResNet 50, inception v3, and Dense-Net 169. These algorithms are kept as a benchmark and compared with the proposed method called fine-tuned EfficientNet B3. VGG 16 obtained a recall value around 85.1%, ResNet 50 obtained a recall value around 85.12%, Inception v3 obtained a recall value around 84.1, and DenseNet 169 obtained a recall value around 90.8%. Here, the proposed method obtained a higher accuracy (around 96.01), which is higher compared with the remaining approach. Table 7 represents the performance evaluation of monkeypox detection.   Figure 15 represents the analysis of F1 score value for deep learning algorithms; here, four different algorithms are taken, namely, VGG-16, ResNet 50, inception v3, and Dense-Net 169. These algorithms are kept as a benchmark and compared with the proposed method called fine-tuned EfficientNet B3. VGG 16 obtained an F1 score value around 90.1%, ResNet 50 obtained an F1 score value around 90.7%, Inception v3 obtained an F1 score value around 84.1, and DenseNet 169 obtained an F1 score value around 92.8%. Here the proposed method obtained a higher accuracy (around 95.01), which is higher compared with the remaining approach. Table 8 represents the performance evaluation of monkeypox detection.         Figure 18 represents the analysis of accuracy value for reinforcement learning algorithms; here, four different algorithms are taken, namely DQN, DDQN, Policy Gradient, and Actor-Critic. These algorithms are kept as a benchmark and compared with the proposed method called Malneural. DQN obtained an accuracy value around 96.5%, DDQN obtained an accuracy value around 89.7%, Policy Gradient obtained an accuracy value around 78.7%, and Actor-Critic obtained an accuracy value around 80.7%. Here, the proposed method obtained a higher accuracy (around 97.7%), which is higher as compared with the remaining approach. Table 9 represents accuracy calculation for the monkeypox disease detection results using the reinforcement learning approach.   Figure 19 represents the analysis of f1 score value for reinforcement learning algorithms; here, four different algorithms are taken, namely, DQN, DDQN, Policy Gradient, and Actor-Critic. These algorithms are kept as a benchmark and compared with the proposed method called Malneural. DQN obtained an f1 score value around 97.4%, DDQN obtained an f1 score value around 91.2%, Policy Gradient obtained an f1 score value around 79.0%, and Actor-Critic obtained an f1 score value around 81.1%. Here, the proposed method obtained a higher accuracy (around 98.1%), which is higher as compared with the remaining approach. Table 10 represents the accuracy calculation monkeypox disease detection results using the reinforcement learning approach.    Figure 19 represents the analysis of f1 score value for reinforcement learning algorithms; here, four different algorithms are taken, namely, DQN, DDQN, Policy Gradient, and Actor-Critic. These algorithms are kept as a benchmark and compared with the proposed method called Malneural. DQN obtained an f1 score value around 97.4%, DDQN obtained an f1 score value around 91.2%, Policy Gradient obtained an f1 score value around 79.0%, and Actor-Critic obtained an f1 score value around 81.1%. Here, the proposed method obtained a higher accuracy (around 98.1%), which is higher as compared with the remaining approach. Table 10 represents the accuracy calculation monkeypox disease detection results using the reinforcement learning approach. Figure 20 represents the analysis of precision value for reinforcement learning algorithms; here, four different algorithms are taken, namely DQN, DDQN, Policy Gradient, and Actor-Critic. These algorithms are kept as a benchmark and compared with the proposed method called Malneural. DQN obtained a precision value around 94.3%, DDQN obtained a precision value around 89.4%, Policy Gradient obtained a precision value around 89.4%, and Actor-Critic obtained a precision value around 92.0%. Here the proposed method obtained a higher accuracy (around 96.1%), which is higher compared with the remaining approach. Table 11 represents the accuracy calculation for monkeypox disease detection results using the reinforcement learning approach. Figure 21 represents the analysis of precision value for reinforcement learning algorithms; here, four different algorithms are taken, namely, DQN, DDQN, Policy Gradient, and Actor-Critic. These algorithms are kept as a benchmark and compared with the proposed method called Malneural. DQN obtained an f1 score value around 97.4%, DDQN obtained an f1 score value around 93.0%, Policy Gradient obtained an f1 score value around 70.6%, and Actor-Critic obtained an f1 score value around 72.5%. Here, the proposed method obtained a higher accuracy (around 98.1%), which is higher compared with the remaining approach. Table 12 represents the accuracy calculation monkeypox disease detection results using the reinforcement learning approach.  Figure 19 represents the analysis of f1 score value for reinforcement learning algorithms; here, four different algorithms are taken, namely, DQN, DDQN, Policy Gradient, and Actor-Critic. These algorithms are kept as a benchmark and compared with the proposed method called Malneural. DQN obtained an f1 score value around 97.4%, DDQN obtained an f1 score value around 91.2%, Policy Gradient obtained an f1 score value around 79.0%, and Actor-Critic obtained an f1 score value around 81.1%. Here, the proposed method obtained a higher accuracy (around 98.1%), which is higher as compared with the remaining approach. Table 10 represents the accuracy calculation monkeypox disease detection results using the reinforcement learning approach.     Figure 20. Analysis of reinforcement learning algorithm for the classification of monkeypox. Figure 21 represents the analysis of precision value for reinforcement learning algorithms; here, four different algorithms are taken, namely, DQN, DDQN, Policy Gradient, and Actor-Critic. These algorithms are kept as a benchmark and compared with the proposed method called Malneural. DQN obtained an f1 score value around 97.4%, DDQN obtained an f1 score value around 93.0%, Policy Gradient obtained an f1 score value around 70.6%, and Actor-Critic obtained an f1 score value around 72.5%. Here, the proposed method obtained a higher accuracy (around 98.1%), which is higher compared with the remaining approach. Table 12 represents the accuracy calculation monkeypox disease detection results using the reinforcement learning approach.

Conclusions and Future Scope
In this work, the classification of monkeypox diseases was identified. Initially, a finetuned EfficientNet B3 model was built and executed. The fine-tuned layers includes the 0.5 dropout layer. The reason to add the dropout layer is to reduce the overfitting problem. One flattened layer, two dense, and two dropout layers are added. The fine-tuned model reduces the model generalization problem. The parameters include the following: the optimizer is the Adam optimizer, the learning rate set to 0.001, loss value is set to Binary cross entropy, and the epoch value set to 200, along with a batch size of 32. The model was compared with the reinforcement learning approach, namely, DQN, DDQN, Policy Gradient, and Actor-Critic. The resultant analysis demonstrates that DQN obtained the highest accuracy (around 0.975). For the monkeypox class, the precision value achieved around 0.95, the recall value achieved around 0.95, and the f1 score was around 0.95. For other classes, the precision value achieved around 0.96, the recall value achieved around 0.96, and the f1 score value was around 0.96. The overall macro average achieved around 0.95 and the overall weighted average was 0.96. It is envisaged that transfer learning models will be developed on this dataset in the future and will perform better than the present CNN models. We also plan to train the models described in the research with bigger datasets as well. It is also anticipated that generative adversarial network (GAN)-based CNN models will be developed and evaluated against the current models. Future work will incorporate this model in clinics and hospitals.

Conclusions and Future Scope
In this work, the classification of monkeypox diseases was identified. Initially, a finetuned EfficientNet B3 model was built and executed. The fine-tuned layers includes the 0.5 dropout layer. The reason to add the dropout layer is to reduce the overfitting problem. One flattened layer, two dense, and two dropout layers are added. The fine-tuned model reduces the model generalization problem. The parameters include the following: the optimizer is the Adam optimizer, the learning rate set to 0.001, loss value is set to Binary cross entropy, and the epoch value set to 200, along with a batch size of 32. The model was compared with the reinforcement learning approach, namely, DQN, DDQN, Policy Gradient, and Actor-Critic. The resultant analysis demonstrates that DQN obtained the highest accuracy (around 0.975). For the monkeypox class, the precision value achieved around 0.95, the recall value achieved around 0.95, and the f1 score was around 0.95. For other classes, the precision value achieved around 0.96, the recall value achieved around 0.96, and the f1 score value was around 0.96. The overall macro average achieved around 0.95 and the overall weighted average was 0.96. It is envisaged that transfer learning models will be developed on this dataset in the future and will perform better than the present CNN models. We also plan to train the models described in the research with bigger datasets as well. It is also anticipated that generative adversarial network (GAN)-based CNN models will be developed and evaluated against the current models. Future work will incorporate this model in clinics and hospitals. Data Availability Statement: Data will be available on request from first author.