An Optimization-Based Diabetes Prediction Model Using CNN and Bi-Directional LSTM in Real-Time Environment

Featured Application: Diabetes is a common chronic disorder deﬁned by excessive glucose levels in the blood. A good diagnosis of diabetes may make a person’s life better; otherwise, it can cause kidney failure, major heart damage, and damage to the blood vessels and nerves. As a result, diabetes classiﬁcation and diagnosis are vital tasks. By using our proposed methodology, clinicians may obtain complete information about their patients using real-time monitoring. To gain new insights, they can combine historical information with current data, making it easier for them to perform more thorough and comprehensive treatments than before, and they will be able to provide proactive care, which will help to improve health outcomes and reduce hospital re-admissions. Abstract: Diabetes is a long-term illness caused by the inefﬁcient use of insulin generated by the pancreas. If diabetes is detected at an early stage, patients can live their lives healthier. Unlike previously used analytical approaches, deep learning does not need feature extraction. In order to support this viewpoint, we developed a real-time monitoring hybrid deep learning-based model to detect and predict Type 2 diabetes mellitus using the publicly available PIMA Indian diabetes database. This study contributes in four ways. First, we perform a comparative study of different deep learning models. Based on experimental ﬁndings, we next suggested merging two models, CNN-Bi-LSTM, to detect (and predict) Type 2 diabetes. These ﬁndings demonstrate that CNN-Bi-LSTM surpasses the other deep learning methods in terms of accuracy (98%), sensitivity (97%), and speciﬁcity (98%), and it is 1.1% better compared to other existing state-of-the-art algorithms. Hence, our proposed model helps clinicians obtain complete information about their patients using real-time monitoring and can check real-time statistics about their vitals. Indian


Introduction
Diabetes is a prevalent chronic illness characterized by the presence of high glucose in the blood. A proper diagnosis of diabetes can make a person's life healthier; otherwise, it may cause kidney failure, serious damage to the heart, and may also affect the blood vessels and nerves [1]. There are three types of diabetes: Type 1, Type 2, and Gestational, which are found in the human body [2]. If our body does not utilize the insulin that our pancreas produces, then it is a severe explanation for Type 2 diabetes. People below 30 usually suffer from Type 1 diabetes, which cannot be treated with oral medicines. It requires additional insulin therapy. However, people of middle age and older who have type 2 diabetes may recover by living a healthy lifestyle and receiving proper checkups. However, gestational diabetes is a common kind of diabetes that affects women during pregnancy. High blood glucose levels may be caused by several hormones and increased insulin content during pregnancy [3].
There are some necessary diagnostics tests through which we can diagnose diabetes, such as A1c, random blood sugar, fasting blood sugar, and oral glucose tolerance tests, which need lots of parameters to predict diabetes properly. One cannot diagnose diabetes with one parameter, such as excessive consumption of vitamin E can alleviate A1c levels, while B9 and B12 can lower A1c levels. As a result, several criteria must be combined to diagnose diabetes accurately. There are many factors that can help to diagnose diabetes. These factors include glucose level and BMI, diabetes pedigree, blood pressure, age, pregnancy, skin thickness, and insulin, as referenced in Table 1. Glucose Plasma glucose 2 h in an oral glucose tolerance test (mg/dl) 0-199 3.

Diabetes Prediction in Real-Time Environment
Intensive care of blood glucose levels helps in preventing and treating diabetic problems [4]. Innovative biosensors that may enable real-time monitoring of a patient's health, as well as recent advancements in information and communication technology (ICT), provide a new viewpoint on diabetes treatment. Diabetic patients can monitor their blood glucose levels by using self-monitoring blood glucose (SMBG) portable devices [5] or continuous glucose monitoring (CGM) sensors [6] to track glucose variations, because of which they will be able to respond quickly and with the necessary actions. The findings suggest that monitoring patients' glucose levels can help them control their disease and enhance their diabetes management performance [7]. The greatest option for improving diabetes care is glucose monitoring in a real-time system that includes sensors, a gateway (smartphone), and a cloud system [8]. It uses a smartphone as a gateway to acquire sensor data from a sensor node connected to the body [9]. Wireless technology is required for communication between the sensor node and the smartphone, as well as low-power operation for the sensor node, and the best choice for this is Bluetooth low energy (BLE) [10].

Motivation
There are 415 million people worldwide suffering from type 2 diabetes, and the major cause is an unhealthy lifestyle. According to WHO, 82% of deaths are due to noncommunicable disorders, and diabetes is one of them [1]. According to Vhaduri et al. [11], continuous glucose monitoring using personal health devices in diabetes care may benefit from the early detection of the disease. Medical recommendations advocate for early detection to identify risk-prone individuals and encourage patients to proactively self-monitor their lifestyle to reduce risk factors. Remote patient monitoring (RPM) may help decrease the alarming number of diabetes-related mortality by providing early detection and timely alerts to patients and medical practitioners. RPM lowers the need for routine examinations, allows continuous treatment efficacy to be measured, and allows for intervention strategies [12]. Through real-time monitoring, clinicians may obtain complete information about thei patients. To obtain new insights, they can combine historical information with current data, making it simpler for them to perform a more thorough and comprehensive treatment than before, and they are able to provide proactive care, which assists in improving health outcomes and minimizing hospital re-admissions. Additionally, patients themselves can monitor their vitals in real-time, such as blood pressure and heart rate. This not only motivates patients to regularize their habits, which contributes a lot to improving their health conditions and also helps clinicians to receive real-time statistics about their vitals.

Major Contributions
In this research, we have made a fourfold contribution:

•
We made a comparison of several deep learning algorithms such as CNN, bi-directional long-short memory (Bi-LSTM), deep neural network (DNN) [13], and their combinations, CNN-LSTM and CNN-BiLSTM, for the detection and prediction of diabetes using the static PIMA Indian dataset (PIDD) [14]; • We used the best parameters to train our models. We ran a grid search algorithm that found the best values for parameters such as learning rate, epochs, optimizer, batch sizes, and hidden units; • We split our dataset into test and training sets by using 10-fold cross-validation, and the precision of each model was improved. On the other hand, CNN-Bi-LSTM outperformed with 98% accuracy, 97% sensitivity, and 98% specificity; • We proposed a framework to test our optimized models using a real-time dataset.
The remainder of the paper is structured in the following manner: The second section discusses related work; Section 3 describes the methodology, which defines the PIMA Indian dataset and preprocessing steps to filter data, the models used to diagnose patients with diabetes, and the proposed framework to diagnose diabetes in the real-time environment; the results and discussions are addressed in Section 4; and, finally, Section 5 draws conclusions and discusses future scope.

Related Work
In India, diabetes is an inescapable problem as over 70% of the grown-up populace suffers from diabetes. Different scientists attempted to detect and predict diabetes by applying various ML and data mining methodologies, and some applied deep learning and fuzzy logic [15]. Data mining approaches have supplanted old procedures because they are more accurate, precise, and predictive in their predictions. Furthermore, machine learning is an artificial intelligence system that learns correlations between nodes without the need for previous training [16]. The significant capacity of machine learning approaches to drive the prediction model without extensive training is connected to the mechanism that underlies these techniques. Methods such as data mining and machine learning assist in detecting information that is otherwise difficult to identify when utilizing a cutting-edge technique [17]. Some of the past related research work focuses on the detection and prediction of diabetes using PIDD [14,[18][19][20][21][22]. Numerous ML methods, such as decision trees, RF, SVM, and Naive Bayes, have recently shown promising outcomes in various types of medical research. Zolfagri et al. [23] suggested a way to identify diabetes in female Pima Indian populations by using a neural network and support vector machine (SVM) models. Sanakal et al. [24] diagnosed diabetes using the prognosis of fuzzy c-means clustering and SVM. Sneha et al. [1] used PIDD and chose the best attributes that are essential for classification and excluded the remainder, such as plasma, glucose, postprandial, pregnancy, serum creatinine, and HBAIC attributes, from the dataset and applied different ML algorithms such as SVM [25], Naive Bayes, and RF. However, SVM outperformed the simplest with an accuracy of 77.37%. Karatsiolis et al. [26] suggested a region-based SVM method for the diagnosis of diabetes on the Pima Indian medical dataset. Similar to Karatsiolis, Kumari et al. [15] suggested SVM for the detection of Type 2 diabetes. Al et al. [27] applied a decision tree approach to identify type 2 diabetes.
The researchers Dey et al. [18] and Zou et al. [28] developed a web-based strategy to forecast diabetes using machine learning techniques on the PIMA Indian dataset. On the other hand, no single study has examined all the well-known supervised learning methods in a comprehensive manner. Sivastava et al. [20] used an ANN approach to predict diabetes using the PIMA Indian dataset. Saji et al. [29] developed a multilayer perceptron that was used to predict diabetes. Using an autotuned multilayer perceptron, Jahangir et al. [30] suggested an expert system to predict diabetes. Kannadasan et al. [31] proposed a DNN for the classification of Type 2 diabetes using stacked encoders for feature engineering, a softmax function for classification, and a backpropagation method for fine-tuning the network. Training of the model was performed with PIDD along with 786 patient records and eight features and achieved an accuracy of 86.26%. Apporva et al. [32] used the decision tree technique to predict type-2 diabetes using the PID dataset [33]. A comparison of performance using an SVM classifier showed that the decision tree successfully predicted type-2 diabetes. In summary, the above-discussed techniques have some pros and cons, which are as follows: For example, ML algorithms such as RF, decision trees, and SVM are helpful if we use them for classification problems, except for regression, where they may not be suitable for predicting training data beyond the range. Similarly, within the decision tree, if there is a little change in data, it may affect the entire structure of the model [31]. Furthermore, SVM faces minor issues with noisy data [34]. Therefore, these ML algorithms are suitable for classification problems. However, ANN and CNN are good at making predictions because, in backpropagation, these methods obtain good results when they use gradients to update the weights. However, they have some problems, such as vanishing gradient problems or exploding gradient problems, where the value of gradients (a value used to update the weights) decreases with backpropagation, so the value becomes too small and does not help much with learning. However, it is possible to overcome these limitations by applying an LSTM and GRU by using ReLU, which allows capturing the impact of the earliest given data. Moreover, by tuning the burden value during the training process, the vanishing gradient issue is usually avoided [35,36]. Our study used CNN-Bi-LSTM, where CNN is employed for feature extraction, and Bi-LSTM has a cell state memory during its training phase, which captures the impact of earlier stages. Besides, it has another peehole connection, which helps remove the vanishing gradient problem. Furthermore, Bi-LSTM can collect information in two ways: one from the past and one from the future, which helps more efficiently with the prediction of diabetes.

Methodology
In this part, we proposed a framework by describing several components of the framework, such as the dataset that was uploaded to cloud servers; preprocessing procedures that are required for data cleaning; description of models used for detection and prediction of Type 2 diabetes where we explained the implementation of three models using a static PIMA dataset such as CNN, CNN-LSTM, and CNN-Bi-LSTM. Here, CNN-Bi-LSTM is further explained through several phases, such as model training using static PIDD. Then the model is optimized using a grid search algorithm. Furthermore, training results were utilized for real-time testing, and lastly, the prediction process of CNN-Bi-LSTM was discussed [36].

Availability of Real-Time Dataset for Training and Testing
This section provides a comprehensive description of the real-time PIMA Indian dataset [14] (1UCI: https://archive.ics.uci.edu/ml/support/Diabetes (accessed on 15 January 2020)), which consists of 768 female patients [37] who were between the ages of 21 and 25 years. There are 268 diabetics among them, and the rest are healthier ones. The dataset consists of 8 vital parameters, and a complete overview of the dataset is given in Table 1, where we input parameters and their ranges given, such as the number of times a woman was pregnant and expressed in Figure 1. PIDD. Then the model is optimized using a grid search algorithm. Furthermore, training results were utilized for real-time testing, and lastly, the prediction process of CNN-Bi-LSTM was discussed [36].

Availability of Real-Time Dataset for Training and Testing
This section provides a comprehensive description of the real-time PIMA Indian dataset [14] (1UCI: https://archive.ics.uci.edu/ml/support/Diabetes (accessed on 15 January 2020)), which consists of 768 female patients [37] who were between the ages of 21 and 25 years. There are 268 diabetics among them, and the rest are healthier ones. The dataset consists of 8 vital parameters, and a complete overview of the dataset is given in Table 1, where we input parameters and their ranges given, such as the number of times a woman was pregnant and expressed in Figure 1.  Table 1, range specifies a threshold value for, e.g., BMI (weight in kg)/(height in m) 2 . Now we discuss the reasons for selecting these parameters for our model: There is a high possibility that glucose levels may increase during pregnancy, which can lead to diabetes-related complications [38]. One of the main reasons for diabetes mellitus is the presence of high glucose in the blood. Obesity is another cause of Type 2 diabetes. Diabetes is a genetic disease, so diabetes pedigree functions are critical in providing data. Additionally, imbalances in insulin may also cause diabetes mellitus in people who may consume insulin and suffer from skin-thickening problems. Lastly, as we age, the probability of diabetes increases, specifically after the age of 45. Therefore, it is evident that all these biological parameters play a significant role in measuring and correctly classifying diabetes mellitus.

Preprocessing of Real-Time Data
To a considerable degree, the accuracy of the data determines the results of the prediction. This indicates that preprocessing data plays a vital function in the model [39]. In this analysis, we picked some of the necessary methods to refine the initial dataset. Firstly, there are some incomplete and inaccurate dataset values due to mistakes or deregulation. These pointless values contributed to several deceptive experimental results, such as the diastolic blood pressure, systolic blood pressure, and body mass index values may not   Table 1, range specifies a threshold value for, e.g., BMI (weight in kg)/(height in m) 2 . Now we discuss the reasons for selecting these parameters for our model: There is a high possibility that glucose levels may increase during pregnancy, which can lead to diabetes-related complications [38]. One of the main reasons for diabetes mellitus is the presence of high glucose in the blood. Obesity is another cause of Type 2 diabetes. Diabetes is a genetic disease, so diabetes pedigree functions are critical in providing data. Additionally, imbalances in insulin may also cause diabetes mellitus in people who may consume insulin and suffer from skin-thickening problems. Lastly, as we age, the probability of diabetes increases, specifically after the age of 45. Therefore, it is evident that all these biological parameters play a significant role in measuring and correctly classifying diabetes mellitus.

Preprocessing of Real-Time Data
To a considerable degree, the accuracy of the data determines the results of the prediction. This indicates that preprocessing data plays a vital function in the model [39]. In this analysis, we picked some of the necessary methods to refine the initial dataset. Firstly, there are some incomplete and inaccurate dataset values due to mistakes or deregulation. These pointless values contributed to several deceptive experimental results, such as the diastolic blood pressure, systolic blood pressure, and body mass index values may not have been 0 in the initial dataset, implying that the true value was absent. We used the mean from the training data to replace all missing values and reduce irrelevant values shown in Table 2.
Second, elimination of outliers: any attribute that does not conform to the usual boundary is referred to as an outlier, as seen in Figure 2, and can be removed by using Equation (3): where Equations (1) and (2) are the first and third quantile [38], respectively. IQR stands for interquartile range, and its values are shown in Table 3. All values that lie beyond this threshold (IQR) are termed outliers. have been 0 in the initial dataset, implying that the true value was absent. We used the mean from the training data to replace all missing values and reduce irrelevant values shown in Table 2. Second, elimination of outliers: any attribute that does not conform to the usual boundary is referred to as an outlier, as seen in Figure 2, and can be removed by using Equation (3): where Equations (1) and (2) are the first and third quantile [38], respectively. IQR stands for interquartile range, and its values are shown in Table 3. All values that lie beyond this threshold (IQR) are termed outliers.    Filtered values are seen in Figure 3. The next step is to normalize data, such as bringing data into the range of 0 and 1 by adding normalized filters and calculating z-score values by using Equation (4) [40]. Where a is the mean or average value of the variable, a i is input values, and s is the standard deviation of the variable. However, b i is a new normal value. Table 4 shows mean, standard deviation, minimum, and maximum values of the PIAM dataset. This reduces the uncertainty of estimation and accelerates the process.
Outcome 1 Filtered values are seen in Figure 3. The next step is to normalize data, such as bringing data into the range of 0 and 1 by adding normalized filters and calculating z-score values by using Equation (4) [40]. Where is the mean or average value of the variable, is input values, and is the standard deviation of the variable. However, is a new normal value. Table 4 shows mean, standard deviation, minimum, and maximum values of the PIAM dataset. This reduces the uncertainty of estimation and accelerates the process.
where : normalised value = input data : input data average = input data standard deviation

Feature Selection
Feature selection is the process of removing non-informative or redundant input characteristics from the dataset. Feature selection decreases the computational complexity

Feature Selection
Feature selection is the process of removing non-informative or redundant input characteristics from the dataset. Feature selection decreases the computational complexity of prediction algorithms. This minimizes prediction uncertainty and improves the model's overall efficacy.
The Chi-squared test is a non-parametric statistical technique used to examine the relationship between two variables [41]. The approach generates a number that quantifies the relationship between the input characteristics and the projected result. The greater the value, the stronger the connection between the input and output characteristics, and features with values less than the critical value are removed. As the Chi-squared approach operates on categorical data, the numerical values of the features in this dataset were discretized depending on their frequency of occurrence.
Extra trees apply several randomized decision trees to different subsets of the total dataset [42]. In the tree building process, the input variables and cut-off values are chosen at random to divide a node so that they are fully independent of the output variable. Each tree leads to a different model, which was trained with subsets of data, and the algorithm evaluates the relevance of the contributing features using a criterion known as the Gini index.
LASSO is an L1 regularization approach for feature selection that is used to facilitate dataset interpretation [43]. In this technique, regression analysis is used to estimate parameters and pick models at the same time, minimizing feature variability by lowering coefficients of noncorrelation characteristics to zero. Table 5 shows the essential characteristics chosen by each technique, along with their ranking measures. The Chi-squared test employs the chi-score, extra trees use the Gini index, and LASSO employs regression coefficients. The characteristics of glucose, insulin, BMI, and age were consistently scored as high in relevance and were chosen by each of the techniques used. After experimenting with the characteristics, it was discovered that removing the skin fold thickness and diabetes pedigree features enhanced the model's overall performance.

Data Augmentation
The synthetic minority oversampling approach (SMOTE) was utilized to eliminate biases in the produced models [44]. SMOTE is an oversampling approach that generates new samples from existing class samples to increase the number of minority class samples in the dataset. The method creates new minority class samples that are convex mixtures of two or more randomly selected neighboring data samples in the feature space rather than duplicates. A recent study showed that using SMOTE in clinical datasets improves model performance by decreasing the detrimental impact of unbalanced data.

Diabetes Prediction Models
This study aimed to develop a model for forecasting diabetes using CNN-Bi-LSTM that has not been used for diabetes classification and prediction. Recently, different approaches of deep learning, such as LSTM, CNN, and their derivatives, have been used for the classification of diabetes, although these methods achieve good accuracy in the predictions. However, they still face certain challenges, such as vanishing gradient problems and exploding gradient problems, that adversely affect the model's training. These drawbacks can be resolved by applying a combination of CNN and Bi-LSTM, which adjusts the weight value during the training phase to gather data results. This part clarifies the detailed architecture of CNN, CNN-LSTM, and CNN-Bi-LSTM over the PIDD [15] and then attempts to assess how well these models perform in terms of precision, sensitivity, and specificity.

Convolutional Neural Network
Here, in this section, we explained the role of the CNN by explaining the functionality of the different layers for the prediction of Type 2 diabetes mellitus. Initially, CNN was used mainly for image classification, but today, CNN can be applied in various domains.

Definition 1.
A CNN is a special kind of multi-layer perceptron identical to a traditional neural network where specific inputs are supplied to each neuron. These are self-learned neurons that learn from data with the assistance of weight and bias by conducting such operations as the dot product [23]. CNN is made of layers, namely: a convolutional layer, a maximum pooling layer, a flattening layer, and a fully connected layer. The goal of the convolutional layer is to learn the feature representation for the input data. It is the heart of the network and has local connections and weights for common features. In the first stage, input parameters are passed through the kernel and then outputs are sent via a nonlinear activation function ReLU, which does not activate all the neurons at the same time. It only activates those neurons which are in the range of 0 and 1. Then output neurons are passed through the pooling layer, which may be thought of as a fuzzy filter since it decreases the dimensionality of the features while increasing their robustness. Finally, the fully connected layer receives signals from the preceding layers and delivers them to each neuron in the system. The output layer, which is generally a softmax classifier, then does the classification. As shown in Figure 4, In our case, the PIDD consists of six features as input and one as target output, which consists of two values such as 0 and 1. Input features are described as where input parameters belong to a feature set; here, the outcome variable belongs to a class label, such as 1 specifies a diabetic and 0 specifies a healthier one.
and exploding gradient problems, that adversely affect the model's training. These drawbacks can be resolved by applying a combination of CNN and Bi-LSTM, which adjusts the weight value during the training phase to gather data results. This part clarifies the detailed architecture of CNN, CNN-LSTM, and CNN-Bi-LSTM over the PIDD [15] and then attempts to assess how well these models perform in terms of precision, sensitivity, and specificity.

Convolutional Neural Network
Here, in this section, we explained the role of the CNN by explaining the functionality of the different layers for the prediction of Type 2 diabetes mellitus. Initially, CNN was used mainly for image classification, but today, CNN can be applied in various domains.

Definition 1.
A CNN is a special kind of multi-layer perceptron identical to a traditional neural network where specific inputs are supplied to each neuron. These are self-learned neurons that learn from data with the assistance of weight and bias by conducting such operations as the dot product [23]. CNN is made of layers, namely: a convolutional layer, a maximum pooling layer, a flattening layer, and a fully connected layer. The goal of the convolutional layer is to learn the feature representation for the input data. It is the heart of the network and has local connections and weights for common features. In the first stage, input parameters are passed through the kernel and then outputs are sent via a nonlinear activation function ReLU, which does not activate all the neurons at the same time. It only activates those neurons which are in the range of 0 and 1. Then output neurons are passed through the pooling layer, which may be thought of as a fuzzy filter since it decreases the dimensionality of the features while increasing their robustness. Finally, the fully connected layer receives signals from the preceding layers and delivers them to each neuron in the system. The output layer, which is generally a softmax classifier, then does the classification. As shown in Figure 4, In our case, the PIDD consists of six features as input and one as target output, which consists of two values such as 0 and 1. Input features are described as where input parameters belong to a feature set; here, the outcome variable belongs to a class label, such as 1 specifies a diabetic and 0 specifies a healthier one. In our proposed model, these input features were passed through convolutional 1D, where we applied batch normalization (BN) along with ReLU. Here, BN normalizes input features into batches, which minimizes the gradient saturation during covariate shift [45], and the ReLU activation function decreases the redundancy by allowing values that range between 0 and 1 to accelerate the speed. The complete process is mathematically explained in Equation (5)   In our proposed model, these input features were passed through convolutional 1D, where we applied batch normalization (BN) along with ReLU. Here, BN normalizes input features into batches, which minimizes the gradient saturation during covariate shift [45], and the ReLU activation function decreases the redundancy by allowing values that range between 0 and 1 to accelerate the speed. The complete process is mathematically explained in Equation (5) [46] where W p : specifies weight x : Batch Normalization of input features f (.) : represents the activation function ReLU p : no of filters b p : bias term value ranges from 0-1

Architectural CNN-LSTM Model for Diabetes Prediction
Here, in this segment, we explained the working of the CNN-LSTM hybrid model by explaining the functionality of CNN and LSTM for the prediction of Type 2 diabetes mellitus. CNN and LSTM are deep learning models and are used for predictions. Here we use the CNN-LSTM [47] combination to classify Type 2 diabetes mellitus over the PIDD. Whereas CNN is used for feature engineering as it automatically selects the unseen features for model training, and LSTM is used for diabetes classification. The complete structure of CNN-LSTM is shown in Figure 5. Firstly, input features are passed through a convolutionary layer responsible for generating a feature map by striding of filters at one step. Next, we introduced non-linearity to the feature by using the ReLU function, which ranges from 0 to 1, i.e., it does not activate all neurons while simultaneously deactivating neurons whose values are less than zero. These function maps are then transferred into a batch normalization that regularizes their meaning and prevents over-fitting functions. Besides, these functions were transferred into the max-pooling layer used for the downsampling of the function diagram. Next, down-sampled features were passed through the flattening layer, which is responsible for translating these function matrices into 1D vectors, and were passed through the LSTM layers as inputs. LSTM is a particular type of RNN that uses cell state memory instead of primary neurons to manage the sequence classification. Eventually, these values were transferred into a classification layer that functions similarly to how the ANN works. Finally, it went into the sigmoid activation, responsible for the binary classification and predicted diabetes.
where : specifies weight : Batch Normalization of input features (. ): represents the activation function ReLU : no of filters : bias term value ranges from 0 − 1

Architectural CNN-LSTM Model for Diabetes Prediction
Here, in this segment, we explained the working of the CNN-LSTM hybrid model by explaining the functionality of CNN and LSTM for the prediction of Type 2 diabetes mellitus. CNN and LSTM are deep learning models and are used for predictions. Here we use the CNN-LSTM [47] combination to classify Type 2 diabetes mellitus over the PIDD. Whereas CNN is used for feature engineering as it automatically selects the unseen features for model training, and LSTM is used for diabetes classification. The complete structure of CNN-LSTM is shown in Figure 5. Firstly, input features are passed through a convolutionary layer responsible for generating a feature map by striding of filters at one step. Next, we introduced non-linearity to the feature by using the ReLU function, which ranges from 0 to 1, i.e., it does not activate all neurons while simultaneously deactivating neurons whose values are less than zero. These function maps are then transferred into a batch normalization that regularizes their meaning and prevents over-fitting functions. Besides, these functions were transferred into the max-pooling layer used for the downsampling of the function diagram. Next, down-sampled features were passed through the flattening layer, which is responsible for translating these function matrices into 1D vectors, and were passed through the LSTM layers as inputs. LSTM is a particular type of RNN that uses cell state memory instead of primary neurons to manage the sequence classification. Eventually, these values were transferred into a classification layer that functions similarly to how the ANN works. Finally, it went into the sigmoid activation, responsible for the binary classification and predicted diabetes.

CNN-Bi-LSTM: A Real-Time Framework
In this study, we utilized the architecture of CNN-LSTM in a bidirectional way, which is designed for diabetes prediction in a real-time framework. Here, first, we performed the training process over the static dataset using optimized parameters by using the grid search hyperparameter optimization technique. Then, we cross-validated our model with a real-time scenario, and lastly, we performed a prediction process over the

CNN-Bi-LSTM: A Real-Time Framework
In this study, we utilized the architecture of CNN-LSTM in a bidirectional way, which is designed for diabetes prediction in a real-time framework. Here, first, we performed the training process over the static dataset using optimized parameters by using the grid search hyperparameter optimization technique. Then, we cross-validated our model with a real-time scenario, and lastly, we performed a prediction process over the training dataset to predict diabetes. The complete working of the CNN-Bi-LSTM [48] architecture was discussed in three sections, such as training of the model using the PIDD dataset, optimizing the model with hyperparameter optimization, and lastly, prediction of diabetes through our proposed optimized CNN-Bi-LSTM model. Architectural representation is shown in Figure 6.
training dataset to predict diabetes. The complete working of the CNN-Bi-LSTM [48] architecture was discussed in three sections, such as training of the model using the PIDD dataset, optimizing the model with hyperparameter optimization, and lastly, prediction of diabetes through our proposed optimized CNN-Bi-LSTM model. Architectural representation is shown in Figure 6. • Input data: The necessary data for CNN-Bi-LSTM training must be entered; • Preprocessing of input data: The z-score standardization approach was used to normalize input data since there was a substantial gap in input data to fully train the algorithm, as indicated in Equation (4) • Input data: The necessary data for CNN-Bi-LSTM training must be entered; • Preprocessing of input data: The z-score standardization approach was used to normalize input data since there was a substantial gap in input data to fully train the algorithm, as indicated in Equation (4) • CNN Layer Calculation: Eight input features with an input shape of 6 × 1 were passed through a convolutionary layer, which is responsible for generating a feature map by striding filters of kernel size 1 at one step. We introduced nonlinearity to the feature by using the ReLU function, which has a range of 0 and • CNN Layer Calculation: Eight input features with an input shape of 6 × 1 were passed through a convolutionary layer, which is responsible for generating a feature map by striding filters of kernel size 1 at one step. We introduced nonlinearity to the feature by using the ReLU function, which has a range of 0 and 1, i.e., it does not activate all neurons at the same time as it deactivates neurons whose values are less than zero. These function maps are then transferred into a batch normalization that regularises the meaning and prevents the over-fitting functions. Besides, these functions are transferred into the max-pooling layer, with pool size 1 being used for the downsampling of the function diagram. Down-sampled features are then passed through the flattening layer, which is responsible for translating these function matrices to 1D vectors.

Remark 2. (Stride:)
The stride is the number of pixels that have been moved across the input matrix. If we set the value of stride as 1, the filters shift one pixel at a time, whereas if we set the value of stride at 2, the filters move two pixels at a time. In this section, we use stride filters with kernel sizes of 1.
• Bi-LSTM Estimation of the layer: Using Bi-LSTM [49,50], the output data of the CNN layer are determined. Bi-LSTM is a bidirectional RNN that consists of 32 hidden LSTM cells, so there are a total of 64 LSTM cells that have an additional peehole connection that prevents vanishing gradient problems and additional cell state memory that uses past and future knowledge to forecast output by using two separate hidden layers, such as forward state sequence, is represented by → h t as shown in Equation (6) [43], the backward state sequence ← h t is shown in Equation (7) [43], and the output vector is represented by Equation (8) where h t : hidden state at timestamp t W ph : weight matrix between input and hidden vector p t : input vector at timestamp t W hh : the weight vector between two hidden states h t+1 : the hidden state vector at timestamp t + 1 b h : the bias vector for hidden state vectors • Dropout: 15% dropout was applied, which is used to reduce the overfitting in neural networks by dropping some of the random nodes during the network training process; • Output values were transferred into a sigmoid function used for binary classification, which determines whether or not the input instance is diabetic [51]; • Calculation Error: The cost function assesses how effectively the neural network is equipped by describing the difference between the provided testing sample and the expected performance. The optimizer function was used to decrease the cost function. A cross-entropy function, which comes in a variety of forms and sizes, is commonly used in deep learning. Mathematically cost function ϕ is expressed as Equation (9) [35]: where m : batch size a : output resultant value b : expected value Experiments have shown that when Adam is used, the optimal point is reached fast. As a result, we employed the Adam optimizer method, which has a 1 × 10 −4 learning rate.

•
Evaluate if the prediction mechanism's final criterion has been achieved: Effective completion of cycles depends on two factors: weights should not exceed a certain threshold, and the estimated error rate is below a specified threshold. If at least one of the graduation standards is fulfilled, the training is finished. Otherwise, the instruction would be resumed; • Back Propagation Error: The calculated error is spread in the opposite direction, the weight and bias of each layer are modified, and then the process goes back to stage (4) to start network training.
Hyperparameter Optimization: In this part, we presented the techniques for tuning the hyperparameter for model training in the best way possible to predict diabetes more accurately. It is a technique that helps us reduce the cost of the model by tuning hyperparameters that change the shape of the model to achieve the highest accuracy. This study applied a grid search algorithm to five deep learning models to train their parameters.

Definition 2.
(Grid Search algorithm) is often referred to as an exhaustive search across hyperparameters using permutation and combination. It returns the settings with maximum precision and accuracy during the validation process. We discovered that each of the five models performs well after analyzing them using a 10-fold cross-validation approach rather than dividing the dataset for training and testing. We also reported that the CNN-Bi-LSTM model outperformed four deep learning models in terms of accuracy, sensitivity, and specificity. Figure 7;

II. CNN-Bi-LSTM Prediction Process: The prediction of the CNN-Bi-LSTM model is explained by the activity diagram shown in
• Input Data: The essential data for CNN-Bi-LSTM predictions must be entered; • Preprocessing of input data: This is performed through standardization through Equation (4); • Process of prediction: Standardized data are fed into the CNN-Bi-LSTM, which is then used to calculate the output value; • Output Result: Recovered results are provided to complete the prediction process. The model summary of CNN-Bi-LSTM is represented in Table 6.

Prototype Implementation and Testing of Proposed Model Using Real-Time Database
This section explains the prototype implementation of a real-time system that uses deep learning models to predict diabetes and provide assistance to users and medical experts. Because the focus of this work is on integrating a diabetes prediction model and leveraging user acquired data to make predictions for enabling lifestyle management, the proposed system is based on a cloud-based brokering framework that integrates multiple health cloud platforms and devices. The devices used in this work are the Samsung Note 8 smartphone an IOT health device gateway that integrates SPO2 sensors such as MAX30100, which is an integrated sensor used to sense SPO2, as well as BPM; an IOT sensor as Node 32; BP sensor; and Pulse sensor. Additionally, it includes AWS serverless API gateway, AWS storage, and AWS server. The working of the proposed framework is shown in Figure 8. Data are collected from medical device smartphones through AWS serverless gateways and transmitted to AWS cloud servers. This node contains gadgets that capture user data both actively and passively. The user first enters their static profile information. The IOT devices offer active collection, in which the user monitors their levels of glucose, blood pressure (BP), and SPO2. The smartwatch and smartphone allow passive collection; they automatically track their steps, caloric outflow, activity type, and duration. Additionally, one can upload its physical reports through smartphones. We utilized an ESP32 based gateway that has built-in Bluetooth and Wi-Fi. Our gateway takes data from all three sensors ( Spo2, heartbeat, and blood pressure) and sends data to the AWS cloud using Wi-Fi, and receives data from our handheld device via Bluetooth. Our gateway provides independence to elderly users who do not use mobile phones. Signing into their respective mobile apps allows users to access this information. Figure 9 depicts the data gathered from each of the devices for a single volunteer individual in this study. The various device sensors on Android smartphones and smartwatches are used to automatically collect and aggregate activity data and present the fitness history to consumers via the mobile application. The Google vendor API monitors movements, activities, and heart rates, as well as calculating the calories burnt and steps taken by using gyroscope and accelerometer sensors on smartphones and smartwatches, as well as an added heart rate sensor on smartwatches.

Sensing of Real-Time Data through a Proposed Framework
This node contains gadgets that capture user data both actively and passively. The user first enters their static profile information. The IOT devices offer active collection, in which the user monitors their levels of glucose, blood pressure (BP), and SPO2. The smartwatch and smartphone allow passive collection; they automatically track their steps, caloric outflow, activity type, and duration. Additionally, one can upload its physical reports through smartphones. We utilized an ESP32 based gateway that has built-in Bluetooth and Wi-Fi. Our gateway takes data from all three sensors ( Spo2, heartbeat, and blood pressure) and sends data to the AWS cloud using Wi-Fi, and receives data from our handheld device via Bluetooth. Our gateway provides independence to elderly users who do not use mobile phones. Signing into their respective mobile apps allows users to access this information. Figure 9 depicts the data gathered from each of the devices for a single volunteer individual in this study. The various device sensors on Android smartphones and smartwatches are used to automatically collect and aggregate activity data and present the fitness history to consumers via the mobile application. The Google vendor API monitors movements, activities, and heart rates, as well as calculating the calories burnt and steps taken by using gyroscope and accelerometer sensors on smartphones and smartwatches, as well as an added heart rate sensor on smartwatches.
Appl. Sci. 2022, 12, x FOR PEER REVIEW 16 of 27 Figure 9. Visualization of real-time data through mobile application.
In the proposed framework AWS server retrieves patient information from the mobile app as well as other mobile applications, e.g., Google Fit, every hour by connecting to the AWS cloud using Server Less API gateway (AWS), and it requires only one-time credential authentication. For authentication purposes, we required the OAuth 2.0 protocol to connect with the mobile application API [12]. It is standard protocol for connection with the various devices online, desktop, and mobile application authorization. This is the security protocol that is utilized for both Google services and API data interaction. A one- In the proposed framework AWS server retrieves patient information from the mobile app as well as other mobile applications, e.g., Google Fit, every hour by connecting to the AWS cloud using Server Less API gateway (AWS), and it requires only one-time credential authentication. For authentication purposes, we required the OAuth 2.0 protocol to connect with the mobile application API [12]. It is standard protocol for connection with the various devices online, desktop, and mobile application authorization. This is the security protocol that is utilized for both Google services and API data interaction. A one-time authentication strategy was used to reduce the intrusiveness of reminding medical professionals or caregivers to authenticate themselves with the online application every time they wish to monitor their data. When registering for the first time, the user connects to their mobile app account over the API to approve access to vendor cloud API privileges for the framework. The user was provided this one-time, two-step method through the mobile app, and they do not need to repeat it. If required, the user may include optional parameters such as skin fold thickness, number of pregnancies, and pre-existing medical concerns into the online application. The server obtains and stores the unique user authentication token for each vendor API. In order to keep the token continually updated, the server connects to the cloud API of the respective vendor and automatically renews the token before its expiry date. This is accomplished by configuring a server-side automated method to update tokens depending on the expiry settings of each vendor API token. AWS server then creates additional matrices depending on the other data that have been acquired, such as age, gender, height, and weight, among other things, that have been previously gathered. User data are aggregated on a daily basis by the server and ordered according to the time stamp of data collection, even if it occurs on the same day. This allows the server to sort user data chronologically. Additionally, metrics such as total daily energy expenditure (TDEE), basal metabolic rate (BMR), and body mass index (BMI) are generated to offer medical practitioners a more comprehensive monitoring capability. Maximum, average, and lowest heart rate values are also computed for the same parameter. In order to make it easier to adopt different prediction and monitoring strategies for accessible diabetic user lifestyle management, it is necessary to increase the number of parameters that can be extracted from the collected data.

Prediction of Diabetes Using Proposed Model Using Real-Time Dataset
We tested our optimized proposed model using a real-time dataset, where data were imported from the AWS server in the form of chunks, each of which was tested with our optimized CNN-Bi -LSTM model, and the results are updated after each instance. The entire method is demonstrated in Figure 10. referred to through Algorithm 1.
Appl. Sci. 2022, 12, x FOR PEER REVIEW 17 of 27 time authentication strategy was used to reduce the intrusiveness of reminding medical professionals or caregivers to authenticate themselves with the online application every time they wish to monitor their data. When registering for the first time, the user connects to their mobile app account over the API to approve access to vendor cloud API privileges for the framework. The user was provided this one-time, two-step method through the mobile app, and they do not need to repeat it. If required, the user may include optional parameters such as skin fold thickness, number of pregnancies, and pre-existing medical concerns into the online application. The server obtains and stores the unique user authentication token for each vendor API. In order to keep the token continually updated, the server connects to the cloud API of the respective vendor and automatically renews the token before its expiry date. This is accomplished by configuring a server-side automated method to update tokens depending on the expiry settings of each vendor API token. AWS server then creates additional matrices depending on the other data that have been acquired, such as age, gender, height, and weight, among other things, that have been previously gathered. User data are aggregated on a daily basis by the server and ordered according to the time stamp of data collection, even if it occurs on the same day. This allows the server to sort user data chronologically. Additionally, metrics such as total daily energy expenditure (TDEE), basal metabolic rate (BMR), and body mass index (BMI) are generated to offer medical practitioners a more comprehensive monitoring capability. Maximum, average, and lowest heart rate values are also computed for the same parameter. In order to make it easier to adopt different prediction and monitoring strategies for accessible diabetic user lifestyle management, it is necessary to increase the number of parameters that can be extracted from the collected data.

Prediction of Diabetes Using Proposed Model Using Real-Time Dataset
We tested our optimized proposed model using a real-time dataset, where data were imported from the AWS server in the form of chunks, each of which was tested with our optimized CNN-Bi -LSTM model, and the results are updated after each instance. The entire method is demonstrated in Figure 10. referred to through Algorithm 1.

Algorithm 1: Algorithm to fetch real-time data
Require: C size , D size , n, i, R i , R c initialization: i = 0, R i = 0 Ensure: n = D size C size , n ≥ 0 if D size, ≥ C size then while n ≥ 1 do Step1: Import chunk i of size C size Step2: Perform testing using proposed model.
Step3: Obtained result R c of chunk i.
Step4: Update previous results R i : Step2: We check if the size of the dataset is greater than or equal to chunk size. Then, each chunk i is tested over our optimized proposed model; • Step3: For every instance, previous results R i were updated with a new result R c ; • Step4: Lastly, we updated the size of the dataset D size, ← D size − C size . If it was still greater, then the chunk size algorithm starts from step1 again.

Experimental Results and Its Analysis in Real-Time Environment
In this section, the experiments were evaluated and analyzed over a real-time dataset PIDD [14] with a python environment with six essential critical parameters such as glucose, insulin, pregnancy, blood pressure, age, and BMI. In order to evaluate the effectiveness of the proposed framework, the results were compared with the similar and recent existing methods such as CNN [50], Bi-LSTM [52], DNN [53], and a combination of CNN-LSTM [54], and CNN-Bi-LSTM for the classification of Type 2 diabetes over PIDD [14]. In this section, five metrics were used to measure the overall success of our proposed model: accuracy (A), as seen in Equation (10) [25]; Recall (R), as in Equation (11) [55]; sensitivity (SN), as in Equation (12) [41]; and specificity (SP), as in Equation (13) [25], where the sensitivity of a model determines its capacity to classify patients who currently have a disease correctly; whereas the specificity of a model determines its capacity to classify disease-free patients correctly. The precision of the model defines the number of patients accurately described by the model. However, the ratio of the number of patients properly classified by the model is called accuracy. Formulas are stated as follows:

Real-Time Qualitative Analysis
For real-time qualitative evaluation, we used grid search algorithms to find the top three mean test scores, such as 90. 38, 85.58, and 79.4, that helped to achieve the highest accuracy, as seen in Table 7. By using the highest test score of 90.38, we were able to achieve the best hyperparameters used for training models, such as learning rate of 0.01, epochs as 250, batch-size as 32, kernel-size as 1, hidden-units as 32, regularisation dropout as 0.05, and optimizer as (referenced in Table 8). Where the following parameters are stated as:

•
Learning Rate: states how many weights are modified in the loss gradient model; • Batch Size: Specified are run through the model at any particular moment; • Epochs: Defines the number of times the machine learning model is performed on the same dataset; • Dropout: Is a method of regularization to reduce the issue of overfitting by dropping some of the random nodes during the training phase and improving generalization error in NN. Remark 3. Adam is a stochastic gradient optimizer for the training of deep learning models, and it is a combination of the best features of AdaGrad and RmsProp, so it can solve problems with low gradients or a lot of background noise.

Real-Time Quantitative Analysis
For quantitative evaluation, we conducted a comparative analysis of deep learning models such as CNN [56], DNN [54], Bi-LSTM [55], CNN-LSTM [50], and CNN-Bi-LSTM, which are trained over a static PIDD dataset by splitting it into two portions, 70% for the training dataset and 30% for the testing dataset, by using hyperparameters such as kernel size as 1; the number of filters as 64; batch-size as 32; regularization dropout as 0.05; optimizer as; maximum pool size as 1; loss-method as binary cross-entropy; epsilon as 1−08; decay as 0.0; epochs as 250 with 32 hidden units and found CNN-LSTM [52] with 90%, CNN [51,55,57,58] has an accuracy of 82%, Bi-LSTM [1] with 85%, DNN [1] with 87%, and, ultimately, CNN-Bi-LSTM outperforms and achieves an accuracy of 88.37% as shown in Table 9. Although the accuracy of all models is reasonable, they suffer from under-fitting and over-fitting problems, and they emerge when models have learned less than or more than 250 epochs. The over-fitting problem model tends to memorize data and cannot generalize new data, while the under-fitting model does poorly in testing but can generalize new data. To remove the over-fitting and under-fitting problems, we trained our models at 250 epochs and over-optimized parameters using 10-fold cross-validation, and the accuracy of each model was increased, as shown in Table 10. Besides, the accuracy of CNN-LSTM was increased to 93%, CNN was increased to 96%, BI-LSTM was also increased to 95%, and DNN was increased to 90%. However, our CNN-Bi-LSTM model outperformed compared to other models and achieved the highest accuracy of 98.85%, as shown in Figure 11, with a sensitivity of 97% and specificity of 98%. Thus, after the discussion, we can infer that CNN-Bi-LSTM is better relative to other deep learning models in terms of accuracy, sensitivity, specificity, precision, and recall. Therefore, we utilized the CNN-Bi-LSTM model in a realtime setting to classify diabetic patients more accurately as well as to monitor their vitals on a real-time basis. Additionally, we have validated our proposed model with different scenarios. First, the proposed model is validated without imputations where the values for precision~0.83, recall~0.88, and F1-score~0.85 for the outcome 0. However, for the outcome 1 values of precision~0.83, recall~0.88, F1-score~0.85 and accuracy~80% as shown in Table 11. Secondly, the proposed model is validated without removing outliers, where the values for precision~0.81, recall~0.87, and F1-score~0.83 for the outcome 0. However, for the outcome 1 values of precision~0.72, recall~0.61, F1-score~0.70 and accuracy~79% as shown in Table 12. Hence, missing values and outliers affect the performance of the model. Therefore, it is essential to preprocess data before training the proposed model.    The performance of different models can be visualized through a graph, as shown in Figure 12. When compared to other current approaches, it is seen that a testing dataset is adequately fitted to the training model in CNN-Bi-LSTM with very few distortions. Hence, our suggested framework is more accurate and capable of properly classifying di-  The performance of different models can be visualized through a graph, as shown in Figure 12. When compared to other current approaches, it is seen that a testing dataset is adequately fitted to the training model in CNN-Bi-LSTM with very few distortions. Hence, our suggested framework is more accurate and capable of properly classifying diabetic patients. Additionally, we have tested various CNN-Bi-LSTM findings over-optimized hyperparameters with different mean test scores (reference from Table 7), and the rest of the results of our proposed methodology with different mean test scores are presented in Table 13. It is clearly seen that our model outperformed, with the highest mean test score at 90 (Reference from Table 10). Lastly, comparisons were made between various stateof-the-art algorithms and the proposed model in terms of accuracy, as shown in Table 14, and it was found that CNN-Bi-LSTM outperformed in terms of accuracy (referenced from  Table 10) [61,62]. After all these conversations, we may infer that the CNN-Bi-LSTM model is more accurate in identifying diabetes patients than the other four models as well as stateof-the-art algorithms. As a result of the fact that CNN-Bi-LSTM combines the power of both CNN and Bi-LSTM, where CNN is used for feature extraction, and Bi-LSTM has additional peehole connections that prevent vanishing gradient problems as well as additional cell state memory that uses past and future knowledge to forecast the output by using two separate hidden layers.  In order to check the performance of deep learning models, we additionally performed a statistical Student t-test between CNN-LSTM and CNN_Bi-LSTM. We used the variance estimate by checking the dependency between the dataset and computing the pvalue. In the null hypothesis, we assumed that there was no statistical difference between the performances of the models. However, the alternative hypothesis we have considered is that there is a potential difference between the performance of models. If the p-value is less than the significant value, then we rejected the null hypothesis and assumed that there was no significant difference between the performance of deep learning models. If the pvalue was higher, then we rejected the hypothesis and considered the alternative hypothesis. In order to calculate the p-value, we calculated the mean of the difference between the results of two classifiers at every iteration step as we are using K-fold cross-validation by using the formula: ̅ = ∑ . Then we calculated the variance of the difference, through = ∑ ( ) where n is the total number of data points. Then we computed the data points used for training, i.e., and data points for testing . Then we computed the mod of variance, = ( + ) and finally, we calculate the time statics through . The calculation of the p-value is shown in Figure 13, where it can be easily seen that the p-value is approximately 1.84%, and the significance value is 5%. As the pvalue is less than the significance value, we rejected the null hypothesis and assume that the performance of the proposed model is different from others and better in terms of accuracy as compared to other models (referenced from Table 10).   In order to check the performance of deep learning models, we additionally performed a statistical Student t-test between CNN-LSTM and CNN_Bi-LSTM. We used the variance estimate by checking the dependency between the dataset and computing the p-value.
In the null hypothesis, we assumed that there was no statistical difference between the performances of the models. However, the alternative hypothesis we have considered is that there is a potential difference between the performance of models. If the p-value is less than the significant value, then we rejected the null hypothesis and assumed that there was no significant difference between the performance of deep learning models. If the p-value was higher, then we rejected the hypothesis and considered the alternative hypothesis. In order to calculate the p-value, we calculated the mean of the difference between the results of two classifiers at every iteration step as we are using K-fold cross-validation by using the formula: d = 1 n ∑ n 1 d i . Then we calculated the variance of the difference, through where n is the total number of data points. Then we computed the data points used for training, i.e., n 1 and data points for testing n 2 . Then we computed the mod of variance, σ 2 Mod = 1 n 1 + n 1 n 2 σ 2 and finally, we calculate the time statics through d σ Mod . The calculation of the p-value is shown in Figure 13, where it can be easily seen that the p-value is approximately 1.84%, and the significance value is 5%. As the p-value is less than the significance value, we rejected the null hypothesis and assume that the performance of the proposed model is different from others and better in terms of accuracy as compared to other models (referenced from  Figure 13. Paired t-test to evaluate the performance of the proposed model.

Conclusions
Diabetes is one of the prolonged diseases triggered by the unbalanced release of insulin, which becomes apparent when blood glucose levels are above average levels. In this study, five different models, such as CNN, DNN, CNN-LSTM, Bi-LSTM, and CNN-Bi-LSTM, are used to identify diabetic patients over static PIDD. The CNN-Bi-LSTM is used for the first time in this study to perform well in the multiple classification and prediction problems, which makes it unique. These models are applied to the dataset defined in two ways: training data are kept separate from testing data. Furthermore, ten-fold cross-validation methods are applied to measure the accuracy of models. Furthermore, hyperparameter optimization is achieved using a grid search algorithm to track the maximum values of the parameters. After an experimental analysis, we found that each of the five Figure 13. Paired t-test to evaluate the performance of the proposed model.

Conclusions
Diabetes is one of the prolonged diseases triggered by the unbalanced release of insulin, which becomes apparent when blood glucose levels are above average levels. In this study, five different models, such as CNN, DNN, CNN-LSTM, Bi-LSTM, and CNN-Bi-LSTM, are used to identify diabetic patients over static PIDD. The CNN-Bi-LSTM is used for the first time in this study to perform well in the multiple classification and prediction problems, which makes it unique. These models are applied to the dataset defined in two ways: training data are kept separate from testing data. Furthermore, ten-fold cross-validation methods are applied to measure the accuracy of models. Furthermore, hyperparameter optimization is achieved using a grid search algorithm to track the maximum values of the parameters. After an experimental analysis, we found that each of the five models works well when using a 10-fold cross-validation approach instead of splitting the data set for training and testing. Our analysis showed that the CNN-Bi-LSTM model outperformed all deep learning models in terms of accuracy of 98.85%, sensitivity of 97%, and specificity of 98%. Lastly, we proposed a framework that demonstrates the evaluation of our CNN-Bi-LSTM model over a real-time scenario, which helps the clinicians to keep complete information about the patients and check real-time statistics about their vitals. In the future, we can generate a dashboard to visualize the summary of the vitals to the practitioners as well as to the patients. Data Availability Statement: Data used in this article will be made available on request to the corresponding author.