Diagnosis and Prognosis of COVID-19 Disease Using Routine Blood Values and LogNNet Neural Network

Since February 2020, the world has been engaged in an intense struggle with the COVID-19 disease, and health systems have come under tragic pressure as the disease turned into a pandemic. The aim of this study is to obtain the most effective routine blood values (RBV) in the diagnosis and prognosis of COVID-19 using a backward feature elimination algorithm for the LogNNet reservoir neural network. The first dataset in the study consists of a total of 5296 patients with the same number of negative and positive COVID-19 tests. The LogNNet-model achieved the accuracy rate of 99.5% in the diagnosis of the disease with 46 features and the accuracy of 99.17% with only mean corpuscular hemoglobin concentration, mean corpuscular hemoglobin, and activated partial prothrombin time. The second dataset consists of a total of 3899 patients with a diagnosis of COVID-19 who were treated in hospital, of which 203 were severe patients and 3696 were mild patients. The model reached the accuracy rate of 94.4% in determining the prognosis of the disease with 48 features and the accuracy of 82.7% with only erythrocyte sedimentation rate, neutrophil count, and C reactive protein features. Our method will reduce the negative pressures on the health sector and help doctors to understand the pathogenesis of COVID-19 using the key features. The method is promising to create mobile health monitoring systems in the Internet of Things.


Introduction
The new severe acute respiratory syndrome coronavirus (SARS-CoV-2), first identified in 2019, has rapidly affected the world and caused a pandemic [1,2]. The disease, identified as coronavirus 2019 (COVID- 19), can cause severe pneumonia and fatal acute respiratory distress syndrome (ARDS) [3][4][5][6]. While the disease may be asymptomatic, severe ARDS is thought to be caused by an inflammatory cytokine storm that may be encountered during the disease period [6,7]. The pathogen can cause a serious respiratory disorder that requires special intervention in intensive care units (ICUs) and, in some cases, may cause death [6,7]. Moreover, the symptoms of COVID-19 induced by the new SARS-CoV-2 are difficult to distinguish from known infections in the majority of patients [6,8,9].
Previous studies have demonstrated the clinical importance of changes in routine blood parameters (RBV) in the diagnosis and prediction of prognosis of infectious diseases [1][2][3][4][10][11][12]. Similarly, many abnormalities have been reported in the peripheral blood of patients infected with COVID-19 [6,7,11]. However, Jiang et al. [13] and Zheng et al. [14] emphasized that information on early predictive factors for particularly severe and fatal COVID-19 cases is relatively limited and further research is needed. Huyut et al. [6] and Lippi et al. [15] described that the rapid spread of disease in pandemics overwhelms health systems and raises concerns about the need for intensive care treatment [6,15]. In addition, the detection of severe and mild patients in COVID-19 is an important and clinically The RBV of the patients consisted of biochemical, hematological, and immunological tests. Patients admitted to the ICU were defined as severely infected, while patients who could not be admitted to the ICU (non-ICU, subjects in all wards) were defined as mildly infected. The dataset SARS-CoV-2-RBV1 included information on n = 2648 COVID-19 positive outpatients and n = 2648 COVID-19 negative (control group), for a total of 5296 patients. The dataset SARS-CoV-2-RBV2 contained information of n = 203 ICU and n = 3696 non-ICU COVID-19 patients. Raw data records included patientsʹ diagnoses (COVID-19, heart disease, asthma, etc.), treatment units (ICU or non-ICU), age, and RBV data. The entire recording process took 20 h. In the raw data, RBV data were on a quantitative scale, diagnostic data were on a multinomial scale, and treatment units were on a binomial scale. In the data preprocessing stage, the string data were converted into numerical data. Categorical data were coded, repeated measurements were averaged, duplicates were removed, and quantitative data were normalized. The missing RBV data were complemented by the mean of the respective parameter distribution.

Characteristic of Participants, Workflow and Define Datasets
In the EBYU-MG hospital, only the cases that were detected as SARS-CoV-2 by realtime reverse transcriptase polymerase chain reaction (RT-PCR) in nasopharyngeal or oropharyngeal swabs during the dates covered by this study were diagnosed with COVID-19. The research only included individuals over the age of 18. In order to prevent various complications, RBV results at the first admission were recorded.
The first SARS-CoV-2-RBV dataset (SARS-CoV-2-RBV1) includes the information of 2648 patients diagnosed with COVID-19 and receiving outpatient treatment in hospital on the specified dates, and the same number of patients (control group) whose COVID-19 tests were negative. The control group was randomly selected from individuals over the age of 18 who had applied to the emergency COVID-19 service but had a negative RT-PCR test. With the feature selection procedure, the most important RBV features that are effective in the diagnosis of the disease were selected from the SARS-CoV-2-RBV1

Characteristic of Participants, Workflow and Define Datasets
In the EBYU-MG hospital, only the cases that were detected as SARS-CoV-2 by real-time reverse transcriptase polymerase chain reaction (RT-PCR) in nasopharyngeal or oropharyngeal swabs during the dates covered by this study were diagnosed with COVID-19. The research only included individuals over the age of 18. In order to prevent various complications, RBV results at the first admission were recorded.
The first SARS-CoV-2-RBV dataset (SARS-CoV-2-RBV1) includes the information of 2648 patients diagnosed with COVID-19 and receiving outpatient treatment in hospital on the specified dates, and the same number of patients (control group) whose COVID-19 tests were negative. The control group was randomly selected from individuals over the age of 18 who had applied to the emergency COVID-19 service but had a negative RT-PCR test. With the feature selection procedure, the most important RBV features that are effective in the diagnosis of the disease were selected from the SARS-CoV-2-RBV1 dataset. The selected features were fed into LogNNet neural network to examine the method's performance in diagnosing COVID-19 disease.
The second SARS-CoV-2-RBV dataset (SARS-CoV-2-RBV2) includes the information of 3899 patients who were treated for COVID-19 in hospital on the specified dates. The treatment units of these patients at the first admission were examined. The SARS-CoV-2-RBV2 dataset contains n = 203 ICU and n = 3696 non-ICU COVID-19 patients. Then, with the feature selection procedure, the most influential RBV traits in the prognosis of the disease were selected from the SARS-CoV-2-RBV2 dataset. Selected features were fed into the LogNNet neural network to examine the performance of this method in determining the prognosis and severity of COVID-19 disease.
The SARS-CoV-2-RBV1 and SARS-CoV-2-RBV2 datasets are presented in Tables 1 and 2. SARS-CoV-2-RBV1 and SARS-CoV-2-RBV2 datasets include immunological, hematological, and biochemical RBV parameters and each dataset consists of 51 features. In the SARS-CoV-2-RBV1 dataset, positive COVID-19 test results were coded as 1 and negative as 0 (COVID-19 = 1, non-COVID-19 = 0).   1  ALT  12  Chlorine  23  eGFR  34  MONO  45  Fibrinogen  2  AST  13  Cholesterol  24  Urea  35  MPV  46  INR  3  Albumin  14  Creatinine  25  UA  36  NEU  47  PT  4  ALP  15  CK  26  BASO  37  PDW  48  PCT  5  Amylase  16  LDH  27  EOS  38  PLT  49  ESR  6  CK-MB  17  LDL  28  HCT  39  RBC  50  Troponin  7  D-Bil  18  Potassium  29  HGB  40  RDW  51  aPTT  8  GGT  19  Sodium  30  LYM  41  WBC  9  Glucose  20  T-Bil  31  MCH  42  CRP  10  HDL-C  21  TP  32  MCHC  43  D-Dimer  11  Calcium  22  Triglyceride  33  In the SARS-CoV-2-RBV2 dataset, severely infected (ICU) COVID-19 patients were coded as 1, while mildly infected (non-ICU) COVID-19 patients were coded as 0. Datasets are available for download in the Supplementary Materials. Figure 2 demonstrates the principle of operation of the neural network LogNNet [43].  An object in the form of a feature vector, denoted as d, is inputted to feature vector contains N coordinates (d1, d2 … dN ), where the number N the user. The classifier output determines the object class to which the inp tor d belongs. The number of possible classes is denoted as M. LogNNet  An object in the form of a feature vector, denoted as d, is inputted to LogNNet. The feature vector contains N coordinates (d 1 , d 2 , . . . , d N ), where the number N is defined by the user. The classifier output determines the object class to which the input feature vector d belongs. The number of possible classes is denoted as M. LogNNet contains a reservoir with a special matrix, denoted as W. The matrix W was filled in a row-by-row pattern with numbers generated by the chaotic mapping x n . We use chaotic mapping based on the congruential generator Equation (1) (see Table 3) and the algorithm of matrix W filling shown in Algorithm 1. Vector d is converted into a vector Y of dimension N + 1 with an additional coordinate Y 0 = 1, and each component is normalized by dividing by the maximum value of this component in the training base. The next step is a multiplication of a special matrix W with the dimension (N + 1) × P and a vector Y. The result is a vector S' with P coordinates, which is normalized [42] and converted into a vector S h of dimension P + 1 with zero coordinate S h [0] = 1, which plays the role of a bias element. In this way, the primary transformation of the feature vector d into the second (P + 1)-dimensional space is completed. Then, the vector S h is fed to a two-layer linear classifier, with the number of neurons H in the hidden layer S h2 , and the number of outputs M in the output layer S out . To indicate the parameters of the neural network, the following designation LogNNet N:P:H:M is used.

Optimization of Reservoir Parameters
The optimal chaotic mapping parameters were selected using a special algorithm. The ranges of the parameters are indicated in Table 3. Before optimization, it is necessary to set the following values of the constant parameters of the model: the value P + 1, which determines the dimension of the vectors S h and S h2 , the number of layers in the linear classifier, the number of epochs Ep for backpropagation training, and the number of neurons in the classifier's hidden layer, in the case of a two-layer classifier. The training of the LogNNet network is performed by two nested iterations [46]. The inner iteration trains the output LogNNet classifier by backpropagation of error on the training set, and the outer iteration optimizes the model parameters.
During the optimization process, the training and validation bases coincided and were equivalent to the initial datasets (SARS-CoV-2-RBV1 or SARS-CoV-2-RBV2). The outer iteration implements the particle swarm method with fitness function equal to classification accuracy. Outer iteration ends either when the desired values of the classification accuracy are reached, or when the specified number of iterations in the particle swarm method is completed. As a result, the optimized model parameters (chaotic mapping parameters) at the output allow us to obtain the highest classification accuracy on the validation set.

Classification Accuracy, K-Fold Cross-Validation and Balancing Techniques
The K-fold cross-validation technique was used to test LogNNet. This method is well suited for the medical databases, which are not split into test and training sets. The elements of the set (SARS-CoV-2-RBV1 or SARS-CoV-2-RBV2) are divided into K parts (K = 5). One of the parts is taken as the test sample, and the remaining K-1 parts are used for the training sample. Then, the average value of the metrics is calculated for all K cases when one of the K parts of the set becomes the test sample in turn. A distinctive feature of the method is that the separate test data are not needed for the training process. Applying the K-fold cross-validation technique, we calculate the classification metrics: classification accuracy, A, precision, recall, and F1-metric. Wherever we talk about the classification accuracy A in this article, we imply the value obtained by the K-fold cross-validation method.
To obtain a higher value of A, the training K-1 parts of the sets were balanced as in [43]. The balancing implies equalizing the number of objects for each class, supplementing the classes with copies of already existing objects, and sorting the training set in sequential order. The balancing process can be illustrated by the following example. The training set consists of 10 objects divided into 2 classes. Each object is assigned a feature vector dz m , where z is the object number z = 1, . . . , 10, m is the class number m = 1, . . . , 2. For example, we have 7 objects of class 1 (d1 1 , d2 1 , d4 1 , d5 1 , d6 1 , d7 1 , d10 1 ) and three objects of class 2 (d3 2 , d8 2 , d9 2 ). We find the maximum number of objects (MAX) in the classes, and MAX equals 7 for class 1. We supplement the remaining groups with copies of the already existing objects (duplication) to equalize the number to MAX. Therefore, for class 2, we acquire the group (d3 2 , d8 2 , d9 2 , d3 2 , d8 2 , d9 2 , d3 2 ). Then, we compose a balanced training data set, choosing one object from each group in turn. As a result, we achieve the following training set: (d1 1 , d3 2 , d2 1 , d8 2 , d4 1 , d9 2 , d5 1 , d3 2 , d6 1 , d8 2 , d7 1 , d9 2 , d10 1 , d3 2 ), which consists of 14 vectors and has the same number of objects in every class.

Threshold Approach
The simplest approach for classifying by one feature in the presence of only two classes is based on determining the threshold value separating the classes Vth. For the SARS-CoV-2-RBV1 dataset, we introduce an additional designation of the type of threshold value Type 1 or Type 2 in accordance with the rule: Type 1 : if feature value > Vth then "COVID-19" else "non-COVID-19" Type 2 : if feature value > Vth then "non-COVID-19" else "COVID-19" The threshold type indicates which side of the threshold the sick and healthy classes are on.
For the SARS-CoV-2-RBV2 dataset (after balancing, see Section 2.4), we introduce a similar relation for the type of threshold value: Type 1 : if feature value > Vth then "ICU" else "non-ICU" Type 2 : if feature value > Vth then "non-ICU" else "ICU" Threshold accuracy after balancing datasets (see Section 2.4) is determined as were TP denotes true positive, TN true negative, FP false positive, and FN false negative. K-fold validation is not used when calculating Ath. The threshold value Vth was determined by stepwise enumeration and finding the maximum value of Ath.
The threshold method reflects the dependence of one feature and COVID-19 and indicates the classification success (Equations (2)-(4)). In practical applications, the LogNNet is a more powerful classification tool than the simple threshold method, revealing more information between features and COVID-19.

Feature Selection Method
The feature selection method is based on a wrapper-type backward feature elimination algorithm and has two consecutive steps. First, redundant features and features that make training of the neural network difficult are removed. In backward elimination, the algorithm starts with all the features and removes the least significant feature at each iteration. The features are removed by zeroing the corresponding components of the input vectors d. The second stage includes sorting the remaining features according to their contribution to the classification metric.
Features selection for the dataset SARS-CoV-2-RBV2 illustrates this method. Let us suppose a reservoir optimization was carried out and an accuracy of A 51 = 93.665% was obtained (using K-fold cross-validation), where the designation A NF means the classification accuracy when using NF = 51 features. Let us introduce additional pointers, denote the set of removed features by FR, and denote the set of selected features by FS. For example, A 49 (FR [3,33]) denotes accuracy at NF = 49 features with features z = 3 and z = 33 removed, and A 4 (FS [1,22,33,41,55] denotes accuracy at NF = 4 features with the main features from the set FS, z = 1, 22, 33, 41, 55. Next, we plot the dependence of the value of dA 51 on the number of the removed feature z (see Figure 3a), where removed, and A4(FS [1,22,33,41,55] denotes accuracy at NF = 4 features with the main features from the set FS, z = 1, 22, 33, 41, 55. Next, we plot the dependence of the value of dA51 on the number of the removed feature z (see Figure 3a), where Dependence dA(z) is a function of the feature strength. The value A50(FR[z]) characterizes the classification accuracy of the neural network using NF = 50 features, after deleting the feature with number z. Positive feature strength dA51 (Figure 3a and Equation (5)) means that the removal of the feature reduces the classification accuracy of the network and the feature is useful. Negative dA51 means that the feature interferes with learning (redundant) and its removal leads to an increase in the classification properties of the neural network. After the first selection iteration, the seven most useful features can be identified having numbers z = 49, 36, 42, 19, 12, 3, 21 ( Figure 3a). The feature that makes learning the most difficult is number z = 44 (in Figure 3 it is indicated by the index ʹMinimumʹ). Its removal makes A50(FR [44]) = 94.075%, which exceeds the previous value A51 = 93.665%.
The next iteration involves calculating the dependence of dA50(z) (Figure 3b), where Equation (6) implies the exclusion of the worst feature z = 44 and the exclusion of all other features in turn. As a result, the next feature to exclude will be the feature z = 45, and the best accuracy will be A49(FR [44,45] Iterations continue until all dA values are greater than or equal to zero. Figure 3c,d shows graphs for Equations (7) The graph in Figure   Dependence dA(z) is a function of the feature strength. The value A 50 (FR[z]) characterizes the classification accuracy of the neural network using NF = 50 features, after deleting the feature with number z. Positive feature strength dA 51 ( Figure 3a and Equation (5)) means that the removal of the feature reduces the classification accuracy of the network and the feature is useful. Negative dA 51 means that the feature interferes with learning (redundant) and its removal leads to an increase in the classification properties of the neural network. After the first selection iteration, the seven most useful features can be identified having numbers z = 49, 36, 42, 19, 12, 3, 21 ( Figure 3a). The feature that makes learning the most difficult is number z = 44 (in Figure 3 it is indicated by the index 'Minimum'). Its removal makes A 50 (FR [44]) = 94.075%, which exceeds the previous value A 51 = 93.665%.
The next iteration involves calculating the dependence of dA 50 (z) (Figure 3b), where Equation (6) implies the exclusion of the worst feature z = 44 and the exclusion of all other features in turn. As a result, the next feature to exclude will be the feature z = 45, and the best accuracy will be A 49 (FR [44,45] Iterations continue until all dA values are greater than or equal to zero. Figure 3c,d shows graphs for Equations (7) and (8) dA 49 The graph in Figure 3d reflects the dependence dA 48 (z) that has positive values. Thus, the best classification accuracy corresponds to A 48 (FR [14,44,45]) = 94.434%, after removing the features z = 44, 45, 14. During the selection, the set of the seven best features with highest feature strength dA also changed from the set [3,12,19,21,36,42,49] (Figure 3a) to [3,12,36,39,40,42,49] (Figure 3d, red circle).

Dataset SARS-CoV-2-RBV1
LogNNet 51:50:20:2 architecture was used for SARS-CoV-2-RBV1 dataset. Reservoir optimization following the method from Section 2.3 with the number of epochs Ep = 50 led to the parameters of the congruential generator listed in Table 4. Feature selection was performed with the number of epochs Ep = 100. Prior to selection, the dA 51 (z) shape is plotted in Figure 4a. After feature selection, the redundant features have the numbers z = 21, 37, 42, 49, 40, and the dA 46 (z) plot is shown in Figure 4b. The influence of features with numbers z = 20, 19, 10, 17 has increased.

Dataset SARS-CoV-2-RBV1
LogNNet 51:50:20:2 architecture was used for SARS-CoV-2-RBV1 dataset. Reservoir optimization following the method from Section 2.3 with the number of epochs Ep = 50 led to the parameters of the congruential generator listed in Table 4. Feature selection was performed with the number of epochs Ep = 100. Prior to selection, the dA51(z) shape is plotted in Figure 4a. After feature selection, the redundant features have the numbers z = 21, 37, 42, 49, 40, and the dA46(z) plot is shown in Figure 4b. The influence of features with numbers z = 20, 19, 10, 17 has increased.  The dependence of A46(FR [21,37,40,42,49]) on the number of epochs is shown in Figure 5, and the values of other metrics are shown in Table 5.  The dependence of A 46 (FR [21,37,40,42,49]) on the number of epochs is shown in Figure 5, and the values of other metrics are shown in Table 5.
Ep = 100 will be taken as the optimal value of the number of epochs. The RBV values found most important in the diagnosis of COVID-19 are the features listed in Table 6. The most important of these are MCHC, MCH, and aPTT. MCHC in a blood test allows to find out the average amount of hemoglobin in an erythrocyte. The dependence of A46(FR [21,37,40,42,49]) on the num Figure 5, and the values of other metrics are shown in Table 5    The efficiency of LogNNet in determining the diagnosis of COVID-19 using only seven features and their combinations is shown in Table 7.
The accuracy of the model in diagnosing the disease with seven features was almost equal to the accuracy rate in using all 46 features (A 7~9 9.4 vs. A 46~9 9.59) ( Table 7).  Figure 6. An LDL level lower than 116.1 mg/dL, HDL-C level lower than 43.1 mg/dL, Cholesterol level lower than 206.3 mg/dL, Triglyceride level lower than 163.3 mg/dL, MCHC level higher than 31.3 g/dL, and Amylase level higher than 76.3 u/L mg/dL are critical levels for the detection of sick individuals. Considering any of these critical levels, the patients and healthy individuals could be detected with accuracy between Ath = 85% and Ath = 94%.  Table 7).
Threshold Accuracy on One Feature  Figure  6. An LDL level lower than 116.1 mg/dL, HDL-C level lower than 43.1 mg/dL, Cholesterol level lower than 206.3 mg/dL, Triglyceride level lower than 163.3 mg/dL, MCHC level higher than 31.3 g/dL, and Amylase level higher than 76.3 u/L mg/dL are critical levels for the detection of sick individuals. Considering any of these critical levels, the patients and healthy individuals could be detected with accuracy between Ath = 85% and Ath = 94%.  Table A1.
For features from Table 6 not included in Figure 6, case distribution histograms (MCH, aPTT, HCT, MONO, RBC) are demonstrated in Figure 7. The success of these features alone in detecting sick and healthy individuals was less than 60% (Figure 7). However, the combination of MCHC with MCH and the combination of MCHC with HDL-C in detecting sick and healthy individuals is higher than their individual performance ( Table 7). Revealed high-level mutual information among these variables helps LogNNet    Table A1.
For features from Table 6 not included in Figure 6, case distribution histograms (MCH, aPTT, HCT, MONO, RBC) are demonstrated in Figure 7. The success of these features alone in detecting sick and healthy individuals was less than 60% ( Figure 7). However, the combination of MCHC with MCH and the combination of MCHC with HDL-C in detecting sick and healthy individuals is higher than their individual performance (Table 7). Revealed high-level mutual information among these variables helps LogNNet to diagnose COVID-19. The combinations of MCH, aPTT, HCT, MONO, and RBC features are not effective in the diagnosis of the disease (A 5 (FS [10,17,19,22,25]), Table 7). We think that there is a low correlation between these features and COVID-19.  Table A1.

Dataset SARS-CoV-2-RBV2
LogNNet 51:50:20:2 architecture was used for the SARS-CoV-2-RBV2 dataset. The result of reservoir optimization obtained following the method from Section 2.3 with the number of epochs Ep = 50 led to the parameters of the congruential generator indicated in Table 4. Feature selection was carried out with the number of epochs Ep = 150. Prior to selection, feature strength corresponded to dA51(z) (Figure 3a). After feature selection, the redundant features are with numbers z = 44, 45 and 14, and the dA48(z) graph is shown in Figure 3d.
The dependence of A48(FR [14,44,45]) on the number of epochs is shown in Figure 8, and the values of other metrics are shown in Table 8.    Table A1.

Dataset SARS-CoV-2-RBV2
LogNNet 51:50:20:2 architecture was used for the SARS-CoV-2-RBV2 dataset. The result of reservoir optimization obtained following the method from Section 2.3 with the number of epochs Ep = 50 led to the parameters of the congruential generator indicated in Table 4. Feature selection was carried out with the number of epochs Ep = 150. Prior to selection, feature strength corresponded to dA 51 (z) (Figure 3a). After feature selection, the redundant features are with numbers z = 44, 45 and 14, and the dA 48 (z) graph is shown in Figure 3d.
The dependence of A 48 (FR [14,44,45]) on the number of epochs is shown in Figure 8, and the values of other metrics are shown in Table 8.
Ep = 150 is be taken as the optimal value of the number of epochs. The metrics for the "ICU" case are significantly worse than for the "non-ICU" case because of limited data for the "ICU" case. The most important RBVs in identifying severely and mildly infected COVID-19 patients are the features listed in Table 9. The most important of these are ESR and NEU. redundant features are with numbers z = 44, 45 and 14, and the dA Figure 3d.
The dependence of A48(FR [14,44,45]) on the number of epoc and the values of other metrics are shown in Table 8.   The efficiency of LogNNet when using only the 12 features and their combinations to identify severely and mildly infected COVID-19 patients are shown in Table 10. The recall value indicates what percentage of individuals diagnosed as mild or severe patients by the specialist could be recognized as mild or severe patients by our model. In other words, the recall value indicates the success of our model in distinguishing mild or severe patients. The precision value indicates the percentage of the individuals diagnosed as mild or severe patients by our model who were also defined as mild or severe patients by the specialist. In other words, the precision value shows the success of our model in diagnosing mild or severe patients.
The accuracy of the model run with 12 features to identify mildly and severely infected patients was close to the accuracy rate of the model run with 48 features (A 12~9 0.9 vs. A 48~9 4.94) ( Table 10). The accuracy with the seven features model run was 89.3%, where the model success in diagnosing the mildly infected (precision value) was 99.1%, and success in recognizing mildly infected patients (recall value) was 89.6%. The metrics for the "ICU" case are significantly worse than for the "non-ICU" case. Here, our model decided in favor of the diagnosis of mildly infected (high precision for non-ICU, low precision for ICU) due to the sample number unbalance of our mildly infected and severely infected patients.
Threshold Accuracy on One Feature Table A2 in Appendix A contains values of threshold accuracy Ath, threshold values Vth, as well as types and limits of change for all features. Rows in the table are sorted in descending order of threshold accuracy Ath. Case distribution histograms for features with the highest threshold accuracy (NEU, Albumin, WBC, CRP, Urea, Calcium) are shown in Figure 9.  Table 10). The accuracy with the seven features model run was 89.3%, where the model success in diagnosing the mildly infected (precision value) was 99.1%, and success in recognizing mildly infected patients (recall value) was 89.6%. The metrics for the ʺICUʺ case are significantly worse than for the "non-ICU" case. Here, our model decided in favor of the diagnosis of mildly infected (high precision for non-ICU, low precision for ICU) due to the sample number unbalance of our mildly infected and severely infected patients.
Threshold Accuracy on One Feature Table A2 in Appendix A contains values of threshold accuracy Ath, threshold values Vth, as well as types and limits of change for all features. Rows in the table are sorted in descending order of threshold accuracy Ath. Case distribution histograms for features with the highest threshold accuracy (NEU, Albumin, WBC, CRP, Urea, Calcium) are shown in Figure 9.
Cases with an NEU level higher than 6.2 × 10 3 /μL, WBC level higher than 7.93 × 10 3 /μL, CRP level higher than 15 mg/dL, Urea level higher than 46.9 mg/dL, Albumin level lower than 32.2 g/L, and Calcium level lower than 8.5 mg/dL most likely require intensive care treatment (Figure 9). Considering any of these critical levels, patients requiring intensive care and patients not requiring intensive care could be correctly identified with the accuracy between Ath = 72% and Ath = 78%.  Table A2.  Table A2. Cases with an NEU level higher than 6.2 × 10 3 /µL, WBC level higher than 7.93 × 10 3 /µL, CRP level higher than 15 mg/dL, Urea level higher than 46.9 mg/dL, Albumin level lower than 32.2 g/L, and Calcium level lower than 8.5 mg/dL most likely require intensive care treatment (Figure 9). Considering any of these critical levels, patients requiring intensive care and patients not requiring intensive care could be correctly identified with the accuracy between Ath = 72% and Ath = 78%.

Discussion
COVID-19 is a systemic multi-organ damage disease that causes severe acute respiratory syndrome, death, and continues to spread [3,47]. Despite the use of vaccines, the spread of the disease cannot be stopped, and important mutations have been detected in the structure of the virus [1]. It is likely that COVID-19 will continue to be present in our lives. Despite the large number of studies on COVID-19, some of these studies were contradictory and pathological aspects of the disease could not be fully determined [48]. Changes in many RBVs and hematological abnormalities were observed during the course of the disease [6,48]. The fact that most patients lost their lives in case of severe infection has led to a fight against the disease all over the world [10,49]. In addition, Brinati et al. [19] and Zhang et al. [49] pointed out that various complications may occur during the treatment process of COVID-19, and this makes it important to predict the prognosis of the disease in the early period. Similarly, Mertoglu et al. [1] and Huyut andİlkbahar [3] stated that the early prediction of the diagnosis and prognosis of the disease are important in the first response to severely infected COVID-19 patients.
As with immunodiagnostic testing, RT-PCR testing may present difficulties in identifying true positive and negative individuals infected with COVID-19 [4,50]. Indeed, Teymouri et al. [50] and D'Cruz et al. [51] suggested that to increase the sensitivity of the RT-PCR test, the test should be repeated on multiple samples and the application methodology should be improved. However, these procedures represent a troublesome process for health personnel and patients. These difficulties in diagnosing COVID-19 have further increased the importance of RBVs methods [1,2]. In this context, it is possible to determine both the diagnosis and the prognosis of the disease with RBVs (biomarkers), which are easier to obtain, more economical, and faster to measure [1][2][3][4][5][6].
In an ML study for the diagnosis of COVID-19 based on RBVs, Brinati et al. [19] explained that AI models are based on clinical features and can be used for processes, such as disease diagnosis and prognosis. AI models that use the RBVs can be both an adjunct and an alternative method to rRT-PCR [20]. In addition, AI application results can provide information about the infection risk level and can be used in the rapid triage and quarantine of high-risk patients [20].
In this study, the most effective RBV biomarkers in the diagnosis and prognosis of COVID-19 were determined by a two-step feature selection procedure for use in peripheral IoT devices with low computing resources. Our LogNNet neural network model, fed with selected features, identified sick and healthy individuals, and especially mildly infected patients, with high accuracy.
In the first dataset used in this study, the RBVs of COVID-19 positive (n = 2648) patients and COVID-19 negative (n = 2648) individuals were recorded. In the second dataset, the RBVs of 3899 patients (n = 203 ICU and n = 3696 non-ICU) hospitalized with the diagnosis of COVID-19 were recorded. Hence, 51 features of all patients were identified (Tables 1 and 2). A two-stage feature selection procedure (see Section 2.5) was applied on the datasets and features were found for each dataset. The features selected for the first dataset were fed into the LogNNet neural network, and the accuracy of the method in the diagnosis of COVID-19 was calculated. Then, the selected features for the second dataset were fed into LogNNet neural network, and the performance of the method in identifying mildly and severely infected patients (determining the prognosis of the disease) was assessed.
Previous studies on the diagnosis and prognosis of COVID-19 have indicated the changes in most of the RBV parameters and biomarkers [1][2][3]5]. Mertoglu et al. [1] and Yang et al. [52] reported that the most effective RBV biomarkers in the diagnosis and prognosis of COVID-19 are CRP and LYM. However, other studies conducted for this purpose have reported blood values of CRP, procalcitonin, ferritin, ALT, aPTT, and ESR [3,4,6]. Banerjee et al. [8] used random forest, glmnet, generalized linear models, and ANN neural network models to determine the diagnosis of COVID-19 with 14 RBV values of 81 COVID-19 positive and 517 healthy individuals. Glmnet was found to be the most successful model in the diagnosis of the disease with 92% sensitivity and 91% accuracy [8]. Brinati et al. [19] used various ML methods with 13 RBV values for diagnosis of the disease (102 COVID-19 negative, 177 positive) and noted that the models with the highest accuracy were random forest (82%) and logistic regression (78%). Similarly, Cabitza et al. [20] used various ML models to rapidly detect COVID-19 using many RBV parameters and found the models with the highest accuracy were random forest (88%), support vector machine (SVM) (88%), and k-nearest neighbor (86%). Joshi et al. [22] developed a trained logistic regression model using some RBVs on a dataset of 380 cases, reporting good sensitivity (93%) but low specificity (43%). Yang et al. [21] applied various ML models on 27 RBV parameters of a large patient population of 3356 individuals (42% COVID-19 positive), and found the gradient boost tree model to be the most successful model in the diagnosis of the disease with 76%-sensitivity and 80%-specificity value. In a COVID-19 study using chest computed tomography (CT) data and RBV parameters, Mei et al. [23] showed a model combining CNN and multilayer sensor and found the success of the model in diagnosing the disease with 84% sensitivity and 83% specificity. Soares [24] proposed a model combining SVM, ensembling, and SMOTE Boost models to diagnose COVID-19 using 15 RBV parameters in a population of 599 individuals, and found the success of the model in diagnosing the disease with 86% specificity and 70% sensitivity. Running various ML models to diagnose COVID-19 with the RBV parameters, Soltan et al. [25] found the XGBoost method to be the most successful model with 85% sensitivity and 90% precision. Huyut [53] used 28 routine blood values with age on a variety of supervised ML models to detect a large population of severely and mildly infected COVID-19 patients. The models with the highest AUC in identifying mildly infected patients were local weighted-learning (0.95%), Kstar (0.91%), Naïve bayes (0.85%), and K nearest neighbor (0.75%).
This study identified the seven most important biomarkers in the diagnosis of COVID-19 (Table 6). Among these features, the most important biomarkers were MCHC, MCH, and aPTT. The overall accuracy rate of the LogNNet model, which was run with seven features, was A 7 (FS [10,17,19,20,22,25,36])~99.3%, and the precision rate of patient identification was 99.6%. In addition, the different combinations of features that are important in the diagnosis of patients were examined. The overall accuracy of the LogNNet model run only with MCHC and MCH features was A 2 (FS [19,20])~99.1% and the precision rate of patient identification was 99.4%. The overall accuracy rate of our model using only the MCHC feature was 94.2%, while the overall accuracy rate of the model using only the HDL-C feature was 94.4%. According to the calculated critical levels of the main features, such as LDL, HDL-C, Cholesterol, Triglyceride, MCHC, and Amylase ( Figure 6), the health and sickness status of individuals could be determined accurately. The fact that the performance of the combination of MCHC and MCH and the combination of MCHC and HDL-C in the detection of sick and healthy individuals was higher than the individual performances suggested that there is a high level of confidential information between these blood feature combinations and COVID-19. This information was revealed by the LogNNet neural network method. These combinations of features can be used by LognNNet in diagnosis of COVID-19 disease with high results.
Studies indicate that the ALT, AST, LDH, direct bilirubin, and aPTT RBVs are increased in severe COVID-19 patients, while the hemoglobin values are decreased significantly compared to mildly infected patients [6,23,54]. However, in other studies, the LYM, NEU, WBC, MCH, MPV, and RDW hematological RBVs were higher in severe COVID-19 patients, when compared to mildly infected patients [1][2][3]6]. Mousavi et al. [16], Zhang et al. [54], and Zheng et al. [55] determined that patients with severe COVID-19 had lower EOS, MONO, RBC, hematocrit, hemoglobin, and MCHC hematological values, when compared to mild patients. Huyut et al. [6], in a study of patients who died from COVID-19, showed that the ESR, INR, PT, CRP, D-dimer, and ferritin biomarkers are the most important biomarkers to detect the mortality of the disease. Luo et al. [56] proposed a multi-criteria decision making (MCDM) algorithm combining ideal the solution similarity sequencing technique (TOPSIS) and naive Bayes (NB) as a feature selection procedure to predict the severity of COVID-19 from initial RBV values. With the MCDM model, the WBC, LYM, NEU values, and age were the most effective features in determining the severity of the disease with 82% accuracy obtained by ROC analysis [56]. Similarly, Ma et al. [57] and Lai et al. [58] noted that the high WBC and NEU values are important manifestations of bacterial infection and indicate a serious disease state that complicates the clinical situation. Numerous studies have shown that other proinflammatory marker levels, including CRP, ferritin, and IL-6, are associated with worse outcomes [59][60][61]. Cheng et al. [62] reported that high levels of inflammatory markers, such as ESR, CRP, and procalcitonin, may indicate hyperinflammatory reactions in COVID-19 patients. Cavalcante-Silva et al. [63] stated that the neutrophil count was increased in severe COVID-19 patients and the neutrophils are the main effector cells in the development of COVID-19. The different neutrophil mechanisms, e.g., neutrophil enzymes and cytokines, are potential targets for treating particularly severe cases of COVID-19 [63].
This study identifies the twelve most important biomarkers to determine the prognosis of COVID-19 (detecting severely and mildly infected patients) ( Table 9). The most important of them are ESR, NEU, CRP, albumin, and RBC biomarkers. The overall accuracy of the LogNNet model, which was run with twelve features, was 90.9%, the success rate in diagnosing mildly infected patients (precision rate) was 99.0%, and the success rate in diagnosing severely infected patients (precision rate) was 36.6% (Table 10). However, the success of the LogNNet model, which was run with twelve features, in distinguishing mild and severe patients according to their real conditions (recall value), was 91.4% and 83.1%, respectively ( Table 10).
The calculated critical levels of NEU, WBC, CRP, Urea, Albumin, and Calcium features are important levels in determining the severity of infection of the patients (Figure 9). Moreover, the performance of the combination of the ESR, NEU, CRP, Albumin, RBC, Chlorine, and RDW features in detecting infected patients being higher than their individual performance indicates a high level of confidential information about COVID-19 among these blood features. This information was revealed by the LogNNet neural network. The combinations of features can be used as important biomarkers in the prognosis of the COVID-19 disease and in identifying patients in need of intensive care.
Our model decided in favor of the diagnosis of mildly infected patients (high precision for non-ICU, low precision for ICU) because of the unbalanced sample size of mildly infected and severely infected patients. However, our model showed a high recall value in identifying mildly and severely infected patients. The model run with only three features showed an average of 82.6% agreement with the expert opinion in distinguishing mildly or severely infected patients (Table 10). However, severe patient diagnosis of our model showed low agreement with expert opinion (low precision "ICU") (Table 10), and the success of our model in diagnosing severe patients is low. As a result, the LogNNet model, which is run with the features in Table 10, can be used safely with high sensitivity (recall) to confirm the expert opinion in recognizing mild and severely infected patients. In addition, our model can be an alternative tool for diagnosing mildly infected patients using the features in Table 10. Furthermore, the success of the LogNNet model using few features in distinguishing mild and severe patients and diagnosing mildly infected patients is high.
Other studies [19,64,65] confirming the association of RBV features with COVID-19 highlight the importance of the clinical research direction that our model takes. The poor performance of our model in diagnosing severe patients (low precision for the ICU) is an expected situation. Several studies have stated that severe COVID-19 patients experienced more changes in the RBV values than mildly infected patients, and that various complications could occur during the severe disease process [1][2][3]6]. There are many factors affecting the intensive care need of an individual with COVID-19 and difficulties in determining this process with only RBV values [1][2][3][4][5][6]. However, there are few studies on determining the severity of infection in patients with COVID-19 based on the RBV values alone.
Cabitza et al. [20], Soltan et al. [25], and Rabanser et al. [66] stated that the reported performance values are good enough, especially in terms of screening, considering the economic benefits and rapid results of the developed artificial intelligence models. Moreover, Brinati et al. [19] suggested the necessity of conducting studies on the predictability of arterial blood gas tests in addition to routine blood values for the diagnosis of COVID-19. In this context, we plan our next studies as follows. The first phase is to identify the diagnosis and prognosis of COVID-19 with LogNNet model using the arterial blood gases. The next phase is to determine the mortality of COVID-19 with the LogNNet model using the RBV values.
Velichko [43] reported a method for the estimation of the occupied RAM in the implementation of the LogNNet on Arduino microcontrollers. The LogNNet 51:50:20:2 model, discussed above, takes about 13.7 kB of RAM. As the matrix W occupies~10.4 kB, this memory can be freed due to RAM saving algorithm, and the algorithm will usẽ 3.3 kB. Therefore, the model can be placed on microcontrollers with a RAM size of 16 kB, e.g., Arduino Nano.
With recent advancements in information and communication technologies due to the adoption of IoT technology, smart health monitoring and support systems have a higher development and acceptability margin to improve wellness [67,68]. The integration of medical technologies into IoT is called the Internet of Medical Things (IoMT) [69].
In this context, the availability of low-cost, single-chip microcontrollers and advances in wireless communication technology have encouraged researchers to design low-cost embedded systems for healthcare monitoring applications [67]. Doctors can use patients' data to remotely monitor their physiological health status and diagnose their disorders [68]. In a study designed for mobile health applications, Hu et al. [70] used various graphical biosensors to monitor conditions, such as heart attack, brain problems, and high blood pressure (seizures, mental disorder, etc.). In a study for a similar purpose, Vizbaras et al. [71] reported that the stretching and bending vibrations of various chemical bonds are moleculespecific. Therefore, certain infrared spectral ranges are of particular interest in biomedical sensing. In addition, this approach can be used to selectively detect important biomolecules, such as glucose, lactate, urea, ammonia, serum albumin, and so on. Clifton et al. [72] demonstrated the use of wearable sensors for routine healthcare in their study of the large-scale clinical adoption of "intelligent" predictive monitoring systems.
Mobile sensors for the measurement of routine blood parameters to be used in the realtime detection of various diseases are being developed rapidly with the advancements of technology [73][74][75][76]. The RBV values can be measured using a low-cost, mobile microscope, an ocular camera, and a smartphone [73]. Chan et al. [74] determined PT and INR blood values by monitoring the micro-mechanical movements of a copper particle with a proof-ofconcept using the vibration motor and camera in smartphones. Farooqi et al. [75] followed the diabetic patients with telemonitoring and Bluetooth-enabled self-monitoring devices and produced new solutions for the glycemic control of the patients. Zhang et al. [76] determined various biochemical parameters by electrochemical controls.
In the feature, the data can be obtained in real time and used to provide immediate medical advice before the health problems of the patients occur and progress. The technique presented in this study can be used to create mobile health monitoring systems.
The output of the LogNNet model can be used in different scenarios. The presented feature selection method can be used in conjunction with molecular testing to obtain high sensitivity and certainty regarding suspected cases. In this way, more positive patients can be identified, isolated, and treated in a timely manner. Likewise, the outputs of our model can be used while the results of other tests are awaited. The results of this study demonstrated that the LogNNet neural network model can be used with high productivity for clinical decision support systems and mobile diagnostics.
Various independent biomarkers used in the study need to be tested in the diagnosis and prognosis of other infectious diseases. The low number of ICU patient groups compared to the non-ICU group was one of the limitations of this study.

Conclusions
Determining the mild or severe infection status of COVID-19 patients using various diagnostic tests and imaging results can be costly, time consuming, and is subject to different complications during the process. In this case, the patient's health may be at higher risk and health services may face tragic situations under intense pressure. This study provides a fast, reliable, and economic alternative mobile tool for the diagnosis and prognosis of COVID-19 based on the RBV values measured only at the time of admission to the hospital.
In this study, the most effective RBVs in the diagnosis and prognosis of COVID-19 were determined using a feature selection method for the LogNNet reservoir neural network. The most important RBVs in the diagnosis of the disease were MCHC, MCH, and aPTT. The most important RBVs in the prognosis of the disease were ESR, NEU, CRP, albumin, and RBC. The LogNNet deep neural network model accurately and precisely detected almost all COVID-19 patients using only a few RBV features.
The health and sickness status of individuals could be determined largely accurately using threshold levels of the LDL, HDL-C, Cholesterol, Triglyceride, MCHC, and Amylase features. In addition, the LogNNet neural network revealed that the performance of the combination of MCHC and MCH and the combination of MCHC and HDL-C in the detection of sick and healthy individuals was higher than the individual performances of these features.
Threshold levels of the NEU, WBC, CRP, Urea, Albumin, and Calcium main properties were found to be significant in the detection of severely and mildly infected patients. As revealed by the LogNNet network, the combination of ESR, NEU, CRP, Albumin, RBC, Chlorine, and RDW features is an important source of variation in the prognosis of COVID-19. We propose to use this combination of the features with LogNNet as important biomarkers in the prognosis of the disease and in identifying patients in need of intensive care.
The results of this study can be effectively used in medical peripheral devices of the IoT (IoTM) with low RAM resources, including clinical decision support systems, remote internet medicine, and telemedicine.

Institutional Review Board Statement:
The dataset used in this study was collected in order to be used in various studies in the estimation of the diagnosis, prognosis and mortality of COVID-19. The necessary permissions for the collected dataset were given by the Ministry of Health of the Republic of Turkey and the Ethics Committee of Erzincan Binali Yıldırım University. This study was conducted in accordance with the 1989 Declaration of Helsinki. Erzincan Binali Yıldırım University Human Research Health and Sports Sciences Ethics Committee Decision Number: 2021/02-07.

Informed Consent Statement:
In this study, a dataset including only routine blood values, RT-PCR results (positive or negative) and treatment units of the patients was downloaded retrospectively from the information system of our hospital in digital environment. A new sample was not taken from the patients. There is no information in the dataset that includes identifying characteristics of individuals. It was stated that routine blood values would only be used in academic studies, and written consent was obtained from the institutions for this. In addition, therefore, written informed consent was not administered for every patient.

Data Availability Statement:
The data used in this study can be shared with the parties, provided that the article is cited.

Acknowledgments:
We thank the method of Erzincan Mengücek Gazi Training and Research Hospital for their support in reaching the material used in this study. Special thanks to the editors of the journal and to the anonymous reviewers for their constructive criticism and improvement suggestions.

Conflicts of Interest:
The authors declare no conflict of interest.