The Role of Artiﬁcial Intelligence, MLR and Statistical Analysis in Investigations about the Correlation of Swab Tests and Stress on Health Care Systems by COVID-19

: The outbreak of the new Coronavirus (COVID-19) pandemic has prompted investigations on various aspects. This research aims to study the possible correlation between the numbers of swab tests and the trend of conﬁrmed cases of infection, while paying particular attention to the sickness level. The study is carried out in relation to the Italian case, but the result is of more general importance, particularly for countries with limited ICU (intensive care units) availability. The statistical analysis showed that, by increasing the number of tests, the trend of home isolation cases was positive. However, the trend of mild cases admitted to hospitals, intensive case cases, and daily deaths were all negative. The result of the statistical analysis provided the basis for an AI study by ANN. In addition, the results were validated using a multivariate linear regression (MLR) approach. Our main result was to identify a signiﬁcant statistical e ﬀ ect of a reduction of pressure on the health care system due to an increase in tests. The relevance of this result is not conﬁned to the COVID-19 outbreak, because the high demand of hospitalizations and ICU treatments due to this pandemic has an indirect e ﬀ ect on the possibility of guaranteeing an adequate treatment for other high-fatality diseases, such as, e.g., cardiological and oncological ones. Our results show that swab testing may play a signiﬁcant role in decreasing stress on the health system. Therefore, this case study is relevant, in particular, for plans to control the pandemic in countries with a limited capacity for admissions to ICU units.


Introduction
The COVID-19 pandemic disease has been extensively studied, in view of several different aspects. Virologic studies [1] have investigated the virus genome, proving its natural origin and studying its mutations. Sociological and economic studies have considered changes in lifestyle and economic and sustainable development effects [2][3][4][5] in correlation with nonpharmacological actions aimed to control the pandemic spread, such as lockdown [6,7], border closure, and social distancing. Other studies focused on the mechanisms and times of incubation [8], resistance, dynamics of transmission [9][10][11], environmental factors, and conditions like the role of UV and the climate [12][13][14][15]. A large number of studies have been dedicated to statistical assessment methods and modeling of the outbreak, two periods, we first calculated the total average number of daily swabs and then looked for the date when the number of daily swabs exceeded that average value for three consecutive days. That date was selected for the splitting.
An appropriate tool to analyze situations with several variables and to monitor performance is provided by AI [34,35]. Therefore, as the next step, Machine Learning through ANN (Artificial Neural Network) was used to investigate, by Artificial Intelligence, the correlations between the output variables of the statistical analysis. For the Machine Learning, the variables were selected based on the results of the statistical analysis, considering those that had the most significant relationship with the number of daily tests (Swab Tests). For the validation of the ANN model results, a MLR (Multivariate Linear Regression) with the same variables as the ANN model was used, and a prediction equation was provided. The flowchart of our analysis is presented in Figure 1.

Case Study
In this study, we analyzed the relevant datasets of six regions in Italy. Moreover, the analysis was also carried out for the whole of Italy. The location of the six regions is shown in Figure 2, and the details of the selected case studies are presented in Table 1.
Our analysis covered the period from the beginning of March to the end of April. The outbreak in four out of the six regions, namely Lombardy, Piedmont, Emilia-Romagna, and Veneto, was more substantial. The regions of Campania and Sicily were selected because of their high population.
Our purpose is to study the effect of an independent variable on one or more dependent variables and to assess the possible existence of a relationship between them. An efficient statistical technique to achieve this is to evaluate a coefficient, the beta coefficient, which measures the degree of change in a dependent variable (outcome variable) in correspondence with a 1-unit change in the independent variable (predictor variable). Of course, the beta coefficient value can be negative or positive. This quantitative evaluation only makes sense if the possible existence of a significant relationship between the two variables has been previously assessed. This is established by evaluating another parameter, the p-value, whose value, if smaller than 0.05 (or a confidence level of 95%), indicates the existence of a significant correlation between the considered variables [62]. The Set-Up of the SPSS Model To carry out our statistical analyses, the SPSS model has been used to study the following variables: The analysis method proceeded as discussed in the previous section.
• Use of the Independent-Samples t-test: The Independent-Samples t-test is used to test the existence of significant differences (with a confidence level of 95%) between the averages of two datasets. We recall that our dataset was divided into two subsets based on comparing the daily number of tests with its average over the full period. The t-test was applied both to the independent variable (Swabs) and to the dependent variables (Home isolation, Mild hosp., Int. care hosp., daily deaths, and daily new cases) to determine whether the comparison between the two periods of the corresponding data presented a significant difference.

•
Evaluation of the p-value: If the result of the p-value resulting from the comparison of the independent variable (Swabs) and each dependent variable (Home isolation, Mild hosp., Int. care hosp.) was less than 0.05, the relationship between the two variables was considered to be significant, and we proceeded to the calculation of the beta coefficient. If, instead, the p-value was larger than 0.05, the significance of the relationship was excluded. • Evaluation of the beta coefficient: A positive (negative) value of this coefficient meant that, for every 1-unit increase (decrease) in the predictor variable (swab number in our case), the outcome variable increased (decreased) by an amount equal to the value of the beta coefficient.

Correlations between Swabs and the Other Variables in the Whole of Italy
Our first analysis referred to the global country dataset. The results of the correlations between the swab number and the output variables in the two periods are presented in Tables 2 and 3  As shown in Table 2, in Italy, the average number of the independent variable (swabs) increased significantly by 3.12 times. This was compared with the variation in the same period of the outcome Information 2020, 11, 454 6 of 20 variables. One sees that this increase was accompanied by a significant increase in the average number of home-isolated patients (4.57 times), Mild Hosp cases (2.00 times), and Int Care Hosp cases (1.49 times). Instead, as indicated by the corresponding p-values, the increase (by a factor of 1.29) in the average number of Daily Deaths, and the decrease (by a factor of 0.98) in the average number of Daily New Cases were not significant.   As for the relationship between the daily swabs and the outcome variables, Table 3 shows that, during the first period (1 March to 31 March), it was significantly positive for each of the five variables. During the second period (1 April to 30 April), the correlation with Home isolation remained significantly positive, those with Mild hosp., Int. Care hosp., and Daily deaths became significantly negative, and that with Daily new cases, although turned negative, was not significant.

Correlations between Swabs and Other Variables at the Regional Level
The global analysis at the country level made it interesting to check whether the global trends we found were also shared at the local level in the six regions that we considered. We note that, together, these six regions account for 57.3% of the country population.
Below, we present the results of the regional analysis of the correlations between the number of swabs and those of the outcome variables in two different periods, determined through the procedure described in Section 2. Tables 4 and 5 refer to Lombardy, Veneto, and Piedmont, for which data are available from 1 March, and Tables 6 and 7 refer to Emilia-Romagna, Campania, and Sicily, for which the first day for which data were available was different. To summarize the t-test results of Table 4, we can say that: • In the three regions, there was a remarkable increase in the average number of daily tests: in Lombardy by a factor of 2.48, in Veneto by a factor of 2.64, and in Piedmont by a factor of 3.69 Against this increase: • The average number of Home isolation and Mild hospital increased significantly for all of them.

•
The average number of Int Care Hosp also increased everywhere, although not significantly in Piedmont.

•
Daily New Cases exhibited a general decrease that was not significant • Daily Deaths decreased (not significantly) in Lombardy and significantly increased in Piedmont and Veneto.  The correlation analysis of Table 5 can be summarized as follows: • In the first period, in each of the three regions, the global Italian result was confirmed, and one finds a significantly positive correlation between the number of swabs and each outcome variable.

•
In the second period, whereas for Home isolation there was a positive correlation that was only significant in Lombardy, all the other correlations generally turned negative in a significant way, the only exceptions being Daily new cases in all three regions and Daily deaths in Veneto and Piedmont.    To summarize the t-test results of Table 6, we can say that: • In the three regions, there was a remarkable increase in the average number of daily tests: by a factor of 2.18 in Emilia-Romagna, and by a factor in the order of 3.5 in Campania and Sicily, although one may observe that, in these last two regions, the average in the first period was relatively low.
Against this increase: • The average number of Home isolation and Mild hosp. increased significantly for all of them.

•
The average number of Int Care Hosp also increased everywhere, but the increase was only significant in Emilia-Romagna.
• As for Daily New Cases, no significant variation was observed in Campania, which differed from Emilia-Romagna and Sicily, where there was a significant decrease.

•
The variations in Daily Deaths were not significant.
The correlation analysis of Table 7 can be summarized as follows: • In the first period, in each region, the global Italian result was confirmed, with a significantly positive correlation between the number of swabs and each outcome variable. Sicily, to some extent, is the only partial exception, since the relationship for Int. Care Hosp, although positive, did not reach a significant level.

•
In the second period, there was a general turn from a positive to a negative relationship in Emilia-Romagna, but without arriving at a significant level. In Campania, Home isolation was positively, but not significantly, correlated, whereas all the other variables exhibited a significantly negative correlation. In Sicily, we found two significant correlations (Home isolation, positive; and Int. Care Hosp, negative) and no significant relationship for the remaining three variables.

The Model Using ANN
As the results of Table 2 show, the number of Total home isolation, Total mild hospital, and Intensive care in the hospital changed significantly with the number of swabs, whereas for Daily new cases, this only happened for the first period. Therefore, the variables we selected for the AI analysis were Daily new cases, Total home isolation, and Total mild hospital. The dependent variable was the Total intensive cases in the hospital.
In this section, we shall investigate this possible correlation using the ANN method in order to analyze the Intensive care number. The selected ANN technique is based on Generalized Feedforward Networks (GFNs). GFNs are a generalization of the Multi-Layer Perceptrons (MLPs), so that connections can be over one or more layers [63,64]. In theory, MLP models can analyze an algorithm that GFN models can solve, but, in practice, GFNs are much more efficient in exploring such a connection due to the smaller number of training epochs required by the algorithm [65]. There are no specific rules for defining the control parameters, and they are mostly based on the results of previous studies and experts' opinions [66,67]. In this study, the selected control parameters were:

•
The selected hidden layers of ANN for the analysis: considered as 1, 3, 5, 10, 15, and 20; • The maximum iteration values: considered as 20, 40, 70, 100, 120, and 140; • The mean squared error (MSE) for the evaluation of the performance; • The training data was 70% of the dataset, and the rest was for validation (15%) and testing (15%).
After an initial analysis based on the control parameters and trial and error, the best developed ANN model was constructed. The structure and details of the model are shown in Table 8. The obtained result of the best cost in each iteration is shown in Figure 5. The best cost in each iteration shows the performance function of the algorithm, and it depends on the values of error in each iteration. According to Figure 5, after about 35 iterations with 0.002, the best cost was reached, and the model achieved convergence. The ANN model first needs to be trained and validated. The results of the training, validation, and testing of the ANN algorithm are presented in Table 9. The comparison between the predicted values and the real data in the testing stage is displayed in Figure 6. Extra details about the model are presented in Appendix A.  The impact of each variable on the Intensive cases in the hospital, based on the training dataset, is shown in Figure 7. According to Figure 7, the dominance of the role of Mild hospital over Home isolation, in terms of the impact on the number of Intensive care hospital, is evident (Similar results were achieved for the testing dataset and validation dataset, as presented in Appendix A). The results and the predicted values by the developed ANN model show that AI is a powerful tool for the study of our problem, as confirmed by the value of the final R 2 , and they confirm the results of the statistical analysis of the swab-output variables correlations.

Discussion
In this section, we summarize the results of the previous section. The first part of our statistical analysis was based on the t-test. In Italy, during the period we analyzed (March-April), there was a significant change in the number of swab tests. At the country level, one can say that there was a first period (March) with a country average of about 15,000 tests per day and that this average grew more than three times in the following month. Analogous increases in the number of tests were observed at the regional level, although they were probably also sensitive to the actual level of development of the pandemic in the specific region.
This preliminary analysis confirms that there was a significant change in the application of these tests. This supports the research hypothesis that a significant change in the number of swab tests might be reflected in the indicators that describe the level of development of the pandemic. To test this hypothesis, we carried out a statistical analysis of the possible correlation between the number of tests and contagious people, recovered ones, and deaths. At the national level, the hypothesis was confirmed, with the exception of daily new cases. These correlations are generally also observed at the regional level, although minor deviations may occur. In particular, these deviations seem to be sizeable in Emilia-Romagna and Veneto, and this may find an explanation in the improvement of the situation in these two regions during April [19]. This also happened in relation to the number of deaths. As might have been expected, we found that, with the increase of the number of tests in the second period, the correlation between the death rate and the number of swab tests diminished considerably as compared to the first period. Instead, significant correlations were generally found for the three variables that were of the utmost importance to health system stresses, relating to the total number of cases and a less advanced level of development of the sickness. The evidence was stronger in the regions where the pandemic spread more.
Of course, this discussion must take into account the multiple reasons that lead one to apply the test. This means that, in any case, only a fraction of those tests that are applied can be related to the variables we studied, but this can be estimated to only be a second-order effect.
The last part of our analysis had a different purpose. Artificial Intelligence is a powerful tool for analyzing many data in order to detect possible correlations between them. We applied the ANN technique to study one of the most critical aspects of the pandemic, the admission to ICUs. We found an excellent agreement with the observed number of ICU admissions. This aspect must not be underestimated when one considers that the stress on ICUs is probably one of the most critical factors to consider when planning a response to the pandemic [68,69].
In the specific case of Italy, when the peak shown in Figure 5 was reached, the value of ICU hospitalizations was close to the health system's response capacity, although a 40% increase of available ICU units was planned for a couple of weeks later [70]. We recall that the Italian ICU capacity was then in the order of 120 ICUs per million inhabitants, and it must be underscored that this is a high performance when compared with different countries, as presented in Table 10. A validation of our results on the role of swabs was provided by determining a prediction equation, which was obtained using MLR. We already pointed out the usefulness of the swabs in identifying infected people without symptoms, who would then be home-isolated. The variables for our MLR analysis were the total number of Home isolation (x 1 ), that of mild condition admissions to a hospital (x 2 ), and that of Daily new cases (x 3 ). These variables were used as independent variables to determine a dependent variable, y, the total number of patients admitted in the ICUs of a hospital.
According to Equation (1), the prediction equation we obtained was: y = 77.751 − 0.020 x 1 + 0.164 x 2 + 0.38 x 3 , with R 2 = 0.995 (1) Table 11 provides the details of the analysis, and the values of the beta coefficients showed that the highest impact variable was x 2 , followed by the variable x 1 . We note that the increase in the number of home isolation cases, which we found to be correlated to swab tests, was associated with a decrease in the total number of patients admitted to hospital intensive care units.

Conclusions
We studied the statistic correlation between the number of applied swab tests and the most important indicators of the level of development of the COVID-19 pandemic in Italy and six Italian regions. Four regions were chosen, based on the diffusion of the pandemic, and the remaining two were chosen based on the size of their populations.
The nation-wide statistical analysis confirmed the research hypothesis that a significant change in the number of swab tests might be reflected in the indicators that describe the level of development of the pandemic. The analysis showed the advantages of increasing the number of swab tests, since, by increasing it, the trend of home isolation cases was positive. However, the trend of mild cases admitted to hospitals, intensive case cases, and daily death were all negative.
The statistical analysis was accompanied by AI techniques and validated through MLR. We applied the ANN technique to study one of the most critical aspects of the pandemic, the admission to ICUs, and the results showed an excellent agreement with the observed number of ICU admissions. The results and the predicted values by the developed ANN model show that AI is a powerful tool for studying our problem. Additionally, it confirms the results of the previous statistical analysis of swab-output variables correlations.
A validation of our results was provided by determining a prediction equation using the multivariate linear regression (MLR) approach. The results of MLR showed that the increase in the number of home isolation cases was associated with a decrease in the total number of patients admitted to hospital intensive care units. This could be correlated to an increase of swab tests, since identifying infected people (even those without symptoms, who would then be home-isolated) would lead to a decrease in new contagions.
In conclusion, swab testing may play a significant role in decreasing stress on the health system. Therefore, this case study is particularly relevant for plans to control the pandemic in countries with a limited capacity for admissions to ICU units. The relevance of these results is not confined to the COVID-19 outbreak, because the high demand for hospitalizations and ICU treatments resulting from this pandemic has an indirect effect on the possibility of guaranteeing an adequate treatment for other high-fatality diseases.
For possible future studies, it is recommended to see the impact of delays in swab test results in different countries, since the results in some countries are provided after several days, affecting proper home isolation. Besides, the use of other ANN algorithms or other machine-learning techniques may improve our results and is recommended for future studies.