Investigating a Serious Challenge in the Sustainable Development Process: Analysis of Conﬁrmed cases of COVID-19 (New Type of Coronavirus) Through a Binary Classiﬁcation Using Artiﬁcial Intelligence and Regression Analysis

: Nowadays, sustainable development is considered a key concept and solution in creating a promising and prosperous future for human societies. Nevertheless, there are some predicted and unpredicted problems that epidemic diseases are real and complex problems. Hence, in this research work, a serious challenge in the sustainable development process was investigated using the classiﬁcation of conﬁrmed cases of COVID-19 (new version of Coronavirus) as one of the epidemic diseases. Hence, binary classiﬁcation modeling was used by the group method of data handling (GMDH) type of neural network as one of the artiﬁcial intelligence methods. For this purpose, the Hubei province in China was selected as a case study to construct the proposed model


Introduction
The concept of sustainable development as a new concept, process, and undeniable fact has emerged in the policies of major governments, and plays a key role in the development of human societies [1][2][3]. Sustainable development is, generally, a combination of the three social, economic, and environmental goals in which political goals are involved. In fact, sustainable development is the advancement of the quality of all aspects of life of today's generation without creating negative impacts on the lives of future generations. While it may seem easy to create sustainability in theory, the process of sustainable development faces many unforeseen problems and obstacles that slow down the process. One of the anticipated problems is the emergence of epidemic diseases that not only have a negative impact on the economy but also cause social problems, both of which are fundamental to sustainable development. Although this is a temporary and transient problem, it has the potential to disrupt the process, which can have years of adverse effects. The COVID-19 (a new version of Coronavirus) is one of the newest and most serious challenges facing governments, and there has not been much research into this problem [4][5][6]. Although the understanding of COVID-19 is limited, interim guidance on laboratory biosafety was introduced by the World Health Organization (WHO) [7]. Kampf et al. investigated the persistence of coronaviruses on inanimate surfaces and ways to deal with it. They found that the period of persistence is nine days for coronaviruses, and some disinfectants, such as 62%-71% ethanol, 0.5% hydrogen peroxide, or 0.1% sodium hypochlorite can be very efficient in dealing with this virus [8]. Lai et al. have evaluated the outbreak of COVID-19 and its challenges. Based on their results, they made some recommendations for the prevention of more outbreaks of the virus [9]. In another study, the role of inanimate surfaces in the outbreak of coronaviruses is investigated by Kampf. Based on the obtained results, he provided some recommendations about the impact of surface disinfection to prevent further viral spread [10]. Telles has conducted an overview of the behaviors of viruses. Some datasets were investigated, and the obtained results show that dynamic mathematical modeling was essential to predict behaviors of viruses [11]. The role of media coverage on the public was evaluated by Wen et al. Their obtained results indicated that misleading and biased media coverage could have a negative impact on individuals' mental health [12]. Chen et al. carried out an investigation for predicting the number of confirmed cases of COVID-19. They evaluated the trend of transmission and recovery rates based upon time, and a mathematical model was proposed. The obtained results show that the proposed model had a suitable performance to predict the confirmed cases [13]. In another study, the trend of the COVID-19 outbreak was estimated in China by Li and Feng. Their results show that rapid and dynamic strategies can be useful to diminish and constrain the current crisis [14]. Chinazzi et al. evaluated the effect of travel constraints to reduce the COVID-19 outbreak. The obtained results indicated that travel restrictions are highly effective in reducing the spread of this new coronavirus [15].
It is worth mentioning that the impact of temperature on virus spread and survival show different results. There have been some investigations on the effects of environmental parameters on epidemic diseases, too. The flu virus spreads quickly in cold and dry conditions, while it is completely inactive at temperatures above 30 • C [16,17]. However, the epidemic of one type of coronavirus, MERS-CoV, was between April and August, which meant that the virus spread quickly in warm temperature, low wind speed, low relative humidity, and high ultraviolet index [18].
A review of previous studies shows that very few researchers have addressed the challenges of the COVID-19 (new version of Coronavirus) in the sustainable development process and, also, unpredictable problems and complexity of the issues require the use of highly capable approaches like artificial intelligence methods to understanding these types of issues. Hence, due to the importance of the subject, the present study investigated the feasibility of artificial intelligence in the classification of confirmed cases of COVID-19, which is a severe challenge in the sustainable development process, and it is an imperative task. In addition, statistical analysis was carried out, and the obtained results were discussed. It is worth mentioning that this type of analysis has not been used in previous research.

Materials and Methods
Two approaches were used in the current study, as follows: • The possible correlations among the trends of confirmed cases in different case studies were investigated, and then a binary classification model was constructed to predict and classify using the group method of data handling (GMDH) algorithm based upon some critical factors; maximum, minimum, and average temperature, the density of a city, relative humidity, and wind speed were considered as the input dataset and the number of confirmed cases was selected as the output dataset for 30 days.
• Regression analysis was used, and a trend of the confirmed cases of COVID-19 analyzed in the five provinces with the highest confirmed cases, including Hubei, Guangdong, Henan, Zhejiang, and Hunan, and the daily fluctuations of confirmed cases were compared with fluctuations of weather parameters.

Conditions of analysis
• The environmental and urban parameters in the analysis included density, sex ratio, average age, elevation, maximum, minimum, and average temperature, relative humidity, and wind. • For daily analysis of the possible trend between confirmed cases of COVID-19 and environmental factors, the data of Hubei province was used.

•
The climate data is based on the stations situated in the capital of the provinces or regions because the population is generally higher in these areas.

•
The analysis period was from 28 January 2020 to 26 February 2020 (30 days).

•
The analysis of the possible correlations about trends of confirmed cases in different case studies was based on the average values in one month.

Case Study
To carry out the analysis of correlation among environmental factors and confirmed cases of COVID-19, a set of data, including 42 provinces in China, Japan, South Korea, and Italy, were used. The selected case studies can be seen in Table 1 and are based on the most confirmed cases of COVID-19 and available data, as shown in Figure 1. It is worth mentioning that the quarantine on travel in and out of Wuhan and the suspension of flights, trains, public buses, and the metro system began on 23 January 2020 and also on 24 January 2020 in 15 cities in Hubei [19,20]. The estimated incubation period of COVID-19 is about 2-14 days [20].

Group Method of Data Handling (GMDH)
Artificial Intelligence includes a wide range of methods and algorithms that work based on machine intelligence and has many applications in various fields of science [40], including fuzzy logic theory and application [41][42][43][44][45][46], artificial intelligence techniques and sociology [47][48][49], risk assessment and hazard identification [50,51], machine learning [52][53][54][55][56][57], and meta-heuristic algorithms and clustering techniques [58][59][60][61][62]. The group method of data handling (GMDH) type of neural network is one of these algorithms that was proposed by Ivakhnenko [63]. GMDH is a self-organization algorithm that has been used successfully for pattern recognition, optimization of complex systems modeling, and prediction problems, and it is also called the polynomial of the Ivakhnenko equation [63,64]. This algorithm can predict the value ofŷ i from an approximate function ( ∧ f ), for each input vector (X), which is shown in Equation (1). The basic form of a relation between input and output data can be declared as a discrete type of the Volterra functional series, referred to as the Kolmogorov-Gabor polynomial. Equation (2) shows the underlying neural network map, which is also called the polynomial of Ivakhnenko [65,66]. (1) where Y is the output, m is the number of data, and X 1 , X 2 , X 3 . . . X m is the input variables vector. In many cases, the quadratic and bivariate form of this polynomial is used as Equation (3).
(3) Figure 2 shows a schematic of input and output variables of the GMDH algorithm, where X= (X 1 , X 2 , X 3 , . . . . X m ) is the input dataset and Y= (y 1 , y 2 , y 3 , . . . . y n ) is the output dataset. In this algorithm, the input dataset is imported to the initial layer and then, after evaluation and optimization, the output is considered as a new input for the next layer. This process is continued and stopped if the algorithm reaches a better answer from layer (n+1) in comparison with layer (n). As mentioned before, to deal with unpredicted and uncertain problems, the GMDH algorithm can be applied as a powerful tool. Hence, a binary classification analysis was done by the GMDH algorithm in the present study [67][68][69].
theory and application [41][42][43][44][45][46], artificial intelligence techniques and sociology [47][48][49], risk assessment and hazard identification [50,51], machine learning [52][53][54][55][56][57], and meta-heuristic algorithms and clustering techniques [58][59][60][61][62]. The group method of data handling (GMDH) type of neural network is one of these algorithms that was proposed by Ivakhnenko [63]. GMDH is a selforganization algorithm that has been used successfully for pattern recognition, optimization of complex systems modeling, and prediction problems, and it is also called the polynomial of the Ivakhnenko equation [63,64]. This algorithm can predict the value of ˆi y from an approximate function ( f  ), for each input vector (X), which is shown in Equation (1). The basic form of a relation between input and output data can be declared as a discrete type of the Volterra functional series, referred to as the Kolmogorov-Gabor polynomial. Equation (2) shows the underlying neural network map, which is also called the polynomial of Ivakhnenko [65,66].
where Y is the output, m is the number of data, and X1, X2, X3… Xm is the input variables vector. In many cases, the quadratic and bivariate form of this polynomial is used as Equation (3).
(3) Figure 2 shows a schematic of input and output variables of the GMDH algorithm, where X= (X1, X2, X3, …. Xm) is the input dataset and Y= (y1, y2, y3,…. yn) is the output dataset. In this algorithm, the input dataset is imported to the initial layer and then, after evaluation and optimization, the output is considered as a new input for the next layer. This process is continued and stopped if the algorithm reaches a better answer from layer (n+1) in comparison with layer (n). As mentioned before, to deal with unpredicted and uncertain problems, the GMDH algorithm can be applied as a powerful tool. Hence, a binary classification analysis was done by the GMDH algorithm in the present study [67][68][69].

Binary Classification Modelling Using GMDH
Before binary classification modeling using the Gmdh algorithm, a regression analysis was conducted among the total data set (See Table A1 in Appendix A) for 42 case studies in four countries, including China, Japan, South Korea, and Italy. This analysis shows that there is a low correlation coefficient (R 2 ); hence, it can be concluded that we should evaluate case by case for binary classification modeling, and Hubei province in China was selected as a case study for this part of the analysis.
Initially, evaluating the parametric correlation of each independent input dataset is necessary for carrying out reliable modeling [70][71][72][73]. Hence, before modeling, a correlation analysis was conducted using Pearson's correlation coefficient among the dataset for five independent input data, including maximum, minimum, and average temperature, relative humidity, and wind speed. Furthermore, it should be noted that independent input data, including the density of the city, is not considered for evaluating the parametric correlation because the value of this parameter is constant. The mathematical relations of Pearson's method can be expressed in Equations (4)-(7) [74].
where ρ is the Pearson's correlation coefficient for two independent parameters that are shown with X and Y, and SP Dxy is the covariance between them. The standard deviation of X and Y are indicated with SS X and SS Y . Table 2 shows the results of the correlation coefficient analysis for five independent input data. According to the previous studies, when ρ 0.8, the correlation coefficient is strong, hence the obtained results show that independent input data were correctly selected. Although there is a correlation coefficient of 0.83 between the maximum and average temperatures, it can be accepted because this value is very insignificant. Secondly, determining the control parameters of the algorithm is an important task because it plays a key role in the fast convergence of the algorithm. There are no specific relations about most of these parameters, and they are determined based upon previous studies, experts, and trial and error. Hence, the selection pressure is dimensionless and has an impact on the sensitivity of the modeling error [75]. It is selected as 0.6 based upon the most recent studies. However, the maximum number of layers and the maximum number of neurons in a layer are considered based upon expert opinions and trial and error [76]. For this purpose, a range of values was determined for the maximum number of layers equal to 5, 10, and 15, and also another range of values was considered including 5, 10, 15, 20, and 30. It is worth mentioning that the confusion matrix is used as the measure of accuracy to evaluate the performances of the binary classification model by the GMDH algorithm. The basic form of a Sustainability 2020, 12, 2427 7 of 21 confusion matrix is shown in Figure 3, and Equations (8) and (9) indicate the mathematical relations of accuracy and error.
Secondly, determining the control parameters of the algorithm is an important task because it plays a key role in the fast convergence of the algorithm. There are no specific relations about most of these parameters, and they are determined based upon previous studies, experts, and trial and error. Hence, the selection pressure is dimensionless and has an impact on the sensitivity of the modeling error [75]. It is selected as 0.6 based upon the most recent studies. However, the maximum number of layers and the maximum number of neurons in a layer are considered based upon expert opinions and trial and error [76]. For this purpose, a range of values was determined for the maximum number of layers equal to 5, 10, and 15, and also another range of values was considered including 5, 10, 15, 20, and 30. It is worth mentioning that the confusion matrix is used as the measure of accuracy to evaluate the performances of the binary classification model by the GMDH algorithm. The basic form of a confusion matrix is shown in Figure 3, and Equations (8) and (9) indicate the mathematical relations of accuracy and error.
According to the maximum number of layers and the maximum number of neurons in a layer, in total, 20 models were constructed. The six notable factors, namely the maximum, minimum, and average temperature, the density of cities, relative humidity, and wind speed were considered as the input dataset and the number of confirmed cases was chosen as the output dataset for 30 days. It should be noted that the two classes (label) are assigned and considered for the number of confirmed cases, and this means that, for the number of confirmed cases under 850, people were given label "0" and, for the number of confirmed cases above 850, people were given label "1". In addition, there is the information of Wuhan city for 30 days as the number of cases studies; for modeling, 75% of cases were considered as training cases and the rest were considered as testing cases [77]. The obtained results of the accuracy of the training and testing models are shown in Table 3.  According to the maximum number of layers and the maximum number of neurons in a layer, in total, 20 models were constructed. The six notable factors, namely the maximum, minimum, and average temperature, the density of cities, relative humidity, and wind speed were considered as the input dataset and the number of confirmed cases was chosen as the output dataset for 30 days. It should be noted that the two classes (label) are assigned and considered for the number of confirmed cases, and this means that, for the number of confirmed cases under 850, people were given label "0" and, for the number of confirmed cases above 850, people were given label "1". In addition, there is the information of Wuhan city for 30 days as the number of cases studies; for modeling, 75% of cases were considered as training cases and the rest were considered as testing cases [77]. The obtained results of the accuracy of the training and testing models are shown in Table 3.  After modeling, a simple ranking was conducted for determining the best model based on the study of Zorlu et al. [78]. The results of this ranking have been shown in Table 4. The obtained results from Table 4 indicate that the 3rd model has highly acceptable degrees of accuracy and robustness. Figures 4-6 demonstrate the results of the confusion matrix for training, test, and total data set, respectively. The obtained results from Table 4 indicate that the 3rd model has highly acceptable degrees of accuracy and robustness. Figures 4-6 demonstrate the results of the confusion matrix for training, test, and total data set, respectively.   The obtained results from Table 4 indicate that the 3rd model has highly acceptable degrees of accuracy and robustness. Figures 4-6 demonstrate the results of the confusion matrix for training, test, and total data set, respectively.   According to Figure 4, the proposed model could predict four cases with label "0" correctly with 100% accuracy and also, from 19 cases, it could correctly predict 18 cases with label "1", and only 1 case was wrongly estimated with label "1" in label "0". Generally, this model had a 95.7% accuracy for train data. For test data based upon Figure 5, the proposed model correctly estimated two confirmed cases with label "0". It could also correctly predict four cases with label "1" with 80% accuracy, and the accuracy of test data was 85.7% in total. Consequently, it can be concluded that, according to Figure 6, the third model (the proposed model) had the suitable performance capacity in predicting and classifying the confirmed cases of COVID-19 for all data. Sustainability 2020, 12, x FOR PEER REVIEW 10 of 21 Figure 6. The obtained results of the total confusion matrix.
According to Figure 4, the proposed model could predict four cases with label "0" correctly with 100% accuracy and also, from 19 cases, it could correctly predict 18 cases with label "1", and only 1 case was wrongly estimated with label "1" in label "0". Generally, this model had a 95.7% accuracy for train data. For test data based upon Figure 5, the proposed model correctly estimated two confirmed cases with label "0". It could also correctly predict four cases with label "1" with 80% accuracy, and the accuracy of test data was 85.7% in total. Consequently, it can be concluded that, according to Figure 6, the third model (the proposed model) had the suitable performance capacity in predicting and classifying the confirmed cases of COVID-19 for all data.

Regression Analysis
The impact of weather parameters and confirmed cases was analyzed with the multi linear regression (MLR) technique. The analysis is based on the weather data of Wuhan, as presented in Table B1 (See Appendix B). In this regard, the regression calculations between weather parameters and confirmed cases have been done for data from 28-Jan to 26-Feb and from 5-Feb to 26-Feb. For this purpose, maximum, minimum, and average daily temperature, relative humidity, and wind speed were considered equal to "A", "B", "C", "D", and "E", respectively, and the number of confirmed cases was considered as "Y". Equations (10) and (11) The result shows the R 2 in the first case is equal to 0.44, and in the second case, 0.65, which shows an increase. In addition, according to the results of collinearity diagnostics of the regression analysis from Equation (11), the obtained results show that relative humidity, maximum daily temperature, average daily temperature, wind speed, and the minimum daily temperature had the highest to the lowest share in the expression of changes of output (confirmed cases), respectively. Since the data in the second case started from 5-Feb, which is about 14 days after the Wuhan lockdown on 23-Jan, it seems that the rate of confirmed cases from 5-Feb might be affected more by the environmental factors.

The Correlations among the Trends of Confirmed Cases and Weather Parameters
The correlations among the trends of confirmed cases in Hubei and weather parameters are presented in Figures 7-9. It is clear from Figure 7 that the confirmed rate increased from 28-Jan to 5-

Regression Analysis
The impact of weather parameters and confirmed cases was analyzed with the multi linear regression (MLR) technique. The analysis is based on the weather data of Wuhan, as presented in Table A2 (See Appendix B). In this regard, the regression calculations between weather parameters and confirmed cases have been done for data from 28-Jan to 26-Feb and from 5-Feb to 26-Feb. For this purpose, maximum, minimum, and average daily temperature, relative humidity, and wind speed were considered equal to "A", "B", "C", "D", and "E", respectively, and the number of confirmed cases was considered as "Y". Equations (10) and (11)  The result shows the R 2 in the first case is equal to 0.44, and in the second case, 0.65, which shows an increase. In addition, according to the results of collinearity diagnostics of the regression analysis from Equation (11), the obtained results show that relative humidity, maximum daily temperature, average daily temperature, wind speed, and the minimum daily temperature had the highest to the lowest share in the expression of changes of output (confirmed cases), respectively. Since the data in the second case started from 5-Feb, which is about 14 days after the Wuhan lockdown on 23-Jan, it seems that the rate of confirmed cases from 5-Feb might be affected more by the environmental factors.

The Correlations among the Trends of Confirmed Cases and Weather Parameters
The correlations among the trends of confirmed cases in Hubei and weather parameters are presented in Figures 7-9. It is clear from Figure 7 that the confirmed rate increased from 28-Jan to 5-Feb, which is about 14 days after the start of quarantine, and then decreased with some fluctuations. According to the comparisons, it seems there are some correlations between the fluctuation of weather data (wind, humidity, and average temperature) and the confirmed cases.
Sustainability 2020, 12, x FOR PEER REVIEW 11 of 21 Feb, which is about 14 days after the start of quarantine, and then decreased with some fluctuations. According to the comparisons, it seems there are some correlations between the fluctuation of weather data (wind, humidity, and average temperature) and the confirmed cases.    Feb, which is about 14 days after the start of quarantine, and then decreased with some fluctuations. According to the comparisons, it seems there are some correlations between the fluctuation of weather data (wind, humidity, and average temperature) and the confirmed cases.    Feb, which is about 14 days after the start of quarantine, and then decreased with some fluctuations. According to the comparisons, it seems there are some correlations between the fluctuation of weather data (wind, humidity, and average temperature) and the confirmed cases.    The daily fluctuations of confirmed cases with fluctuations of weather parameters in four case studies in China are presented in Figures 10-13. The same trend in the province of Hubei exists in the other four provinces, and the rate increased until 4-Feb and decreased with some fluctuations.    The daily fluctuations of confirmed cases with fluctuations of weather parameters in four case studies in China are presented in Figures 10-13. The same trend in the province of Hubei exists in the other four provinces, and the rate increased until 4-Feb and decreased with some fluctuations.    The daily fluctuations of confirmed cases with fluctuations of weather parameters in four case studies in China are presented in Figures 10-13. The same trend in the province of Hubei exists in the other four provinces, and the rate increased until 4-Feb and decreased with some fluctuations.

Discussions
According to the regression analysis, no correlation was found among the different case studies in four countries, which might be because of the policy and type of restrictions in different countries in confronting the issue. Therefore, the prediction for the trend could be case by case and based on the countries policy.
The GMDH algorithm had an appropriate performance to predict and classify, with reliable accuracies equal to 95.7% and 85.7% for the training and testing datasets, respectively. It should be noted that the proposed model and its obtained results are unique and should not be used to evaluate directly in other cities. The GMDH algorithm showed reliable result in the selected case study. Hence, the problem of collecting data and incomplete data could affect the analysis in case studies facing insufficient data, and the use of other artificial intelligence method algorithms like the naive bayes classifier can be useful. In addition, other qualitative and quantitative factors may be significant, such as policies of governments, accessibility of health and hygiene facilities, education level, and food culture.
The comparisons between the result of regression analysis for 42 case studies in four countries that show a low correlation coefficient (R 2 ), and the binary classification modeling using GMDH that show high accuracy in Wuhan, demonstrate some unknown pattern between climate factors and confirmed cases of COVID-19. The analysis confirmed that GMDH could be an appropriate technique in predicting and classifying the confirmed cases of COVID-19, and for checking the possible pattern of the dataset. However, the impact of daily temperature, wind, and humidity might occur in the upcoming days, not the same day, and it could be the reason for the low correlation coefficient in the regression analysis.
The daily analysis in the main case study, Hubei, and other four case studies in China show the positive impact of quarantine in decreasing the number of confirmed cases. In fact, after about 14 days, which is equal to 5-Feb, the increasing trend of positive samples stopped. Moreover, the regression calculations between weather parameters and confirmed cases for data from 28-Jan to 26-Feb and from 5-Feb to 26-Feb show an increase and, therefore, it seems that the rate of confirmed cases from 5-Feb might be affected more by the environmental factors.
The comparisons among the trends of confirmed cases and daily weather parameters (wind, humidity, and average temperature) show similar fluctuations that could approve the role of weather parameters on the epidemic rate of COVID-19.
The results of the research are in good agreement with similar studies about the impact of environmental parameters on epidemic diseases, such as the quicker spread of flu virus in cold and

Discussions
According to the regression analysis, no correlation was found among the different case studies in four countries, which might be because of the policy and type of restrictions in different countries in confronting the issue. Therefore, the prediction for the trend could be case by case and based on the countries policy.
The GMDH algorithm had an appropriate performance to predict and classify, with reliable accuracies equal to 95.7% and 85.7% for the training and testing datasets, respectively. It should be noted that the proposed model and its obtained results are unique and should not be used to evaluate directly in other cities. The GMDH algorithm showed reliable result in the selected case study. Hence, the problem of collecting data and incomplete data could affect the analysis in case studies facing insufficient data, and the use of other artificial intelligence method algorithms like the naive bayes classifier can be useful. In addition, other qualitative and quantitative factors may be significant, such as policies of governments, accessibility of health and hygiene facilities, education level, and food culture.
The comparisons between the result of regression analysis for 42 case studies in four countries that show a low correlation coefficient (R 2 ), and the binary classification modeling using GMDH that show high accuracy in Wuhan, demonstrate some unknown pattern between climate factors and confirmed cases of COVID-19. The analysis confirmed that GMDH could be an appropriate technique in predicting and classifying the confirmed cases of COVID-19, and for checking the possible pattern of the dataset. However, the impact of daily temperature, wind, and humidity might occur in the upcoming days, not the same day, and it could be the reason for the low correlation coefficient in the regression analysis.
The daily analysis in the main case study, Hubei, and other four case studies in China show the positive impact of quarantine in decreasing the number of confirmed cases. In fact, after about 14 days, which is equal to 5-Feb, the increasing trend of positive samples stopped. Moreover, the regression calculations between weather parameters and confirmed cases for data from 28-Jan to 26-Feb and from 5-Feb to 26-Feb show an increase and, therefore, it seems that the rate of confirmed cases from 5-Feb might be affected more by the environmental factors.
The comparisons among the trends of confirmed cases and daily weather parameters (wind, humidity, and average temperature) show similar fluctuations that could approve the role of weather parameters on the epidemic rate of COVID-19.

Conclusions
In this research study, a serious challenge of sustainable development was investigated using the GMDH algorithm and regression analysis. According to the results, the GMDH algorithm has an appropriate performance to predict and classify the parameters of a case study affected by COVID-19, and the accuracies based on Wuhan datasets were equal to 95.7% and 85.7% for training and testing, respectively. No correlation was found among the different case study datasets in four countries, which might be due to different policies and types of restrictions in each country and means that the prediction of the trend could be made case by case. The results of collinearity diagnostics of the regression analysis demonstrated that the relative humidity and maximum daily temperature and average temperature had the highest share in the expression of changes of output (confirmed cases), respectively. The relative humidity (in the case study with an average of 77.9%) affected positively, and maximum daily temperature (in the case study with an average of 15.4 • C) affected negatively, the confirmed cases. The study shows the positive impact of quarantine in decreasing the number of confirmed cases, which was effective after about 14 days, alongside the impact of environmental factors in confirmed cases of COVID-19 and the role of regression analysis and binary classification by using artificial intelligence in the investigations.
Finally, since the analysis shows the impact of the weather parameters on confirmed cases of COVID-19, the development of a prediction model with more datasets is suggested for future studies.    Appendix B