Data Analytics and Mathematical Modeling for Simulating the Dynamics of COVID-19 Epidemic—A Case Study of India

: The global explosion of the COVID-19 pandemic has created worldwide unprecedented health and economic challenges which stimulated one of the biggest annual migrations globally. In the Indian context, even after proactive decisions taken by the Government, the continual growth of COVID-19 raises questions regarding its extent and severity. The present work utilizes the susceptible-infected-recovered-death (SIRD) compartment model for parameter estimation and fruitful prediction of COVID-19. Further, various optimization techniques such as particle swarm optimization (PSO), gradient (G), pattern search (PS) and their hybrid are employed to solve the considered model. The simulation study endorse the efﬁciency of PSO (with or without G) and G+PS+G over other techniques for ongoing pandemic assessment. The key parametric values including characteristic time of infection and death and reproduction number have been estimated as 60 days, 67 days and 4.78 respectively by utilizing the optimum results. The model assessed that India has passed its peak duration of COVID-19 with more than 81% recovery and only a 1.59% death rate. The short duration analysis (15 days) of obtained results against reported data validates the effectiveness of the developed models for ongoing pandemic assessment.


Introduction
The novel coronavirus disease 2019 (COVID- 19) has created one of the biggest challenges in the history of mankind. With over 63.89 million positive cases and 1.48 million deaths as of 1 December 2020 [1], the COVID-19 continues to its deadly killing spree in all corners of the world. The disease caused by SARS-CoV-2 (Severe Acute Respiratory Syndrome Coronavirus 2) is primarily spread between people during close contact mainly via small droplets produced by coughing, sneezing and talking [2,3]. It is the third virus of corona family and other two are SARS-CoV (Severe Acute Respiratory Syndrome Coronavirus) and MERS-CoV (Middle East Respiratory Syndrome Coronavirus) [4]. Compared to the other two viruses, SARS-Cov-2 not only displays unusual epidemiological traits but the number of deaths associated with it also exceeds greatly. This poses a colossal threat to the global public health and economy [5].
The first case was reported in late December 2019 in Wuhan, Hubei, China and after that despite the radical measures taken by various countries to contain it; the number of COVID-19 patients grew exponentially across the globe. As a result of this, World Health Organization (WHO) declared this pneumonia a pandemic on 11 March 2020, a first of its Organization (WHO) declared this pneumonia a pandemic on 11 March 2020, a first of its kind since World War II [6,7]. The symptoms of COVID-19 are around 80% mild (which include fever, cough and shortness of breath with the mean incubation period of 5-6 days), 13.8% severe and 6.2% critical (which include respiratory failure, septic shock, and/or multiple organ failure) [8]. The median time for mild and severe/critical patients from the onset to clinical recovery is approximately 2 weeks and 3-6 weeks respectively [8]. Further, it has been found that the deadly effect of COVID-19 is also associated with demographic conditions such as population, population density, age structure and so forth and pre-existing medical condition of an individual [9,10].
The COVID-19 has triggered one of the largest annual migrations in the world which resulted in a rapid global spread of the virus. Because of this, the epidemic center was shifted to Europe and now to USA which has already more than 117,834 deaths by 14 June 2020 [1]. India has also felt the impact of this global pandemic with the first reported case on 30 January 2020 which gradually increased to 519 positive cases and 40 deaths by 24 March 2020 [11]. In the absence of any vaccine, Indian policymakers, like other countries, have implemented a nation-wide lockdown which after some relaxations still continues as of the submission date of this study on 3 December 2020. The situation in overpopulated countries like India is very threatening as even after 60 days of lockdown exponential growth in the number of infected people is observed. Currently, the ever-infected people in India have already crossed 9.48 million. Moreover, with significantly increased testing rate per million population (which is still much less as compared to others), more number of cases are expected to come. The rate of coronavirus tests performed per million in most impacted countries as of 26 November 2020 is shown in Figure 1. As impact of health on severe economic measures are well known, policymakers in India, like others, are facing a difficult situation when trying to balance between draconian public health actions and keeping the economy alive [12].  26 November 2020 [13].
Due to the lack of previous exposure and still unknown behavior of the novel coronavirus, future prediction of its spread and acquired herd-immunity cannot be properly Due to the lack of previous exposure and still unknown behavior of the novel coronavirus, future prediction of its spread and acquired herd-immunity cannot be properly anticipated. Nevertheless, the epidemiological models could help in understanding the dynamics of COVID-19. Moreover, this will help the authorities in optimum resource sharing including medical infrastructure and administrative service [14]. A number of heuristic models have been proposed to predict COVID-19 outbreak which may be broadly categorized into time series and compartment models. Long term prediction, especially for the current pandemic using time series models (which are mainly based on exponential [14] or logistic [15] curve fitting and are used to describe infection data using scarce (or none) physical meaning parameters), should be avoided. On the other hand, compartment models provide right trade-off between the ease of solution, need of having physical-based parameters and the limited available data of the ongoing epidemic [16,17]. The study of spread of infectious disease using compartment model started in 1927 when Kermack-McKendrick gave their theory [18], leading to the introduction of SIR (Susceptible-Infected-Recovered) model and their successors. In this, the total population of interest, based on the infection status, is subdivided into small compartments (susceptible, exposed, infected, recovered etc.) and the flow between these compartments is governed by ordinary differential equations.
The present investigation utilizes the SIRD model, one of the basic compartment models, to evaluate and forecast the COVID-19 outbreak in India. In order to incorporate some of the proactive actions taken by India, various parameters including dynamic behavior of infection, recovery and death rate have been considered. Although the key parametric values required for modeling are published by WHO from time to time but they represent the global average. Therefore, in this study, these parameters have been estimated using various optimization techniques. Based on the optimum fitting to Indian outbreak, best parametric values have been chosen which are further utilized by the model for predicting the COVID-19 pandemic in India.
The key contributions of this paper are highlighted below: • An efficient forecasting model to forecast the infected, recovered and death cases of the COVID-19 in India based on available epidemiological data. • A maiden attempt to evaluate the effectiveness of PSO, G and PS with their hybrid combinations for prediction in Indian context, to the best knowledge and belief of authors. • Estimation of key parametric values for SIRD modeling in context of India rather than utilizing the median values published by WHO.
The rest of this paper is organized as follows. Section 2 describes the basics of model used in the present study and Section 3 explains the evolutionary kinematic of COVID-19. The stability analysis has been discussed in Section 4. Thereafter, Section 5 enlightens about epidemiological data and its source. In Section 6, the simulation setup and various optimization techniques used have been presented. Section 7 presents a widespread discussion of the findings and at last, the concluding remarks are presented in Section 8.

The SIRD Model
The compartment models are based on the assumption that every individual in the compartment must exhibit same characteristics. The main reason of choosing SIRD model in the present analysis is its simplicity and easy implementation among other compartment models along with high robustness in elucidating the evolution of the pandemic. In SIRD model, the total population of interest is subdivided into Susceptible (S) (number of people that might be infected), Infected (I) (number of people already infected), Recovered (R) (number of people that have been recovered) and Death (D) (number of deaths occurred). The schematic of SIRD model has been illustrated in Figure 2.
The model has been developed with constant population (N = 1.35 billion). The birth rate and death rate are assumed to be same for the period of interest; hence vital dynamics have no impact on total population. Although circumstantial cases are found in the literature for re-infected people but the reinfection rate appears negligible [19,20]. Hence, the probability rate of re-susceptible once recovered from the infection has not been consid-Electronics 2021, 10, 127 4 of 21 ered. The ordinary differential equations describing the evolution of population in each compartment over time for the present SIRD model [6] are reported in Equations (1)- (4).
where, β represents the rate of transmission (infection), α represents the rate of recovery and µ represents the death rate. S is the number of people susceptible to disease. I represents the number of people ever infected. R is the number of people who recovered and D represents the number of people who died. The total population under consideration is represented by N. The model has been developed with constant population (N = 1.35 billion). The birth rate and death rate are assumed to be same for the period of interest; hence vital dynamics have no impact on total population. Although circumstantial cases are found in the literature for re-infected people but the reinfection rate appears negligible [19,20]. Hence, the probability rate of re-susceptible once recovered from the infection has not been considered. The ordinary differential equations describing the evolution of population in each compartment over time for the present SIRD model [6] are reported in Equations (1)-(4).

= −
(1) where, β represents the rate of transmission (infection), α represents the rate of recovery and µ represents the death rate. S is the number of people susceptible to disease. I represents the number of people ever infected. R is the number of people who recovered and D represents the number of people who died. The total population under consideration is represented by N.
A susceptible person becomes infected by coming into ample contact with an infectious person and the probability of being infected is directly proportional to the product of fraction of infected (I/N) and susceptible population. A fraction of infected people can recover whose recovery is directly proportional to the recovery rate and inversely proportional to the average duration of infection as illustrated in Equation (3). Moreover, a fraction of critically infected population may die which is directly proportional to the case fatality rate or lethality of disease and inversely proportional to the average duration between the infection and death as shown in Equation (4). Equations (1)-(4) are not independent because of the consideration of a closed population for diseases with nation-wide outbreak and low lethality rate (like COVID-19) and therefore, they all must follow Equa- A susceptible person becomes infected by coming into ample contact with an infectious person and the probability of being infected is directly proportional to the product of fraction of infected (I/N) and susceptible population. A fraction of infected people can recover whose recovery is directly proportional to the recovery rate and inversely proportional to the average duration of infection as illustrated in Equation (3). Moreover, a fraction of critically infected population may die which is directly proportional to the case fatality rate or lethality of disease and inversely proportional to the average duration between the infection and death as shown in Equation (4). Equations (1)-(4) are not independent because of the consideration of a closed population for diseases with nation-wide outbreak and low lethality rate (like COVID-19) and therefore, they all must follow Equation (5) at any stage.
Hence, values from any of these five equations can be determined using the other four and their initial values are taken from the available data [11].

The Evolutionary Kinematic of COVID-19
Most of the available simple compartment models (SIR [21], SIRD [6] and SEIR [22], etc.) are based on the constant kinematics of β, α and µ which are explained in Equations (1)-(4). Based on these factors, they describe the epidemic growth until a very large population is being infected and achieve herd immunity [23]. However, due to the momentous influence of various suppression strategies (such as social distancing and lockdown) taken by policymakers and new findings on the epidemic, an accurate forecasting based on the growth of its kinematics becomes cumbersome.

The Infection Rate (β)
The infection or transmission rate (β) is generally used as a fitting parameter to describe the epidemic. However, on account of various aggressive mitigation and suppression Electronics 2021, 10, 127 5 of 21 strategies taken by government for the current epidemic, the number of adequate contacts per person per unit time decreases drastically. Hence, in this work, to capture the behavioral changes by mitigation and suppression steps, a time varying β as described in Equation (6) is considered.
As per the previous findings, β depends upon three characteristic values β 0 , β 1 and τ β [24]. There is an initial high value β 0 and a final value β 1 , which for all practical purposes is assumed to be zero considering a long enough isolation period. After lockdown, β decays exponentially from β 0 to β 1 . Finally, τ β represents the settling time and it has been assumed that at t = 3τ β , transmission rate is decreased to 90% of its initial value.

The Recovery Rate (α)
For COVID-19 being a new disease with inadequate known characteristics, the recovery rate may not be constant. It mainly depends upon the policies adopted by various countries, existing health care system, new clinical findings about the disease, ability of doctors and medical staff to quickly learn new therapeutic and unorthodox procedure for detection and treatment and so forth. Hence, the recovery time gradually decreases from its initial high value to a constant value which is inversely proportional to the mean infection period. In this work, the recovery rate (α) has been modeled with a logistic function [25] which is expressed by Equation (7).
where, α 0 represents the initial recovery rate and α 0 + α 1 represents the final recovery rate after τ α duration.

The Death Rate (µ)
Like any other infectious disease, the death rate of any new disease cannot be constant. Usually, it is initially high due to lack of awareness about the disease and hence, only severe cases are detected. However, with increasing awareness and introduction of dedicated treatment methods, the death rate decreases gradually and finally, settles down to a long-term mortality rate. Moreover, with the introduction of various non-pharmaceutical interventions (like social distancing and lockdown), the number of new infected reduces drastically which in turn reduces death rate. Based on this background, a time varying death rate as described by Equation (8) is considered in the present analysis.
where, µ 0 and µ 1 represent the initial and final death rate respectively. Here, it is assumed that after τ µ duration, µ reduces by 90% of its initial value.

The Reproduction Number (R)
The reproduction number, R 0 is a very crucial parameter in any epidemic. Most of the statistical models of COVID-19 are based on R 0 . It basically explains the number of cases each infected case can directly generate considering all individuals susceptible to infection [26]. It is simply a threshold which is used to determine the nature of any disease being an epidemic (R 0 > 1) or not (R 0 < 1). Generally, the higher the value of R 0 , the tougher it is to control the epidemic. For basic compartment model (SIR), R 0 calculated by Equation (9).
The normalized infected population [22] may be calculated by Equation (10).
where, i represent the normalized infected population. R 0 for SIRD model can be calculated from Equation (11) which is obtained by putting Equation (10) in Equation (2).
On putting Equation (11) equal to zero Equation (12) is formed.
Since, normalized infected population cannot be zero and therefore, Equation (12) may be modified as Equation (13).
Further, R 0 may be defined by Equation (14).
Moreover, at the initial phase of outbreak, everyone is susceptible (S ≈ N). Because of very large population with reasonably low total infected, S ≈ N holds true at any time, therefore, Equation (13) is equivalent to Equation (15).
Conclusively, if β is greater than α + µ, then the disease is epidemic; else it will die out. Since, it is usually a mathematical parameter with no physical meaning, it alone cannot describe the true nature of any disease [27].

Stability Analysis
To analyze the stability of the disease-free equilibrium (DFE) in terms of the reproduction number, let it is defined by Equation (16).
Moreover, Equations (1) and (2) yields that 0 ≤ S, 0 ≤ I and S + I ≤ N and the set Ω = {(S, I): S ≥ 0; I ≥ 0; S + I ≤ N} is a positively invariant compact set for the model. On considering the Lyapunov-LaSalle function as V(S, I) = I, the globally asymptotically stability has been analyzed using Equation (17).
Electronics 2021, 10, 127 7 of 21 Furthermore . V = 0 if I = 0 or R 0 = 1. Therefore, the largest invariant set contained in the set represented by Equation (18) is reduced to DFE and is globally asymptotically stable in Ω.

The Epidemiological Data
The data utilized in the present investigation has been taken from the Ministry of Health and Family Welfare (MOHFW), Government of India [11] and publicly available dashboard of India Today [28]. Although, as stated earlier, the very first case was found on 30 January 2020 but until 2 March 2020, all the infected cases were recovered and no new cases were reported. So, for this analysis, data from 3 March 2020 to 28 November 2020 has been considered. In the early phase of outbreak, all the positive cases are because of the overseas travelers and in response to that, India implemented travel ban on 12 March 2020. Although, some people have been migrated after this date but they are properly tested and quarantined. Moreover, considering high internal mobility due to festive season and very limited international migrations, the whole nation has been considered as a closed population in this analysis.
The data available in the aforementioned source provide details about the total active, cured/discharged, deaths and migrated. The total cumulative count has been estimated by summing all the available data and the number of incident cases (new daily cases) have been estimated by the difference of current and previous day cumulative count. For better visualization of data, the variations in cumulative and daily cases with respective dates have been illustrated in Figure 3a-c for infected, recovered and death respectively. The number of cumulative cases and new daily cases are represented on left side and right side respectively in each figure.

Furthermore
= 0 if I = 0 or = 1. Therefore, the largest invariant set contained in the set represented by Equation (18) is reduced to DFE and is globally asymptotically stable in Ω.

The Epidemiological Data
The data utilized in the present investigation has been taken from the Ministry of Health and Family Welfare (MOHFW), Government of India [11] and publicly available dashboard of India Today [28]. Although, as stated earlier, the very first case was found on 30 January 2020 but until 2 March 2020, all the infected cases were recovered and no new cases were reported. So, for this analysis, data from 3 March 2020 to 28 November 2020 has been considered. In the early phase of outbreak, all the positive cases are because of the overseas travelers and in response to that, India implemented travel ban on 12 March 2020. Although, some people have been migrated after this date but they are properly tested and quarantined. Moreover, considering high internal mobility due to festive season and very limited international migrations, the whole nation has been considered as a closed population in this analysis.
The data available in the aforementioned source provide details about the total active, cured/discharged, deaths and migrated. The total cumulative count has been estimated by summing all the available data and the number of incident cases (new daily cases) have been estimated by the difference of current and previous day cumulative count. For better visualization of data, the variations in cumulative and daily cases with respective dates have been illustrated in Figure 3a,b,c for infected, recovered and death respectively. The number of cumulative cases and new daily cases are represented on left side and right side respectively in each figure.
(a)  Moreover, the population density of India is 464.1 per square kilometer which is amongst top 30 in the world and the total population of India is more than 1.38 billion [29] (second largest on the globe) out of which more than 31% is urban population [30]. The urban population resides in very dense clusters which creates a significant challenge for authorities to apply strict social distancing norms. To have a better visualization of COVID-19 spread, a surface plot for the top 10 most affected states in India up until 30 November 2020 has been sketched in Figure 4. It has been observed that the spread is Moreover, the population density of India is 464.1 per square kilometer which is amongst top 30 in the world and the total population of India is more than 1.38 billion [29] (second largest on the globe) out of which more than 31% is urban population [30]. The urban population resides in very dense clusters which creates a significant challenge for authorities to apply strict social distancing norms. To have a better visualization of COVID-19 spread, a surface plot for the top 10 most affected states in India up until 30 November 2020 has been sketched in Figure 4. It has been observed that the spread is asymmetrically distributed and these states carry about 75% cases out of total cases found in India, Maharashtra being the most affected. These states carry maximum proportion of Indian population and have a significant number of economic hubs in India which points towards their high population density than the average value. It validates that people in these states are compelled to live in highly dense clusters and containment of spread by social distancing norms is very difficult. However, the effect of population density and other such parameters towards the suppression of COVID-19 is beyond the scope of this study. in India, Maharashtra being the most affected. These states carry maximum proportion of Indian population and have a significant number of economic hubs in India which points towards their high population density than the average value. It validates that people in these states are compelled to live in highly dense clusters and containment of spread by social distancing norms is very difficult. However, the effect of population density and other such parameters towards the suppression of COVID-19 is beyond the scope of this study.

Model Simulation and Optimization
In the present investigation, the entire simulation work has been done on MATLAB 2016a programming platform. The optimization of model parameters has been implemented by the minimization of residuals (objective function) estimated between available epidemiologic data and modeling data with default settings. Therefore, if and represents the modeling and experimental data at a certain time i respectively, then the residuals will be calculated as = − . Further, because cumulative data is highly correlated which may produce highly misleading results. Therefore, in this work the daily incident data of infected, recovered and deaths have been utilized. However, the incident data is not only independent but also greatly scattered as a result of which the ordinary least squares may generate inappropriate result. To overcome this problem, the bi-square M-estimator based regression method has been implemented which recalculates the residuals at a time according to Equation (18) [24].

Model Simulation and Optimization
In the present investigation, the entire simulation work has been done on MATLAB 2016a programming platform. The optimization of model parameters has been implemented by the minimization of residuals (objective function) estimated between available epidemiologic data and modeling data with default settings. Therefore, if x i andx i represents the modeling and experimental data at a certain time i respectively, then the residuals will be calculated as J i = x i −x i . Further, because cumulative data is highly correlated which may produce highly misleading results. Therefore, in this work the daily incident data of infected, recovered and deaths have been utilized. However, the incident data is not only independent but also greatly scattered as a result of which the ordinary least squares may generate inappropriate result. To overcome this problem, the bi-square M-estimator based regression method has been implemented which recalculates the residuals at a time according to Equation (18) [24].
where, k = 5.38 × Mean absolute deviation of the values in the residuals. Therefore, the impact of outliers (|J i | > k) has been nullified. Further, in the present investigation three independent data sets (daily new infected, daily new recovered and daily new deaths) have been incorporated which converts the problem in a multi-objective function minimization. The weighted sum of single objective function ( J i ) has been employed as the final objective function to estimate the goodness of fit between the model predicted and epidemiological data as described in Equation (19).
where, k 1 = 20, k 2 = 50 and k 3 = 100 have been empirically choosen to compensate the different order of magnitude between single objective functions. Further, this global objective function has been used to compute the parameters of developed model at global minima using proposed optimization techniques including pattern search (PS), particle swarm (PSO), gradient descent (G) and their hybrid. Nevertheless, the generalized stepby-step pseudo code for the proposed optimization techniques (PS, PSO, and G) has been comprehensively demonstrated in Algorithms 1-3 respectively.

Algorithm 1 (Pseudo code for PS)
for each step i = 1, . . . , S do, Initialize the default search step α 0 Initialize the current solution β 0 α = α 0 while i < S or error ≥ error bound do: Update the current solution to the best neighbor in θ α = α 0 Else α = α 0 2 Algorithm 2 (Pseudo code for PSO) for each particle i = 1, . . . , S do, Initialize the particle s position with a uniformly distributed random vector : Initialize the particle s best known position to its initial position : p i ← x i If f (p i ) < f (g) then update the swarm s best known position : g ← p i Initialize the particle s velocity : v i ∼ U(±|b u − b l |) while i ≤ S or error ≥ error bound do: for each particle i = 1, . . . , S do for each dimension d = 1, . . . , n do Pick random numbers : r p , r g ∼ U(0, 1) Update particle's velocity: v i,d ← ρv i,d + ω p r p p i,d − x i,d + ω g r g g d − x i,d Update the particle's position: If f (x i ) < f (p i ) then update the particle s best known position : p i ← x i If f (p i ) < f (g) then update the swarm s best known position : g ← p i

Algorithm 3 (Pseudo code for G)
for each step i = 1, . . . , S do, Initialize with f being a differentiable function ( R n → R) Initialize with any random solution x 0 while i < S or error ≥ error bound do:

Result and Discussion
The SIRD model presented in this study is based upon ten parameters as discussed in Section 3. Most of the reported articles on mathematical modeling for COVID-19 forecasting use some parameters which are based upon the median value given by WHO. However, due to the fact that the origin of the virus is still unknown, the use of these values may produce highly misleading parameters, resulting in wrong prediction. To avoid this pitfall, all the parameters except t lockdown have been estimated using the evolutionary data. In order to validate the robustness of the estimated parameters, PSO, PS and G along with their combinations have been used. It has been found that the values of these estimated parameters greatly vary depending upon the optimization technique and level of hybridization used. The parametric values estimated by G and PS are highly misleading however, PSO estimates the optimum values. Simulated results also reveal that the parameters estimated by PSO with or without G are approximately same. However, hybridization of G with PSO only increases the complexity and simulation time without any fruitful advantage and it may be avoided. On the contrary side, a drastic impact of hybridization has been observed in the estimated parameters of PS and depending upon the level of hybridization, the estimated parameter approaches towards the values estimated by PSO. With level-1 hybridization, G followed by PS (G + PS), a variation of 4.8 × 10 −2 (23.76%), 12.48 (20.68%), 0.31 × 10 −2 (13.54%), 0.951 × 10 −2 (14.68%), 4.17 (9.56%), 5.93 × 10 −2 (18.74%), 9.80 × 10 −3 (136.11%) and 52.78 (78.32%) in β 0 , τ β , α 0 , α 1 , τ α , µ 0 , µ 1 and τ µ respectively with respect to PSO has been observed. However, with level-2 hybridization, G followed by PS followed by G (G + PS + G), almost same values of parameters as of PSO have been observed with a variation of 1 × 10 −4 (0.05%), 0.03 (0.05%), 0, 1 × 10 −3 (1.54%), 0.65 (1.49%), 1.9 × 10 −4 (5.99%), 1.5 × 10 −3 (20.83%) and 3.78 (5.61%) in the respective values of β 0 , τ β , α 0 , α 1 , τ α , µ 0 , µ 1 and τ µ . Therefore, with level-2 hybridization, the parametric values of PS have been greatly improved. The various estimated parameters and their bounds have been presented in Table 1. Also, the outbreak of COVID-19 in India can be viewed as from regional to national level and in response to this, various states applied lockdown and curfew. However, they are limited to a particular region only whereas, the nationwide lockdown was implemented on 25 March 2020. Hence, in this study, t lockdown has been considered as 22 days.
According to the parameters estimated in Table 1, the model values of infected, recovered and deaths have been generated using different optimization techniques and to find the best fit, it has been plotted in Figure 5a-c respectively. It has been found that G and PS based optimization alone produces worst fit to the available epidemiological data whereas, G + PS yield slightly better fitting; however, it is also unable to deliver optimum fit. This may be because of the reason that instead of finding global minima, they got stuck to local minima. However, when the objective function was minimized by PSO (with or without G) or G + PS + G, then global minima has been achieved and optimum fitting to the available data has been obtained. Moreover, although with level-2 hybridization, PS achieves global minima but it also enhances system complexity and simulation time. However, PSO alone can lead towards global minima as evident from Figure 5. Therefore, it has been assumed that the parameters estimated by PSO are the parameters of the Indian epidemiological data. According to the parameters estimated in Table 1, the model values of infected, recovered and deaths have been generated using different optimization techniques and to find the best fit, it has been plotted in Figure 5a-c respectively. It has been found that G and PS based optimization alone produces worst fit to the available epidemiological data whereas, G + PS yield slightly better fitting; however, it is also unable to deliver optimum fit. This may be because of the reason that instead of finding global minima, they got stuck to local minima. However, when the objective function was minimized by PSO (with or without G) or G + PS + G, then global minima has been achieved and optimum fitting to the available data has been obtained. Moreover, although with level-2 hybridization, PS achieves global minima but it also enhances system complexity and simulation time. However, PSO alone can lead towards global minima as evident from Figure 5. Therefore, it has been assumed that the parameters estimated by PSO are the parameters of the Indian epidemiological data. Based on the above findings, the modeling results for kinetic of infection, recovery and death rate obtained for Indian COVID-19 epidemiologic using all the above mentioned optimization techniques (G, PS, PSO, G + PS, G + PSO, G + PS + G and G + PSO + G) have been reported in Figure 6. Here, it is important to analyze the value of acquired modeling parameters, reported in Table 1 and plotted in Figure 6 to have some physical considerations. Assuming constant contact rate before lockdown, β remains constant in that duration and after that, due to isolation, it decreases exponentially with the characteristic time of τβ equals to 60 days. Therefore, after 180 days (3τβ) from the onset of epidemiology, β reduces to 90%. The recovery parameters (α0, α1 and τα) are useful to detect the average number of days required for recovery (Tr). The kinetic of recovery has been found as constant which represents only severe cases have been observed and reported. Therefore, Tr has been calculated as 44 which is within the bounds of very severe infections [8]. The parameters associated with kinetic of death (µ0, µ1 and τµ) have been used to acquire the information regarding the average time between symptom and death. Considering the 4% death rate [31], average death time (Td) gradually increases from 13 days until it reaches seven months from the onset of pandemic and achieves a long-term mortality rate. This can be described as at the start of pandemic; mostly severe cases have been reported. Nevertheless, in spite of weak medical infrastructure, the factors like a very large young population, warmer weather conditions and humidity [32] along with the awareness programs initiated by the government favor India in achieving long term lethality rate.
Another important parameter to analyze the severity of infectious disease is R0 and it has been regularly calculated using the parameters estimated in Table 1 with the help of Equation (15). At the start of the outbreak, moderate initial value (3.22) has been observed that goes up to 9.78 by first week of May 2020. This is because at the beginning, it has been driven by high initial value of β and very low values of α and µ. However, on the application of social distancing norms, lockdown and other suppression strategies, it started to fall gradually and settle down to less than 1 after five months of outbreak. The average value has been calculated as 4.78 whereas in literature, the R0 of 2.56 has been reported for India [33]. However, based on the long enough duration of disease and different methods used for calculation, the estimated data can also be considered. Based on the above findings, the modeling results for kinetic of infection, recovery and death rate obtained for Indian COVID-19 epidemiologic using all the above mentioned optimization techniques (G, PS, PSO, G + PS, G + PSO, G + PS + G and G + PSO + G) have been reported in Figure 6. Here, it is important to analyze the value of acquired modeling parameters, reported in Table 1 and plotted in Figure 6 to have some physical considerations. Assuming constant contact rate before lockdown, β remains constant in that duration and after that, due to isolation, it decreases exponentially with the characteristic time of τ β equals to 60 days. Therefore, after 180 days (3τ β ) from the onset of epidemiology, β reduces to 90%. The recovery parameters (α 0 , α 1 and τ α ) are useful to detect the average number of days required for recovery (T r ). The kinetic of recovery has been found as constant which represents only severe cases have been observed and reported. Therefore, T r has been calculated as 44 which is within the bounds of very severe infections [8]. The parameters associated with kinetic of death (µ 0 , µ 1 and τ µ ) have been used to acquire the information regarding the average time between symptom and death. Considering the 4% death rate [31], average death time (T d ) gradually increases from 13 days until it reaches seven months from the onset of pandemic and achieves a long-term mortality rate. This can be described as at the start of pandemic; mostly severe cases have been reported. Nevertheless, in spite of weak medical infrastructure, the factors like a very large young population, warmer weather conditions and humidity [32] along with the awareness programs initiated by the government favor India in achieving long term lethality rate.
Another important parameter to analyze the severity of infectious disease is R 0 and it has been regularly calculated using the parameters estimated in Table 1 with the help of Equation (15). At the start of the outbreak, moderate initial value (3.22) has been observed that goes up to 9.78 by first week of May 2020. This is because at the beginning, it has been driven by high initial value of β and very low values of α and µ. However, on the application of social distancing norms, lockdown and other suppression strategies, it started to fall gradually and settle down to less than 1 after five months of outbreak. The average value has been calculated as 4.78 whereas in literature, the R 0 of 2.56 has been reported for India [33]. However, based on the long enough duration of disease and different methods used for calculation, the estimated data can also be considered. Moreover, when compared with the results obtained by PSO, G + PS + G also predicts similar value of Tr. However, the respective values of τβ, τµ, Td and initial and average R0 are predicted by G + PS + G as 60.31, 63.61, 12, 3.07 and 4.6 and accordingly a very small variation with respect to PSO of 0.05%, 5.6%, 7.69%, 4.65% and 3.76% has been observed and hence, are also acceptable. The values of τβ, Tr, τµ, Td and initial and average R0 obtained by all these techniques have been tabulated in Table 2. The variation in R0 using G, PS, PSO, G + PS, G + PSO, G + PS + G and G + PSO + G have been demonstrated in Figure  7 respectively for better visualization.  Moreover, when compared with the results obtained by PSO, G + PS + G also predicts similar value of T r . However, the respective values of τ β , τ µ , T d and initial and average R 0 are predicted by G + PS + G as 60.31, 63.61, 12, 3.07 and 4.6 and accordingly a very small variation with respect to PSO of 0.05%, 5.6%, 7.69%, 4.65% and 3.76% has been observed and hence, are also acceptable. The values of τ β , T r , τ µ , T d and initial and average R 0 obtained by all these techniques have been tabulated in Table 2. The variation in R 0 using G, PS, PSO, G + PS, G + PSO, G + PS + G and G + PSO + G have been demonstrated in Figure 7 respectively for better visualization. The Indian outbreak prediction curves for COVID-19 using all these techniques for active infected, total ever infected, recovered and death have been demonstrated in Figure 8. It has been found that using PSO, the model predicts last week of September as the peak of Indian outbreak using optimum parameters and thereafter, the number of infected people will gradually decrease. However, at its peak, the number of active infected people predicted by the model is 1,025,407 and towards the end of simulation time (350 days), 273,586 and 10,738,028 active infected and ever infected people respectively, have been estimated. The total number of recovered and death towards the end of simulation of the ongoing pandemic have been predicted as 10,313,876 (96.05%) and 150,566 (1.40%) respectively. Interestingly, the model predicts same time as the peak of Indian outbreak by using G + PS + G also and the total active infected people have been estimated as 1,060,303 at that time. Furthermore, as compared to PSO, a respective difference in ever infected population of 0.21% and 0.08% has been observed at peak and end of simulation time.
The values regarding the forecasting of COVID-19 outbreak in India by all these techniques have been presented in Table 3.
Moreover, when compared with the results obtained by PSO, G + PS + G also predicts similar value of Tr. However, the respective values of τβ, τµ, Td and initial and average R0 are predicted by G + PS + G as 60.31, 63.61, 12, 3.07 and 4.6 and accordingly a very small variation with respect to PSO of 0.05%, 5.6%, 7.69%, 4.65% and 3.76% has been observed and hence, are also acceptable. The values of τβ, Tr, τµ, Td and initial and average R0 obtained by all these techniques have been tabulated in Table 2. The variation in R0 using G, PS, PSO, G + PS, G + PSO, G + PS + G and G + PSO + G have been demonstrated in Figure  7 respectively for better visualization.   Further, the short term effectiveness of the developed models has been compared against actual pandemic values after 15 days (13 December 2020). The obtained results reveal that G, PS and G + PS are incapable to predict the epidemic; however, the models developed by employing any variant PSO and G + PS + G predict the values of I, R and D with significant accuracy (99%) which validates the efficiency of these developed models for accurate prediction of the ongoing pandemic. The obtained results along with reported data have been presented in Table 4.  Further, the short term effectiveness of the developed models has been compared against actual pandemic values after 15 days (13 December 2020). The obtained results reveal that G, PS and G + PS are incapable to predict the epidemic; however, the models developed by employing any variant PSO and G + PS + G predict the values of I, R and D with significant accuracy (99%) which validates the efficiency of these developed models for accurate prediction of the ongoing pandemic. The obtained results along with reported

Conclusions
In the present investigation, the SIRD compartment model has been used for investigating the evolution and prediction of COVID-19 in India. To incorporate behavioral change in key parameters because of lockdown, social distancing and other non-pharmaceutical interventions, dynamic behavior of β, α, µ and hence, R 0 has been considered. The modeling parameters have been optimized using gradient (fmincon), particle swarm, pattern search and their hybrid. Through simulation-based investigation, it has been found that PSO along with any combination of G and G + PS + G produce almost identical results but considering the model complexity and time required for simulation, PSO has assumed to be the best optimization technique and therefore, τ β of 60 days, τ µ of 67 days and R 0 of 4.78 have been estimated. However, G, PS and G + PS did not yield optimum fitting, thereby validating their inappropriateness in the current pandemic assessment.
Based on the above parameters, model predicted last week of September as the peak duration of COVID-19 pandemic in India with more than 1,025,407 active infected people at its peak with an accuracy of 97% and even after 350 days form the onset of the pandemic, more than 273,586 people will remain infected with a total of ever infected people crosses 10.7 million. However, by that time, more than 96% people will be recovered and only around 1.4% death has been projected. It also anticipated that even at its peak, around 81% people have been recovered with 1.59% death which is far better than other countries. Nonetheless, it also creates an alarming situation considering the fact that India is among the lowest health workforce density [34] with the ratio of nurse to physicians only 0.6:1 as compared to 3:1 in developed countries [23]. Despite the weak medical infrastructure, comparatively less severe pandemic has been observed which may be because of very high young population accompanied by proactive response by policymakers. However, constant recovery rate indicated that most of the reported cases in India are associated with severe infections. Therefore, the pandemic data could be extremely underestimated in total number of infected as well as recovered.
Notably, the effectiveness of the present investigation extremely lies on the quality of data. The proposed methodology should be used only for quality understanding of the Indian pandemic and crude predictions and not for any change in policy or decision. Moreover, the accuracy of prediction depends on a number of factors such as policy changes (leading to drastic variations in pandemic data), false reported data and modifications in guidelines to report the data, novel findings and introduction of vaccines and so forth. These factors may have a significant influence on the prediction accuracy and are envisioned as the scope of further investigation.  Data Availability Statement: Data available in a publicly accessible repository that does not issue DOIs. Publicly available datasets were analyzed in this study. This data can be found here: reference number [11,28].