Epidemiological Modeling of COVID-19 in Saudi Arabia: Spread Projection, Awareness, and Impact of Treatment

Featured Application: In this work, the prediction of the spread of COVID-19 in Saudi Arabia was analyzed and evaluated using di ﬀ erent epidemiological models in order to choose the most appropriate model. Additionally, the impacts of social distancing and treatment were modeled using two modiﬁed versions of the susceptible–infectious–recovered (SIR) model Abstract: The ﬁrst case of COVID-19 originated in Wuhan, China, after which it spread across more than 200 countries. By 21 July 2020, the rapid global spread of this disease had led to more than 15 million cases of infection, with a mortality rate of more than 4.0% of the total number of conﬁrmed cases. This study aimed to predict the prevalence of COVID-19 and to investigate the e ﬀ ect of awareness and the impact of treatment in Saudi Arabia. In this paper, COVID-19 data were sourced from the Saudi Ministry of Health, covering the period from 31 March 2020 to 21 July 2020. The spread of COVID-19 was predicted using four di ﬀ erent epidemiological models, namely the susceptible–infectious–recovered (SIR), generalized logistic, Richards, and Gompertz models. The assessment of models’ ﬁt was performed and compared using four statistical indices (root-mean-square error (RMSE), R squared (R 2 ), adjusted R 2 ( R 2adj ), and Akaike’s information criterion (AIC)) in order to select the most appropriate model. Modiﬁed versions of the SIR model were utilized to assess the inﬂuence of awareness and treatment on the prevalence of COVID-19. Based on the statistical indices, the SIR model showed a good ﬁt to reported data compared with the other models (RMSE = 2790.69, R 2 = 99.88%, R 2 adj = 99.98%, and AIC = 1796.05). The SIR model predicted that the cumulative number of infected cases would reach 359,794 and that the pandemic would end by early September 2020. Additionally, the modiﬁed version of the SIR model with social distancing revealed that there would be a reduction in the ﬁnal cumulative epidemic size by 9.1% and 168.2% if social distancing were applied over the short and long term, respectively. Furthermore, di ﬀ erent treatment scenarios were simulated, starting on 8 July 2020, using another modiﬁed version of the SIR model. Epidemiological modeling can help to predict the cumulative number of cases of infection and to understand the impact of social distancing and pharmaceutical intervention on the prevalence of COVID-19. The ﬁndings from this study can provide valuable information for governmental policymakers trying to control the spread of this pandemic.

potential when the primary reproductive number ≥ 1; for COVID-19, this variable was determined in different studies [9,17,18] to be in the range of 2.2-6.47, implying that this disease is very infectious. Moreover, the influence of different levels of social distancing on the number of hospitalizations has been examined in other mathematical studies using the susceptible-exposed-infectious-recovered (SEIR) compartmental model [1]. These epidemiological models can be useful for policymakers when choosing optimal strategies to minimize the number of cases of infection and to reduce the burden on health facilities.
The main objective of this study was to generate COVID-19 projections in Saudi Arabia using epidemiological models. Specifically, we implemented and compared different models, namely, the susceptible-infectious-recovered (SIR), generalized logistic, Richards, and Gompertz models. The accuracy of these models was assessed in order to choose the most appropriate model. Additionally, the impacts of social distancing and treatment were modeled using two modified versions of the SIR model.

Methods
The dynamic spread of COVID-19 was estimated using different epidemiological models. In this study, the SIR, generalized logistic, Richards, and Gompertz models were used to predict the final number of cases of infection and the end date of the COVID-19 pandemic. These models were implemented using MATLAB (version R2020a, MathWorks Inc., Natick, MA, USA) to estimate the spread of COVID-19 in Saudi Arabia. The parameters in these models were optimized to approximate the total number of infections reported by the Saudi Ministry of Health from 31 March 2020 to 21 July 2020.

Data
The models mentioned above were implemented and applied to the daily-updated public data by the Saudi Ministry of Health via the online COVID-19 dashboard [19]. Additionally, this dashboard was used to gather the daily reported cumulative number of cases of infection, mortality, and recovery.

Parameter Estimation
For each model, the parameters were estimated via the built-in MATLAB function fminsearch, which uses a simplex search method [20]. This function can help in searching for the set of parameters (P) that minimizes the sum of the squared differences between the observed values (y i ) and the predicted values calculated by the model (ŷ i ). The final objective function can be given by whereP is the set of optimized parameters of each model, y i denotes the observed values,ŷ i indicates the predicted values, and n is the number of observations.

Compartmental Model
The SIR model, as shown in Figure 1, is considered the simplest compartmental model. It was first introduced by Kermack and McKendrick [21], and has been used to investigate the spread dynamics of a variety of diseases, such as MERS coronavirus [22] and novel coronavirus [23], and has also been used to forecast seasonal influenza [24]. To model an epidemic, the population is divided into three groups. The first group is defined as the number of individuals who are susceptible to the disease and is represented by S(t). This group includes those individuals who have not yet been infected by time t. The second group is represented by those individuals who have been infected by time t and is represented by I(t). The last group is denoted by R(t) and contains those individuals who have recovered from the disease by time t. This model consists of three differential equations: where S, I, and R are the number of susceptible, infected, and recovered cases, respectively. The parameters β and γ represent the contact rate of the disease and the average recovery rate, respectively. From Equation (2), the total size of the population (N) can be obtained as the sum of the susceptible, infected, and recovered cases (N = S + I + R).
Appl. Sci. 2020, 10, x FOR PEER REVIEW 4 of 14 infected by time t. The second group is represented by those individuals who have been infected by time t and is represented by I(t). The last group is denoted by R(t) and contains those individuals who have recovered from the disease by time t. This model consists of three differential equations: where , , and are the number of susceptible, infected, and recovered cases, respectively. The parameters and represent the contact rate of the disease and the average recovery rate, respectively. From Equation (2), the total size of the population (N) can be obtained as the sum of the susceptible, infected, and recovered cases ( = + + ).

Growth Models
In addition to the SIR model, three nonlinear growth models were used to estimate the spread of COVID-19.
The growth models used in this work were the Richards, generalized logistic, and Gompertz models. The Richards and generalized logistic models are defined using differential equations as expressed in Equations (3) and (4) [25]; additionally, the Gompertz growth model is shown in Equation (5) [26].
where is the cumulative number of cases of infection, and , , and are the models' parameters. More precisely, is the early growth rate, is the carrying capacity, and is the scaling parameter.

Model Selection Criteria
To determine the accuracy of the models and to select the appropriate model, the four statistical metrics of the root-mean-squared error ( ), R squared ( ), adjusted ( ), and Akaike's information criterion ( ) were calculated [27,28]. Firstly, the is one of the standard measures used to assess the goodness of fit of a model. It is the average squared difference between the observed values and the predicted values, and it is defined by

Growth Models
In addition to the SIR model, three nonlinear growth models were used to estimate the spread of COVID-19.
The growth models used in this work were the Richards, generalized logistic, and Gompertz models. The Richards and generalized logistic models are defined using differential equations as expressed in Equations (3) and (4) [25]; additionally, the Gompertz growth model is shown in Equation (5) [26].
where C is the cumulative number of cases of infection, and r, K, and d are the models' parameters. More precisely, r is the early growth rate, K is the carrying capacity, and d is the scaling parameter.

Model Selection Criteria
To determine the accuracy of the models and to select the appropriate model, the four statistical metrics of the root-mean-squared error (RMSE), R squared (R 2 ), adjusted R 2 (R 2 adj ), and Akaike's information criterion (AIC) were calculated [27,28].
Firstly, the RMSE is one of the standard measures used to assess the goodness of fit of a model. It is the average squared difference between the observed values and the predicted values, and it is defined by where y i denotes the observed values,ŷ i indicates the predicted values, n is the number of observations, and p is the number of parameters in the model. Additionally, R 2 statistics measure the predictive accuracy of a fitted model. It represents the amount of the variation that is accounted for by the fitted model and is defined by where y i ,ŷ i , and y denote the observed values, the predicted values, and the mean of the observed values, respectively. Furthermore, the adjusted R 2 adj adds a small penalty for adding more variables in the model compared to the R 2 with variability in the data [29] and is defined by Lastly, the AIC is one of the most common tools used to measure the goodness of fit of a model and is calculated by

Current Status of COVID-19
The geographical distribution of the cases of infection, mortality, and recovery of COVID-19 in Saudi Arabia are illustrated in Figure 2A-C. Figure 2A-C shows that a high number of infected and recovered individuals, respectively, are located in big cities, such as Riyadh and Jeddah, compared to other small cities and rural areas. Figure 2B illustrates the number of mortalities in Saudi Arabia, with only three cities showing a high rate of mortality (greater than 200 deaths in each city).
where denotes the observed values, indicates the predicted values, is the number of observations, and is the number of parameters in the model.
Additionally, statistics measure the predictive accuracy of a fitted model. It represents the amount of the variation that is accounted for by the fitted model and is defined by where , , and denote the observed values, the predicted values, and the mean of the observed values, respectively. Furthermore, the adjusted adds a small penalty for adding more variables in the model compared to the with variability in the data [29] and is defined by Lastly, the AIC is one of the most common tools used to measure the goodness of fit of a model and is calculated by

Current Status of COVID-19
The geographical distribution of the cases of infection, mortality, and recovery of COVID-19 in Saudi Arabia are illustrated in Figure 2A-C. Figure 2A-C shows that a high number of infected and recovered individuals, respectively, are located in big cities, such as Riyadh and Jeddah, compared to other small cities and rural areas. Figure 2B illustrates the number of mortalities in Saudi Arabia, with only three cities showing a high rate of mortality (greater than 200 deaths in each city).   Figure 3A-C illustrates the rates of infection, mortality, and recovery at different weekly intervals from 13 March 2020 to 21 July 2020. First, Figure 3A shows a significant drop in the rate of infection at the end of July compared to earlier stages of the spread of COVID-19 in Saudi Arabia. Second, Figure 3B illustrates a slight increase in the mortality rate, although it remained under 1% of total cases. Lastly, Figure 3C shows a recent rise in the recovery rate compared to the beginning of this pandemic. Figure 3A-C illustrates the rates of infection, mortality, and recovery at different weekly intervals from 13 March 2020 to 21 July 2020. First, Figure 3A shows a significant drop in the rate of infection at the end of July compared to earlier stages of the spread of COVID-19 in Saudi Arabia. Second, Figure 3B illustrates a slight increase in the mortality rate, although it remained under 1% of total cases. Lastly, Figure 3C shows a recent rise in the recovery rate compared to the beginning of this pandemic.

Epidemiological Model Predictions
The Richards, generalized logistic, Gompertz, and SIR models were used to estimate the cumulative number and the daily number of cases of infection from 31 March 2020 to 21 July 2020 with a 95% confidence interval (CI) (Figure 4). The results of the four models showed good agreement between the fitted and the observed values ( Figure 4A,C,E,G).

Epidemiological Model Predictions
The Richards, generalized logistic, Gompertz, and SIR models were used to estimate the cumulative number and the daily number of cases of infection from 31 March 2020 to 21 July 2020 with a 95% confidence interval (CI) (Figure 4). The results of the four models showed good agreement between the fitted and the observed values ( Figure 4A,C,E,G).
Furthermore, the final number of cases of infection and the end dates of the pandemic were predicted using each model, as summarized in Table 1. The SIR model revealed the lowest final cumulative number of cases of COVID-19 infection, with an end date on 7 September 2020, compared to the growth models. Conversely, a higher cumulative number of cases of infection was estimated by the Gompertz model, with an end date on 20 December 2020. Correspondingly, the Richards and generalized logistic models predicted the end date of this pandemic to be in October 2020, with approximately 404,853 and 420,593 as final number of cases, respectively. Figure 5 shows the cumulative and daily number of cases of infection for all models used in this study overlaid with the observed cases. It can be clearly seen from Figure 5A that the maximum final cumulative number of cases of infection was predicted by the Gompertz model. Conversely, the SIR model estimated the minimum number of final infection cases, which was 26% lower that that predicted by the Gompertz model. Subsequently, the SIR model showed an earlier end date in Saudi Arabia compared to the other models (as shown in Figure 5B and Table 1). The Gompertz growth model revealed that new cases of infection would last until 20 December 2020, as shown in Table 1. In Figure 5, the area shaded in yellow illustrates the period of lockdown, while the unshaded area refers to the period after 30 May 2020, where most restrictions were lifted in Saudi Arabia, except in Makkah Al-Mukarramah. Furthermore, the final number of cases of infection and the end dates of the pandemic were predicted using each model, as summarized in Table 1. The SIR model revealed the lowest final cumulative number of cases of COVID-19 infection, with an end date on 7 September 2020, compared to the growth models. Conversely, a higher cumulative number of cases of infection was estimated by the Gompertz model, with an end date on 20 December 2020. Correspondingly, the Richards and generalized logistic models predicted the end date of this pandemic to be in October 2020, with approximately 404,853 and 420,593 as final number of cases, respectively.  Figure 5 shows the cumulative and daily number of cases of infection for all models used in this study overlaid with the observed cases. It can be clearly seen from Figure 5A that the maximum final cumulative number of cases of infection was predicted by the Gompertz model. Conversely, the SIR model estimated the minimum number of final infection cases, which was 26% lower that that predicted by the Gompertz model. Subsequently, the SIR model showed an earlier end date in Saudi Arabia compared to the other models (as shown in Figure 5B and Table 1). The Gompertz growth model revealed that new cases of infection would last until 20 December 2020, as shown in Table 1.  In Figure 5, the area shaded in yellow illustrates the period of lockdown, while the unshaded area refers to the period after 30 May 2020, where most restrictions were lifted in Saudi Arabia, except in Makkah Al-Mukarramah.

Beyond the SIR Model
The SIR model showed the best fit to the actual dataset from 31 March 2020 to 21 July 2020 compared to the Richard, generalized logistic, and Gompertz models, as shown in Table 2. Hence, two modified versions of the SIR model were developed in order to investigate the impacts of awareness (i.e., social distancing) and treatment on the spread of COVID-19 in Saudi Arabia.

Effect of Social Distancing on the Spread of the Virus
The SIR model (Equation (2)) was modified in order to include the awareness term, i.e., social distancing, as a function of affected individuals (i.e., infected and recovered cases), as shown in Equation (10). In this study, two types of social distancing, namely, long-and short-term, were considered to investigate their leverage on the spread of COVID-19, according to previously published work [30].
Long-term social distancing can be simulated by reducing the interaction between individuals proportional to the rate of affected individuals (i.e., infected and recovered) according to Equation (11).
In short-term awareness models, the reduction of the interaction between individuals is proportional to the rate of infected individuals only (see Equation (12)).
The term ξ in both awareness models represents the reduction parameter. When ξ = 0, the original version of the SIR model was simulated with no social-distancing impact. On the other hand, the linear response of social distancing on the spread of the virus was simulated with ξ = 1, whilst nonlinear implications were seen when ξ > 1. In addition, the modified version of the SIR model (see Equation (10)) was fitted with the cumulative cases during the effective lockdown period (from 1 May 2020 to 14 June 2020) using short-and long-term types. Thereafter, we were able to define the optimized values for β, γ, S 0 , and ξ. The optimal values of ξ when using short-and long-term types were 0.1 and 0.17, respectively.
The simulation results shown in Figure 6 illustrated that the cumulative number of cases of infection reduced when the optimal exponent values (ξ), the linear (ξ = 1) and non-linear (ξ = 2) social-distancing models were applied compared to the nonmodified SIR model. The linear and nonlinear short-term social-distancing models showed a decrease of 4.63% and 9.1%, respectively, in comparison to the final size of the pandemic predicted by the nonmodified SIR model. Additionally, we applied the optimal exponent values of the short-and long-term lockdown types (0.1 and 0.17, respectively) acquired by fitting the modified SIR model (Equation (10)) with the data during the lockdown period. As consequence, short-(ξ = 0.1) and long-term (ξ = 0.17) lockdowns produced drops of 0.5% and 4.86%, respectively. The long-term awareness model showed a higher impact compared to the short-term awareness model (see Figure 6B).
original version of the SIR model was simulated with no social-distancing impact. On the other hand, the linear response of social distancing on the spread of the virus was simulated with = 1, whilst nonlinear implications were seen when > 1. In addition, the modified version of the SIR model (see Equation (10)) was fitted with the cumulative cases during the effective lockdown period (from 1 May 2020 to 14 June 2020) using short-and long-term types. Thereafter, we were able to define the optimized values for , , , and . The optimal values of when using short-and long-term types were 0.1 and 0.17, respectively.
The simulation results shown in Figure 6 illustrated that the cumulative number of cases of infection reduced when the optimal exponent values ( ), the linear ( = 1) and non-linear ( = 2) social-distancing models were applied compared to the nonmodified SIR model. The linear and nonlinear short-term social-distancing models showed a decrease of 4.63% and 9.1%, respectively, in comparison to the final size of the pandemic predicted by the nonmodified SIR model. Additionally, we applied the optimal exponent values of the short-and long-term lockdown types (0.1 and 0.17, respectively) acquired by fitting the modified SIR model (Equation (10)) with the data during the lockdown period. As consequence, short-( = 0.1) and long-term ( = 0.17) lockdowns produced drops of 0.5% and 4.86%, respectively. The long-term awareness model showed a higher impact compared to the short-term awareness model (see Figure 6B).

Impact of Antiviral Drug Treatment on the Prevalence of COVID-19
Another modification was made to the original SIR model, mentioned earlier in Equation (2), in order to explore how treatment would impact the spread of COVID-19 [31,32], as well as to estimate the final size of the pandemic after the onset of antiviral drug treatment. The SIR model was adjusted by dividing the infected individuals into treated (I tr ) and untreated (I u ) cases, as per Equation (13). f 0 represents the fraction of infected individuals to receive the antiviral drug treatment [31], with the assumption that the level of infection can be reduced by treatment factor σ. Subsequently, this would lead to an increase in the number of recovered individuals after the onset of the treatment and, as a consequence, a decrease in the number of cases of infection.
The parameters γ u and γ tr are the average recovery rates for the untreated and treated individuals, respectively. The start date of the antiviral drug treatment was set to 8 July 2020 in order to examine the impact of treatment on the current number of cases of infection. Multiple treatment impacts are illustrated in Figure 7, showing different fractions of infected individuals assumed to receive the treatment. As there is still no antiviral drug that has been proven to be fully effective against COVID-19, the impact of treatment was investigated by assuming that such a drug would reduce the level of infection by 50% (σ = 0.5) [31]. Meanwhile, f 0 was varied between 0 and 1 with a step size of 0.1 in order to simulate different projections of the virus after the onset of antiviral drug treatment. The simulations showed that there were various significant reductions in the number of new daily cases of infection due to the treatment in July 2020, as shown in the 11 curves of different fractions of treatment in Figure 7.
against COVID-19, the impact of treatment was investigated by assuming that such a drug would reduce the level of infection by 50% ( = 0.5) [31]. Meanwhile, was varied between 0 and 1 with a step size of 0.1 in order to simulate different projections of the virus after the onset of antiviral drug treatment. The simulations showed that there were various significant reductions in the number of new daily cases of infection due to the treatment in July 2020, as shown in the 11 curves of different fractions of treatment in Figure 7. . The start date of treatment was set to 8 July 2020.

Discussion
The spread of COVID-19 resulted in more than 15 million confirmed cases worldwide by 21 July 2020 [2]. Although a significantly higher number of cases of infection have been caused by COVID-19 than by SARS and MERS, the case fatality rate of COVID-19 (approximately 4.0%) is lower than those of SARS (9.6%) and MERS (34.4%) [2,17]. In Saudi Arabia, the case fatality rate of COVID-19 (1.0%) is lower than the global percentage rate of COVID-19-related deaths [2]. It is evident from this information that COVID-19 is very infectious, and, therefore, preemptive actions should be taken by

Discussion
The spread of COVID-19 resulted in more than 15 million confirmed cases worldwide by 21 July 2020 [2]. Although a significantly higher number of cases of infection have been caused by COVID-19 than by SARS and MERS, the case fatality rate of COVID-19 (approximately 4.0%) is lower than those of SARS (9.6%) and MERS (34.4%) [2,17]. In Saudi Arabia, the case fatality rate of COVID-19 (1.0%) is lower than the global percentage rate of COVID-19-related deaths [2]. It is evident from this information that COVID-19 is very infectious, and, therefore, preemptive actions should be taken by all, especially governments and policymakers, to alleviate the consequences of this pandemic for people's lives, health facilities, and the economy.
Epidemiological modeling is a powerful tool that can provide valuable insights into pandemics and can help in understanding the different aspects of such diseases. This modeling approach will help policymakers to determine the appropriate steps to take to cope with the current COVID-19 pandemic [33,34]. In this study, several models were statistically compared, and it was found that the SIR model was the best prediction model for the almost four months starting on 31 March 2020 using the statistical measures RMSE, R 2 , R 2 adj , and AIC. The SIR model predicted that the cumulative number of cases of infection would reach 359,794 ± 26,626 cases, and that the end date of this pandemic will occur on 7 September 2020 ± 25 days. On the other hand, the Gompertz model predicted the highest number of cases of infection, i.e., 488,318 ± 27,659, and the latest end date, i.e., 20 December 2020 ± 44 days.
Most governments have implemented precautionary measures to prevent the spread of COVID-19, and, among these measures, have introduced social-distancing measures to address the ongoing spread of this pandemic. In this regard, the measures implemented by the government of the Kingdom of Saudi Arabia included strict instructions that individuals must maintain physical distance when they are in public places. They also included the closure of schools, mosques, and universities, limited attendance at workplaces, and placed strict restrictions on gatherings in commercial complexes. A home quarantine policy was also applied for certain hours every day, with a limited presence of family members of a second-degree relationship inside the same house. In order to accompany these precautionary measures, this study worked to develop the SIR model in such a way that it could predict the spread of the virus. Mainly, a modified version of the SIR model involving a social-distancing factor was employed to investigate the impact of these policies on the cumulative and the daily number of cases of infection in Saudi Arabia [30]. The results revealed that there would be a reduction in the final size of the pandemic of 9.1% and 168.2% when applying social distancing over short-and long-term periods, respectively. This finding is quantitatively consistent with a recent study that found a significant influence of behavior change on reducing the number of total confirmed cases [35]. Furthermore, another modified version of the SIR model was used to predict the impact of antiviral drug treatment on the spread of COVID-19 in Saudi Arabia. The purpose of this model was to show the possible effects of pharmaceutical interventions on the prevalence of this virus [31,32]. The model simulations presented a significant reduction in the number of cases of infection as a result of these interventions. Consequently, such a treatment could help to decrease the load on healthcare facilities, especially critical care wards.
Saudi Arabia has launched different actions to control and slow the spread of COVID-19. For example, specialized clinics for the early examination of suspected COVID-19 patients have been established in order to isolate infected people before they transmit the infection to others. Moreover, fines have been applied to individuals and business owners who do not follow control measures, such as wearing masks and keeping an appropriate distance from one another. Furthermore, Saudi Arabia has implemented flexible work management in all government sectors, and its borders are still closed to control the spread of COVID-19. Consequently, recently reported data provide a good indication of the success of the steps taken by Saudi Arabia to control the prevalence of COVID-19. It can be clearly seen in Figure 3 that the rate of infection (measured in mid-July) has reduced by approximately 80% compared to the rates calculated in March. Conversely, the recovery rate showed a dramatic increase in the last two months and reached its maximum peak in late July (76.46%). On the other hand, the mortality rate showed a slight fluctuation during this pandemic, but has still not exceeded 1.5%. However, more effort should be undertaken by all to control and manage the spread of this virus to avoid overwhelming the healthcare system.

Conclusions
In this study, several epidemiological models were implemented to predict the projection of the spread of COVID-19 in Saudi Arabia. The model predictions showed variations in the final cumulative number of cases of infection, ranging from 360,000 to 490,000, with different end dates. Particular attention should be paid to addressing the increase in the number of COVID-19 cases, especially in terms of the rise in severe and critical cases. Awareness of individuals regarding following preemptive actions, such as social distancing and wearing masks, will play a key role in controlling the spread of this virus, as shown by the findings of the simulations. Epidemiological models can help in understanding the impact of behavioral factors, as well as providing useful insights into the possible influence of new drugs on the final number of cases of COVID-19 infection.