Time-Series Analysis and Healthcare Implications of COVID-19 Pandemic in Saudi Arabia

The first case of coronavirus disease 2019 (COVID-19) in Saudi Arabia was reported on 2 March 2020. Since then, it has progressed rapidly and the number of cases has grown exponentially, reaching 788,294 cases on 22 June 2022. Accurately analyzing and predicting the spread of new COVID-19 cases is critical to develop a framework for universal pandemic preparedness as well as mitigating the disease’s spread. To this end, the main aim of this paper is first to analyze the historical data of the disease gathered from 2 March 2020 to 20 June 2022 and second to use the collected data for forecasting the trajectory of COVID-19 in order to construct robust and accurate models. To the best of our knowledge, this study is the first that analyzes the outbreak of COVID-19 in Saudi Arabia for a long period (more than two years). To achieve this study aim, two techniques from the data analytics field, namely the auto-regressive integrated moving average (ARIMA) statistical technique and Prophet Facebook machine learning technique were investigated for predicting daily new infections, recoveries and deaths. Based on forecasting performance metrics, both models were found to be accurate and robust in forecasting the time series of COVID-19 in Saudi Arabia for the considered period (the coefficient of determination for example was in all cases more than 0.96) with a small superiority of the ARIMA model in terms of the forecasting ability and of Prophet in terms of simplicity and a few hyper-parameters. The findings of this study have yielded a realistic picture of the disease direction and provide useful insights for decision makers so as to be prepared for the future evolution of the pandemic. In addition, the results of this study have shown positive healthcare implications of the Saudi experience in fighting the disease and the relative efficiency of the taken measures.


Introduction
The COVID-19 pandemic is the most significant global crisis since the second world war. The disease is currently spreading across the globe at a surprisingly faster rate, affecting more than 213 countries, infecting more than 545,900,772 people and leading to The disease symptoms encountered in Saudi Arabia were found to be common and similar to those encountered worldwide. As illustrated in Figure 1, such symptoms subside in a few days to weeks. However, a small percentage of infected persons develop severe illnesses and lose the ability to breathe on their own. In extreme circumstances, their organs fail, which can be fatal. The disease effect depends on the age and health of the infected individual. In fact, people who have cardiac illness, chronic obstructive pulmonary disease, asthma, high blood pressure, or weakened immune systems may experience severe problems. In some cases, the COVID-19 virus can cause death. In addition to the previous health consequences, the spread of the disease has also led to severe effects on social and economic systems [2][3][4]. In fact, due to stopping many economic activities, several persons have lost their jobs and many companies have closed. The disease's spread dynamics can be explained by several factors including the demographic distribution of the population, the efficiency of the public healthcare system, the mitigation countermeasures taken by local health authorities and the availability of vaccines, among other factors. The COVID-19 pandemic's evolution, like that of earlier pandemics, is not entirely random. The path of the disease looks like a life cycle, with the outbreak followed by an acceleration phase, an inflection point, a deceleration phase, and finally a stop or termination. The occurrence of novel variants of the virus such as Omicron or Delta, as well as other factors such as the organization of vaccination campaigns, may be all linked to the disease spread. Due to preventative measures, such as lockdowns and social distancing, the disease life cycle may differ from one country to another, and various countries may be in different phases at a given moment. In the same country, the disease might sometimes present several features depending on the region, most likely as a result of sociological and climatic factors. Research is currently being undertaken using various mathematical models to forecast the progression of the pandemic and to characterize its dynamics in order to aid in understanding its trajectory through time. The Susceptible-Infectious-Removed (SIR) class of compartmental modelling techniques, developed by Kermack and McKendrick [5] almost a century ago, is one of those popular models. SIR has played a key role in treating infectious diseases and continues to do so. The "Susceptible", "Infectious", and "Removed" percentages of a given population are divided into compartments. These compartments are related by dynamic interactions that are represented by non-linear ordinary differentials (ODEs). Authors in [6][7][8] used SIR models to forecast the COVID-19 outbreak, respectively, in India, Algeria and Saudi Arabia. The simulation results showed the necessity of interventions in flattening the disease propagation curve, delaying the peak, and lowering the fatality rate.
In order to predict the spread of COVID-19, it was discovered that the "econometric models" family of models was effective. The time series Auto-Regressive-Integrated-Moving-Average (ARIMA) model is the most well-known member of this family. The prevalence and incidence of COVID-19 were predicted by the authors in [9] using the ARIMA model and the Johns Hopkins epidemiological data. Authors in [10][11][12][13][14] performed forecasting of the spread of the COVID-19 pandemic using the ARIMA prediction model under current public health interventions, respectively, in Saudi Arabia, Kuwait, Egypt, Korea and Morocco. The prediction accuracy of the ARIMA model was found to be acceptable and adequate. Authors in [15] came to the conclusion that the ARIMA model, when compared to the AR (Auto Regression) model, provides the best match for predicting new cases in India. The authors of [16] sought to create the best models to anticipate new daily instances. The daily new cases in India and the US were fitted using ARIMA and a hybrid ARIMA model. In India, the ARIMA model's predictive values were the most accurate; however, in the US, the hybrid ARIMA model performed better.
As an alternative to epidemiological and time series models, machine learning models showed potential in predicting COVID-19, as they did for modeling other outbreaks. References [17][18][19][20] included an overview of research using mathematical, machine learning and Deep Learning models to detect, diagnose, or forecast COVID-19. Authors in [21][22][23][24] used Support Vector Machine, variants of Recurrent neural network (RNN) and variants of long-short term memory (LSTM) for predicting COVID-19. It was found that these algorithms were effective due to the non-linear nature of how they handle the datasets.
References [25][26][27] trained the Facebook Prophet model in order to examine and predict the number of COVID-19 cases and fatalities based on the previously available data.
From the above literature review, a plethora of techniques from the field of statistics, data science, machine learning and artificial intelligence [28] have been used for COVID-19 prediction. To the best of the authors' knowledge, the accuracy of the time-series and machine learning models are sensitive to the case study as well as to the time window covered by the utilized datasets. Saudi Arabia is a country of rapid economic growth, visited annually by millions of Muslim people for performing Hajj and Umrah and hosting immigrants from different countries, cultures and religions. Saudi Arabia also has a large surface territory exhibiting various climatic conditions. Therefore, Saudi Arabia includes several patterns making it a suitable case study that can represent Arabic, Muslim, developing and Oil-Producer countries. In addition, Saudi Arabia has had a relatively successful experience in mitigating the disease.
For the reasons cited above, this study's main aim is to investigate statistical and machine learning-inspired time series approaches for modeling/analyzing the spread of COVID-19 in Saudi Arabia in terms of numbers of confirmed, recovered and death cases. More specifically, ARIMA and Facebook's Prophet approaches were developed and then compared. In addition, healthcare implications regarding the Saudi experience in facing the disease will be analyzed according to the time-line evolution and more particularly regarding the events related to the countermeasures taken. Saudi Arabia announced on Monday 13 June 2022, that it is relaxing the restrictions on the use of face masks in enclosed spaces with the exception of the Grand Mosque in Mecca and the Prophet's Mosque in Medina, as well as medical facilities, public gatherings, sporting events, flights, and public transportation. This study aims to evaluate the future short-term impacts of such a decision.
The rest of this paper is structured as follows. In Section 2, a data description and the proposed methods are introduced. Detailed experiments are outlined in Section 3.

Data Description
The datasets used in this study were collected from the official website of Saudi Ministry of Health [29] and from the dashboard (/www.https://covid19.moh.gov.sa/, accessed on 1 August 2022). It contains the daily number of confirmed, recovered and death cases from 2 March 2020 to 22 June 2022.       Table 2, the major events that marked the pandemic in Saudi Arabia are shown. The majority of these events were highly related to the mitigating actions taken by the Saudi authorities. In addition, the evolution of the numbers of new infections and deaths occurring after those events may show that the measures taken exhibited positive effects on the disease spread. Moreover, other actions' effects appeared after a few days or even weeks.

Models
COVID-19 time series (TS) data, like other TS, is simply a collection of data recorded over a period of time usually regularly spaced (daily, weekly, monthly, . . .). TS are often analyzed to understand the past, in order to predict the future (forecast). TS are mainly employed for helping managers and policy makers to make well-informed and sound decisions. TS can be univariate when its values are taken by a single variable at a periodic time instance over a period, and multivariate when its values represent multiple variables at the same periodic time instances over a period. TS data are different to cross-sectional data which record individuals, companies or others at a single point in time.
A natural temporal ordering exists in TS data. The observations made in the past, often known as lag times or lags, are frequently of interest to data scientists. Observations that are close in time tend to be correlated, which is a characteristic of most time series that sets them apart from cross-sectional data. The aim is often to estimate how TS will evolve in the future, with time serving as the independent variable. In general, TS data can be found in any area of applied science and engineering that uses temporal observations, including social sciences, finance, economics, epidemiology, and more. There are a few factors such as trend, stationarity, seasonality and correlation that are relevant when dealing with time series. When there is a long-term rise or fall in the data, the situation is referred to as a trend. Stationarity is another crucial property of time series. If a time series' statistical characteristics remain constant across time, it is said to be stationary. Its mean and variance are constant, and its covariance is time-independent [30]. Seasonality is the existence of recurring fluctuations that occur at predetermined regular periods of less than a year. The term "autocorrelation" describes the similarity of data as a function of their distance in time.
A time series analysis is the process of examining time series data to extract useful statistics and other aspects of the data, and time series forecasting is the process of using a model to project future values based on observed values. The "No-Free-Lunch Theorem" [31] states that no forecasting technique is optimal for every time series. Instead, the data analysis expert must choose a forecasting methodology from one of the three families of forecasting techniques listed below: (1) machine learning, (2) statistical models, and (3) hybrid methods [32].
In this paper, for forecasting the COVID-19 time series, we developed and compared three approaches based on ARIMA, Facebook's Prophet and baseline models known to be relatively simple and possessing good performance. Details of these models are provided in the following sections.

ARIMA Model
The ARIMA model is the acronym of the Autoregressive Integrated Moving Average model. It is also known as the Box-Jenkins model. The ARIMA model is the most widely used approach to univariate time series forecasting. It is composed of three key components [33].
• AR (Autoregression): This component of ARIMA expresses the dependent relationship between the current observation and a number of lagged observations.
where C is a constant; y t−1 , y t−2 , y t−p are the lags (past values); and α 1 , α 2 , α p are lag coefficients which are estimated by the model.
where t , t−1 , t−q are white noise terms for the respective lags, i.e,. y t−1 , y t−2 , y t−q ; β 1 , β 2 , β q are the parameters of the model. The ARIMA model is characterized by the order of each of these components. Following the notation ARIMA (p, d, q), the model parameters are described as follows [34]: • p is the number of autoregressive lags included in the model; • d is the order of differencing used to make the data stationary; • q is the number of moving average lags included in the model.
There are many heuristics for choosing the parameters of an ARIMA model. One popular method is the Box-Jenkins method, which is an iterative multistep process ( Figure 5). In order to determine p and q, the autocorrelation function (ACF) and partial autocorrelation function (PACF) provide guidance for the autoregressive and moving average orders that are appropriate for the considered model. In this paper, a grid search of hyperparameters is used to tune the ARIMA model. For further details about ARIMA models and time series, the interested reader can refer to the following books: [35][36][37].

Facebook' Prophet
On February 23, 2017, Prophet, a method for predicting time series data, was published by Facebook and made available for use. Prophet is a robust forecasting technique. It can be easily used by users without a strong background in time series forecasting. This tool helps produce accurate forecasts for a wide range of problems. Based on an additive model, Facebook's Prophet fits non-linear patterns with weekly and yearly seasonality as well as considering holidays patterns. The Prophet model includes in general three key elements [38]: where: • n(t): is the trend function which models non-periodic changes in the value of the time series; • p(t): represents periodic changes (seasonality); • h(t): represents the effects of holidays • ε t : an error term . Two potential trend models are implemented by the Prophet library for n(t). The first type is referred to as nonlinear, saturating growth. It takes the shape of a logistic growth model. where, • C: carry capacity; • k: growth rate; • m: offset parameter.
The latter, however, is a straightforward Piece-wise Linear Model with a stable rate of growth. where, • c: trend change point; • β: trend parameter (can be tuned as per requirement). For situations without excessive growth, the latter is the ideal option. Due to weekly and yearly seasonality, the seasonal component p(t) offers a flexible model of periodic variations. Fourier series are used in Prophet's yearly seasonality model. where, • P: regular period expected for considered time series; • It was discovered that N = 10 and N = 3, respectively, for yearly and weekly seasonality, work effectively for the majority of cases. A model selection method such as AIC could be used to automate the selection of these parameters.
Black Fridays and other predictable exceptional days with irregular schedules are represented by the component h(t). The data analyst must supply a customized set of events in order to make use of this feature. The information that was not taken into account by the model is represented by the error term which reflects the model robustness ε t . A uniformly distributed noise is typically used to model it [38].
Prophet, a novel time series forecasting model from the machine learning family, adheres to the streamlined framework shown in Figure 6.

Metrics for Evaluation
The following three criteria (Mean Absolute Error (MAE), Root Mean Squared Error (RMSE) and the coefficient of determination (R 2 )) were applied to each case (confirmed, recovered and deaths) to compare the goodness-of-fit yielded by the investigated models: where y t is the actual value,ŷ t is the predicted value andȳ = 1 n n ∑ t=1 y t is the mean of y t .
These three measures can be used to assess a model's performance and depict its accuracy very well. These three metrics can be computed for each model and compared to each other to identify the most accurate one. The fit is better when MAPE and MSE have smaller values and the coefficient of determination has a value close to 1 which represents the ideal fit. Table 3 lists the hardware and software specifications involved in the experiments conducted in this work. Table 3 shows the packages used during the implementation of ARIMA, Prophet and baseline models.

Results of ARIMA Models
Checking the stationarity data is the initial step in time series forecasting because the majority of TS models rely on this assumption. Furthermore, compared to non-stationary data, the stationary TS theory is better established and simpler to put into practice. In time series analysis and forecasting, visualization is crucial. Line plots of datasets, for instance, can help to detect patterns, cycles, and seasonality. As a result, this can affect the model choice. The stationarity can be more easily seen on a line plot. If a time series' statistical characteristics do not alter over time, it is classified as stationary. Its mean and variance are therefore constant, and its covariance is not affected by time [30]. To examine stationarity in TS data, a variety of statistical methods are available. The Augmented Dickey-Fuller (ADF) test, also referred to as a unit root test, is one of the most popular types [33]. In this test, we assume that TS is not stationary, which is the null hypothesis. A Test Statistic and a few crucial values for various confidence levels are included in the test results. We can reject the null hypothesis and declare that the series is stationary if the test statistic is less than the critical value. Equivalently, the null hypothesis can be rejected if the p-value is less than 0.05. For statistical modeling techniques to succeed, the TS must be stationary. If a TS is non-stationary, differencing (the d parameter of ARIMA (p,d,q)) provides a straightforward technique to make it stationary. Finding p and q for ARIMA is the next task after TS becomes stationary. The ARIMA (p,d,q) model can be expressed mathematically as follow [39]: where y t is the time series; p, d, and q are, respectively, the order of AR, order of integration (number of differences) and MA components of the ARIMA model. ∆ d is an operator to make y t stationary; L is defined as the lag operator; Φ(L) p is the lag polynomials of order p, q is the number of time lags of the error term to regress on, ϕ is defined analogously to Φ and t is a white noise.
Through Auto Correlation (ACF) and Partial Auto Correlation (PACF) graphs, we can learn some important properties of the TS data since the ACF measures the linear relationships between observations at different lags and PACF measures the partial correlation between two points at a specific lag of time. Alternatively, a widely used method for estimating the three parameters required in ARIMA(p,d,q) is the grid search procedure. In order to discover how to tune the ARIMA model using grid search of hyperparameters, the reader can refer to [33]. Once p, d and q are carefully chosen and fixed, one has to fit the data to the ARIMA which has to be finalized so as to make predictions on new data. The skill and the capability of the forecast model can be evaluated through performance measures (See Section 2.3).

Prediction of Confirmed Cases
When referencing Figure 2 of the confirmed cases in Saudi Arabia from 2 March 2020 to 22 June 2022, it can be observed that the TS is stationary. This observation can be proved by the results of the Augmented Dickey-Fuller test shown in Figure 7. Here, in order to optionally specify the number of lags considered in the ADF test, the Akaike's Information Criterion 'AIC' is used through the optional parameter autolag = 'AIC' ( in the python function adfuller()) . Usually, to identify the best balance between variance and bias, the complexity of the models is penalized using the AIC information criteria.
In Figure 7, the rolling mean of a time series for time point t for a window size w is simply the mean of the previous w time steps and the rolling standard deviation for the same example is defined as the standard deviation of the previous w time. If the rolling statistics do not fluctuate appreciably over time, a time series is said to be "visually stationary". From Figure 8, it can also be concluded that the studied time series comprising confirmed cases in Saudi Arabia from 2 March 2020 to 22 June 2022 is stationary (interpret the test using the p-value or the critical values returned by the test: test statistic is lower than the critical value). Figure 9 shows the predictions of COVID-19 case trends with the ARIMA model. As is shown in Figure 9, the model fits the confirmed cases in Saudi Arabia very well, with the values and the curve itself being very close to the actual ones. Table 4 shows the three metrics of the ARIMA model. MAE and R 2 of ARIMA used for confirmed cases are greatly improved compared to the baseline model with relatively low values. However, the RMSE of the ARIMA model has slightly worse performance than the baseline model.

Prediction of Recovered Cases
When observing Figure 3 of recovered cases in Saudi Arabia from 2 March 2020 to 22 June 2022, it can be noted that the studied TS is stationary. This observation can be proved by the results of the Augmented Dickey-Fuller test shown in Figure 10.
From Figure 10, it can be concluded that the studied time series comprising recovered cases in Saudi Arabia from 2 March 2020 to 22 June 2022 is stationary (test statistic is lower than critical value; therefore, the considered data require some transformations). By looking at the autocorrelation function (ACF) and partial autocorrelation (PACF) plots (Figure 11), the numbers of AR and/or MA terms that are needed can be tentatively identified.   Figure 12 shows the predictions of COVID-19-recovered cases with the ARIMA model. As is shown in Figure 12, the model fits the recovered cases in Saudi Arabia very well, with the values and the curve itself being very close to the actual ones. Table 5 shows the three metrics of the ARIMA model. R 2 of ARIMA used for recovered cases is greatly improved compared to the baseline model with relatively low values. However, the RMSE and MAE of the ARIMA model has slightly worse performance than the baseline model.

Prediction of Deaths
When observing Figure 4, representing the deaths cases in Saudi Arabia from 2 March 2020 to 22 June 2022, it can be noted that the studied TS is non-stationary. This observation can be proved by the results of the Augmented Dickey-Fuller test shown in Figure 13. From Figure 13, it can be concluded that the studied time series comprising deaths cases in Saudi Arabia from 2 March 2020 to 22 June 2022 is non-stationary (test statistic is greater than critical value; therefore, the TS data need to be stationary). By looking at the autocorrelation function (ACF) and partial autocorrelation (PACF) plots of the differenced series (Figure 14), the numbers of AR and/or MA terms that are needed can be tentatively identified. By looking at the autocorrelation function (ACF) and partial autocorrelation (PACF) plots of the differenced series (Figure 15), we can tentatively identify the numbers of AR and/or MA terms that are needed. In this case, an initial order for the model will be (1, 0, 1); however, after performing a grid search for hyperparameters of the ARIMA model, we found that the best fitted order is (0, 0, 9). Using the latter model provided the forecast illustrated in Figure 16.   Figure 16 shows the predictions of COVID-19 death cases with the ARIMA model. As is shown in Figure 16, the model fits the death cases in Saudi Arabia very well, with the values and the curve itself being very close to the actual ones. Table 6 shows the three metrics of the ARIMA model. R 2 of ARIMA used for death cases is greatly improved compared to the baseline model with relatively low values. However, the RMSE and MAE of the ARIMA model have slightly worse performance than the baseline model.

Results of Prophet
When using Prophet under a Python environment, an instance of the Prophet class should be created and then its fit and predict methods are recalled. The input to Prophet is always a data frame with the following two columns: ds and y. The ds (datestamp) column should be of a format of a date or a timestamp expected by Pandas while y column must be numeric, and represents the variable expected to be forecasted. Figure 17 shows the relationship between the original values (black dots) of confirmed cases in Saudi Arabia from 2 March 2020 to 22 June 2022 and the predicted values (blue solid line). The predicted and the actual values are close to one another as seen in Figure 17. We can also see the forecast components ( Figure A1) by using the Prophet.plot-components method. This method provided the analysis of the trend of COVID-19 confirmed cases in Saudi Arabia until 22 June 2022, on a weekly and monthly basis. Table 7 shows the three performance metrics of the Prophet model. R 2 of the Prophet used for confirmed cases is clearly better than the baseline model (0.977 against 0.734 in a scale ranging from 0 to 1 which corresponds to the ideal fit). In terms of RMSE and MAE, the Prophet model provided a slightly worse performance than the baseline model.    Figure 18. The trend of recovered cases, in a weekly, and monthly analysis of COVID-19 in Saudi Arabia until to 22 June 2022 is shown in Figure A2. Table 8 shows the three metrics of the ARIMA model. R 2 of ARIMA used for the recovered cases is greatly improved compared to the baseline model with relatively lower values. However, the RMSE and MAE of the ARIMA model have worse performance than the baseline model.  Figure 19 shows the relationship between the actual (black dots) deaths in Saudi Arabia from 2 March 2020 to 22 June 2022 and the predicted values (blue solid line). The predicted and the original values are very similar, as seen in Figure 19. The trend of death cases, on a weekly and monthly basis, is shown in Figure A3. Table 9 shows the three metrics of the ARIMA model. MAE and R 2 of the Prophet model used for death cases are obviously improved compared to those of the baseline model. The RMSE of the Prophet model has slightly worse performance than the baseline model.

Discussion
To date, over 615 million confirmed cases have been reported, and over 6 million people died worldwide. The physical and mental health of people is seriously impacted by COVID-19, which also has an impact on everyone's lifestyle and the world economy. Given that COVID-19 is currently a serious global crisis, it is essential to fully comprehend the pandemic curve and forecast its future trajectory. An accurate COVID-19 forecast may have various advantages. It can assist in preserving lives, minimizing losses of financial resources, managing medical and human resources in healthcare, and reviving the world economy. The total number of cases of COVID-19 (or any other disease) is a common example of time series data, and there are currently a variety of analysis techniques to help identify patterns or forecast trends in such data. Time series data analysis has been proven to be successful with statistical methods (AR, MA, ARIMA, etc.), machine and deep learning methods (ANN, LSTM, etc.), and many other methods. Some of these techniques have already provided significant insights into the dynamics of infectious disease transmission and its surveillance. Records of new infections, recoveries and deaths are often released daily, which makes the task of predicting the future trend of the disease having lower performance when using data-demanding techniques such as deep learning (DL). However, statistical techniques such as ARIMA and straightforward machine learning (ML) techniques such as Facebook's Prophet continue to show good performance. To the best of the authors' knowledge, using ARIMA and Facebook's Prophet with more than two years of COVID-19 pandemic data in Saudi Arabia represents the first study of its category. This relatively long period was marked by several events such as the vaccination campaigns organized over many countries, the emergence of new variants of the virus as well as the relaxation of lockdown measures and mask wearing. For every time series forecasting task, establishing a baseline is crucial. Usually, before starting any forecasting exercise, a baseline model should be first investigated. In that sense, the baseline model offers a basis for comparison for any additional forecasting approach. The newly developed approach should be modified or replaced if its performance is at the same level or below the baseline model. The persistence or the "Zero Rule" algorithm, also called the naive model, is the most often used. It uses the value at the current time step (t) to predict the expected outcome at the next time step (t + 1) [33]. In this study, Facebook's Prophet and the ARIMA models are compared to the naive model. Table 10 includes the RMSE, MAE, and R 2 metrics for each model used in this study. The table clearly shows that the ARIMA model outperforms both Facebook's Prophet and the baseline models at predicting confirmed and recovered cases of COVID-19 since it has the highest R 2 and the lowest MAE. However, the Prophet model performs better at predicting death cases. The ARIMA and Prophet model performances are comparable with almost similar results. Both models predict values that are practically equal to the actual values, which means they may be very useful in anticipating the number of confirmed, recovered and death cases in the future and provide individuals helpful recommendations on how to improve the COVID-19 mitigating measures. The optimal performance that ARIMA is capable of relies on a particular tuning parameter range. With a wide parameter selection, the model may identify better parameters for each of the cases under consideration; however, the amount of time and computational cost required to implement it may significantly increase. Under these conditions, data scientist may sacrifice an amount of the accuracy for quicker and simpler implementation of the model by using a restricted range of parameter selection. A reduced range of parameters for faster grid search (or similar approaches) can thus be used to speed up computing in some circumstances where the model accuracy does not need to be exceptionally high. Regarding the Prophet model, it has been found to present clear benefits and drawbacks. In fact, it requires the least amount of design and computing effort among ARIMA (for confirmed, recovered, and death cases). Furthermore, in this research, it has been found to be robust to missing data in a time series.
Although a fair comparison should be conducted on the same data set, covering the same period and the same location, results obtained in this study have been compared to those of three previous works in Saudi Arabia. The work in reference [10] used the ARIMA model but covered the beginning of the pandemic (From 2 March 2020 to 20 April 2020). The disease spread has after that shown many fluctuations. Our work is more beneficial to the work in [10] since it covered a period of more than two years where many aspects and virus variants appeared and highly impacted the disease dynamics. Reference [40] developed several growth models for the case study of Saudi Arabia. It obtained a coefficient of determination comparable to the performances of our present study. However, those models failed in predicting the probable end date of the disease. The work in [41] provided performances worst than those obtained in this study although it used sophisticated deep learning techniques shown to be data demanding.
In this study, the Prophet and ARIMA models produced reliable findings. However, given the complexity of the COVID-19 situation which exhibited concerns with virus mutation, population density, international travel, human behaviors, etc., the Prophet and ARIMA models are not well suited to handle such data trends. They occasionally perform worse than the baseline model in terms of RMSE and MAE (see Table 10). To overcome this issue, either multivariate time series forecasting or hybrid techniques may be implemented. Multivariate forecasting can be used to improve the thoroughness of the experiment and attain the best results since more data sources can improve accuracy. Recently, hybrid models ( [42][43][44]) were used in time series analysis. For improved outcomes, these models typically incorporate machine learning techniques such as ANN (Artificial Neural Network) and statistical models including ARIMA. Future research can examine these models to improve their ability to forecast COVID-19.

Conclusions
When compared to other countries, the patterns from the most recent statistics demonstrated that the Saudi Arabian authorities' quick and effective actions to restrict the pandemic imparted a beneficial effect, although numerous parameters affected the pandemic spreading in the country. This relatively successful experience in mitigating the disease provided this research with insights to deeply analyze and predict the time series data collected from official authorities. The evolution of COVID-19 using the Facebook's Prophet and ARIMA models was investigated. The forecast was based on the data from 2 March 2020 until 22 June 2022. The results from these two models are not quantitatively different, since both models predicted a significant decrease in recovered cases and deaths in Saudi Arabia for the next month. For both models, the confirmed cases expected next month are in flux. These findings would aid the Saudi authorities in better containing the COVID-19 outbreak in the future. The results indicate that, although a one-size-fits-all approach does not exist, the Prophet model prevails since it is the easiest model to use and requires almost no manual effort. However, in the case of COVID-19, there are multiple issues involved such as virus mutation and human behaviors and such data cannot fit well in the Prophet model. These influential factors that may affect the disease dynamics need to be further investigated in a multi-variate time series context. Another improvement that should be investigated is the use of hybrid models which aggregate the benefits of different models, more specifically statistical and machine learning approaches.

Data Availability Statement:
No new data were created in this study. Data sharing is not applicable to this article.

Conflicts of Interest:
All authors declare no conflicts of interest.

Abbreviations
The following abbreviations are used in this manuscript: To see the forecast components, one can use the Prophet−plotcomponents method. By default, the trend and seasonality can be viewed. Here, you can see the components (trend, seasonality) of recovered cases in KSA from 2 March 2020 till 22 June 2022.