Improvement of Time Forecasting Models Using Machine Learning for Future Pandemic Applications Based on COVID-19 Data 2020–2022

Improving forecasts, particularly the accuracy, efficiency, and precision of time-series forecasts, is becoming critical for authorities to predict, monitor, and prevent the spread of the Coronavirus disease. However, the results obtained from the predictive models are imprecise and inefficient because the dataset contains linear and non-linear patterns, respectively. Linear models such as autoregressive integrated moving average cannot be used effectively to predict complex time series, so nonlinear approaches are better suited for such a purpose. Therefore, to achieve a more accurate and efficient predictive value of COVID-19 that is closer to the true value of COVID-19, a hybrid approach was implemented. Therefore, the objectives of this study are twofold. The first objective is to propose intelligence-based prediction methods to achieve better prediction results called autoregressive integrated moving average–least-squares support vector machine. The second objective is to investigate the performance of these proposed models by comparing them with the autoregressive integrated moving average, support vector machine, least-squares support vector machine, and autoregressive integrated moving average–support vector machine. Our investigation is based on three COVID-19 real datasets, i.e., daily new cases data, daily new death cases data, and daily new recovered cases data. Then, statistical measures such as mean square error, root mean square error, mean absolute error, and mean absolute percentage error were performed to verify that the proposed models are better than the autoregressive integrated moving average, support vector machine model, least-squares support vector machine, and autoregressive integrated moving average–support vector machine. Empirical results using three recent datasets of known the Coronavirus Disease-19 cases in Malaysia show that the proposed model generates the smallest mean square error, root mean square error, mean absolute error, and mean absolute percentage error values for training and testing datasets compared to the autoregressive integrated moving average, support vector machine, least-squares support vector machine, and autoregressive integrated moving average–support vector machine models. This means that the predicted value of the proposed model is closer to the true value. These results demonstrate that the proposed model can generate estimates more accurately and efficiently. Compared to the autoregressive integrated moving average, support vector machine, least-squares support vector machine, and autoregressive integrated moving average–support vector machine models, our proposed models perform much better in terms of percent error reduction for both training and testing all datasets. Therefore, the proposed model is possibly the most efficient and effective way to improve prediction for future pandemic performance with a higher level of accuracy and efficiency.

ARIMA (p, d, q) model [5][6][7]. Predicting new daily cases of COVID-19 was a difficult task, as cases increased daily. In the first wave, the pattern of COVID-19 cases has been continuously increasing for a period and then decreasing. However, for the second wave, it appears to be picking up again, and some of the COVID-19 cases are difficult to predict. In this scenario, some researchers predict the pattern of COVID-19 using ARIMA [8][9][10][11][12][13][14]. However, the ARIMA model has a limitation in that it typically can only handle a linear time-series data structure [15]. ARIMA model approximations are insufficient to pose a timeseries prediction obstacle for researchers, especially for nonlinear patterns [16]. Despite its superior performance, the classification performance of Support Vector Machines (SVMs) and the generalizability of the classifier are often affected by the dimension or number of feature variables used, as mentioned by Lee [17]. As a result of the development of Vector Machines models, this process will be able to provide the most accurate and efficient result in each prediction case. SVMs, first introduced in 1995 by Vladimir Vapnik [18] in the field of statistical learning theory and structural risk minimization, have proven useful in a variety of prediction problems and classifications. SVMs could also manage or address difficulties such as non-linearity, local minimum, and high dimension where the ARIMA model could not [15,[19][20][21]. SVM models have recently been used to handle problems such as nonlinear, local minimum, and high dimension. SVM can even guarantee higher accuracy for long-term predictions compared to other computational approaches in many practical applications. However, the single SVM model as a single ARIMA model also has some limitations, as the SVM model can only handle non-linear data and not linear data. With the limitations of a single ARIMA and SVM model, as well as an in-depth analysis of time-series prediction, hybrid approaches have become the best approach to overcome both limitations, and have a very significant impact in many areas due to their dynamic nature and higher level of predicting accuracy, efficiency, and precision. This approach is crucial because of the problems encountered in time-series forecasting, where almost all real time series contain linear and nonlinear correlation patterns between the data. Recently, the hybridization of prediction methods has been used with great success to achieve higher prediction accuracy [15,16,19,20,[22][23][24][25][26].
Regarding the spread of COVID-19, the hybrid time-series model approach is crucial for predicting the impact of the COVID-19 outbreak, and has proven successful in predicting COVID-19 [27][28][29][30][31][32][33]. This study aims to (a) propose the ARIMA-LSSVM hybrid model approach to achieve better forecast results when it is able to produce the best estimator, i.e., produce small error terms; additionally, it aims to (b) examine the performance of the proposed models by comparing them to ARIMA and SVM models using three daily cases of COVID-19 data in Malaysia, that is, daily new positive cases, daily new deaths, and daily new recovered cases. Despite recent advances in time series and on COVID-19, the modelling process does not include COVID-19 cases specifically in Malaysia to help authorities manage the spread of this outbreak by producing more efficient, more accurate data, and more accurate forecasting results.
This study makes a significant contribution to the field of pandemic prediction and prevention by introducing novel approaches to dealing with COVID-19 data. Rather than relying on traditional methods, this research utilizes evidence-based prediction techniques, which have been shown to be more accurate and efficient. The use of these intelligent forecasting models enables local health authorities to create more precise and effective preventive measures, especially in the face of future outbreaks.
This study is particularly innovative in its use of hybrid forecasting models by machine learning for Malaysia's future pandemics, such as avian flu or novel coronavirus strains. According to Moore [34], the scenario is for the next possible new pandemic of avian influenza virus strain H7N9 or a novel coronavirus. The predictive models developed are more precise, accurate, and efficient in anticipating the dynamic spread of the virus. This approach has been tested on real-world data, including daily new cases, daily new death cases, and daily new recovered cases of COVID-19, making it a valuable tool for public health officials and researchers. This research also has significant implications for Diagnostics 2023, 13, 1121 4 of 32 future outbreaks, particularly in countries with tropical rainforests such as Malaysia. By predicting the spread of COVID-19 early on, this model can help policymakers build better healthcare facilities, take legislative action, and avoid economic losses. While a vaccine is now available, this model remains useful in accurately forecasting and preventing the impact of future pandemics, including those caused by new virus strains.
This study's innovative and evidence-based methods make a valuable contribution to pandemic prediction and prevention, providing significant insights that can be used to mitigate the impact of future outbreaks. The implications of this research extend to public health authorities, policymakers, and researchers worldwide, offering powerful tools for mitigating the devastating effects of pandemics. The remainder of this paper is structured as follows. Materials and Methods goes into detail about the method we used to develop our proposed model. The hybrid ARIMA-SVM model used in this study is then briefly described. The results and discussion present the performance of our proposed model based on three known COVID-19 case datasets. Finally, we wrap up the article and make suggestions for future research.

ARIMA Modelling
The ARIMA (p, d, q) autoregressive integrated moving average model is one of the families in time-series forecasting that is widely used for time-series forecasting series datasets due to its flexibility with different time categories [16]. It also explicitly considers several standard patterns in time-series analysis, allowing for a powerful and easy-to-use way to produce accurate time-series forecasts. However, limitations may occur due to the existence of assumptions of a linear form that represents a linear relationship between the future value of the time series with the current value, the past value, and random noise in the model [15][16][17]21,26]. In the ARIMA model, p and q are the numbers of the autoregressive and moving average terms, and they are always listed in the order of the model, while d is the integer representing the differential order. The ARIMA model type with mean µ is represented mathematically as follows: where y t and e t are the actual value and the random error at time t, respectively. Both are assumed to be independently and identically distributed (iid) with a mean 0 and a constant variance of σ 2 ; θ i (i = 1, 2, . . . , q) and ∅ j (j = 0, 1, 2, . . . , q) are the model parameters that need to be predicted.

Support Vector Machines Modelling
The Support Vector Machine (SVM) introduced by Vladimir Vapnik [18], which incorporates statistical learning theory, can handle larger dimensional data better, even with a small number of instances generalizability. Because the models select boundary support vectors from the input data, they process the data quickly. The SVM regression function is written as follows.
For linear and regressive dataset {x i , y i } the function is formulated as follows: The coefficient w and b are estimated by minimizing. where ε is called the ε-intensive loss function and is formulated as follows: Equation (3) can be transformed to the following constrained formulation by introducing positive slack variables ξ and ξ * i : We always use dual theory to convert the above formula into a convex quadratic programming problem when solving it. Adding the Lagrange Equation (5) results in the following term: Subject to When a dataset cannot be regressed linearly, we map it to a high dimension feature space and regress it linearly. The following is the formulation: Subject to is the inner product of feature space and is called kernel function. Any symmetric function that satisfies Mercer condition can be used as Kernel function [19]. The Gaussian kernel function is specified in this study.
SVMs were used to estimate the nonlinear behaviour of the forecasting dataset because Gaussian kernels perform well under general smoothness assumptions [22].

Least-Square Support Vector Machines Modelling
The Least-Squares Support Vector Machines (LSSVM) proposed by Suykens and Vandewalle [35] is a modification of the standard SVM. LSSVM formulates the training process by solving linear problem quicker than SVM through quadratic programming. Additionally, this model is also more time efficient when analysing huge data. Consider a given training set (x i , y j ), i = 1, 2, . . . n with x i ∈ R n as input data and y i ∈ R as output data. LSSVM defines the regression function as: Diagnostics 2023, 13, 1121 6 of 32 Subject to where ω is the weight vector; y is the regularization parameter where it determines the trade-off between the training error minimization and smoothness of the estimated function; e is the approximation error; ϕ(.) is the nonlinear function; and b is the bias term. Constrained optimization of Equation (9) can be translated to unconstrained optimization by constructing Lagrange function. This can be obtained by using Karush-Kuhn-Tucker (KKT) condition, where it partially differentiates with respect to ω, b, e, and ϕ(.): is the Radial Basis Function (RBF) kernel function that obtains a and b by calculating linear operations.

Proposed Hybrid Model
Despite the various time-series models presented, the accuracy, efficiency, and precision of time-series forecasts are becoming crucial for many decision-making processes today. However, these factors do not appear in ARIMA and SVM models. This is also the main reason why the time-series forecasting model is crucial, more demanding, and dynamic, as well as actively researched in many fields of study. ARIMA and SVM models have also prevailed in their linear or nonlinear domains [15,25,26]. However, none of these are generic principles that can be generalized to all situations. Therefore, a hybrid approach using both linear and non-linear modelling capabilities is recommended. This approach is mainly proposed to improve the overall prediction effectiveness. Therefore, there is no research on how to improve the effectiveness of predictive models created in Malaysia, particularly in the case of COVID-19.
There are two reasons for using hybrid models in this study. First, a single ARIMA and SVM model may not be sufficient to identify all the time series' characteristics. The second assumption is that one or both cannot recognise the actual data generation process. This study's hybrid models were built in two stages. Part I discusses linear autocorrelation composition, and Part II discusses nonlinear components. Thus, where t and N t are denoted as the linear composition and the nonlinear component, respectively. Based on the data, these two parts must be approximated. Part I focuses on linear modelling, which employs the ARIMA model to model the linear composition. The model from the first model included residuals, which are nonlinear interactions that cannot be modelled by a linear model or possibly a linear relationship. Thus, Let ε i denoted as the residual from the linear model at time t. Then, whereˆ t is the predicted value for time t from the estimated relationship in (1), with ε t as the residual at time t from the linear model. The residual dataset after ARIMA fitting will only contain non-linear relationships that can be represented by a linear model [15]. The first stage results, which include forecast values and residuals from linear modelling, are then used in Part II.
Following Part II, the emphasis is on nonlinear modelling, where LSSVM is used to predict the nonlinear connection that occurs in residuals of linear modelling and original data. Then, the residual can be calculated using LSSVM by modelling various configurations as follows: Part II focuses on nonlinear modelling, and LSSVM is used to model the nonlinear (possibly linear) relationship that occurs in residuals of linear modelling as well as original data. The residual can then be calculated using LSSVM by modelling different configurations as follows: where f is a nonlinear function determined by the LSSVMs model and e t is the random errors. Thus, the hybrid forecast isŷ Equations (12) and (13) can be identified asN t , therefore the forecasted values can be achieved by summation of linear and nonlinear components. Figure 1 shows the functional flowchart of hybrid models. In short, the proposed hybrid process methodology is divided into two parts. The ARIMA model is used to analyse the linear composition problem in Part I. Part II develops an LSSVM model to model the residuals from Part I. Because the ARIMA model in Part I cannot handle the nonlinear component of the data, the residuals of the linear model will include information about the nonlinearity. The LSSVM results can be utilised as forecasts of the ARIMA model's error terms. The hybrid model defines various patterns by combining the distinct features and strengths of the ARIMA and LSSVM models. As a result, it is more effective to model linear and non-linear patterns separately with two different models and then re-hybridize the forecast results to improve overall modelling and forecasting performance.

Proposed Algorithm
Step 1: Three selected time series of COVID-19 cases datasets (1 October 2020-4 November 2022), namely daily new positive cases, daily new deaths cases, and daily new recovered cases, are generated in R programming Language.
Step 2: Each of the generated datasets is defined as . , x 3n } for daily new positive cases, daily new deaths cases, and daily new recovered cases, respectively. Then, the best ARIMA (p, d, q) is selected after checking the autocorrelation function (ACF) plot of ARIMA (p, d, q) residuals. The best fitted value for daily new positive cases is ARIMA (2, 1, 2), while it is ARIMA (1, 1, 2) and ARIMA (0, 1, 1) for daily new fatalities cases and daily new recovered cases of COVID-19, respectively.
Step 3: The fitted value, Step 4: Combine the values in step 3 as a set of input variables to obtain the output y t Step 5: The ARIMA (p, d, q) is defined by the order of q. According to the information in step 4, Vector Machines is carried out to examine the residuals to obtain the output L t using R-programming Language.
Step 6: A fitted value of ARIMA with the hybridization of Vector Machines model is obtained for all sample data. Then, the residuals ε t is generated to obtain the forecasting result N t .
Step 7: The framing data split randomly into training data and testing data for further Vector Machines modelling. Run the Vector Machines procedure using the "e1071" and "liquidSVM" package in R-Programming Language.
Step 8: The two modifiable parameters of the LSSVM technique (γ and σ) derived by objective function minimization such as mean square error (MSE). The grid-search method updates the parameters exponentially in the specified range using predetermined equidistant steps.
Step 9: Assume the split data as the processing data and the order q as in Step 5. Therefore, the combine forecast as in Equation (16):ŷ t =ˆ t +N t Step 10: Estimate the model performance using the statistical measurement which are MSE, RMSE, MAE, and MAPE.

Forecasting Evaluation Criteria
In order to assess the overall performance of the proposed hybrid models, the one of a kind statistical measurements standard which accompanied by [15,16,36] including MAE (Mean Absolute Error), MAPE (Mean Absolute Percentage Error), MSE (Mean Squared Error), and RMSE (Root Mean Squared Error) are used. In time-series analysis, measurement tools such as Akaike's information criterion (AIC) and the Bayesian information criterion (BIC) are commonly used to determine the appropriate length for distributed lag for the ARIMA model. As a result, model selection is based on the model with the lowest AIC and BIC values to provide measures of model performance, resulting in the selection of the best ARIMA model. Meanwhile, three parameters such as C are used as measurement tools to determine the best fitted model for LSSVMs models. Meanwhile, for the LSSVMs models, two parameters such as γ and σ are used as the measurement tools to determine the best fitted model.
Incorrect LSSVM model parameter selection can lead to over or underfitting of the training data. The parameter sets of the LSSVMs model with the lowest MSE value, as with the ARIMA model, will be selected for use in the best fitting model. As a result, for the hybrid models, the ARIMA first functioned as a pre-processor, filtering the linear pattern of datasets. The ARIMA model's error term is then fed into the SVM in the hybrid models. LSSVMs were used to reduce the ARIMA error function.

Application of the Hybrid Model of COVID-19 in Malaysia
This section examined the proposed model's performance in two ways: first, the performance of the proposed models compared to ARIMA, SVM, LSSVM, ARIMA-SVM, and ARIMA-LSSVM models; second, the percentage improvement of the proposed models compared to ARIMA and SVM models. Since the World Health Organization (WHO) declared COVID-19 to be a worldwide pandemic, the COVID-19 time-series datasets have been extensively studied. The predictive capability of the developed novel models was then compared using three well-known datasets of daily COVID-19 cases in Malaysia-daily new positive cases data, daily new fatalities cases data, and daily new recovered cases datato demonstrate the performance of the proposed model in terms of accuracy, effectively, and accurately. All these data are reported from the 1 October 2020 to 4 November 2022 and retrieved from the COVIDNOW website at https://covidnow.moh.gov.my/, accessed on 10 January 2023.
The minimum value of new death, new cases, and new recovered cases in Table 1 is 0, 2600, and 1.8, respectively, while the maximum value of new cases, death, and recovered cases is 33,872.0, 592, and 33,406, respectively. Similarly, the mean and median for new cases, deaths, and recovered cases are 6322.7, 47.51, and 6415.5, respectively, where the parentheses indicate the median (3471, 11, 3447.0). The first quartile values for daily new cases, death cases, and recover cases are 1922, 4, and 1843, respectively. The number of daily new cases, deaths, and recoveries in the third quartile is 6824, 58, and 6775, respectively. Furthermore, the standard deviations for new cases, deaths, and recoveries are 7097.8, 81.12, and 7058.3 percentiles, respectively. Furthermore, this section discusses the process of proposed models for both parts, i.e., Part I (Linear Modelling) and Part II (Nonlinear Modelling), using three well-known COVID-19 datasets, namely daily new positive cases, daily new deaths cases, and daily new recovered cases, to demonstrate the effectiveness of the proposed models. Both linear and nonlinear modelling, as well as the data used in this study, are carried out using R programming.
Part I (Linear Modelling)-ARIMA is used to generate the best ARIMA model for the daily new positive case dataset (2,1,2). ARIMA is the best fitting ARIMA model for the daily new death case dataset (1, 1, 2). Meanwhile, the best ARIMA model is reported as ARIMA in the case of the daily new recovered cases dataset (0, 1, 1). Table 2 summarizes the results of this ARIMA (p, d, q) model. Table 3 displays the estimates for all parameters. The p-values for all parameters are small, as shown in this table. As a result, for confirmed, recovered, and death cases, the models were statistically significant and could be used to forecast the future [37,38]. Part II (Nonlinear Modelling)-Based on the concepts of support vector machine design and the use of pruning algorithms in R-programming software, an optimal machine learning algorithm was created. For the daily new positive COVID-19 cases datasets, parameters γ = 264, σ = 0.008 show the smallest values of MSE i.e., 6,661,412 (see Table 4). Therefore, this parameters value was selected for use in the best-fitting model for the datasets of daily new positive COVID-19 cases. Whereas the smallest value of MSE is 250.887 and 21114252 (Table 4), with parameters γ = 877, σ = 0.006 and γ = 334, σ = 0.008 are selected as the best-fitting model for daily new death and daily new recovered cases of COVID-19, respectively.  The daily new positive cases datasets series contains 765 data points and is recoded from 1 October 2020 to 4 November 2022 (see Figure 2). The number of daily new positive COVID-19 cases in Malaysia has increased significantly twice since July 2021, but has now dropped below 5000 new cases. However, it has continued and increased to a maximum of 33,406.00 around March-April 2022. This figure is expected to fall precipitously until 5 November 2022. The COVID-19 datasets have been extensively used with a wide range of linear and nonlinear time-series models, including ARIMA and machine learning methods [7][8][9]11,13,16,[19][20][21][22][23][24][25][26]. The analysis of daily new positive cases of COVID-19 is critical as an indicator of the effectiveness of preventive measures that have been taken, are being taken, and will be taken by authorities to control the spread of this epidemic more effectively.  Therefore, a similar approach to that used by Aisyah et al. [15] is used to inve the performance of the proposal models on daily new positive cases of COV datasets, where the dataset is divided into two samples, known as training samp testing sample. According to Aisyah et al. [15] and Nurul Hila et al. [16], datasets be divided into two parts to achieve the best results: 70-80% for training a remaining 20-30% for testing [39,40]. The training data are used to assemble the m while the testing data are used to evaluate the forecasting performances of the m based on statistical measurements. Thus, the daily new positive cases of the COV dataset are divided into two samples in this study: the training dataset and the test d The training datasets contain 612 observations from day 1 to day 612, accounting f of the datasets used exclusively to formulate from 1 October 2020 to 4 June 2022. In to evaluate the forecasting performance of proposed models, the test sample datase approximately 153 observations from days 613-765 (20%) from the 5 June 2022 t November 2022 Therefore, a similar approach to that used by Aisyah et al. [15] is used to investigate the performance of the proposal models on daily new positive cases of COVID-19 datasets, where the dataset is divided into two samples, known as training sample and testing sample. According to Aisyah et al. [15] and Nurul Hila et al. [16], datasets should be divided into two parts to achieve the best results: 70-80% for training and the remaining 20-30% for testing [39,40]. The training data are used to assemble the models, while the testing data are used to evaluate the forecasting performances of the models based on statistical measurements. Thus, the daily new positive cases of the COVID-19 dataset are divided into two samples in this study: the training dataset and the test dataset. The training datasets contain 612 observations from day 1 to day 612, accounting for 80% of the datasets used exclusively to formulate from 1 October 2020 to 4 June 2022. In order to evaluate the forecasting performance of proposed models, the test sample datasets used approximately 153 observations from days 613-765 (20%) from the 5 June 2022 to the 4 November 2022 Table 5 (Tables 5  and 6 and Figures 3-5), it is possible to conclude that the proposed model produced greater accuracy and efficiency than ARIMA and SVM.

New Deaths Cases Data Forecasts
In addition to the Malaysian daily new positive COVID-19 cases datasets, the Malaysian daily new deaths cases datasets are taken into account and used to evaluate the performance of the proposed models. This dataset, like the daily new positive dataset and the daily new death case dataset, has a recording period of 1 October 2020 to 4 November 2022 (see Figure 6) and contains 765 data points divided into two samples. As the number of daily positive COVID-19 cases reported rises, so does the number of deaths, which now stands at around 600. The training dataset contains 612 observations (80%) from 1 October 2020 to 4 June 2022, and the test sample contains approximately 153 observations (20%) from 5 June 2022 to 4 November 2022 to evaluate the prediction performance of the proposed model. As shown in Table 7, the performance of the proposed models using the daily new deaths datasets from COVID-19 is first characterized by statistical measurements such as MSE, MAPE, RMSE, and MAE. The results for the training data in this table show that the proposed model produces the smallest MSE and MAE values of 19.6422 and 1.03218, respectively, when compared to ARIMA, SVM, LSSVM and ARIMA-SVM. The same pattern can be seen in the test data, where all the statistical measures used have the lowest values when compared to the ARIMA, SVM, LSSVM, and ARIMA-SVM models. The study then examines the estimated value of the suggested model for the COVID-19 case dataset for daily deaths, as shown in Figure 7a-e. This graph makes it abundantly clear that the proposed model line and the observed data are nearly identical. Additionally, Figure 8a-e each show the estimated values for the test sample for ARIMA, SVM, LSSVM, ARIMA-SVM and ARIMA-LSSVM the suggested models. Once more, it is obvious that when compared to ARIMA, SVM, LSSVM, and ARIMA-SVM models, the test sample lines for our suggested model (Figure 8e) are somewhat close to the actual data. This demonstrates that the outcomes of our suggested model are in line with prior findings and are more effective, accurate, and precise than those of ARIMA, SVM, LSSVM, and ARIMA-SVM models. The number of daily COVID-19 death cases is also plotted, just like in Figure 9. The daily new death cases of COVID-19 in Malaysia are anticipated to A similar approach to the daily new positive cases of the COVID-19 dataset was used to study the performance of the proposed model on the daily new death cases of the COVID-19 dataset. The dataset was divided into two samples, namely training sample and testing sample. It accounts for approximately 80% of the daily new death cases in the COVID-19 dataset for the training sample (involving 612 observations with the period 1 October 2020 until 4 June 2022). The remaining 20% is for the test sample, which includes approximately 153 observations from 5 June 2022 to 4 November 2022.
As shown in Table 7, the performance of the proposed models using the daily new deaths datasets from COVID-19 is first characterized by statistical measurements such as MSE, MAPE, RMSE, and MAE. The results for the training data in this table show that the proposed model produces the smallest MSE and MAE values of 19.6422 and 1.03218, respectively, when compared to ARIMA, SVM, LSSVM and ARIMA-SVM. The same pattern can be seen in the test data, where all the statistical measures used have the lowest values when compared to the ARIMA, SVM, LSSVM, and ARIMA-SVM models. The study then examines the estimated value of the suggested model for the COVID-19 case dataset for daily deaths, as shown in Figure 7a-e. This graph makes it abundantly clear that the proposed model line and the observed data are nearly identical. Additionally, Figure 8a-e each show the estimated values for the test sample for ARIMA, SVM, LSSVM, ARIMA-SVM and ARIMA-LSSVM the suggested models. Once more, it is obvious that when compared to ARIMA, SVM, LSSVM, and ARIMA-SVM models, the test sample lines for our suggested model (Figure 8e) are somewhat close to the actual data. This demonstrates that the outcomes of our suggested model are in line with prior findings and are more effective, accurate, and precise than those of ARIMA, SVM, LSSVM, and ARIMA-SVM models. The number of daily COVID-19 death cases is also plotted, just like in Figure 9. The daily new death cases of COVID-19 in Malaysia are anticipated to decline because of this number over the course of the following three weeks, indicating a downward trend.
As shown in Table 8, a similar method in the daily addition of positive COVID-19 case dataset is used to investigate the performance of the proposed model for the daily recorded death COVID- 19 Tables 7 and 8 and Figures 7-9) clearly show that our proposed model outperforms the ARIMA, SVM, LSSVM, and ARIMA-SVM models in terms of efficiency and accuracy.

New Recovered Cases Data Forecasts
The investigation to study the performance of the proposed model is continued with the dataset of new daily recovered cases of COVID-19 in Malaysia. Predicting Malaysia's daily new recovered COVID-19 cases is just as important as the previous two datasets. The data used in this paper include daily observations from 1 October 2020 to 4 November 2022, for a total of 765 data points in the time series ( Figure 10). The number of patients recovered from COVID-19 exhibits the same trend, with a significant increase twice. Beginning in July 2021, the number of recovered patients increases exponentially until it reaches over 22,500.00 in August 2021 (the time-series plot is shown in Figure 10) and then drops.    daily new recovered COVID-19 cases is just as important as the previous two datasets. The data used in this paper include daily observations from 1 October 2020 to 4 November 2022, for a total of 765 data points in the time series ( Figure 10). The number of patients recovered from COVID-19 exhibits the same trend, with a significant increase twice. Beginning in July 2021, the number of recovered patients increases exponentially until it reaches over 22,500.00 in August 2021 (the time-series plot is shown in Figure 10) and then drops. However, around March-April 2022, the number of recovered COVID-19 cases increased again to a maximum of 33,872.00, then decreased and showed a relatively stable movement after that. This dataset is also divided into two samples, namely the training However, around March-April 2022, the number of recovered COVID-19 cases increased again to a maximum of 33,872.00, then decreased and showed a relatively stable movement after that. This dataset is also divided into two samples, namely the training dataset and the test dataset. The training dataset, which included 612 observations (80%) from 1 October 2020 to 4 June 2022, was used in the same way as the previous datasets to formulate the model. In contrast, the test sample uses approximately 153 observations (20%) for the period 5 June 2022-4 November 2022. Table 9 displayed the performance of the proposed model on the daily new recovered COVID-19 case datasets based on training and testing samples. The results in Table 9 Figure 11a-e shows the estimated value of the dataset for daily new recovered COVID-19 cases for the test sample. Once more, this graph demonstrates how closely the predicted value from the proposed models seems to match the actual values. Figure 12a-e present an additional analysis of the outcomes of the proposed model. These plots (Figure 12a-e) show the predicted values for the test samples derived from ARIMA, SVM, LSAVM, ARIMA-SVM and ARIMA-LSSVM models. In these models, however, the proposed model is close to the true value because, as we shall see in Figure 11e, the proposed model dominates them. As shown in Figure 13, the number of daily new recovered COVID-19 cases is plotted. This figure makes it abundantly clear that the suggested model maintains the data's original sharpness. The daily new recovered COVID-19 cases for Malaysia are predicted from this figure for the upcoming three weeks, and it suggests that these cases will rise in the days to come in Malaysia.     As shown in Table 10, further research was completed to determine how well the  proposed models performed for the daily newly recovered COVID-19 case datasets  The results reported in the parentheses are the ARIMA, SVM, LSSVM, and ARIMA-SVM models. As a result, based on the findings, the proposed model has produced results that are more accurate and effective than those produced by ARIMA, SVM, LSSVM, and ARIMA-SVM models.

Conclusions
In conclusion, predicting the spread of COVID-19 with accuracy and efficiency is essential but frequently challenging for decision-makers, especially the front-line workers and health care authorities. Despite what might seem to be an endless spread of COVID-19, there have been numerous efforts to develop time-series models and ongoing research to enhance forecasting model efficacy. One of the most well-liked types of hybrid models that divide time series into linear and non-linear forms is the hybrid approach. In this study, a hybrid model that combines some linear and non-linear predictions is proposed. Utilizing three well-known COVID-19 datasets-daily new positive cases, daily new death cases, and daily new recovered cases-revealed that our proposed models were demonstrated as having the highest efficiency, accuracy, and precision. In comparison to ARIMA, SVM, LSSVM, and ARIMA-SVM models, the proposed model with cross-validation check based on MSE, RMSE, MAE, and MAPE makes the most accurate predictions. In terms of performance (the proposed models compared to ARIMA, SVM, LSSVM and ARIMA-SVM models) for both the training and testing datasets, the proposed models' performance yields the smallest values of MSE, RMSE, MAE, and MAPE. This indicates that the proposed model's predicted value is more closely aligned with the observed value. Therefore, our proposed models had a higher level of precision and could be suggested for COVID-19 forecasting. It can be concluded that the proposed model may be the most efficient and effective way to increase prediction accuracy performance, especially since it is important to anticipate and stop the spread of COVID-19 cases.

Limitations and Future Recommendation
In this research study, an attempt was made to predict the overall number of confirmed cases, fatalities, and recoveries of COVID-19 in Malaysia. Investigating SVM performance with various kernel functions and developing the best hyperparameters for the SVM forecasting model can help to increase the forecast's accuracy in upcoming work. Since only one-step-ahead forecasting is considered in this paper, multi-step forecasts can be centralised in subsequent work. It has been demonstrated that multi-step forecasts can greatly increase the trading system's realism [41,42]. Additionally, to improve the performance of the model in terms of efficiency and accuracy of dataset prediction, hybrid approaches such as bootstrap and double bootstrap methods [16,43,44] can be considered in the hybridization of ARIMA and SVM. Given the dearth of researchers using bootstrap in daily COVID-19 forecasting cases, it is a reliable method. Numerous studies have demonstrated that the bootstrap resampling method yields a more precise estimate [45]. Future studies should also consider (i) the clinical and behavioural aspects such as actions, cognition, and emotions and (ii) the possibility of the underreporting of cases and deaths, as well as delays in notifications, in order to avoid biased predictions, forecasts, and results.