Time-Series Analysis for the Number of Foot and Mouth Disease Outbreak Episodes in Cattle Farms in Thailand Using Data from 2010–2020

Thailand is one of the countries where foot and mouth disease outbreaks have resulted in considerable economic losses. Forecasting is an important warning technique that can allow authorities to establish an FMD surveillance and control program. This study aimed to model and forecast the monthly number of FMD outbreak episodes (n-FMD episodes) in Thailand using the time-series methods, including seasonal autoregressive integrated moving average (SARIMA), error trend seasonality (ETS), neural network autoregression (NNAR), and Trigonometric Exponential smoothing state–space model with Box–Cox transformation, ARMA errors, Trend and Seasonal components (TBATS), and hybrid methods. These methods were applied to monthly n-FMD episodes (n = 1209) from January 2010 to December 2020. Results showed that the n-FMD episodes had a stable trend from 2010 to 2020, but they appeared to increase from 2014 to 2020. The outbreak episodes followed a seasonal pattern, with a predominant peak occurring from September to November annually. The single-technique methods yielded the best-fitting time-series models, including SARIMA(1,0,1)(0,1,1)12, NNAR(3,1,2)12, ETS(A,N,A), and TBATS(1,{0,0},0.8,{<12,5>}. Moreover, SARIMA-NNAR and NNAR-TBATS were the hybrid models that performed the best on the validation datasets. The models that incorporate seasonality and a non-linear trend performed better than others. The forecasts highlighted the rising trend of n-FMD episodes in Thailand, which shares borders with several FMD endemic countries in which cross-border trading of cattle is found common. Thus, control strategies and effective measures to prevent FMD outbreaks should be strengthened not only in Thailand but also in neighboring countries.


Introduction
The World Organization for Animal Health (OIE) lists foot and mouth disease (FMD) as an important transboundary disease [1]. The disease is caused by the FMD virus (FMDV FMD outbreaks in one country can support other countries in developing an effective control program to prevent FMD from spreading across borders. Thus, given Thailand's proximity to other FMD-endemic countries, the findings of this study, which are based on 10 years of retrospective data, will be useful to livestock authorities and stakeholders in those boundary countries. The objectives of this study were to examine FMD epidemic episodes (n-FMD episodes) in Thailand over a 10-year period for the presence of trend and seasonality using time-series methods and to compare the prediction performances of time-series forecasting models for prospective estimation of FMD outbreak occurrences.

Outbreak Episode Definition, Data, and Data Analysis Steps
An outbreak episode of FMD was defined as an official report of an FMD outbreak in an outbreak area where disease investigation and FMD disease confirmation were undertaken by an authority from the Department of Livestock and Development (DLD) [19]. Notably, the episode of FMD outbreak defined as the unit of analysis in this study is similar to the unit utilized in several previous studies [7,9,41].
Data of n-FMD episodes in Thailand from 2010 to 2020 (n = 1209 episodes) were obtained from the DLD. Time-series data were generated from the original data. Further, the data was divided into two parts: data from 2010 to 2019 (Data: 2010-2019; training data) and data from 2020 (Data: 2020; validation data), with the former being used for model development and the latter for model validation, respectively. Data analysis and time-series modeling were performed using R statistical software [42] and "xts", "tsbox", "plotly", "TSstudio", "forecast", and "forecast Hybrid" packages. The step of time-series data analysis and modeling includes (i) decomposition of time-series data into several components, (ii) determination of the final model through the model development process, (iii) evaluation of the performances of the final model developed, and (iv) forecasting the n-FMD episodes from the final model. These steps are depicted in Figure 1.

Decomposition of Time-Series Data
An additive decomposition of the FMD time series was carried out to describe the trends and seasonality components using R statistical software version 4.0.4 [42]. At this step, time-series data were decomposed into trend, seasonal, and residual components. The model is given as follows: where Y t is the n-FMD episode at time t, T t is the trend-cycle component at time t, S t is the seasonal component at time t, and I t is the irregular component at time t.

Model Development
The SARIMA is composed of seasonal and non-seasonal components. The general SARIMA model [29] has the following form: where ϕ(X) = non-seasonal autoregressive operator of order p, θ(X) = non-seasonal moving average operator of order q, Φ X 12 = the autoregressive seasonal operator of order P, Θ(X) = the moving averages operator of order Q, ∆ d = the operator difference, ∆ D 12 = operator seasonal difference, and a t = white noise.

SARIMA Model
The SARIMA is composed of seasonal and non-seasonal components. The general SARIMA model [29] has the following form: where ( ) = non-seasonal autoregressive operator of order , ( ) = non-seasonal moving average operator of order , ( ) = the autoregressive seasonal operator of order , ( ) = the moving averages operator of order , Δ = the operator difference, Δ = operator seasonal difference, and = white noise. The expression of SARIMA is given as The expression of SARIMA is given as where p = autoregressive order in non-seasonality, d = difference in non-seasonality, q = the non-seasonal moving average order, P = autoregressive order in seasonality, D = differences in seasonality, Q = moving average order in seasonality, and s = length of the seasonal pattern (s = 12 in this study). The development of SARIMA models was carried out using auto.arima( ) function from a "forecast" package. This function determined the p, d, q, P, D, Q for several candidate models. Akaike information criterion (AIC) was used as criteria to verify the model that best fit the data.
The auto.arima function utilized the Hyndman-Khandarkar algorithm for automatic autoregressive integrated moving average (ARMA) modeling. As previously described, this algorithm performed various steps in the model selection procedure [29,43]. Based on such procedures, the final model was defined as the model with the lowest AIC. Additionally, the model assumptions were checked by examining the standardized residual plot, autocorrelation function plot of residuals, and the p-values plot from the Ljung-Box test.

NNAR Model
The NNAR used lagged values of the time series as input to a neural network. A network of three layers of functioning is linked by acyclic linkages. The equation of the NNAR is as follows [44]: where y t and y t−i,..., y t−p are the output and the input, ω i,j (i = 0, 1, 2, ..., P, j = 1, 2, ..., Q) and ω j (j = 0, 1, 2, ..., Q) are model parameters, which are known as connection weights; the number of input nodes is represented by P while the number of hidden nodes is indicated by Q.
For seasonal data, the NNAR model [29] can be written as where p = the last observed values from the same season using as inputs, P = lagged inputs, K = number of neurons (nodes) in the hidden layer, and m = the number of months. A function nnetar from "forecast" package was used to fit NN AR(p, P, K) m model. For seasonal time series, P = 1 was set as default value and p was chosen from the optimal linear model fitted to the seasonally adjusted data. We used the default setting, and thus K = (p + P + 1)/2 was set using the nnetar( ) function [29].

ETS Model
Regarding a state-space framework, the ETS combines error (E), trend (T), and seasonal (S) components in a smoothing calculation. The ETS offers a total of 30 possible ETS combinations; thus, it has the ability to analyze a variety type of time-series data, even with both heterogeneity and non-linearity. Within a state-space framework, the E component is either additive (A) or multiplicative (M), T and S components may be A, M, or inexistent (N), and T is referred to as dampened additively (A d ) or multiplicatively (M d ) [45]. The state-space equations can be written as [46] where w, f , and g are coefficient while ε t represents the Gaussian white noise series. Equation (6) is known as the observation that describes the relationship between the observation x t−1 and y t . Equation (7) is the transitional equation describing the evolution of states over time. The ETS model was performed using the ets( ) function from the "forecast" package.

TBATS Model
Multiple seasonal incorrect cycles can be accommodated by TBATS. A combination of Fourier with an exponential smoothing state-space model and a Box-Cox transformation is represented by the TBATS models in this study. The basic equation of the TBATS took the following form [30,47]: indicates the Box-Cox transformation parameter (ω) and y t is the observation at time t, l t is the local level, φ denotes the damped trend, b denotes the long-run trend, T indicates the seasonal pattern, s (i) t i denotes the ith seasonal component, m i indicates the seasonal periods, and d t is an ARMA(p, q) process for residuals. The TBATS model was identified by tbats( ) function included in the "forecast" package.

Hybrid Model
Time-series hybrid models including SARIMA-NNAR, SARIMA-ETS, SARIMA-TBATS, NNAR-ETS, NNAR-TBATS and ETS-TBATS were developed using functions from the "forecast Hybrid" package and R. The functions offer the automated procedure to obtain the best fitting model. The detail of model development from the functions was previously described [8,48].

Forecast and Model Performances
Time-series models developed from Data: 2010-2019 were used to predict n-FMD episodes in 2020. By comparing forecast and actual n-FMD episode values, the forecasting ability was evaluated.
We used several evaluation error metrics, including mean absolute error (MAE), root mean squared error (RMSE), and mean absolute scaled error (MASE) [29,49], to determine prediction performances among models developed from the full and training datasets [29]. It was important to note that a mean absolute percentage error (MAPE), which was generally reported, was unable to be determined as our data contained zero counts for some months of time-series data. Therefore, the scaled errors such as MASE were proposed as an alternative to using MAPE. Similar to MAPE, the MASE is independent of the scale of the data [29]. The MASE is considered the most versatile and reliable measure of forecast accuracy and can be used to compare forecast accuracy both on single and multiple time series [30].
The MAE and RMSE are expressed as follows: where y t denotes the actual values,ŷ t represent the predicted values, and n indicates the number of observations. It is generally accepted that the lower the error metrics, the better the method [30,49]. All error metric values were calculated for models developed from Data: 2010-2019 to assess the accuracy of such models for in-sample data (training data), whereas these values were also determined to assess how the models developed from the training data performed on out-of-sample data (validation data; Data: 2020).
In addition, we also forecasted the n-FMD episodes for the period of 2021-2023 based on 2010-2020 FMD data.

Trends and Seasonality
From 2013 to 2019, there was a seasonal increase in the number of n-FMD episodes.
The year with the most episodes was 2016 (Figure 2). There was a downward trend in n-FMD episodes from mid-2016 to mid-2017. From late 2018 to mid-2019, the n-FMD episodes appeared to be stable. Following that, there was an upward trend in the number of n-FMD episodes.

Trends and Seasonality
From 2013 to 2019, there was a seasonal increase in the number of n-FMD episodes.
The year with the most episodes was 2016 (Figure 2). There was a downward trend in n-FMD episodes from mid-2016 to mid-2017. From late 2018 to mid-2019, the n-FMD episodes appeared to be stable. Following that, there was an upward trend in the number of n-FMD episodes. The seasonal pattern demonstrated that the peak of n-FMD episodes was mainly observed from September to December annually.

Fitted Time-Series Models, Model Performances, and Forecasts
For the entire dataset, the program generated and tested 192 potential models, (1,0,1)(0,1,1) was the best-fitting model with the lowest AIC. Assumptions for model residuals were tested before this model could be used to forecast (Figure 3). The ACF of residuals showed no significant deviation from a zero-mean white noise process, and all p values for the Ljung-Box statistic were large (p > 0.05), indicating no violations. The best forecasting model for NNAR was found to be (3,1,2) , whereas the fitted model for ETS was ( , , ). The fitted model for TBATS was (1, {0,0},0.8, {< 12,5 >}). The seasonal pattern demonstrated that the peak of n-FMD episodes was mainly observed from September to December annually.

Fitted Time-Series Models, Model Performances, and Forecasts
For the entire dataset, the program generated and tested 192 potential models, SARI MA (1, 0, 1)(0, 1, 1) 12 was the best-fitting model with the lowest AIC. Assumptions for model residuals were tested before this model could be used to forecast (Figure 3). The ACF of residuals showed no significant deviation from a zero-mean white noise process, and all p values for the Ljung-Box statistic were large (p > 0.05), indicating no violations. The best forecasting model for NNAR was found to be NN AR(3, 1, 2) 12 , whereas the fitted model for ETS was ETS(A, N, A). The fitted model for TBATS was TBATS(1, {0, 0}, 0.8, {< 12, 5 >}). Figure 3 shows the actual and fitted values for each final model derived from the training data (Data: 2010-2019). Furthermore, actual and forecast values from the final models tested with validation data (Data: 2020) were illustrated (Figure 4). The projection appearance of actual and fitted values among models appears to be similar graphically (Figure 3), whereas actual and forecast values of the n-FMD episodes appear to differ among models (Figure 4). Additionally, the forecast values of n-FMD episode from each model are presented in Supplementary Materials Table S2.
In terms of RMSE, MAE, and MASE (Table 1), it was found that using NNAR for training data produced results that outperformed those obtained from other single-technique methods. On the other hand, the lowest values of the RMSE, MAE, and MASE from testing data demonstrate that SARIMA is more accurate than other competing models in terms of accuracy. In hybrid models, the SARIMA-NNAR and NNAR-TBATS models outperformed the others.

Discussion
In the present study, we applied the time-series method to FMD outbreak data to determine epidemiological features and explore the predictivity in forecasting the n-FMD episodes through various time-series models. Based on the most recent outbreak data used in this study, forecasts of n-FMD episodes for the following three years were also projected.

Discussion
In the present study, we applied the time-series method to FMD outbreak data to determine epidemiological features and explore the predictivity in forecasting the n-FMD episodes through various time-series models. Based on the most recent outbreak data used in this study, forecasts of n-FMD episodes for the following three years were also projected.
From 2014 to 2019, there was an increasing trend of n-FMD episodes in Thailand, which was in agreement with a previous report [16]. The increasing trend could be attributed to the presence of more than one serotype of FMD virus causing outbreaks. In general, FMD virus serotype O was responsible for the majority of the FMD outbreaks in Thailand each year [16,17]. However, FMD virus serotype A has become more prevalent since 2013. During the years 2013-2015, both serotypes A and O were responsible for a number of FMD outbreaks in different parts of the country [16]. It has been discussed that the presence of two serotypes causing outbreaks nationwide might pose a difficulty in preventing and controlling the disease [17,50]. The variation in the genetics of FMD virus serotype A is a major concern because the ratio of antibody titer against the heterologous field strain and antibody titer against the homologous vaccine strain or r-value [51] is not high, indicating that a moderate-level matching is observed between FMD vaccine strains and FMD outbreak strains, especially in 2016, 2017, and 2019 [17]. Given that, it is likely that vaccines used in some areas could not adequately provide disease protection [17]. Another explanation for the increase in n-FMD episodes is the enhancement of surveillance programs by DLD, which encourages farmers to report FMD outbreaks to DLD officers [50].
Our findings revealed a seasonal variation, indicating that the FMD outbreak in Thailand occurred on a regular basis. A predominant peak of reported FMD outbreaks was observed between October to December annually. This finding could be due to the possibility that climatic influences cause stress in cattle as they transition from the rainy season to the winter season, and thus, they are more susceptible to the disease [50]. Moreover, the high degree of trade activity throughout the last 3 months of the year might potentially have a role in the spreading of the disease across locations [50]. The finding of the seasonality of n-FMD episodes in this study was in agreement with several studies conducted in Thailand [52,53], Vietnam [9], and Colombia [41] but different from a report in Ethiopia that showed no seasonal trend for FMD outbreaks [7].
There is a lot of ambiguity and nonlinearity in FMD epidemic data. Unsatisfactory results may be obtained by predictive models that fail to account for seasonal fluctuation and nonlinearity [54]. Thus, SARIMA, ETS, and TBATS methods that account for seasonal variation, the nonlinear NNAR approach that incorporates seasonality and nonlinearity in its algorithm, and hybrid models that combine some of the advantages of these methods were utilized. According to our findings, NNAR outperformed other single-technique methods. Better performance of NNAR may be due to its ability to take account of the nonlinear characteristics of n-FMD episodes exhibited during some parts of the study period. However, with validation data, the SARIMA model outperformed other singletechnique methods in terms of validity. This could be because SARIMA appears to be an effective model for short-term forecasting [26,27]. However, a model's performance changes depending on the data it utilizes. Thus, we propose examining both SARIMA and NNAR predictions together for predicting future n-FMD episodes because they each have their own set of advantages. Furthermore, our findings demonstrated that the prediction ability from the single-technique methods and hybrid methods was likely to be similar. Nevertheless, it should not be extrapolated that hybrid models will not improve prediction ability for other FMD outbreak data or other data types because several studies from various fields have shown that hybrid models outperform non-hybrid models [28,33,44,55]. As addressed above, the performance of time-series methods is varied depending on characteristics and nature of the data. Thus, we suggest building multiple models from the same dataset to get better predictive insights.
Furthermore, it is generally recommended that the time-series method should be used on an updated dataset [40]. Hence, we recommend developing a time-series model to forecast n-FMD episodes on a regular basis, such as every six months or every year, for continuous projections and better prediction.
This study has some limitations to be considered. First, the FMD dataset is likely to be subjected to underreporting bias, as also indicated by other studies conducted in Thailand [56] and elsewhere [57]. Second, we did not determine temporary patterns for n-FMD episodes at provincial or regional levels due to the fact that the majority of the monthly FMD episodes data would contain zero values, which may not be appropriate for SARIMA and NNAR. For follow-up studies, other time-series methods should be investigated. In addition, FMD outbreak data from several countries in the region should be analyzed.
There is evidence that the same FMDV strains (e.g., O/EA/Mya-98, O/Middle East-South Asia, and O/ME-SA/Ind-2001) circulate in different Southeast Asian countries [8], suggesting that regional cooperation is required to control disease spread across borders. The lack of effective outbreak alerts and FMD outbreak data sharing among Southeast Asian endemic countries is one of the most difficult challenges for FMD prevention [58]. The analysis of retrospective time-series data and the prediction of future FMD outbreak occurrences in Thailand in this study would provide important information and warning messages to FMD endemic neighboring countries. The prediction of future n-FMD episodes would support the existing livestock surveillance system [16]. Livestock authorities can use the prediction from time-series models to develop an additional strategic plan for relevant control strategies for future FMD outbreaks, such as an implementation of rigorous animal movement control during the last three months of the year. Furthermore, in the Southeast Asia region, strict quarantine procedures should be implemented when cattle are transported across international borders. We also recommended greater regional cooperation in enhancing FMD prevention and controls.

Conclusions
To our knowledge, this is the first study to use the SARIMA, ETS, NNAR, TBATS, and hybrid models for the n-FMD episodes in Thailand to describe trends and seasonality as well as forecast the future n-FMD episodes. This study provides a better understanding of FMD outbreaks in Thailand in terms of trend, seasonality, and forecast based on 10-year outbreak data. The approaches demonstrated in this study could be used by livestock authorities to forecast FMD outbreak episodes in the short and long term and thus design or improve the FMD control strategies to prevent future outbreaks. The results of the study were not limited to Thailand; neighboring countries could use our findings and predictions as basic information to develop a regional strategy to mitigate FMD outbreaks in the region.

Supplementary Materials:
The following supporting information can be downloaded at: https://