The Deep Learning LSTM and MTD Models Best Predict Acute Respiratory Infection among Under-Five-Year Old Children in Somaliland

: The most effective techniques for predicting time series patterns include machine learning and classical time series methods. The aim of this study is to search for the best artiﬁcial intelligence and classical forecasting techniques that can predict the spread of acute respiratory infection (ARI) and pneumonia among under-ﬁve-year old children in Somaliland. The techniques used in the study include seasonal autoregressive integrated moving averages (SARIMA), mixture transitions distribution (MTD), and long short term memory (LSTM) deep learning. The data used in the study were monthly observations collected from ﬁve regions in Somaliland from 2011–2014. Prediction results from the three best competing models are compared by using root mean square error (RMSE) and absolute mean deviation (MAD) accuracy measures. Results have shown that the deep learning LSTM and MTD models slightly outperformed the classical SARIMA model in predicting ARI values.


Introduction
Acute respiratory infections (ARI) are the most common causes of both illness and mortality in children under five regardless of where they live or what their economic situation is [1,2]. ARIs, particularly lower respiratory tract infections (LRTI), are a major cause of death among children under five years of age. It is estimated that between 1.9 and 2.2 million children die annually because of this infection. Other estimates indicate that some 10.8 million children die each year [3]. Studies conducted by [4] suggested that more than 95% of all cases attributed to clinical pneumonia in young children worldwide occur in developing countries. Ref. [5] estimated that 4 million child deaths occur due to pneumonia each year, including 2.6 million infants and 1.4 million children aged 5-14 years. Despite the fact that forty-two percent of these ARI related deaths occur in Africa, the epidemiology and pathogenesis of LRTI remains understudied [6]. Accurate information on specific diseases in African children dying from respiratory illnesses is scarce [7]. Refs. [8][9][10] indicated the incidence of ARI is closely associated with the nutritional status of the child, socio-economic level of the family, and the family size. Studies conducted by [11,12] have shown that children who are under one year old are more likely to have ARI compared to older children. Some studies have associated environmental and sociodemographic factors with ARI. These include age of the mother, family size, and unhealthy environments [13][14][15]. Similar results from studies conducted in India showed a higher prevalence in urban areas [16]. Other studies demonstrated that children born to mothers younger than 20 years old had higher incidence of ARI infection compared to those with mothers older than 20 years [17]. Results from other studies have also shown increased incidence of ARI among children from homes that use unclean cooking fuel such as charcoal and firewood [18]. Ref. [19] suggests that improving nutrition and parental literacy may contribute to lowering the incidence of acute lower respiratory infections. Ref. [20] noted that other factors contributing to the incidence and severity of lower respiratory infections in developing countries include crowding in households, high birth rate, vitamin A deficiency, and population.
The World Health Organization (WHO) estimates under-five mortality in Somalia at 200 deaths per 1000 births, which is one of the highest in the world. Approximately one third of these are neonatal deaths, occurring during the first month of life. Pneumonia and diarrhea are the main killers, each contributing 20-25 percent of all under-five mortality [21]. Since different regions of the world may not experience outbreaks of communicable diseases at the same time, it is difficult to make worldwide joint predictions for the seasonal outbreaks of these diseases. Despite this, many studies have been conducted to predict the spread of these diseases using classical statistics and deep learning techniques. Deep learning models do not only play a prominent role in disease prediction, but these models are also used in analyzing and segmenting images [22], detection of tumors and lesions in medical images [23,24], and computer-aided diagnostics [25,26].
For disease prediction, some of the previous studies about the prediction of ARI that have used classical time series and deep learning techniques were conducted by [27][28][29][30][31][32][33].

Data
The data used in this study were collected from five regions in Somaliland from 2011-2014. The numbers of reported cases of acute respiratory infection and pneumonia during this period were 246,349 and 61,599, respectively. Ethical approval was provided by the institutional ethics review board of Somaliland's Ministry of Health (degree # 2020-7-027). The requirement for informed consent was waived because the data used in the study contained only the historical time series values for the number of admitted patients to the hospitals. Statistical and deep learning techniques are utilized to understand the trends of these diseases. SARIMA, MTD, and LSTM techniques are employed to assess and predict the spread of these diseases.

Long Short Term Memory (LSTM) Model
Artificial neural networks are designed to perform tasks in a similar fashion to human neurons. Many variates of these algorithms are currently used in machine learning. Recurrent neural networks (RNN) are special cases of the general neural networks used to model patterns in sequential data. These networks are capable of capturing the dependency of sequential data like the data collected from sensors and stock markets over time. One of the limitations for these models is a vanishing or exploding gradient that has a negative impact on their convergence [34,35].
Ref. [35] introduced long short-term memory (LSTM), which is a type of recurrent neural network that overcomes the problem of vanishing/exploding gradients. These models can be trained in both short and long-term dependences. For some of the most recent studies about LSTM and ARIMA, see [36][37][38][39][40]. LSTM is made up of a memory cell, an input gate, an output gate, and a forget gate. The memory cell has the information of the past filtration, while the gates control the amount of memory needed to pass to the next stage given the new information and the past filtration. Unlike a feedforward neural network, LSTM has a looping mechanism that feeds information cyclically in a loop. This allows the use of both current information input and what has been learned previously. Figure 1 displays LSTM architecture.
where C t−1 is the memory cell for past filtration, and f t , i t , and o t are the forget gate, input gate, and the output gate, respectively. σ is a logistic sigmoid function and Tanh is a hyperbolic activation function, whereas W and b are the matrix of the parameters and the bias, respectively. where is the memory cell for past filtration, and , , and are the forget gate, input gate, and the output gate, respectively. is a logistic sigmoid function and Tanh is a hyperbolic activation function, whereas W and b are the matrix of the parameters and the bias, respectively.

Mixture Transition Distribution (MTD) Model
Another promising technique for the analysis and the prediction of time series data is the MTD models introduced in 1985 by [41] for the modeling of high-order Markov chains with a finite state space. Since then, it has been successfully employed in a wide range of research problems. Ref. [42] proposed a univariate MTD with Gaussian components. For bivariate cases, Ref. [43] investigated a class of bivariate mixture transition distribution (BMTD) models using mixtures with components of different probability distributions. For non-Gaussian distributions, Ref. [44] considered MTD for high Markov chains and non-Gaussian time series. For more recent references about MTD, see, for example, [45][46][47].
In this study, a univariate version of the MTD model that has Gaussian components is used. The model has the following form: where is a probability density function for the jth component of the mixture for j = 1, 2, 3…, p, is the past filtration, is the weight for the jth component, and is the set of the model parameters.

Mixture Transition Distribution (MTD) Model
Another promising technique for the analysis and the prediction of time series data is the MTD models introduced in 1985 by [41] for the modeling of high-order Markov chains with a finite state space. Since then, it has been successfully employed in a wide range of research problems. Ref. [42] proposed a univariate MTD with Gaussian components. For bivariate cases, Ref. [43] investigated a class of bivariate mixture transition distribution (BMTD) models using mixtures with components of different probability distributions. For non-Gaussian distributions, Ref. [44] considered MTD for high Markov chains and non-Gaussian time series. For more recent references about MTD, see, for example, [45][46][47].
In this study, a univariate version of the MTD model that has Gaussian components is used. The model has the following form: where φ j is a probability density function for the jth component of the mixture for j = 1, 2, 3 . . . , p, y t−1 is the past filtration, α j is the weight for the jth component, and ϕ is the set of the model parameters.

Results
After the data were collected, they were randomly split into training and test sets with 70% for training and 30% for testing. Training data are fitted to the MTD, SARIMA, and LSTM models, and they are then validated with the test data to compare the performance of the models. RMSE and MAD accuracy measures are used to choose the best model for predicting the spread of the disease.
ARI accounted for 80% of the 307,948 combined cases for both diseases. To glean insights from the data, a qualitative assessment of the data is presented; Figures 2 and 3 display the time series trends of ARI and pneumonia for those four years. These trends show that both ARI and pneumonia are increasing over time, but the increase in ARI is relatively higher. ARI accounted for 80% of the 307,948 combined cases for both diseases. To glean insights from the data, a qualitative assessment of the data is presented; Figures 2 and 3 display the time series trends of ARI and pneumonia for those four years. These trends show that both ARI and pneumonia are increasing over time, but the increase in ARI is relatively higher.  It is observed from Figure 4, that the shape of the ARI distribution appears to be bimodal with a peak in November through January and another from March to May. This demonstrates that the disease is common through the seasons from winter to spring in Somaliland. The bar chart in Figure 5 also explicitly shows that increase. insights from the data, a qualitative assessment of the data is presented; Figures 2 and 3 display the time series trends of ARI and pneumonia for those four years. These trends show that both ARI and pneumonia are increasing over time, but the increase in ARI is relatively higher.  It is observed from Figure 4, that the shape of the ARI distribution appears to be bimodal with a peak in November through January and another from March to May. This demonstrates that the disease is common through the seasons from winter to spring in Somaliland. The bar chart in Figure 5 also explicitly shows that increase. It is observed from Figure 4, that the shape of the ARI distribution appears to be bimodal with a peak in November through January and another from March to May. This demonstrates that the disease is common through the seasons from winter to spring in Somaliland. The bar chart in Figure 5 also explicitly shows that increase.
Quantitative summary statistics of the data are depicted in Tables 1 and 2. From  Table 1, for the last three years, the infection rate of ARI has increased by 18%, 19%, and 8%, respectively, whereas the annual increase of pneumonia incidences were 31%, 3%, and 10% during the same period. In addition, Table 2 displays 95% confidence intervals for the true monthly means of ARI and pneumonia incidences. Additionally, it has been observed that pneumonia is highly correlated with ARI. The Pearson correlation coefficient between ARI and pneumonia is 0.866, with a p-value of (<0.001). Since ARI and pneumonia are highly correlated and the majority of the cases are acute respiratory infections, we restrict our study to the investigation of the ARI trends.    Quantitative summary statistics of the data are depicted in Tables 1 and 2. From Table  1, for the last three years, the infection rate of ARI has increased by 18%, 19%, and 8%, respectively, whereas the annual increase of pneumonia incidences were 31%, 3%, and 10% during the same period. In addition, Table 2 displays 95% confidence intervals for the true monthly means of ARI and pneumonia incidences. Additionally, it has been observed that pneumonia is highly correlated with ARI. The Pearson correlation coefficient  Quantitative summary statistics of the data are depicted in Tables 1 and 2. From Table  1, for the last three years, the infection rate of ARI has increased by 18%, 19%, and 8%, respectively, whereas the annual increase of pneumonia incidences were 31%, 3%, and 10% during the same period. In addition, Table 2 displays 95% confidence intervals for the true monthly means of ARI and pneumonia incidences. Additionally, it has been observed that pneumonia is highly correlated with ARI. The Pearson correlation coefficient The original ARI data were not stationary, and its sample autocorrelation and partial autocorrelation functions are displayed in Figure 6a,b. The autocorrelation function dies down very slowly and the partial autocorrelation cuts off after the first lag, which is clearly an indication that the series is non-stationary. These results reveal that data transformation Symmetry 2021, 13, 1156 6 of 10 is needed to make the time series stationary. The autocorrelation and the partial autocorrelation functions of the series after the first differencing are presented in Figure 6c,d respectively. Both functions die down very quickly after the first differencing.

ARI
(4757-5507) Pneumonia (1205-1362) The original ARI data were not stationary, and its sample autocorrelation and partial autocorrelation functions are displayed in Figure 6a,b. The autocorrelation function dies down very slowly and the partial autocorrelation cuts off after the first lag, which is clearly an indication that the series is non-stationary. These results reveal that data transformation is needed to make the time series stationary. The autocorrelation and the partial autocorrelation functions of the series after the first differencing are presented in Figure  6c,d respectively. Both functions die down very quickly after the first differencing. The Dickey-Fuller test statistic for stationarity was computed and a value of −3.9 was obtained with a p-value of 0.022. The above results indicate the stationarity of the time The Dickey-Fuller test statistic for stationarity was computed and a value of −3.9 was obtained with a p-value of 0.022. The above results indicate the stationarity of the time series after the first differencing. Machine learning algorithms and Box-Jenkins forecasting methods are employed to predict the spread of the disease. The three competing models were chosen from the deep learning long short term memory (LSTM), EM-algorithm based mixture transition distribution (MTD), and the seasonal autoregressive integrated moving averages (SARIMA) techniques. The data were fitted to each one of these models and their results were compared to understand the patterns of the disease. 70% of data are used for training and the remaining 30% for testing. Root mean square error (RMSE) and mean absolute deviation (MAD) accuracy measures are used to choose the best model for predicting the future spread of the disease. The best LSTM model identified by the two accuracy measures (RMSE and MAD) is the model with a learning rate of 5%, hidden-dim = 3, activation function = Tanh, loss function = MSE, optimizer = Adam, and the number of epochs = 300. The model is trained very well with the data and the training error rate has fallen very sharply, as shown in Figure 7. The best MTD model is the second order model with the parameter estimates (α 1 ,α 2 ) = (0.1, 0.43), (µ 1 , µ 2 ) = (0.93, 2.4), and (σ 1 ,σ 2 ) = (14.8, 1.39). Finally, the best SARIMA model is SARIMA (1, 1, 0)(1, 0, 0) 12 . function = MSE, optimizer = Adam, and the number of epochs = 300. The model is trained very well with the data and the training error rate has fallen very sharply, as shown in Figure 7. The best MTD model is the second order model with the parameter estimates ( , ) = (0.1, 0.43), (μ1, μ2) = (0.93, 2.4), and ( , ) = (14.8, 1.39). Finally, the best SARIMA model is SARIMA (1, 1, 0)(1, 0, 0)12.  Table 3 shows the estimated parameters of the SARIMA model with the test statistics and p-values. Both p-values are less than 0.02. Ljung-Box Chi-Square statistics of the SARIMA are also calculated for lags 12 and 24 and presented in Table 4 to check the adequacy of the fitted ARIMA model. RMSE and MAD indicated that the results of the three fitted models are very close with slight differences as shown in Table 5. However, MTD and LSTM performed better than the well-known and the more commonly used SARIMA model.   Table 3 shows the estimated parameters of the SARIMA model with the test statistics and p-values. Both p-values are less than 0.02. Ljung-Box Chi-Square statistics of the SARIMA are also calculated for lags 12 and 24 and presented in Table 4 to check the adequacy of the fitted ARIMA model. RMSE and MAD indicated that the results of the three fitted models are very close with slight differences as shown in Table 5. However, MTD and LSTM performed better than the well-known and the more commonly used SARIMA model.

Discussion
In this study, time series trends of acute respiratory infection and pneumonia among Somaliland children are investigated. Both the qualitative and the quantitative assessments of the data sets are performed. LSTM, MTD, and SARIMA models are employed to predict the spread of the ARI. Two accuracy measures, RMSE and MAD, were used to identify the best model that could be applied to predict acute respiratory infection. Results obtained in the study demonstrate that implementing several different machine learning and classical time series predictive modeling techniques could aid in the search for the best model that can predict the spread of diseases with high accuracy. Stationarity and data normalization improved the predictive accuracy. Furthermore, those predictive modeling techniques are varied in their performance depending on the given data, and on the stability of their convergence. Thus, finding a method that could generate predictive models that outperform all other models for any given time series data is unlikely, and instead, the implementation of multiple techniques is the standard practice in model development.
There is a large body of literature on predicting the spread of acute respiratory infection among different regions of the world. They generally use one of the predictive modeling techniques, but we are not aware of any modeling methods that compare the performance of the classical techniques, including but not limited to, Box-Jenkins, exponential smoothing, and decomposition with MTD and artificial intelligence. The comparisons implemented here are, as far as we know, the first attempt to compare deep learning sequential techniques with MTD and classical time series methods.
There are a number of limitations to this study. First, the time series data sample size was relatively small. Although the number of observations is 246,349, the data collection period was 48 months. Second, each of the three employed techniques has its own drawbacks. For example, parameter estimates of the MTD are based on the EM algorithm that has reproducibility problems if not carefully chosen. Major disadvantages of the LSTM algorithm include longer training times and more memory to train. For the Box-Jenkins models, the main weakness is the satisfaction of the residual assumptions and model stability. Finally, to overcome the limitations of those techniques, domain knowledge could be useful to select the initial values of the parameters.
In conclusion, the results of this study showed that LSTM and MTD both performed better than the SARIMA model. LSTM and MTD models demonstrated their flexibility and competitiveness in the study, which may lead to their being considered viable alternatives to the existing time series models.

Conclusions
In this study, we have compared the prediction performance of MTD, seasonal autoregressive integrated moving averages (SARIMA), and the long short term memory (LSTM) deep learning models. The data used in the study were monthly ARI and Pneumonia cases among under-five year old children that were collected from five regions of Somaliland from 2011-2014. To get reliable results from the above models, a number of changes has been made to the original observations of the data before comparing competing models. First, the data are transformed by differencing to make them stationary; the autocorrelation function (ACF), partial autocorrelation (PACF), and the Dickey-Fuller test for stationarity were computed to check the stationarity of the series. The results of these measures indicated that the time series is stationary. Second, data were randomly split into training and testing sets with a ratio of 70:30 (70% training and 30% testing). Third, the independent variables in the training data were normalized by subtracting their means and dividing them by their standard deviations. Data normalization in the preprocessing stage is needed to implement machine learning algorithms to reduce the variability among the different variables. Training data are fitted to the three models, and then validated with the test data to compare the performance of the competing models. RMSE and MAD accuracy measures are used to choose the best model in predicting the spread of the disease.
The results of this study have shown that no model is a panacea over the other two models, but they demonstrated that the deep learning LSTM and MTD models slightly outperformed the classical SARIMA model in predicting ARI values.
Although there is a large body of literature dealing with the comparison of Box-Jenkins and machine learning techniques, we are not aware of any studies that compare the MTD, the SARIMA, and the LSTM models. Perhaps one of the most important outcomes of this study is the performance of the MTD model. The study illustrated the utility and the efficacy of the MTD model, which is not familiar to many researchers. Besides, this study is the first to attempt to show that MTD could be a highly competitive and flexible predictive model that can challenge other machine learning algorithms. Informed Consent Statement: Ethical approval was provided by the institutional ethics review board of Somaliland's Ministry of Health (degree # 2020-7-027).

Data Availability Statement:
The study did not report any data.