#
Towards a Predictive Analytics-Based Intelligent Malaria Outbreak Warning System^{ †}

^{1}

^{2}

^{3}

^{4}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

## 2. Assessment of Hidden Ecological Factors

#### 2.1. Study Site and Population

#### 2.2. Data Collection and Source

#### 2.3. Factor Analysis

#### 2.4. Structural Equation Modelling

#### 2.5. Estimation of PLS-PM

#### 2.5.1. Measurement Model

- Matrix of MVs $\mathbf{Y}$ are scaled to have zero mean and unit variance.
- Each block of MVs ${\mathbf{Y}}_{g}$ is already transformed to be positively correlated for all LVs ${\mathbf{x}}_{g},g=1,\cdots ,G$.

#### 2.5.2. Mode A

#### 2.5.3. Mode B

#### 2.6. Presentation of Results

## 3. Intelligent Malaria Outbreak Warning System

#### 3.1. Data Preprocessing

#### 3.2. Machine Learning

- Linear Regression (LiR) method gives overall good prediction results, but it seems that the method failed to produce any medium predictions.
- Logistic Regression (LoR) method predicts the probability of occurrence of an event by fitting the dataset, as a set of independent variables, into a logic function. In other words, for a correlated data set, LoR may not be able to find the intrinsic-relationships between events.
- Decision Tree (DT) works very well for both categorical and continuous dependent variables; however, this dataset cannot be separated as distinct groups since the edges of the samples are fuzzy. Therefore, DT gave a bad prediction after all.
- Support Vector Machine (SVM) is one of the most efficient supervised machine learning algorithms, which is mainly used for solving classification and regression problems. The best part of this algorithm is that training and testing data can be plotted as a point in a n-dimensional plane, with a feature being the value of a particular coordinate. Without optimisation of the parameters, SVM gave a 80.56% predicting result. After parameter optimisation, especially on the penalty parameter and gamma coefficient adjustment, SVM (o) gave a 99.0% predicting result.
- Naive Bayes (NB) is a well-known classification method, which is based on Bayes’ Theorem with an oversimplified assumption of independence between classifiers. Moreover, NB is a conditional probability model, which means that the method needs to be assigned a series of certain events. For this data set, NB did not produce a good prediction overall.
- K-Nearest Neighbours (KNN) method is able to deal with both classification and regression problems. In comparison to KNN5 (where k = 5) and KNN10 (where k = 10), KNN1 (where k = 1) failed to make a good prediction. It means that the data may need to do more pre-process and/or noise removal in a theory; however, most of data from the real world are incomplete; that is why KNN5 and KNN10 make a better prediction.
- K-Means (K-M) is a type of un-supervised method for clustering. In this case, three clusters have been set at the beginning; however, a convergence did not perfectly land; therefore, it cannot give a good overall prediction.

#### 3.3. Mobile Application

#### 3.4. Discussion

## 4. Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## Abbreviations

SEM | Structural Equation Modelling |

EFA | Exploratory Factor Analysis |

PLS-PM | Partial Least Squares Path Modelling |

LVs | Latent Variables |

MVs | Measurement Variables |

FA | Factor Analysis |

API | Application Programming Interface |

SVM | Support Vector Machine |

LiR | Linear Regression |

LoR | Logistic Regression |

DT | Decision Tree |

NB | Naive Bayes |

KNN | K-Nearest Neighbours |

K-M | K-Means |

CFSR | Climate Forecast System Reanalysis |

NCEP | National Centre for Environmental Prediction |

JSON | JavaScript Object Notation |

XML | Extensible Markup Language |

HTML | HyperText Markup Language |

## Appendix A

#### Appendix A.1. Estimation of Parameters

- Step 1
- Initialization: Suppose ${\mathbf{Y}}_{1},\cdots ,{\mathbf{Y}}_{K}$ are the respective MVs, and are scaled such that $\mathbf{E}\left({\mathbf{Y}}_{i}\right)=0$ and $\mathbf{V}\left({\mathbf{Y}}_{i}\right)=1$. We are interested in expressing each LV as a linear combination of MVs, represented in compact form:$$\begin{array}{cc}\hfill \widehat{\mathbf{X}}& =\mathbf{YM}\hfill \\ \hfill {\widehat{\mathbf{x}}}_{g}& =\frac{{\widehat{x}}_{g}}{\sqrt{VAR\left({\widehat{x}}_{g}\right)}},g=1,\cdots ,G\hfill \end{array}$$Hence, the LVs are initialized as: $\widehat{\mathbf{X}}={\widehat{\mathbf{x}}}_{1},\cdots ,{\widehat{\mathbf{x}}}_{G}$.
- Step 2
- Inner approximationWithin the inner model domain, the estimation of the path parameter of each LV can be mathematically represented as the weighted sum of its neighbouring LVs.$$\begin{array}{cc}\hfill \tilde{\mathbf{X}}& =\widehat{\mathbf{X}}\mathbf{E}\hfill \\ \hfill {\tilde{\mathbf{x}}}_{g}& =\frac{{\tilde{x}}_{g}}{\sqrt{\left(VAR\left({\tilde{\mathbf{x}}}_{g}\right)\right)}},g=1,\cdots ,G\hfill \end{array}$$The approximate estimation of the inner model path parameter takes: $\tilde{\mathbf{X}}=({\tilde{\mathbf{x}}}_{\mathbf{1}},\cdots ,{\tilde{\mathbf{x}}}_{\mathbf{G}})$.
- Step 3
- Outer approximationThe outer approximation is computed based on the weight of the LV loads from the inner approximation. This comes in two forms, Mode A and Mode B. For Mode A, a multivariate regression coefficient with the block of MVs as the response and the LV as the regressor:$$\begin{array}{cc}\hfill {\widehat{\mathbf{w}}}_{g}^{\top}& ={({\tilde{\mathbf{x}}}_{g}^{\top}{\tilde{\mathbf{x}}}_{g})}^{-1}{\tilde{\mathbf{x}}}_{g}^{\top}{\mathbf{Y}}_{g}\hfill \end{array}$$Mode B is a multiple regression coefficient with the block of MVs as the response and its block of MVs as the regressor:$$\begin{array}{cc}\hfill {\widehat{\mathbf{w}}}_{g}& ={\left({\mathbf{Y}}_{g}^{\top}{\mathbf{Y}}_{g}\right)}^{-1}{\mathbf{Y}}_{g}^{t}{\tilde{\mathbf{x}}}_{g}\hfill \end{array}$$
- Step 4
- Outer weight vectorLet ${k}_{g}=\left\{k\in \left\{1,\cdots ,K\right\}|{y}_{k}\phantom{\rule{3.33333pt}{0ex}}{x}_{g}\right\}$ be a set of indices for MVs related to LV ${x}_{g}$; then, ${w}_{g}$, $g=1,\cdots ,g$, is a column vector of length $|{k}_{g}|$. We can write down the matrix of outer weights, W as:$$W=\left(\begin{array}{cccc}{w}_{1}& 0& \cdots & 0\\ 0& {w}_{2}& \cdots & 0\\ \vdots & \vdots & \ddots & \vdots \\ 0& 0& \cdots & {w}_{G}\end{array}\right)$$The outer weight vectors, ${w}_{1},\cdots ,{w}_{G}$, in an outer weights matrix W, which we are using now to estimate the factor scores by means of the MVs, are$$\begin{array}{cc}\hfill \widehat{\mathbf{X}}& =\mathbf{YW}\hfill \\ \hfill {\widehat{\mathbf{X}}}_{g}& =\frac{{\widehat{\mathbf{X}}}_{g}}{\sqrt{VAR\left({\widehat{\mathbf{X}}}_{g}\right)}},g=1,\cdots ,G,\hfill \end{array}$$
- Step 5
- IterationIf the relative change of all the outer weights from one iteration to the next are smaller than a predefined tolerance,$$\left|\frac{{\widehat{w}}_{kg}^{old}-{\widehat{w}}_{kg}^{new}}{{\widehat{w}}_{kg}^{new}}\right|<\u03f5,\forall ,k=1,\cdots ,K\wedge g=1,\cdots ,G,$$

#### Appendix A.2. Weighting Scheme

#### Appendix A.2.1. Centroid (A)

#### Appendix A.2.2. Factorial (B)

#### Appendix A.2.3. Path Weighting (C)

#### Appendix A.3. Discriminant Validity Check

#### Appendix A.3.1. Path Coefficients

#### Appendix A.3.2. Total Effects

#### Appendix A.3.3. Outer Loadings

## References

- World Health Organization. Malaria Rapid Diagnostic Test Performance: Results of WHO Product Testing of Malaria RDTs: Round 6; World Health Organization: Geneva, Switzerland, 2015. [Google Scholar]
- Haque, U.; Hashizume, M.; Glass, G.E.; Dewan, A.M.; Overgaard, H.J.; Yamamoto, T. The role of climate variability in the spread of malaria in Bangladeshi highlands. PLoS ONE
**2010**, 5, e14341. [Google Scholar] [CrossRef] [PubMed] - Bonan, G.B.; Shugart, H.H. Environmental factors and ecological processes in boreal forests. Annu. Rev. Ecol. Syst.
**1989**, 20, 1–28. [Google Scholar] [CrossRef] - Kumar, V.; Mangal, A.; Panesar, S.; Yadav, G.; Talwar, R.; Raut, D.; Singh, S. Forecasting malaria cases using climatic factors in Delhi, India: A time series analysis. Malar. Res. Treat.
**2014**. [Google Scholar] [CrossRef] [PubMed] - Ngarakana-Gwasira, E.T.; Bhunu, C.P.; Masocha, M.; Mashonjowa, E. Assessing the Role of Climate Change in Malaria Transmission in Africa. Malar. Res. Treat.
**2016**. [Google Scholar] [CrossRef] [PubMed] - Nath, D.C.; Mwchahary, D.D. Association between Climatic Variables and Malaria Incidence: A Study in Kokrajhar District of Assam, India: Climatic Variables and Malaria Incidence in Kokrajhar District. Glob. J. Health Sci.
**2013**, 5, 90. [Google Scholar] - Modu, B.; Asyhari, A.T.; Peng, Y. Data Analytics of climatic factor influence on the impact of malaria incidence. In Proceedings of the 2016 IEEE Symposium Series on Computational Intelligence (SSCI), Athens, Greece, 6–9 December 2016; pp. 1–8. [Google Scholar]
- Tenenhaus, M.; Vinzi, V.E.; Chatelin, Y.M.; Lauro, C. PLS path modeling. Comput. Stat. Data Anal.
**2005**, 48, 159–205. [Google Scholar] [CrossRef] - Sriram, T.; Rao, V.; Narayana, S.; Dowluru, K. Intelligent Parkinson disease prediction using machine learning algorithms. Int. J. Eng. Innov. Technol.
**2013**, 3, 212–215. [Google Scholar] - Ganesan, N.; Venkatesh, K.; Rama, M.A. Application of Neural Networks in diagnosing cancer disease using demographic data. Int. J. Comput. Appl.
**2010**, 1, 76–85. [Google Scholar] [CrossRef] - Aditya, M.; Prince, K.; Himanshu, A.; Pankaj, K. Early heart disease prediction using data mining techniques. Comput. Sci. Inf. Technol.
**2014**, 53–59. [Google Scholar] [CrossRef] - Wang, L. (Ed.) Support Vector Machines: Theory and Applications; Springer: Berlin, Germany, 2005; Volume 177. [Google Scholar]
- Sharma, V.; Kumar, A.; Panat, L.; Karajkhede, G.; Lele, A. Malaria outbreak prediction model using machine learning. Int. J. Adv. Res. Comput. Eng. Technol.
**2015**, 4, 4415–4419. [Google Scholar] - Parham, P.E.; Michael, E. Modelling the effects of weather and climate change on malaria transmission. Environ. Health Perspect.
**2010**, 118, 620. [Google Scholar] [CrossRef] [PubMed] - Myers, S.S.; Patz, J.A. Emerging threats to human health from global environmental change. Annu. Rev. Environ. Resour.
**2009**, 34, 223–252. [Google Scholar] [CrossRef] - Myers, S.S.; Gaffikin, L.; Golden, C.D.; Ostfeld, R.S.; Redford, K.H.; Ricketts, T.H.; Osofsky, S.A. Human health impacts of ecosystem alteration. Proc. Natl. Acad. Sci. USA
**2013**, 110, 18753–18760. [Google Scholar] [CrossRef] [PubMed] - Bayles, B.R.; Brauman, K.A.; Adkins, J.N.; Allan, B.F.; Ellis, A.M.; Goldberg, T.L.; Ricketts, T.H. Ecosystem Services Connect Environmental Change to Human Health Outcomes. EcoHealth
**2016**, 13, 443–449. [Google Scholar] [CrossRef] [PubMed] - The Potsdam Institute for Climate Impact Research and Climate Analytics. Turn-Down the Heat—Why a 4 Degree Warmer World Must Be Avoided; International Bank for Reconstruction and Development and World Bank: Washington, DC, USA, 2012. [Google Scholar]
- De Castro, M.C.; Monte-Mór, R.L.; Sawyer, D.O.; Singer, B.H. Malaria risk on the Amazon frontier. Proc. Natl. Acad. Sci. USA
**2006**, 103, 2452–2457. [Google Scholar] [CrossRef] [PubMed] - Nyarko, P. Population and Housing Census, District Analytical Report, Ejisu-Juaben Municipal. Available online: https://www.citypopulation.de/php/ghana-admin.php?adm2id=0117. (accessed on 12 January 2017).
- Addai, G.; Anyatewon Kwesi, D. 2010 Population and Housing Census: District Analytical Report, 1st ed.; Ghana Statistical Service: Accra, Ghana, 2014.
- Takyi Appiah, S.; Otoo, H.; Nabubie, I.B. Times Series Analysis Of Malaria Cases In Ejisu-Juaben Municipality. Int. J. Sci. Technol. Res.
**2015**, 4, 220–226. [Google Scholar] - Global Weather Data for SWAT. Available online: http://globalweather.tamu.edu (accessed on 24 June 2017).
- Nitzl, C.; Chin, W.W. The case of partial least squares (PLS) path modeling in managerial accounting research. J. Manag. Control
**2017**, 28, 137–156. [Google Scholar] [CrossRef] - Bagozzi, R.P.; Yi, Y. Specification, evaluation, and interpretation of structural equation models. J. Acad. Mark. Sci.
**2012**, 40, 8–34. [Google Scholar] [CrossRef] - Dan, E.D.; Jude, O.; Idochi, O. Modelling and forecasting malaria mortality rate using SARIMA models (a case study of Aboh Mbaise general hospital, Imo State Nigeria). Sci. J. Appl. Math. Stat.
**2014**, 2, 31–41. [Google Scholar] [CrossRef] - Ruscio, J.; Roche, B. Determining the number of factors to retain in an exploratory factor analysis using comparison data of known factorial structure. Psychol. Assess.
**2012**, 24, 282. [Google Scholar] [CrossRef] [PubMed] - Kline, R.B. Principles and Practice of Structural Equation Modelling; Guilford Publications: New York, NY, USA, 2015. [Google Scholar]
- Kelloway, E.K.; Santor, D.A. Using LISREL for Structural Equation Modelling: A Researcher’s Guide. Can. Psychol.
**1999**, 40, 381. [Google Scholar] - Monecke, A.; Leisch, F. SemPLS: Structural Equation Modeling Using Partial Least Squares. J. Stat. Softw.
**2012**, 48, 1–32. [Google Scholar] [CrossRef] - Wold, H. Soft Modeling: The Basic Design and Some Extensions. In Systems under Indirect Observation: Causality– Structure– Prediction; Part 2; Jöreskog, K.G., Wold, H., Eds.; North-Holland Publishing Company: Amsterdam, The Netherlands, 1982; pp. 1–54. [Google Scholar]
- Dijkstra, T.K. Latent variables and indices: Herman Wold’s basic design and partial least squares. In Handbook of Partial Least Squares; Springer: Berlin, Germany, 2010; pp. 23–46. [Google Scholar]
- Byrne, B.M. Structural Equation Modelling with LISREL, PRELIS, and SIMPLIS: Basic Concepts, Applications, and Programming; Psychology Press: Hove, UK, 2013. [Google Scholar]
- Li, X.X.; Wang, L.X.; Zhang, J.; Liu, Y.X.; Zhang, H.; Jiang, S.W.; Zhou, X.N. Exploration of ecological factors related to the spatial heterogeneity of tuberculosis prevalence in PR China. Glob. Health Action
**2014**, 7. [Google Scholar] [CrossRef][Green Version] - Yeomans, K.A.; Golder, P.A. The Guttman-Kaiser criterion as a predictor of the number of common factors. Statistician
**1982**, 31, 221–229. [Google Scholar] [CrossRef] - Ledesma, R.D.; Valero-Mora, P.; Macbeth, G. The scree test and the number of factors: A dynamic graphics approach. Span. J. Psychol.
**2015**, 18. [Google Scholar] [CrossRef] [PubMed] - Xu, L.; Stige, L.C.; Chan, K.S.; Zhou, J.; Yang, J.; Sang, S.; Lu, L. Climate variation drives dengue dynamics. Proc. Natl. Acad. Sci. USA
**2016**, 114, 113–118. [Google Scholar] [CrossRef] [PubMed] - Srinivasulu, N.; Gujju Gandhi, B.; Naik, R.; Daravath, S. Influence of Climate Change on Malaria Incidence in Mahaboobnagar District of Andhra Pradesh, India. Available online: https://www.ijcmas.com/Archives/vol-2-5/N.%20Srinivasulu,%20et%20al.pdf (accessed on 24 June 2017).
- Hair, J.F.; Sarstedt, M.; Pieper, T.M.; Ringle, C.M. The use of partial least squares structural equation modelling in strategic management research: A review of past practices and recommendations for future applications. Long Range Plan.
**2012**, 45, 320–340. [Google Scholar] [CrossRef] - Jarque, C.M.; Bera, A.K. A test for normality of observations and regression residuals. Int. Stat. Rev./Rev. Int. Stat.
**1987**, 55, 163–172. [Google Scholar] [CrossRef] - Wilk, M.B.; Gnanadesikan, R. Probability plotting methods for the analysis for the analysis of data. Biometrika
**1968**, 55, 1–17. [Google Scholar] [CrossRef] [PubMed] - Lohmöller, J.B. Latent Variable Path Analysis with Partial Least Squares; Physica-Verlag: Heidelberg, Germany, 1989. [Google Scholar]
- Lustgarten, J.L.; Gopalakrishnan, V.; Grover, H.; Visweswaran, S. Improving classification performance with discretization on biomedical datasets. In Proceedings of the AMIA Annual Symposium, Hilton Washington and Tower, Washington, DC, USA, 8 November 2008. [Google Scholar]
- Maslove, D.M.; Podchiyska, T.; Lowe, H.J. Discretization of continuous features in clinical datasets. J. Am. Med. Inf. Assoc.
**2013**, 20, 544–553. [Google Scholar] [CrossRef] [PubMed] - Scikit-Learn. Available online: http://www.scikit-learn.org (accessed on 24 June 2017).
- MLSVM for Research. Available online: https://play.google.com/store/apps/details?id=project.lanydr.mlsvm&hl=en (accessed on 24 June 2017).
- LIBSVM-A Library for Support Vector Machines. Available online: www.csie.ntu.edu.tw/~cjlin/libsvm/ (accessed on 24 June 2017).
- Weather API. Available online: http://openweathermap.org/api) (accessed on 24 June 2017).
- Gang, S. Soft modeling: Intermediate between traditional model building and data analysis. In Mathematical Statistics; Polish Scientific Publishers: Warsaw, Poland, 1980; Volume 6, pp. 333–346. [Google Scholar]

**Figure 1.**Conceptual framework of the malaria ecosystem describing the dynamic stages of malaria transmission from humans and mosquitoes under the influence of environmental factors. The boxes colored blue indicates the dynamics development of malaria parasite and its interaction between human host with mosquito vector and ecology. While the box colored red is the main scheme for malaria prevention and control indicating the intervention measure taken to mitigate the burden imposed on the human population.

**Figure 2.**The picture on the left shows the map of Ghana and the portion of Kumasi city, where the study area within which Ejisu-Juaben lies. The picture on the right illustrates the climate vegetational belt characterized by a typical semi-deciduous forest.

**Figure 3.**Structural equation model showing the relationship between malaria incidence and climate factors, the black colored rectangle indicating measurement variables while red colored ellipse is latent variables. (

**a**) Showing the hypothetical causal relationship between malaria incidence and the climate factors. (

**b**) Presenting the reduced causal relationship between malaria incidence and the climate factors after applying factor analysis to identify hidden factors and their dependent measurement variables.

**Figure 4.**The Cattell scree plot presents the eigenvalues of the components and threshold for identifying the number of hidden ecological factors to be considered using the information in Table 1.

Mal. Incid. | Max. Temp. | Min. Temp. | Precip. | Rel. Humid | Solar Rad. | Wind Speed | |
---|---|---|---|---|---|---|---|

Mal. Incid. | 1.00 | - | - | - | - | - | - |

Max. Temp. | 0.28 | 1.00 | - | - | - | - | - |

Min. Temp. | 0.68 | 0.04 | 1.00 | - | - | - | - |

Precip. | −0.21 | −0.36 | 0.22 | 1.00 | - | - | - |

Rel. Humid. | 0.51 | −0.24 | 0.90 | 0.38 | 1.00 | - | - |

Solar Rad. | 0.19 | 0.54 | −0.33 | −0.10 | −0.44 | 1.00 | - |

Wind Speed | −0.16 | 0.07 | 0.45 | 0.17 | 0.39 | 0.01 | 1.00 |

**Table 2.**Cross-correlation between meteorological variables and malaria incidence; VIF:variance inflated factor.

Variables | Lag 0 | Lag 1 | Lag 2 | VIF | Kurtosis | Standard Error |
---|---|---|---|---|---|---|

Maximum temperature | 0.284 | 0.321 ${}^{b}$ | 0.092 | 2.4096 | 5.48 | 0.38 |

Minimum temperature | −0.122 | 0.215 ${}^{b}$ | −0.237 | 8.7919 | 2.07 | 0.33 |

Precipitation | −0.214 | −0.292 ${}^{a}$ | −0.155 | 1.4194 | 20.73 | 0.27 |

Relative humidity | −0.134 | 0.254 ${}^{b}$ | −0.198 | 9.0065 | 1.42 | 0.02 |

Solar radiation | - | - | - | 1.9000 | 6.73 | 0.50 |

Wind speed | - | - | - | 1.3452 | −0.58 | 0.04 |

^{a}negative association at lag 1.

^{b}positive association at lag 1.

Measurement/Structural Model | Parameter | Estimate | Centroid (A) | Factorial (B) | Path Weighting (C) |
---|---|---|---|---|---|

Minimum temperature ⟵ FactorI | ${\lambda}_{1,1}$ | 0.9479 | 0.9479 | 0.9495 | 0.9495 |

Relative humidity ⟵ FactorI | ${\lambda}_{1,2}$ | 0.9910 | 0.9910 | 0.9903 | 0.9903 |

Maximum temperature ⟵ FactorII | ${\lambda}_{2,1}$ | 0.8816 | 0.8816 | 0.8675 | 0.8675 |

Solar radiation ⟵ FactorII | ${\lambda}_{2,2}$ | 0.8735 | 0.8735 | 0.8873 | 0.8873 |

Precipitation ⟵ FactorIII | ${\lambda}_{3,1}$ | 0.9849 | 0.9849 | 0.9852 | 0.9852 |

Wind speed ⟵ FactorIII | ${\lambda}_{3,2}$ | 0.0017 | 0.0017 | 0.0031 | 0.0031 |

FactorI ⟶ FactorII | ${\beta}_{1,2}$ | −0.3248 | −0.3248 | −0.3302 | −0.3302 |

FactorII ⟶ FactorIII | ${\beta}_{2,3}$ | −0.2774 | −0.2774 | −0.2690 | −0.2690 |

FactorI ⟶ Malaria incidence | ${\gamma}_{1}$ | 0.9700 | - | - | - |

FactorII ⟶ Malaria incidence | ${\gamma}_{2}$ | 0.7700 | - | - | - |

FactorIII ⟶ Malaria incidence | ${\gamma}_{3}$ | 0.4900 | - | - | - |

Maximum number of iterations | - | - | 12 | 15 | 15 |

**Table 4.**Bootstrapping test of the outer loadings and path coefficients in the PLS-PM with a 95% confidence interval.

Measurement/Structural Model | Parameter | Estimate | Bias | Standard Error | Lower | Upper |
---|---|---|---|---|---|---|

Minimum ⟵ FactorI | ${\lambda}_{1,1}$ | 0.9479 | −0.0057 | 0.0467 | 0.8240 | 0.9890 |

Relative humidity ⟵ FactorI | ${\lambda}_{1,2}$ | 0.9910 | −0.0055 | 0.0347 | 0.9823 | 1.0000 |

Maximum temperature ⟵ FactorII | ${\lambda}_{2,1}$ | 0.8816 | −0.0329 | 0.1289 | 0.4769 | 0.9810 |

Solar radiation ⟵ FactorII | ${\lambda}_{2,2}$ | 0.8735 | −0.0343 | 0.1748 | −0.0705 | 0.9550 |

Precipitation ⟵ FactorIII | ${\lambda}_{3,1}$ | 0.9849 | −0.1748 | 0.4044 | 0.7666 | 1.0000 |

Wind speed ⟵ FactorIII | ${\lambda}_{3,2}$ | 0.0017 | 0.1356 | 0.4059 | −0.6593 | 0.7300 |

FactorI ⟶ FactorII | ${\beta}_{1,2}$ | −0.3248 | −0.0333 | 0.1692 | −0.4974 | 0.4260 |

FactorII ⟶ FactorIII | ${\beta}_{2,3}$ | −0.2774 | −0.0264 | 0.2191 | −0.4963 | 0.3810 |

**Table 5.**Indices for selecting the ecological hidden factor of high malaria incidence in the study area.

Factor | Reflective Variables | Communality | Dillon–Goldstein’s $\mathit{\rho}$ |
---|---|---|---|

I | 2 | 0.94 ${}^{c}$ (94%) | 0.97 ${}^{c}$ (97%) |

II | 2 | 0.77 (77%) | 0.87 (87%) |

III | 2 | 0.49 (49%) | 0.49 (49%) |

^{c}the most significant hidden factor.

**Table 6.**Summary of data discretization using the k-means algorithm. SSB: The sum of squares of errors between the clusters; SST: The total sum of squares of the entire clusters.

Number of Clusters (k) | 2 | 3 | 4 | 5 |
---|---|---|---|---|

iteration | 3 | 4 | 9 | 6 |

convergence | yes | yes | yes | no |

$\frac{SSB}{SST}$ | 66.4% | 82% | 89.9% | 93% |

**Table 7.**Comparison of the accuracy of model checking algorithms. LiR: Linear Regression; LoR: Logistic Regression; DT: Decision Tree; SVM: Support Vector Machine; SVM (o): Optimized Support Vector Machine; NB: Naive Bayes; KNN: K-Nearest Neighbours; K-M: K-Means.

Algorithm | LiR | LoR | DT | SVM | SVM (o) | NB | KNN1 | KNN5 | KNN10 | K-M (3) |
---|---|---|---|---|---|---|---|---|---|---|

Accuracy | 83.8% | 75.0% | 63.8% | 80.6% | 99.0% | 63.9% | 58.3% | 80.6% | 80.6% | 47.2% |

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Modu, B.; Polovina, N.; Lan, Y.; Konur, S.; Asyhari, A.T.; Peng, Y.
Towards a Predictive Analytics-Based Intelligent Malaria Outbreak Warning System. *Appl. Sci.* **2017**, *7*, 836.
https://doi.org/10.3390/app7080836

**AMA Style**

Modu B, Polovina N, Lan Y, Konur S, Asyhari AT, Peng Y.
Towards a Predictive Analytics-Based Intelligent Malaria Outbreak Warning System. *Applied Sciences*. 2017; 7(8):836.
https://doi.org/10.3390/app7080836

**Chicago/Turabian Style**

Modu, Babagana, Nereida Polovina, Yang Lan, Savas Konur, A. Taufiq Asyhari, and Yonghong Peng.
2017. "Towards a Predictive Analytics-Based Intelligent Malaria Outbreak Warning System" *Applied Sciences* 7, no. 8: 836.
https://doi.org/10.3390/app7080836