Prediction of Human Brucellosis in China Based on Temperature and NDVI

Brucellosis occurs periodically and causes great economic and health burdens. Brucellosis prediction plays an important role in its prevention and treatment. This paper establishes relationships between human brucellosis (HB) and land surface temperature (LST), and the normalized difference vegetation index (NDVI). A seasonal autoregressive integrated moving average with exogenous variables (SARIMAX) model is constructed to predict trends in brucellosis rates. The fitted results (Akaike Information Criterion (AIC) = 807.58, Schwarz Bayes Criterion (SBC) = 819.28) showed obvious periodicity and a rate of increase of 138.68% from January 2011 to May 2016. We found a significant effect between HB and NDVI. At the same time, the prediction part showed that the highest monthly incidence per year has a decreasing trend after 2015. This may be because of the brucellosis prevention and control measures taken by the Chinese Government. The proposed model allows the early detection of brucellosis outbreaks, allowing more effective prevention and control.


Introduction
Brucellosis is a common zoonosis afflicting humans and animals and causes heavy economic and health burdens around the world [1][2][3]. There are four types of brucellosis [4,5] affecting sheep, cows, pigs, and dogs, with transmission to humans mainly occurring through close contact with infected animals and their wastes, or by eating unsterilized animal products such as fresh milk [4,[6][7][8]. Aborted sheep fetuses are highly infectious. Farmers, butchers, and veterinarians who come into contact with animals are at high risk of infection. Brucellosis often causes a series of symptoms in infected people [9,10], and it leads to animal production and abortion [11][12][13][14]. Brucellosis is mainly distributed in Asia, the Middle East, sub-Saharan Africa and the Balkan Peninsula [1,15,16]. More than 500,000 cases occur globally every year [17]. Sheep brucella vaccine rev.1 is not suitable for human use [18,19]. The United States, New Zealand, and other countries have adopted quarantine or culling measures [20,21] but these measures cannot eradicate brucellosis completely.
As a country with a vast agricultural industry, China's brucellosis prevention and control situation is very serious [13]. The number of brucellosis cases is on the rise and the affected region is gradually spreading from the north to the south. The north is a high-incidence area [22], especially in Inner Mongolia. Meanwhile, the south has sporadic outbreaks [23], especially in recent years in Hainan Province, which was previously free of the disease. The first case of brucellosis occurred in China in

Human Brucellosis Data
Data on human brucellosis (HB) cases were collected on a monthly basis by the Chinese Centre for Disease Control and Prevention (CDC) [47] from January 2011 to December 2016. Brucellosis is a class b infectious disease, as stipulated by the law of prevention and treatment of infectious diseases in China. Cases of brucellosis must be reported within 24 h and suspected cases are subject to clinical and serological testing or isolation as required by the World Health Organization. In this study, monthly means of HB cases across China from January 2011 to May 2016 were used in the model to fit the ARIMAX model, while the data from June 2016 to May 2017 were used for validation.

Land Surface Temperature Data
Land surface temperature (LST) has an important impact on surface-dwelling organisms and is an important variable for studying biological change. In the present study, LST data were obtained for January 2011 to May 2016, and monthly means were used as input for the ARIMAX model of brucellosis. The data was provided by the International Scientific Data Mirror website [48] of the Computer Network Information Centre of the Chinese Academy of Sciences. We used the MODLT1T China 1 km surface temperature monthly composite product. Then, ArcGIS10.4 software (Environment Systems Research Institute, Redlands, CA, USA) was used to obtain monthly mean daytime and night-time surface temperatures across China (the monthly product includes daily and evening monthly average images), and a monthly mean national temperature was calculated.

Normalized Difference Vegetation Index Data
The normalized difference vegetation index (NDVI) is an important indicator of land cover. It is provided by the International Scientific Data Mirror website [48] of the Computer Network Information Centre of the Chinese Academy of Sciences. The MODND1M China 500 m NDVI monthly synthesis product from January 2011 to May 2016 was adopted, and ArcGIS software was used to calculate national mean monthly NDVI values.

General Concepts
The steps used to build the overall model are as follows. The input HB, LST and NDVI data are differentiated to stabilize them. Independent variables of LST and NDVI are fitted to the ARIMA model. Each independent variable model is used to filter the HB data and determine a fitted model, as shown in Figure 1. The detailed data processing procedure is shown in the following section.
brucellosis. The data was provided by the International Scientific Data Mirror website [48] of the Computer Network Information Centre of the Chinese Academy of Sciences. We used the MODLT1T China 1 km surface temperature monthly composite product. Then, ArcGIS10.4 software (Environment Systems Research Institute, Redlands, CA, USA) was used to obtain monthly mean daytime and night-time surface temperatures across China (the monthly product includes daily and evening monthly average images), and a monthly mean national temperature was calculated.

Normalized Difference Vegetation Index Data
The normalized difference vegetation index (NDVI) is an important indicator of land cover. It is provided by the International Scientific Data Mirror website [48] of the Computer Network Information Centre of the Chinese Academy of Sciences. The MODND1M China 500 m NDVI monthly synthesis product from January 2011 to May 2016 was adopted, and ArcGIS software was used to calculate national mean monthly NDVI values.

General Concepts
The steps used to build the overall model are as follows. The input HB, LST and NDVI data are differentiated to stabilize them. Independent variables of LST and NDVI are fitted to the ARIMA model. Each independent variable model is used to filter the HB data and determine a fitted model, as shown in Figure 1. The detailed data processing procedure is shown in the following section.  As a classical time series model [29], ARIMA (p, d, q) is a further development of the AR (p) and MA (q) models, where p is the number of AR autoregression terms, d is the difference order, and q is the number of MA sliding average terms. If the information of the sequence cannot be fully extracted when the time series is stable, an SARIMA(p, d, q)(P, D, Q) S model [49] needs to be considered, where P refers to the maximum lag order of the seasonal autoregression term, Q is the maximum lag order of the moving average operator, D is the seasonal difference order, and S is the seasonal difference cycle step. When we consider input variables such as LST, NDVI, we need to use an ARIMA model with input variables, namely, an ARIMAX model. A SARIMAX model can also be considered to be an ARIMA model with an interference sequence or dynamic regression model, which contains a regression section with input variables and an ARIMA section. Its general model is shown as formula (1): Here, µ represents a constant, β i is regression coefficients of external variables, Φ P (B) refers to the seasonal autoregressive coefficient polynomial with P-order, Θ Q (B) denotes the polynomial of the seasonal moving average coefficient with Q-order, φ p (B) stands for the autoregressive coefficient polynomial with p-order, θ q (B) is the polynomial of the seasonal moving average coefficient with When setting up an ARIMAX model, the steps are as follows: (1) A stationarity test is carried out for the response sequence and each input variable. If the input data is not stationary, it should be stabilized by a differential method. No matter which difference [50] method is adopted, it cannot meet the requirement of stationarity and the seasonal variation is obvious. Therefore, a seasonal difference can be considered. Using a seasonal differential method, the sequence was determined to be in a steady state. Since the surface temperature data and NDVI ( Figure 2) are themselves in the stationary condition, the next step can be taken directly. Similarly, HB data requires both first-order difference and seasonal difference processing to be a stationary series. The method used to check whether the data is stationary is the Augmented Dickey-Fuller (ADF) test [51], with a result of p < 0.05 being indicative of stability. (2) White noise testing. A stationary time series is of no value if it is simply white noise. On the condition that it is not white noise, further study is necessary, then further pattern recognition. All the above three input variables are non-white-noise sequences. (3) ARIMA and prewhitening. The ARIMA model is fitted to each input variable-LST, NDVI-to make the residual of the model a white noise sequence. Then, the output variable brucellosis sequence is filtered by this model to calculate the correlation coefficient between the input sequence and the output sequence. (4) The structure of the SARIMAX model is determined by using the co-correlativity coefficient, and then the model is fitted.

Model Test
The root mean square error (RMSE), mean absolute percentage error (MAPE), and mean error rate (MER) are used to evaluate the fit and predictive effect of the established model. These are calculated according to Equations (2)-(4).
In the formulas, X i denotes actual values, X i is the mean of the actual values, X i is the fitted or predicted values, and N refers to the number of simulated or predicted values. The software used in this study was SAS 9.4 (SAS Institute Inc., Cary, NC, USA), ARCGIS 10.4 and EXCEL 2016 (Software, Redmond, CA, USA). The LST and NDVI data were read by ARCGIS 10.4, and the BDI data were imported into EXCEL 2016. Data fitting, model establishment, inspection, and mapping were all carried out in SAS 9.4.

General Characteristics
A total of 71 observed values of brucellosis data were obtained from January 2011 to November 2016, among which the first 65 were used as training data for the establishment of an ARIMAX model, and the last six values were used for comparison to test the predictive effect of the model. Looking at the data from previous years, the monthly data ranged from a minimum of 1123 cases per month (January 2012) to a maximum of 8102 cases per month (July 2014). The total number of cases in more than five years was 274,800, the monthly average was 4227 cases per month (HB/mo), and the standard error was 1875 HB/mo. Between 2011 and 2015, HB cases increased from 43,827 to 60,782, an increase of 138.68%. According to the data, the incidence of brucellosis has obvious seasonality and periodicity, with a peak period of May to August, as shown in Figure 3.

HB Data Stationarity
As a response sequence, HB data is required to be stationary for modelling purposes. It can be seen from the scatter diagram that it is non-stationary and has obvious seasonality. After the difference method is adopted, it is found that the data is first order twelve-step differential stability. The test results are shown in Table 1.

LST and NDVI Models
A scatter diagram of surface temperature and NDVI data from January 2011 to May 2016 was made and it showed that the data were approximately stable. The ADF test result of p < 0.05 showed that the data were stable, so the difference method was not needed. Meanwhile, the white noise test showed a non-white noise sequence. According to the ACF diagram, PACF diagram, and AIC

HB Data Stationarity
As a response sequence, HB data is required to be stationary for modelling purposes. It can be seen from the scatter diagram that it is non-stationary and has obvious seasonality. After the difference method is adopted, it is found that the data is first order twelve-step differential stability. The test results are shown in Table 1.

LST and NDVI Models
A scatter diagram of surface temperature and NDVI data from January 2011 to May 2016 was made and it showed that the data were approximately stable. The ADF test result of p < 0.05 showed that the data were stable, so the difference method was not needed. Meanwhile, the white noise test showed a non-white noise sequence. According to the ACF diagram, PACF diagram, and AIC criterion, the optimal temperature model is ARMA (2, 1), whose corresponding AIC and SBC values are 340.43 and 349.13, respectively. All parameters of this model are significant, and the residual error is white noise. The NDVI ADF test shows that the NDVI model is stable and the residuals are white noise. Also, with the aid of the ACF and PACF diagrams, the orders of models were determined. Finally, the selected models are shown in Table 2; the optimization model is ARMA (2, 1). The results for LST and NDVI are shown in Figure 2.

SARIMAX Model
The optimal brucellosis model was filtered by the optimal models of LST and NDVI, then a regression equation was determined, with the model fitted according to the correlation number. If the parameters of the fitted model are not significant or the residuals do not pass the white noise test, the parameters need to be adjusted constantly. Finally, the model with significant parameters ( Table 3) that passed the white noise test (Table 4) after multiple adjustments is shown as Formula (5), and its corresponding AIC and SBC values are 807.58 and 819.28, respectively. This model is used for prediction, with the predicted results shown in Table 5 and in Figure 4. After establishing the final model, RMSE, MAPE and MER are adopted to evaluate the ARIMAX model established above.

Assessing Model Performance
For the final evaluation result, RMSE, MAPE, MER [29], and other common evaluation methods are adopted to evaluate the difference between fitted and predicted values.  Table 5.

Trends and Seasonal Periodicity
As can be seen from Figure 1, the annual incidence of brucellosis has been increasing since 2011, with the annual peaks increasing. One important cause is the increasing scale of livestock husbandry In the formula, y represents the incidence of HB, x 1 , x 2 represent the input variables LST, NDVI, respectively, B is a lag factor, and ε 0 is the white noise residual sequence.

Assessing Model Performance
For the final evaluation result, RMSE, MAPE, MER [29], and other common evaluation methods are adopted to evaluate the difference between fitted and predicted values.  Table 5.

Trends and Seasonal Periodicity
As can be seen from Figure 1, the annual incidence of brucellosis has been increasing since 2011, with the annual peaks increasing. One important cause is the increasing scale of livestock husbandry caused by improvements in China's economy and demands for livestock products in recent years [52]. At the same time, it may be related to the free-range breeding methods used by farmers, which have much higher animal disease risks than conventional methods. In this case, susceptible animals can come into contact with infected animals or brucellosis bacteria. Sheep are the main source of infection in China, followed by cattle and pigs. The peak timing of brucellosis rates is also closely related to the lamb production season. The LST, NDVI, and HB data have similar trends, which indicates that there is a close relationship between them. See Figure 3 for more details.
Brucellosis shows obvious seasonal periodicity. The incidence of brucellosis is relatively high from April to August every year, with the peak occurring from May to July. This may be related to many factors [36], such as precipitation, lamb production, animal movement, wind speed, and sunshine levels. A more plausible explanation is that as temperatures rise in the spring, snow and ice melt, amounts of organic matter increase, and precipitation increases due to weather changes, all of which are conducive to the activation and incubation of bacteria. Spring is also the breeding season for many animals. With more animals moving around, the area affected by infected animals and their excretions becomes larger and the risk of infection of sensitive animals is higher. The net result may be a growing number of brucellosis cases that are distributed more widely across the country.
It can be seen from Figure 3 that from January 2011 to May 2016, parts of the fitted results are almost identical to the actual values. Only in July 2014 do the fitted and actual values have certain differences; the month where the abnormal values reached 8102, and which attained a maximum from 2011 to 2016. This is because the results of the ARIMAX simulation are continuous and smooth, so extreme values or outliers will make it relatively flat. From January 2011 to May 2016, the annual maximum values showed an increasing trend, however, the ARIMAX model could accurately predict from this increasing trend that the maximum monthly incidence decreased in 2016. It was less than in 2014 and 2015. This is consistent with actual observations, indicating that this model predicts trends well. In addition, since 2015, the rate of brucellosis cases has shown a downward trend. One important reason is that the measures taken by the Chinese Government to control brucellosis more strictly have been effective.

The Interrelations of Variables and Their Significance
Since LST and NDVI are two environmental factors that have important influence on HB, we chose these two factors mainly to explore their influence on HB. Although collinearity exists between LST and NDVI, collinearity does not affect the prediction accuracy of ARIMAX model. So, we do not deal with it. As shown in Figure 4, the fitted curve is basically consistent with that of NDVI. The coefficient of NDVI is the largest, indicating that its change does indeed indicate the basic trend of brucellosis onset and has the greatest influence. This is because vegetation coverage increases from January to July each year. This provides live Brucella bacteria with favorable hydrothermal equilibrium conditions. With increasing areas of favorable conditions, the number of Brucella hosts, such as sheep that feed on infected vegetation, increases. The highest incidence of brucellosis occurs in Inner Mongolia [23] in northern China, which is closely related to its widespread grasslands. This is the best example of the influence of vegetation on Brucella. Because the time at which spring ewes lamb is appropriate for Brucella, contact with amniotic fluid that carries Brucella bacteria can cause infection in humans. However, with increases in temperature, water evaporation increases and large numbers of bacteria multiply rapidly. The moisture content of the soil decreases because of water evaporation. At the same time, brucella has very high nutritional demands, while a large number of bacteria are competing for nutrients from brucellosis bacteria, so Brucella bacteria receive less nutrition. These conditions do not facilitate the growth of bacteria, causing brucellosis cases to decrease. Hence, there is a negative correlation between LST and brucellosis cases. In summary, HB incidence is primarily determined by NDVI, while LST influence the amplitude of its change. Previous studies have considered temperature, moisture, and other environments in which bacteria infect hosts; however, it is better to also consider the effect of vegetation on temperature and moisture.
High vegetation coverage is good for Brucella survival and spread, which is a kind of average result. Some high vegetation coverage and wide distribution is very good for Brucella bacteria survival, while some other coverages are not very good. The level of NDVI that causes the highest HB incidence requires further study. The Inner Mongolia prairie [23] is very favorable for Brucella survival and the highest HB incidence in China is found there. In some southern areas, the vegetation coverage is higher but is not necessarily suitable for Brucella survival. From the local area to the national level, no vegetation types were significantly adverse to brucellosis. Of course, some vegetation may play a greater role in Brucella survival, while others may play a smaller role.
The results of this study improve on previous research results [29]. NDVI, LST were found to be of certain significance in the prediction, prevention and control of brucellosis. The consistency of the NDVI and brucellosis incidence trends show that NDVI is of important significance in formulating accurate prevention and control strategies, as it can be used to predict the incidence of brucellosis in a country or a region according to its specific vegetation. The lagged effects of LST show that we need to allocate medical resources, personnel, and animal control and isolation measures according to changes in LST. According to these conditions, brucellosis prevention and control measures can be planned in advance, medical resources can be allocated appropriately, and public awareness can be carried out. The two variables above may have greater significance for certain regional infections due to their relatively easy availability, extensive coverage and close correlations with various surface-dwelling bacteria.
This model is established throughout the whole country and has certain applicability to high-risk areas. For high-risk areas, the idea and method of building the model are the same, but the coefficients of the model may be different. As long as we get LST and NDVI for these regions, we can build a similar model. In areas with high vegetation coverage (such as grasslands) and dense flocks of sheep, the incidence is high, such as in the grasslands of Inner Mongolia (as can be seen from Equation (5)), which is a typical high-risk area. A higher NDVI coefficient may be obtained from the model if these regions are selected, because the temperature in the north is lower and the influence of LST is greater. That is, the absolute value of the coefficient is smaller (because it is negative) and lower temperatures are more conducive to the survival of brucellosis. However, in some areas in the south, the number of sheep is relatively small and the area of vegetation cover (such as grassland) is smaller, so the incidence in these areas is relatively low. The NDVI coefficient of the model presented here is relatively small. Meanwhile, due to the high temperature in the south, LST will increase while its influence on the equation will decrease. In other words, its coefficient will decrease (the absolute value will increase).

Other Factors
In the analysis above, only significant factors were selected for research. Choosing the main influencing factors can help to grasp the essence of the problem and highlight its regularity so as to solve it. In addition to the above factors, there may be other factors that affect brucellosis incidence, such as relevant policies and animal numbers. In general, strict control policies may decrease the incidence of brucellosis while lax ones may increase it. From the 1970s, the incidence of brucellosis declined steadily; however, after the 1990s, the number of brucellosis cases increased continuously. After 2015, brucellosis began to decline again. These changes were related to the policies of the Chinese Government. Before the 1990s, China's brucellosis control was strict and brucellosis incidence steadily declined. After the 1990s, due to the relaxation of awareness, vigilance and policies related to brucellosis, the number of brucellosis cases increased, and the Government attached greater importance to it. After 2013, the Government gradually formulated and adopted stronger policies that had an effect, as suggested by the decline in brucellosis since 2015.
From the perspective of animal numbers, after the 1990s, the number of livestock increased significantly. This was coupled with a restructuring of the livestock industry; the proportion of pork production decreased, while that of beef and mutton increased. According to official annual data released by the National Bureau of Statistics, the numbers of sheep raised from 2011 to 2015 were 28,664.15 (×10,000 head), 28,512.74, 28,935.22, 30,391.28, 31,174.27, and 29,930.54, respectively; an overall increasing trend. In the same period, the numbers of cattle were 9384.00 (×10,000 head), 9137.25, 8985.76, 9007.28, 9055.79, and 8834.49, respectively; an overall decrease, and contrary to the trend in brucellosis incidence. The incidence of brucellosis in sheep greatly increased along with the large increase in sheep numbers, while the incidence of brucellosis in humans remained unchanged. At the same time, the incidence of brucellosis in sheep greatly increased due to its wide distribution range and strong pathogenicity, so the chance of contracting brucellosis in humans would also have been greatly increased. This indicates that sheep and brucellosis are closely related. In 2016, the number of sheep decreased and brucellosis incidence declined. At the same time, the timing of human brucellosis incidence coincided with the timing of sheep reproduction. From 2011 to 2016, cattle numbers showed a downward trend, which was opposite to the increase in human brucellosis, indicating that brucellosis in cattle did not affect trends in human brucellosis on the whole, indicating that the influence of cattle brucellosis on human brucellosis is relatively weak. The reason may be that cattle Brucella are less pathogenic and only occur locally. This suggests that the effect of sheep brucellosis on human brucellosis is much greater than that of cattle brucellosis.
On the one hand, Brucella is dependent on the spread of environmental factors, etc. High vegetation cover provides a suitable living environment for Brucella. People and animals are easily infected with Brucella after coming into contact with these infected substances, thus greatly increasing the risk of human infection. On the other hand, exposure is also an important factor. Farmers who keep sheep on their farms may also be at increased risk of contracting brucellosis due to frequent contact with sheep. Similarly, butchers who often come into contact with live animals such as sheep during slaughter are also susceptible to infection. Therefore, the incidence of the disease is higher in the northern areas where sheep are raised intensively. Because farm households are more exposed to animals such as sheep during lambing season, this leads to a higher incidence of brucellosis, which shows a distinct seasonality.

Limitations
There are some shortcomings to the present study. First, the effect of precipitation on brucellosis was not considered. We know that precipitation and temperature are two important factors affecting brucellosis disease. However, since we mainly studied the problem based on remote sensing data, we did not consider precipitation, which should be done in future. In addition, since the LST and NDVI data were only available from January 2011 to May 2016, further data accumulation may be able to further improve the models.

Conclusions
In recent years, brucellosis rates have stayed at high levels, and the disease is involved in almost all provinces of China. It is seriously dangerous to human health and causes huge economic losses to the country; therefore, its control is of great significance. This study established relationships between HB incidence and several influencing factors. Using remote sensing data, we established the specific relationships between HB and LST and NDVI using an ARIMAX model. NDVI has a positive correlation with HB and follows its basic trends, while LST are negatively correlated with HB and only affect its magnitude. Although brucellosis has been on the decline since 2015 because of the prevention and control efforts of the Chinese Government, the disease is still considered serious due to its continued high incidence. Accordingly, we propose the following suggestions: (1) In the process of risk control and prevention, vegetation coverage should be regarded as an important reference index, especially in grasslands with high vegetation coverage. (2) Due to the higher human incidence in the lambing season, management of livestock practices should be strengthened at this time to prohibit the discharge of amniotic fluid and associated pollutants. (3) Since the low temperature of LST is conducive to the survival of brucellosis, during low spring temperatures, pathogens are spread in water due to snow and ice melts, resulting in greatly increased pathogen spreading and risk. Therefore, attention should be paid to strengthening the prevention and control of brucellosis at this time. (4) Since brucellosis has appeared repeatedly in China, prevention and treatment measures should be constantly reviewed to prevent decreases in the awareness of the problem. Our study has established a specific relationship between HB and several of its influences, which is of certain significance for the prevention and control of brucellosis in China. It also provides a reference for similar research in other countries.