Analysis of HIV/AIDS Epidemic and Socioeconomic Factors in Sub-Saharan Africa

Sub-Saharan Africa has been the epicenter of the outbreak since the spread of acquired immunodeficiency syndrome (AIDS) began to be prevalent. This article proposes several regression models to investigate the relationships between the HIV/AIDS epidemic and socioeconomic factors (the gross domestic product per capita, and population density) in ten countries of Sub-Saharan Africa, for 2011–2016. The maximum likelihood method was used to estimate the unknown parameters of these models along with the Newton–Raphson procedure and Fisher scoring algorithm. Comparing these regression models, there exist significant spatiotemporal non-stationarity and auto-correlations between the HIV/AIDS epidemic and two socioeconomic factors. Based on the empirical results, we suggest that the geographically and temporally weighted Poisson autoregressive (GTWPAR) model is more suitable than other models, and has the better fitting results.


Introduction
Acquired immunodeficiency syndrome (AIDS) is a malignant infectious disease with a high fatality rate caused by human immunodeficiency virus (HIV). The HIV/AIDS epidemic has been one of the greatest global public health and social development problems since 1981, particularly in Sub-Saharan Africa. As of 31 December 2016, over 30 million people had died from the disease [1]. More than 70% of the 35 million people are infected with the HIV/AIDS disease in Sub-Saharan Africa. Thus, the HIV/AIDS epidemic of Sub-Saharan Africa has attracted extensive attention from researchers around the world [2][3][4].
In earlier studies, Janet et al. [5] and Hallman et al. [6] demonstrated the relationship between the disease and socioeconomic status. Chris et al. [7] indicated socioeconomic factors to explain this disease outperformed cultural ones in South Africa. Mathematical models always play an important role in evaluating the trends of the HIV/AIDS epidemic [8]. For example, regression models have been widely used in the study of the relationship between this disease and influencing factors. Shiboski et al. [9] considered a generalized linear model to obtain the statistical analysis of the HIV/AIDS disease. A mixed-effects linear regression model was used to analyze the correlation between national population and antenatal care [10]. Laurence et al. [11] applied a spatial regression model to show that the epidemic had substantial geographic variance across Sub-Saharan Africa.
This paper proposes several regressive models to investigate the relationships between the HIV/AIDS epidemic, the gross domestic product (GDP) per capita and the population density in ten countries of Sub-Saharan Africa. The Poisson regression model is introduced in Section 2.1. Sections 2.2 and 2.3 describe two spatial models, respectively. A spatiotemporal autoregressive model is proposed in Section 2.4. The maximum likelihood method is used to obtain the iterative formulas of coefficient estimations in Section 3. The main results are shown in Section 4, followed by discussion in Section 5.

Poisson Regression Model
Regression models are a set of statistical processes for estimating the relationships between response and explanatory variables. The classical model is a linear regression. Nelder and Wedderburn [12] extended the linear model to a generalized linear regression for solving the discrete data problem. This kind of models are very important in ecology, medicine and economics [13][14][15]. Suppose that Y = (Y 1 , Y 2 , . . . , Y n ) is the response variable, where Y i (i = 1, . . . , n) are independent. The density function is where a(·), b(·), c(·, ·) are known functions, and θ i , φ i are unknown parameters for i = 1, 2, . . . , n.
Denote µ i = E(Y i ), and g(µ i ) = ln(µ i ) is a link function. Let X ij be explanatory variables for the ith observation in the jth variable. Then, the Poisson regression (PR) model is given by where i = 1, 2, . . . , n, and β j (j = 1, 2, . . . , p) are unknown parameters.

Geographically Weighted Poisson Regression Model
With in-depth study, regression models have been frequently applied in epidemiology and health geography for trying to investigate the persistent geographical variations in disease [16]. Based on the generalized linear regression, Brunsdon et al. [17] proposed the geographically weighted regression model to analyze the spatial non-stationary processes of discrete data. The disease maps arising from this process are considered through the establishment of the geographically weighted Poisson regression (GWPR) model [18][19][20] below where (u i , v i )(i = 1, 2, . . . , n) are the geographical locations, and β j (u i , v i )(j = 1, 2, . . . , p) are unknown parameters at the position (u i , v i ).

Geographically Weighted Poisson Autoregressive Model
Another issue deserving of special attention is whether there exists an interaction between different regions in terms of spatial data. Previous studies [21][22][23][24] showed that spatial data has not only spatial non-stationarity but also correlation. Zhang [25] proposed the geographically weighted Poisson autoregressive (GWPAR) model as follows: where ρ is a scalar autoregressive parameter, and c ik (i, k = 1, 2, . . . , n) is the adjacency relation between the ith and kth locations. Let c i be the number of regions adjacent to the ith position. If the kth position is next to the ith's, then c ik = 1/c i . Otherwise, c ik = 0.

Geographically and Temporally Weighted Poisson Autoregressive Model
Recently, many spatiotemporal models have been proposed to describe the spatiotemporal variations in the relationships of response and explanatory variables [26,27]. Concerning the modeling of spatiotemporal data, there are two important properties: non-stationarity and auto-correlation. The non-stationarity indicates that there exists more than one linear relation between response and explanatory variables. It can be used to identify where interesting relationships are likely to occur or where detailed investigation is necessary in the study areas [28]. Spatiotemporal auto-correlation is an important factor to determine the temporal correlations of observations [29]. These two problems always appeared together [30]. A geographically and temporally weighted autoregressive (GTWPAR) model can be applied to account for non-stationary and auto-correlated effects simultaneously.
Let Y be the response variable, and Y ik (i = 1, 2, . . . , n k , k = 1, 2, . . . , T) be the independent variables of Y in the ith position and the kth time. The density function can be defined as follows: where the parameters are similar to Section 2.1. Denote µ ik = E(Y ik ), and g(µ ik ) = ln(µ ik ). Let X ijk (j = 1, 2, . . . , p) be the jth explanatory variable. The GTWPAR model is expressed by where {β jk (u ik , v ik , t k )} is a set of unknown parameters at the ith position in the kth time, and c (ik) lm is the adjacent relation between the location (u ik , v ik , t k ) and (u lm , v lm , t m ). Following the work of [31], the spatiotemporal distance between the locations (u ik , v ik , t k ) and (u lm , v lm , t m ) can be defined as where µ and λ are used to balance spatiotemporal distances. Suppose that where d is a constant and satisfies min{d lm }. Next, we rewrite the model (4) in a matrix form where η = (η 11 , · · · , η n 1 1 , η 12 , · · · , η n 2 2 , · · · , η 1T , · · · , η n T T ) , C = (c (ik) lm ), X = (X ijk ) and B = (β jk (u ik , v ik , t k )). For convenience, define η K as the Kth element of η; C IK and X IK are the Ith row and the Kth column of the matrices C and X, respectively. The detailed expressions of C, X and B are given in Appendix A.1.

Remark 1.
For the GTWPAR model (4), if ρ = 0 and β jk (u ik , v ik , t k ) is independent of the spatiotemporal effect, the model is a PR model. If ρ = 0 and β jk (u ik , v ik , t k ) is dependent on spatial effect but independent of temporal effect, the model becomes GWPR model. If ρ = 0 and β jk (u ik , v ik , t k ) is independent of temporal effect, it is the GWPAR model. Thus, PR, GWPR and GWPAR models are the special cases of the GTWPAR model.

Remark 2.
The estimationsβ(u ik , v ik , t k )(i, l = 1, 2, . . . , n k , k, m = 1, 2, . . . , T) are related to the temporal and spatial effects in the GTWPAR model. If m = k, w ik (u lm , v lm , t m ) = 0 and c (ik) to the parameter estimations of the GWPAR model. If w ik (u lm , v lm , t m ) = 0(m = k) and C = 0, they are the estimations of the GWPR model. If W = 0 and C = 0, thenβ(u ik , v ik , t k ) =β are the global estimation values of the PR model.

Estimation of Parameter ρ
Based on the density function, the log-likelihood function of ρ is Differentiating L 2 (ρ) with respect to ρ, we have dρ . Then, we take the derivative of the model (5) with respect to ρ as follows: The detail calculation is given in Appendix A.3. Then, the Equation (8) can be rewritten in the following nonlinear form According to the Newton-Raphson procedure and Fisher scoring algorithm, the iterative formula where the scalar vector S(ρ (m) ) = 1 φ (ACη) T(Z − η) and the Fisher information matrix I(ρ) = 1 φ (ACη) T(ACη). The calculation process of the scalar vector S(ρ (m) ) and the information matrix I is given in Appendix A.3.

Main Results
In this section, we apply the PR, GWPR, GWPAR and GTWPAR models to analyze the relationships between the HIV/AIDS epidemic, the GDP per capita and population density in ten countries of Sub-Saharan Africa from 2011 to 2016. The ten countries are Angola, Botswana, Lesotho, Malawi, Mozambique, Namibia, South Africa, Swaziland, Zimbabwe and Zambia. The parameters of these four models are estimated by the Newton-Raphson procedure and Fisher scoring algorithm. The coefficient of determination R 2 , the corrected Akaike information criterion (AICc), the deviation (D) and mean-square error (MSE) are used to compare the performances of the four models [18].

The HIV/AIDS Epidemic Models
The data of HIV/AIDS incidence, GDP per capita and population density were derived from http://data.cnki.net/InternationalData/Report. Readers should note that authorization is required to access the database on this website. Figure 1 describes the HIV/AIDS incidence in ten countries from 2011 to 2016. It shows that the incidence varies significantly in different regions. Angola has a minimum incidence of less than 5%, while Botswana and Swaziland have higher incidences of more than 20% every years. Therefore, it may be necessary to consider the temporal and spatial factors in analyzing the HIV/AIDS epidemic. The distributions of HIV/AIDS cases, GDP per capita and population density are displayed in Figure 2. The Pearson correlation coefficients between these cases and GDP per capita and population density are 0.2739 and −0.1179, respectively. Meanwhile, the two socioeconomic factors have different effects on the HIV/AIDS cases at the spatiotemporal locations. These reflect a spatiotemporal non-stationarity between the cases and two factors in ten countries from 2011 to 2016. Table 1 lists the p-values of the first-order autocorrelation of HIV/AIDS cases in the different years of the same region or the different regions of the same year. Each region has a significant spatial autocorrelation (p-value < 0.01) each year. Lesotho and South Africa had temporal autocorrelation during 2011 to 2016. Thus, the spatial and temporal autocorrelation should not be ignored.  Next, we standardized the two socioeconomic factors. The multiplex collinear test [32] was performed by the condition number k = √ λ max /λ min = 1.804(≤ 15) (λ is the eigenvalue of explanatory variable matrix). If k > 15, then the data have collinearity. Otherwise, there is no collinearity. Thus, there is no collinearity between the two factors. Let µ ik , r ik and P ik be the annual HIV/AIDS cases (Unit: 1/1000 people), incidence (Unit: 1/100) and total population (unit: 100,000 people) in the kth year of the ith region, respectively. Denote g(µ ik ) = η ik = ln µ ik = ln r ik + ln P ik (µ ik = r ik P ik , i = 1, 2, . . . , 10 and k = 1, 2, . . . , 6). Let X i1k and X i2k be the GDP per capita and population density in the ith region at the kth year, respectively. The PR model is written by where β j (j = 0, 1, 2) are unknown constants. The GWPR model is introduced as where k is a fixed constant taken from {1, 2, . . . , 6}, and β j (u ik , v ik ) are unknown spatial parameters for the ith country (u ik , v ik ) in the kth year. Let ρ be a scalar autoregressive parameter, and c il be a constant that represents an adjacency relation. The GWPAR model is where n = 10, k is a fixed constant, and β j (u ik , v ik ) are defined as above. Let c (ik) lm be a spatiotemporal adjacency relation, and β jk (u ik , v ik , t k )(k = 1, 2, . . . , 6) be unknown spatiotemporal parameters in the ith country (u ik , v ik ) in the kth year. The GTWPAR model is established as follows: where T = 6; n k = 10 for every k years; and ρ is defined as above. Algorithms I, II, III and IV of PR, GWPR, GWPAR and GTWPAR models are provided in Appendix A.4, respectively.

Statistical Analysis
For the PR model, we get the estimated values of unknown parameters by Algorithm I. Then, the best space bandwidth is chosen by the cross-validation method.  Table 2. We note that the GWPR, GWPAR and GTWPAR models can reflect the non-stationarity property of the influencing factors; the PR model cannot. Moreover, the GTWPAR model has a better performance than other models by comparing the true and fitted values. The average estimated coefficients are visualized in Figure 3. For the PR model, the GDP per capita and population density had the same effect on the HIV/AIDS epidemic for ten countries in six years. However, there exist significant spatial non-stationarity and auto-correlation for different countries under the GWPR, GWPAR and GTWPAR models. Figure 4 shows the spatial distribution of the average MSE of their response variables. The lighter the color, the smaller the average error is. Thus, the GWPAR and GTWPAR models have the better fitting results.  These four indicators can effectively compare the performances of the proposed models ( Table 3). The calculation formulas of R 2 , AICc, D and MSE are given in Appendix A.5. The coefficient of determination R 2 gradually increases from 12.91% of the PR model to 99.57% of the GTWPAR model. The MSE, AICc and D values of the GTWPAR model are smaller than those of other models. Therefore, the GTWPAR model is more suitable to investigate the spatiotemporal HIV/AIDS epidemic. Based on the GTWPAR model, the mean values and 95% confidence intervals of the coefficient estimations are shown in Figure 5

Conclusions
In this paper, we propose four regression models, including the PR, GWPR, GWPAR and GTWPAR, to investigate the non-stationary and auto-correlation properties. The relationships between the HIV/AIDS epidemic, GDP per capita and population density were analyzed in ten countries of Sub-Saharan Africa from 2011 to 2016. The unknown parameters of these models can be estimated by the Newton-Raphson procedure and Fisher scoring algorithm.
The PR model is a classical generalized model, which considers the global relationships between the response and explanatory variables. The GWPR and GWPAR models have been introduced to determine the spatial non-stationarity or auto-correlation. The GTWPAR model proposed by this article can be used to investigate not only spatiotemporal non-stationary but also auto-correlation. Thus, the PR, GWPR and GWPAR models are several special cases of the GTWPAR model (see Remark 1 and Remark 2). The performances of these models were evaluated by analyzing the correlations between the HIV/AIDS epidemic and two socioeconomic factors. The parameter estimations of the models can be obtained by Algorithms I, II, III and IV in Appendix A.4.
The results show that the impacts of GDP per capita and population density on HIV /AIDS cases had significant spatiotemporal non-stationarity and auto-correlation. The GWPR, GWPAR and GTWPAR models can reflect the strong spatial or spatiotemporal non-stationarity. The auto-correlation can be reflected in the GWPAR and GTWPAR models. Compared with other models, the GTWPAR model is more effective in terms of four comparison indicators. Thus, we suggest that the GTWPAR model can be used to analyze the spatiotemporal characteristics of the HIV/AIDS epidemic and the influences of the GDP per capita and population density.
Further work also exists in our study. For example, we observed that the effects of the GDP per capita for Lesotho, Malawi and Zimbabwe and the population density for Angola on HIV/AIDS had strong spatiotemporal non-stationarity. These may be the result of local environmental or political factors. Whether the fitting results of these regions will perform better if explanatory variables such as local unique environmental or political factors are added needs to be further investigated. In model η = ρCη + B X , the expressions of C, X and B are · · · c (n 1 1) 1T (n T T) · · · c (n 1 1) X n 1 11 X n 1 21 · · · X n 1 p1 . . . . . . . . . . . .

The Fisher information matrix is
A l· X ·r A l· X ·b a l φV(µ l )(g (µ l )) 2 W l (u 00 , v 00 , t 0 ) T l A l· X ·r A l· X ·b W l (u 00 , v 00 , t 0 ). Appendix A.3. Formula and Information Matrix of ρ That is to say A l· C ·h η h , l = 1, 2, · · · , N.
(2) The scalar vector of ρ is The information matrix is Thus, The Fisher information matrix is

. Algorithms of Coefficient Estimation
We provide four algorithms to estimate the unknown coefficients of PR, GWPR, GWPAR and GTWPAR models in Section 4.
Algorithm II: Estimate the unknown coefficients in the GWPR model. (Note that k is a fixed constant taken from {1, 2, . . . , 6} and the following steps should be repeated six times independently). Take the initial values η where W(u 0k , v 0k ) = diag(w 1 (u 0k , v 0k ), w 2 (u 0k , v 0k ), · · · , w 10 (u 0k , v 0k )) and Repeat the above step until convergence. When (u 0k , v 0k ) takes all the locations (u ik , v ik ), we will get the estimated valueβ =β (m) in a fixed the kth year.
Algorithm III: Estimate unknown coefficients in the GWPAR model. (Note that k is a fixed constant and the following steps should be repeated six times as in Algorithm II). Take the initial value If (u 0k , v 0k ) takes all the locations (u ik , v ik ), the estimateβ (m+1) can be given. When all estimated values arrive to converge, we will getβ =β (m) (u ik , v ik ) andρ =ρ (m) in a fixed the kth year.
Algorithm IV: Estimate the unknown coefficients in the GTWPAR model. Take the initial values β The (m + 1)th iterative estimationsβ(u 00 , v 00 , t 0 ) andρ arê where W = {w ik (u 00 , v 00 , t 0 )} and w ik (u 00 , v 00 , A detailed definition is given in Section 3.1. If (u 00 , v 00 , t 0 ) takes all the locations (u ik , v ik , t k ) and all estimations converge, we will getβ =β (m) (u ik , v ik , t k ) andρ =ρ (m) . It is worth noting that we use the parameter estimates of the previous model as the initial values of the next model to reduce the number of iterations and improve the operational efficiency. For example, the estimations of the GWPR model are selected as the initial values of the GWPAR model.  Table 3) (1) The coefficient of determination is defined by where η is a set of vectors {η ik },η andη are the parameter estimate and the mean value of η, respectively.
(3) The corrected Akaike information criterion is AICc = D + 2P + 2 P(P + 1) where D, P and N are the deviation, the number of parameters and the number of samples, respectively.
(4) Mean-square error is given by where the parameter settings are the same as above.