Space-Time Clustering Characteristics of Tuberculosis in Khyber Pakhtunkhwa Province, Pakistan, 2015–2019

The number of tuberculosis (TB) cases in Pakistan ranks fifth in the world. The National TB Control Program (NTP) has recently reported more than 462,920 TB patients in Khyber Pakhtunkhwa province, Pakistan from 2002 to 2017. This study aims to identify spatial and space-time clusters of TB cases in Khyber Pakhtunkhwa province Pakistan during 2015–2019 to design effective interventions. The spatial and space-time cluster analyses were conducted at the district-level based on the reported TB cases from January 2015 to April 2019 using space-time scan statistics (SaTScan). The most likely spatial and space-time clusters were detected in the northern rural part of the province. Additionally, two districts in the west were detected as the secondary space-time clusters. The most likely space-time cluster shows a tendency of spread toward the neighboring districts in the central part, and the most likely spatial cluster shows a tendency of spread toward the neighboring districts in the south. Most of the space-time clusters were detected at the start of the study period 2015–2016. The potential TB clusters in the remote rural part might be associated to the dry–cool climate and lack of access to the healthcare centers in the remote areas.


Introduction
Tuberculosis (TB) is an aerosol transmittable disease that is caused by a bacillus mycobacterium and is mainly spread from a TB-infected person to other nearby people through coughing, sneezing, spitting, or talking. TB is still one of the major public health challenges at the global level, particularly in developing countries, and is listed in the top-ten major causes of mortality at the world-level. It is the second fatal disease After HIV/AIDS caused by a single infectious agent [1,2]. The developing countries such as Nigeria, Philippines, Indonesia, Pakistan, India, South Africa, and China highly contribute (above 60%) to total TB burden in the world, where the diagnosis and treatment of TB are difficult to access [3,4].
Pakistan is one of those developing countries where TB is a major public health challenge. Approximately half a million new TB cases are reported each year, and approximately 70,000 people die each year due to TB disease [5]. In terms of TB case burden, Pakistan ranks 5th in the world due to the high occurrence of multidrug-resistant TB (MDR-TB) cases [3,6,7]. Khyber Pakhtunkhwa is the northwestern province of Pakistan, where approximately 55,000 new cases of TB (all types) are reported each year [5]. During 2002-2017, more than 462,920 TB cases were reported in Khyber Pakhtunkhwa province [8].
Several epidemiological studies on TB have been previously conducted in different districts of Khyber Pakhtunkhwa [9][10][11][12][13][14][15][16][17]. These studies focused on the epidemiological characteristics and risk factors analysis of TB within a specific district. However, the space-time cluster analysis of TB incidence at the district-level in the province seems to be lacking. Space-time cluster detection can identify important space-time patterns in TB incidence in Khyber Pakhtunkhwa province, which could be useful for the evidence-based public health interventions to control the TB outbreak [18]. This study aims to identify the space-time clusters of TB cases at the district-level in Khyber Pakhtunkhwa province, Pakistan from January 2015 to April 2019.
A number of methods have been developed for the automatic detection of disease clusters in public health data. Generally, cluster detection methods are classified as either global or local tests. Global clustering methods are used to assess whether a global tendency for the disease to group together is apparent throughout the study region, but it does not identify the location of clusters [19][20][21][22]. These types of methods are appropriate, for example, for finding evidence of whether a disease is infectious or not. Local methods infer the locations and extent of clusters [23][24][25]. This group of methods (which infer the geographical location and time-period of the clusters) can further be divided into two categories [26]: clustering-based methods [27][28][29] and scan statistics methods [23][24][25]. Scan statistics methods are the most widely used methods in public health and epidemiology [30][31][32][33][34][35][36][37].
We used the scan statistics method [23,24,38] to detect the potential space-time clusters of TB cases in Khyber Pakhtunkhwa province, Pakistan. Detecting such clusters helps in identifying the important patterns in the TB incidence, and can thereby provide important information for public health managers to identify their targets of interest for interventions. Moreover, detecting space-time clusters of TB cases assist the epidemiologists in finding the possible environmental or social determinants of TB outbreak in the study area.

Study Area
The study was carried out in Khyber Pakhtunkhwa, the northwestern province of Pakistan. The total area of Khyber Pakhtunkhwa province is 74,521 km 2 with a census population of 30.52 million according to the census 2017. The land area of the province has 25 districts of different sizes as shown in Figure 1. However, very recently in the year 2019, seven Federally Administered Tribal Areas (FATA) have been merged politically and administratively into the province of Khyber Pakhtunkhwa. The climate in this province is of the tropical monsoon type, but most of the districts are situated beyond the tropical zone with relatively high temperatures and a cool, dry winter that runs from December to February with March to June representing the hot and dry season. Summer extends from July to September and is generally rainy. October and November represent the receding monsoon period. The province has two rainy seasons: March to April and the summer monsoon from July to September.

Data Collection
The quarterly data on TB cases and population-at-risk in each district were collected from the quarterly reports (2015-2019) of the District Health Information System (DHIS) Khyber Pakhtunkhwa, Pakistan [39]. The District Health Information System (DHIS) collects data on the reported disease cases in each district on a monthly basis. All hospitals in a district report the registered disease cases to the respective DHIS office on a monthly basis. The district offices then send the monthly data to the provincial office, DHIS Cell, Khyber Pakhtunkhwa [40]. The collected dataset comprises the number of quarterly reported TB cases in each district and the corresponding population-at-risk. The collected dataset is available in supplementary information File S1.

Data Collection
The quarterly data on TB cases and population-at-risk in each district were collected from the quarterly reports (2015-2019) of the District Health Information System (DHIS) Khyber Pakhtunkhwa, Pakistan [39]. The District Health Information System (DHIS) collects data on the reported disease cases in each district on a monthly basis. All hospitals in a district report the registered disease cases to the respective DHIS office on a monthly basis. The district offices then send the monthly data to the provincial office, DHIS Cell, Khyber Pakhtunkhwa [40]. The collected dataset comprises the number of quarterly reported TB cases in each district and the corresponding population-at-risk. The collected dataset is available in supplementary information File S1.

Clusters Detection
The scan statistics software SaTScan™ ver.9.6 [41] with a discrete Poisson probability model was applied under retrospective analysis to detect the spatial and space-time clusters of TB cases at district-level in Khyber Pakhtunkhwa province, Pakistan from January 2015 to April 2019. SaTScan with the discrete Poisson model assumes the number of disease cases in each region is Poisson-distributed with the parameter λ as in Equation (1) [42].
where µ(z) denotes the population count for area z, µ (G) denotes the total population of the whole study area, and nG is the total observed disease counts in the whole study area. Under the null hypothesis of no cluster exists, the expected number of disease cases in each region is proportional to the population size in that region. SaTScan is an open source software program that was originally developed in [43]. The space-time scan statistics is defined by a cylindrical shaped window with a circular base. The base of the cylindrical window corresponds to the geographical size of the cluster and the height corresponds to the temporal length of the cluster. The circular base is centered around one of the several location-centroids in the study area, with the radius varying continuously from zero to the maximum spatial window size. The height of the cylinder is varying from zero to the maximum temporal window size. The cylindrical window is then moved in space as well as time to check the clusters with all possible spatial and temporal sizes. As a result, a number of overlapping cylinders of various sizes are obtained that jointly cover the entire study region. Each cylindrical window reflects a possible space-time cluster. The likelihood function is maximized over all cylindrical windows, and the window with the maximum log-likelihood ratio (LLR) is assumed to be the most likely cluster, that is, the cluster least likely to be caused by chance, and other windows with a statistically significant LLR were measured as secondary clusters. The statistical significance of each cluster in the study area is based on comparing the likelihood ratio (LLR) against a null distribution achieved from the Monte Carlo simulation. The detail of how SaTScan works is given in the SaTScan user guide [42]. In our study, the maximum spatial window size was set to the default value, i.e., ≤ 50% of the total population of the study area and the maximum temporal window size was set to default value, i.e. , ≤ 50% of the study period [42]. A circular window shape was chosen. A Monte-Carlo approach with 9999 repetitions was performed to test the null hypothesis that there was no difference in Relative Risk (RR) between the TB clusters. The clusters with p-value < 0.001 were considered as statistically significant clusters.

Results and Discussion
The space-time cluster analysis by SaTScan identified a total of six space-time clusters of TB cases in Khyber Pakhtunkhwa, Pakistan from 1st quarter 2015 to 1st quarter 2019 (Table 1, Figure 2 Table 2). The cluster with p-value < 0.001 is considered to be statistically significant. The largest number of spatial clusters were seen in 2016 compared to the other years. The interval 2015-2016 was also found to have a large number of clusters in space-time analysis (Table 1). The geographical locations of the detected space-time clusters of TB cases in the study area are shown in Figure 2, which indicates some important patterns of TB incidence in the study area. It is obvious from Figure 2 that in the 1st quarter 2015, two potential TB clusters occurred. The one covered the two districts (Swat and Buner) in the north persisted for one year (2015), which shows a tendency of spread toward the neighboring districts in the central part of the province from 2015 to 2016 and then moved again to the north (Upper Dir) in 2017. The other cluster in 1st quarter 2015 appeared in one district in the west persisted for two years (2015-2016), showing no tendency of spread toward the neighboring districts. Such a space-time pattern of the potential cluster provides clues to disease etiology and risk behaviors, suggesting local environmental or social characteristics that promote increased risk. Moreover, the retrospective clusters guide the public health officials on the significance of the control strategies that have been implemented previously in the cluster-regions to control the prevalence.
To know the general pattern of spatial clusters of TB cases over time, we performed spatial cluster analysis for each year (2015-2018) using SaTScan. Five significant clusters were detected in the year 2015, seven in 2016, five in 2017 and four in 2018 ( Table 2). The cluster with p-value < 0.001 is considered to be statistically significant. The largest number of spatial clusters were seen in 2016 compared to the other years. The interval 2015-2016 was also found to have a large number of clusters in space-time analysis (Table 1). The geographical locations of the significant spatial clusters in each year were displayed in Figure 3. In 2015, the most likely cluster was seen in district Swat. In 2016, the same district with one neighboring district (Buner) appeared as the most likely cluster. In 2017, the most spatial cluster was moved from (Swat Buner) to the neighboring district Upper Dir in the west. The most likely cluster was moved back to districts Swat covering additional neighboring districts (Shangla and Battagram) in the South. The two districts (Swat and Buner) were found repeatedly in the potential cluster each year. The district Swat was seen in the most likely cluster in the year 2015, 2016, and 2018 showing it to be the main region of TB outbreak. Moreover, the two western districts Bannu and Hangu were found to be the spatial clusters throughout the study period (2016-2018). These results suggest three district Swat, Shangla, and Battagram for policymakers to be the most targeted districts for possible interventions because this part of the province was seen in the most likely cluster in each year (2016-2018). The two districts Bannu and Hangu are suggested to be the second targeted districts for possible interventions. These two secondary clusters show a tendency of spread toward the central districts in 2018. The TB case burden in the two western districts (Bannu and Hangu) might be due to the presence of the large proportion of Afghan refugees in these districts as evidence from the previous study [45]. Some previous studies on TB prevalence in the individual district have also been identified the evidence of TB outbreak in our resulted most likely cluster regions Swat and Buner [11,12]. It is evident from Figure 2 and Figure 3 that the most likely TB clusters occurred each year in rural hilly areas [46], which might be due to lack of access to healthcare. The population is scattered in these areas, i.e., the population is distributed in small villages that are very far from each other and hence most of the villages are very far from healthcare centers. In addition, the lack of awareness and misconceptions about TB might have caused TB clusters in these rural areas [47]. The snowfall and dry-cold climate in winter may also contribute to high TB cases in these districts.

Conclusion
This study provides a good understanding of the space-time clustering characteristics of TB in Khyber Pakhtunkhwa, Pakistan from 2015 to 2019. The most likely and the 1st secondary spacetime clusters were seen in the northern part of the province such as (Dir Upper, Swat, and Buner), showing a tendency of spread toward the central part from 2015 to 2016 and then moved toward the north in 2017 (Figure 2). Most of these space-time clusters were seen in the years 2015-2016, which might be associated to the lack of healthcare centers in this period. In addition, the most likely spatial clusters were also seen each year in the northern districts (Dir Upper, Swat, and Shangla), showing a tendency of spread over the years from the north toward the neighboring districts in the south (Figure 3). This study suggests these northern districts for policymakers to be the most targeted regions for possible interventions. This targeted intervention may help control the TB epidemic in the province more effectively. Future research on finding the social and environmental determinants of TB outbreak in Khyber Pakhtunkhwa province is strongly recommended.  It is evident from Figures 2 and 3 that the most likely TB clusters occurred each year in rural hilly areas [46], which might be due to lack of access to healthcare. The population is scattered in these areas, i.e., the population is distributed in small villages that are very far from each other and hence most of the villages are very far from healthcare centers. In addition, the lack of awareness and misconceptions about TB might have caused TB clusters in these rural areas [47]. The snowfall and dry-cold climate in winter may also contribute to high TB cases in these districts.

Conclusions
This study provides a good understanding of the space-time clustering characteristics of TB in Khyber Pakhtunkhwa, Pakistan from 2015 to 2019. The most likely and the 1st secondary space-time clusters were seen in the northern part of the province such as (Dir Upper, Swat, and Buner), showing a tendency of spread toward the central part from 2015 to 2016 and then moved toward the north in 2017 (Figure 2). Most of these space-time clusters were seen in the years 2015-2016, which might be associated to the lack of healthcare centers in this period. In addition, the most likely spatial clusters were also seen each year in the northern districts (Dir Upper, Swat, and Shangla), showing a tendency of spread over the years from the north toward the neighboring districts in the south (Figure 3). This study suggests these northern districts for policymakers to be the most targeted regions for possible interventions. This targeted intervention may help control the TB epidemic in the province more effectively. Future research on finding the social and environmental determinants of TB outbreak in Khyber Pakhtunkhwa province is strongly recommended.
Author Contributions: H.D. and S.U. conceived the general concept of this research; S.U. designed the models and wrote the paper; S.C.D. and H.F.-T. contributed in the statistical assessment; H.K. and A.K. reviewed the paper. All authors have read and agreed to the published version of the manuscript.