# Tuberculosis in Prisons: Importance of Considering the Clustering in the Analysis of Cross-Sectional Studies

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Study 1: Active TB

#### 2.1.1. Variables

The outcome (Y) was TB disease (positive or negative).

#### 2.1.2. Sampling

#### 2.2. Study 2: TBI

#### 2.2.1. Variables

#### 2.2.2. Sampling

#### 2.3. Multivariable Regression Models

to adjust by cluster structure.

#### 2.3.1. Regression Models for Non-Clustered Dichotomous Outcomes

where Y

= 1 means PDL with TB disease and Y

= 0 if not.

is the probability that a subject i has TB disease [P(Y

= 1)]. The reason that logistic regression estimates the OR is due to the logit transformation, which is called the link function between the independent variables and the Y

outcome. This link function in log-binomial and Poisson regressions changes to log, and it is the reason for directly estimating PR instead of OR (Equations (2) (log-binomial) and (3) (Poisson)).

represents the mean of having TB, and the mean of dichotomous values (TB disease yes/no) is the same probability of having TB, π

_{i}. The Poisson model is a regression indicated for outcomes that one can “count” (for example, the number of transplants rejected and the number of revascularization procedures in patients with heart disease), and it has been suggested as an alternative method to binomial regression because it has associated convergence problems. Poisson regression estimates the consistent coefficients of Equation (3) but the variances are inconsistent. Variance in the Poisson regression is greater than the variance in the binomial regression unless the outcome is rare (prevalence <10%). To avoid overestimating standard errors for the estimated parameters, the robust variance estimator (sandwich variance estimator) is used in Poisson regression [36,53].

#### 2.3.2. Multivariable Regression Models for Dichotomous Data with Clustered Structure

. If Y

takes the value of 0, it indicates that person 2 of courtyard 1, prison 3, and city 2, does not have TB. When clusters are limited to the courtyard level, the outcome is expressed as Y

, and it becomes a count variable. The cluster structure is shown in Figure 1

**.**

#### 2.4. Analysis

version 15.0 and R

version 4.2.

## 3. Results

#### 3.1. Effect of Prevalence in General and Sub-Groups Ignoring Clustering

#### 3.2. Effect of Cluster Level on Cluster-Robust Variance Estimates

#### 3.3. Effect of Adjustment by Cluster-Robust Standard Errors and Bias-Corrected Standard Errors

## 4. Discussion

## 5. Conclusions

## Supplementary Materials

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

**Figure 1.**Multilevel structure used to determine factors associated with TB disease in an analytical cross-sectional study. Legend: In Level 1, a black circle represents a person with TB disease and a white circle represents an individual without TB. In Level 2, the population density within each courtyard is represented by the number of dots; a greater number of dots indicates a higher population density inside the courtyard. People with a black rhombus illustrate the coding system for the outcome at different levels. In courtyard 1 of prison 2, the fact that two or more people have TB disease inside that courtyard can be attributed to shared conditions, such as population density and overcrowding, which facilitate the emergence of the disease (intra-cluster variation); these conditions may differ from the conditions of people who have TB disease in courtyard 3 (inter-cluster variation).

**Figure 2.**Diagram to select regression models that directly estimate prevalence ratio (PR) or odds ratio (OR) from independent or correlated data. Legend: In GEE, Li [31] recommends using the correction of Kauermann and Carroll (KC) when the number of clusters is less than ten and the coefficient of variation (CV) is less than 60%. When the CV is >60%, the use of the correction outlined by Fay and Graubard (FG) has been suggested. Leyrat and Li [30,31] recommend using the between–within (B-W) degrees of freedom approximation method in mixed models for binary or continuous outcomes and a low number of clusters (<30). In bold are the relevant Stata commands. Abbreviations: GEE, generalized estimation equations; KC, correction of Kauermann and Carroll; FG, correction Fay and Graubard; B-W, approximation of degrees of freedom between–within, according to the number of clusters and coefficient of variation.

**Table 1.**Comparison of the odds ratio (OR) and prevalence ratio (PR) estimated with three types of regressions when the active TB prevalence is less than 10% and under the assumption of independent data.

Exposure Factor | Active TB Prevalence | Logistic | Log-Binomial ^{1} | Robust Poisson ^{2} | Comparison ^{3} |
---|---|---|---|---|---|

OR_{adjusted} | PR_{adjusted} | PR_{adjusted} | |||

[95%CI] | [95%CI] | [95%CI] | |||

Age ≤ 24 (ref) | 4.61 | ||||

Age > 24 | 5.84 | 1.43 | 1.40 | 1.39 | 7.5 |

[0.797, 2.547] | [0.809, 2.403] | [0.798, 2.405] | |||

No prior TB (ref) | 5.02 | ||||

Prior TB | 18.0 | 3.31 | 2.84 | 2.84 | 25.5 |

[1.461, 7.504] | [1.469, 5.494] | [1.447, 5.569] | |||

Normal BMI (ref) | 5.36 | ||||

Low weight | 16.0 | 3.32 | 2.93 | 2.89 | 20.2 |

[1.677, 6.576] | [1.660, 5.163] | [1.638, 5.096] | |||

Overweight | 2.89 | 0.42 | 0.43 | 0.43 | 1.8 |

[0.147, 1.169] | [0.158, 1.187] | [0.156, 1.185] |

and overweight > 25 kg/m

. OR

and PR

_{adjusted}: multivariable models adjusted for age, prior TB, and BMI.

^{1}. Generalized linear model, binomial family, log link function. Estimation method: maximum likelihood (ML).

^{2}. Robust standard errors: sandwich variance estimator.

^{3}. Comparison = ((OR − PRLog-binomial)/PRLog-binomial) × 100. The measures of association obtained by Poisson regression and log-binomial regression were identical. Consequently, we only compared OR and PR using log-binomial and logistic regression.

**Table 2.**Comparison of the prevalence ratio of TB disease with four regressions for correlated data when the overall prevalence of the disease is less than 10% and is considered different cluster levels to adjust the robust variance error.

Exposure Factor | Adjusted Prevalence Ratio and [95%CI] | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

GEE ^{1} | Multilevel—Mixed-Effects ^{2} | Log-Binomial ^{3} | Modified Poisson ^{4} | |||||||||

Level | Courtyard (n = 39) | Prison (n = 4) | City (n = 2) | Courtyard (n = 39) | Prison (n = 4) | City (n = 2) | Courtyard (n = 39) | Prison (n = 4) | City (n = 2) | Courtyard (n = 39) | Prison (n =4 ) | City (n = 2) |

Age ≤ 24 (ref) | ||||||||||||

Age > 24 | 1.38 | 1.37 | 1.37 | 1.38 | 1.38 | 1.38 | 1.40 | 1.40 | 1.40 | 1.39 | 1.39 | 1.39 |

[1.020, 1.874] | [1.089, 1.729] | [1.310, 1.433] | [1.008, 1.878] | [1.020, 1.856] | [1.202, 1.575] | [1.015, 1.918] | [1.056, 1.843] | [1.256, 1.549 | [1.011, 1.898] | [1.043, 1.841] | [1.245, 1.542] | |

No prior (ref) | ||||||||||||

Prior TB | 2.73 | 2.82 | 2.81 | 2.67 | 2.67 | 2.67 | 2.84 | 2.84 | 2.84 | 2.84 | 2.84 | 2.84 |

[1.531, 4.852 | [1.944, 4.075] | [1.988, 3.962] | [1.542, 4.637] | [1.644, 4.349] | [1.529, 4.676] | [1.571, 5.140] | [1.902, 4.246] | [1.886, 4.283] | [1.645, 4.901] | [1.757, 4.587] | [1.669, 4.828] | |

Normal BMI (ref) | ||||||||||||

Low weight | 2.85 | 2.85 | 2.84 | 2.84 | 2.84 | 2.84 | 2.93 | 2.93 | 2.93 | 2.89 | 2.89 | 2.89 |

[1.842, 4.408 | [1.695, 4.781] | [2.211, 3.639] | [1.786, 4.514] | [1.615, 4.994] | [2.116, 3.811] | [1.885, 4.545] | [1.628, 5.264] | [2.048, 4.185] | [1.814, 4.601] | [1.624, 5.138] | [2.123, 3.932] | |

Overweight | 0.42 | 0.43 | 0.44 | 0.42 | 0.42 | 0.42 | 0.44 | 0.44 | 0.44 | 0.43 | 0.43 | 0.43 |

[0.119, 1.474 | [0.059, 3.116] | [0.049, 3.863] | [0.119, 1.491] | [0.035, 5.027] | [0.158, 11.281] | [0.125, 1.503] | [0.038, 4.896] | [0.017, 10.869] | [0.123, 1.501] | [0.037, 4.877] | [0.017, 10.846] | |

CV(%) | 181% | 103% | - | |||||||||

ICC | 0.043 | <0.001 | <0.001 |

and overweight > 25 kg/m

^{2}.

^{1}. Poisson family, log link function. Robust standard errors. Exchangeable correlation.

^{2}. Multilevel mixed-effects Poisson regression had the exposure factors as fixed effects and the city, the prison, and the courtyard as random effects. The city model is a four-level model (people deprived of liberty (PDL, level 1) clustered within courtyards (level 2), clustered within prisons (level 3), and clustered within cities). The prison model is a three-level model, and the courtyard is a two-level model (PDL clustered within courtyards). Cluster-robust standard errors are adjusted by the highest level in each multilevel model. The ICC is shown for the highest level in each multilevel model.

^{3}. Generalized linear model, binomial family, log link function. Cluster-robust standard errors.

^{4}. Cluster-robust standard errors.

**Table 3.**Comparison of multilevel models for estimating TB prevalence utilizing the city, prison, and courtyard levels as fixed or random effects.

Exposure Factor | Model 1: City ^{1} | Model 2: Prison ^{1} | Model 3: Courtyard ^{1} | Model 4 ^{2} | Model 5 ^{2} | |||||
---|---|---|---|---|---|---|---|---|---|---|

Level | PR | [95%CI] | PR | [95%CI] | PR | [95%CI] | PR | [95%CI] | PR | [95%CI] |

Age ≤ 24 (ref) | ||||||||||

Age > 24 | 1.39 | [1.25–1.54] | 1.39 | [1.04–1.84] | 1.38 | [1.01–1.88] | 1.39 | [1.02–1.86] | 1.39 | [1.02–1.88] |

No prior (ref) | ||||||||||

Prior TB | 2.84 | [1.67–4.83] | 2.84 | [1.76–4.59] | 2.67 | [1.54–4.64] | 2.70 | [1.58–4.60] | 2.68 | [1.55–4.62] |

Normal BMI (ref) | ||||||||||

Low weight | 2.89 | [2.12–3.93] | 2.89 | [1.62–5.14] | 2.84 | [1.79–4.51] | 2.79 | [1.75–4.45] | 2.84 | [1.78–4.53] |

Overweight | 0.43 | [0.02–10.84] | 0.43 | [0.04–4.88] | 0.42 | [0.12–1.49] | 0.42 | [0.12–1.51] | 0.42 | [0.12–1.49] |

Prison D (ref) ^{3} | ||||||||||

Prison A | - | - | - | - | - | - | 1.64 | [0.90–2.99] | - | - |

Prison B | - | - | - | - | - | - | 1.63 | [0.76–3.47] | - | - |

Prison C | - | - | - | - | - | - | 2.15 | [1.02–4.54] | - | - |

City A (ref) ^{3} | ||||||||||

City B | - | - | - | - | - | - | - | - | 1.42 | [0.79–2.54] |

ICC | <0.001 | <0.001 | 0.043 | 0.031 | 0.034 |

and overweight > 25 kg/m

^{2}.

^{1}.Multilevel mixed-effects Poisson regression with a single random effect: model 1 = city, model 2 = prison, and model 3: courtyard.

^{2}.Multilevel mixed-effect Poisson regressions with the courtyard as a random effect and prison or city as fixed effects. Cluster-robust standard errors adjusted by the courtyard factor.

^{3}.The reference category was those with the lowest TB incidence.

**Table 4.**Comparison of the prevalence ratio and 95% confidence intervals associated with latent tuberculosis infection without and with adjustment by cluster-robust standard errors and number of clusters.

Exposure Factor | GEE ^{1} | ||
---|---|---|---|

Without Any Adjustment ^{3} [95%CI] | With CVRE [95%CI] | With BCSE [95%CI] | |

Age 18–24 (ref) (Years) | |||

25–64 | 1.14 [0.892, 1.407] | 1.14 [1.087, 1.193] | 1.14 [0.658, 1.970] |

≥65 | 1.10 [0.687, 1.473] | 1.10 [0.996, 1.221] | 1.10 [0.495, 2.457] |

History of the previous incarceration | 1.09 [0.904, 1.311] | 1.09 [1.009, 1.172] | 1.09 [0.650, 1.819] |

Time of current incarceration ≤12 (ref) (Months) | |||

13–24 | 1.11 [0.878, 1.408] | 1.12 [1.007, 1.228] | 1.12 [0.562, 2.201] |

≥25 | 1.08 [0.874, 1.332] | 1.08 [0.951, 1.234] | 1.08 [0.392, 2.995] |

Presence of BCG scar | 1.00 [0.931, 1.070] | 1.00 [0.985, 1.013] | 1.00 [0.832, 1.198] |

Last contact with a TB case—No contact (ref ) (Months) | |||

1–12 | 0.98 [0.776, 1.235] | 0.99 [0.862, 1.134] | 0.99 [0.357, 2.740] |

≥13 | 1.04 [0.797, 1.383] | 1.04 [0.983, 1.107] | 1.04 [0.689, 1.578] |

Coefficient of variation | 191% | ||

Exposure Factor | Multilevel– Mixed-Effects ^{2} | ||

Without Any Adjustment [95%CI] | With CVRE [95%CI] | With B-W ^{4} | |

Age 18-24 (ref) (Years) | |||

25–64 | 1.09 [0.877, 1.364] | 1.09 [1.023, 1.169] | - |

≥65 | 1.05 [0.724, 1.509] | 1.05 [0.963, 1.134] | - |

History of the previous incarceration | 1.09 [0.904, 1.304] | 1.09 [1.010, 1.168] | - |

Time of current incarceration ≤12 (ref) (Months) | |||

13–24 | 1.09 [0.865, 1.372] | 1.09 [0.994, 1.193] | - |

≥25 | 1.02 [0.836, 1.252] | 1.02 [0.871, 1.203] | - |

Presence of BCG scar | 1.00 [0.936, 1.072] | 1.00 [0.993, 1.011] | - |

Last contact with a TB case—No contact (ref) (Months) | |||

1–12 | 1.00 [0.804, 1.262] | 1.01 [0.862 1.176] | - |

≥13 | 1.02 [0.777, 1.347] | 1.02 [0.955, 1.095] | - |

Intraclass Correlation | 0.071 |

^{1}Poisson family, log link function. Exchangeable correlation. Cluster-robust standard errors adjusted by courtyard (n = 10).

^{2}. Mixed-effects Poisson regression model. Exchangeable covariance. Cluster-robust standard errors (adjusted for 10 clusters in the courtyard).

^{3}There is no adjustment of standard errors by cluster structure and number of clusters. The CVRE is applied first, followed by the BCSE or B-W.

^{4}. Between–within degrees of freedom approximation for dichotomous outcome are not available in Stata or R.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

