Point and Interval Estimation of Population Prevalence Using a Fallible Test and a Non-Probabilistic Sample: Post-Stratification Correction
Abstract
:1. Introduction
2. Materials and Methods
2.1. Estimation of a Proportion in Non-Probabilistic Samples with Post-Stratification Adjustment
2.2. Scenario 1: Prevalence Estimation with Full Verification Using a Gold Standard
2.2.1. Frequentist Approach
- Post-Stratification AdjustmentLet denote the adjusted prevalence, where represents the weight of stratum h in the population, and is the estimated prevalence within stratum h. The adjustment for the sample size per stratum, , is given as follows:The estimator must also be adjusted using the weights and prevalences of each stratum and would be expressed as follows:For the weighted prevalence, the formula is as follows:The weighted variance is given as follows:The confidence interval is as follows:
2.2.2. Bayesian Approach
2.3. Scenario 2: Estimation with a Single Diagnostic Test with Known Sensitivity and Specificity
2.3.1. Frequentist Approach
- Post-Stratification AdjustmentApplying the adjustment proposed by Agresti and Coull for each stratum h:The estimator for the prevalence adjusted by post-stratification and the known sensitivity and specificity of the test, , is as follows:The variance for each stratum is as follows:The combined total variance is as follows:Finally, the confidence interval adjusted for post-stratification is as follows:
2.3.2. Bayesian Approach
- Post-Stratification AdjustmentThe previous Bayesian estimation can be complemented with a post-stratification adjustment. For H strata, each with a prevalence , and each stratum having a proportion of the total population, the total prevalence adjusted for post-stratification can be calculated as follows:However, when estimating the prevalence for each stratum h under a Bayesian approach, we do not obtain a single point estimate but rather a posterior distribution for . Thus, this post-stratification must be based on posterior distributions.Suppose we have obtained S samples from the posterior distribution of for H subgroups. We denote these samples for each subgroup h as follows:For each iteration s in the posterior, we weigh the prevalence by the proportion of the stratum h in the total population. In this way, the adjusted prevalence in iteration s is as follows:This process is repeated for each sample , resulting in a combined posterior distribution for the total prevalence adjusted by post-stratification:From the samples , it is possible to calculate any credible interval. For example, the 2.5th and 97.5th percentiles of the posterior distribution of can be used to define a credible interval . Similarly, the point estimate is obtained as the median of the posterior distribution of .
2.4. Scenario 3: Estimation When the Status of the Disease Is Verified Only Among Test Positives
2.4.1. Frequentist Approach
- Post-Stratification AdjustmentThe prevalence for each stratum is calculated using the method proposed by [8]. A value is defined for each of the H strata in the sample:The overall prevalence adjusted by post-stratification is as follows:The weighted variance is given as follows:A confidence interval for is as follows:
2.4.2. Bayesian Approach
- Post-Stratification AdjustmentThe previous Bayesian estimation can also include a post-stratification adjustment. In this case, as previously defined, for the H strata formed, is estimated by stratifying the posterior of P:As previously defined, when estimating the prevalence for each stratum h under a Bayesian approach, a single point estimate is not obtained; instead, a posterior distribution for is generated. In this context, post-stratification must be based on these posterior distributions, allowing for the incorporation of uncertainty into the stratum-adjusted estimates. We have obtained S samples from the posterior distribution of for H strata. These samples for each stratum h are denoted as follows:For each iteration s of the posterior distribution, we weight the prevalence by the proportion of the stratum h in the total population. In this way, the adjusted prevalence in iteration s is as follows:This process is repeated for each sample , resulting in a combined posterior distribution for the total prevalence adjusted by post-stratification:From the samples , we can calculate any credible interval.
2.5. Application of Prevalence Estimation for Mutation in Familial Chylomicronemia with Post-Stratification Adjustment
3. Results
3.1. Scenario 1: Estimation with Full Verification Using a Gold Standard
3.2. Scenario 2: Estimation with a Diagnostic Test of Known Sensitivity and Specificity
3.3. Scenario 3: Estimation When the Status of the Disease Is Verified Only Among Positives
4. Discussion
4.1. First Scenario
4.2. Second Scenario
4.3. Third Scenario
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
MCMC | Markov Chain Monte Carlo |
FCS | Familial Chylomicronemia Syndrome |
References
- Arya, R.; Antonisamy, B.; Kumar, S. Sample Size Estimation in Prevalence Studies. Indian J. Pediatr. 2012, 79, 1482–1488. [Google Scholar] [CrossRef]
- Lewis, F.I.; Torgerson, P.R. A tutorial in estimating the prevalence of disease in humans and animals in the absence of a gold standard diagnostic. Emerg. Themes Epidemiol. 2012, 9, 9. [Google Scholar] [CrossRef]
- Agresti, A.; Coull, B.A. Approximate is Better than “Exact” for Interval Estimation of Binomial Proportions. Am. Stat. 1998, 52, 119–126. [Google Scholar] [CrossRef]
- Rogan, W.J.; Gladen, B. Estimating prevalence from the results of a screening test. Am. J. Epidemiol. 1978, 107, 71–76. [Google Scholar] [CrossRef]
- Izbicki, R.; Diniz, M.A.; Bastos, L.S. Sensitivity and specificity in prevalence studies: The importance of considering uncertainty. Clinics 2020, 75, e2449. [Google Scholar] [CrossRef]
- Reiczigel, J.; Foldi, J.; Ózsvári, L. Exact confidence limits for prevalence of a disease with an imperfect diagnostic test. Epidemiol. Infect. 2010, 138, 1674–1678. [Google Scholar] [CrossRef]
- Lang, Z.; Reiczigel, J. Confidence limits for prevalence of disease adjusted for estimated sensitivity and specificity. Prev. Vet. Med. 2014, 113, 13–22. [Google Scholar] [CrossRef] [PubMed]
- Thomas, E.G.; Peskoe, S.B.; Spiegelman, D. Prevalence estimation when disease status is verified only among test positives: Applications in HIV screening programs. Stat. Med. 2018, 37, 1101–1114. [Google Scholar] [CrossRef]
- Elliott, M.R.; Valliant, R. Inference for nonprobability samples. Stat. Sci. 2017, 32, 249–264. [Google Scholar] [CrossRef]
- Lohr, S.L. Sampling: Design and Analysis, 2nd ed.; Cengage Learning: Boston, MA, USA, 2010. [Google Scholar]
- Smith, T.M.F. Post-Stratification. J. R. Stat. Society Ser. D (Stat.) 1991, 40, 315–323. [Google Scholar] [CrossRef]
- Holt, D.; Smith, T.M.F. Post Stratification. J. R. Stat. Society Ser. A (Gen.) 1979, 142, 33–46. [Google Scholar] [CrossRef]
- Tony Cai, T. One-sided confidence intervals in discrete distributions. J. Stat. Plan. Inference 2005, 131, 63–88. [Google Scholar] [CrossRef]
- Flor, M.; Weiß, M.; Selhorst, T.; Müller-Graf, C.; Greiner, M. Comparison of Bayesian and frequentist methods for prevalence estimation under misclassification. BMC Public Health 2020, 20, 1135. [Google Scholar] [CrossRef]
- Rodriguez, F.H.; Estrada, J.M.; Quintero, H.M.A.; Nogueira, J.P.; Porras-Hurtado, G.L. Analyses of familial chylomicronemia syndrome in Pereira, Colombia 2010–2020: A cross-sectional study. Lipids Health Dis. 2023, 22, 43. [Google Scholar] [CrossRef] [PubMed]
- Moulin, P.; Dufour, R.; Averna, M.; Arca, M.; Cefalù, A.B.; Noto, D.; D’Erasmo, L.; Costanzo, A.D.; Marçais, C.; Walther, L.A.A.S.; et al. Identification and diagnosis of patients with familial chylomicronaemia syndrome (FCS): Expert panel recommendations and proposal of an “FCS score”. Atherosclerosis 2018, 275, 265–272. [Google Scholar] [CrossRef] [PubMed]
- Gelman, A.; Rubin, D.B. Inference from Iterative Simulation Using Multiple Sequences. Stat. Sci. 1992, 7, 457–472. [Google Scholar] [CrossRef]
- McNamee, R. Two-Phase Sampling for Simultaneous Prevalence Estimation and Case Detection. Biometrics 2004, 60, 783–792. [Google Scholar] [CrossRef]
- Shrout, P.E.; Newman, S.C. Design of Two-Phase Prevalence Surveys of Rare Disorders. Biometrics 1989, 45, 549–555. [Google Scholar] [CrossRef] [PubMed]
- Viana, M.A.G.; Ramakrishnan, V.; Levy, P.S. Bayesian analysis of prevalence from the results of small screening samples. Commun. Stat.-Theory Methods 1993, 22, 575–585. [Google Scholar] [CrossRef]
- Van Hasselt, M.; Bollinger, C.R.; Bray, J.W. A Bayesian approach to account for misclassification in prevalence and trend estimation. J. Appl. Econ. 2022, 37, 351–367. [Google Scholar] [CrossRef]
- Bayer, D.M.; Fay, M.P.; Graubard, B.I. Confidence intervals for prevalence estimates from complex surveys with imperfect assays. Stat. Med. 2023, 42, 1822–1867. [Google Scholar] [CrossRef] [PubMed]
Scenario | Method | Unadjusted (%) | Adjusted (%) | ||||
---|---|---|---|---|---|---|---|
P | Lower | Upper | P | Lower | Upper | ||
Scenario 1 | Agresti–Coull | 0.0081 | 0.0032 | 0.0181 | 0.4621 | 0.4015 | 0.5226 |
Bayesian (beta-binomial) | 0.0088 | 0.0028 | 0.0156 | 0.2417 | 0.2014 | 0.2886 | |
Scenario 2 | Lang | 0.0000 | 0.0000 | 0.0162 | 0.0000 | 0.0000 | 0.0836 |
Bayesian–Flor | 0.0016 | 0.0000 | 0.0058 | 0.2740 | 0.2280 | 0.3286 | |
Scenario 3 | Frequentist–Thomas | 0.0094 | 0.0020 | 0.0169 | 0.0075 | 0.0014 | 0.0135 |
Bayesian–Thomas | 0.0104 | 0.0045 | 0.0201 | 0.2797 | 0.2328 | 0.3352 |
Method | Without Adjustment | With Adjustment |
---|---|---|
Agresti–Coull | 0.0149 | 0.1212 |
Bayesian (beta-binomial) | 0.0128 | 0.0872 |
Lang | 0.0162 | 0.0836 |
Flor–Bayesian | 0.0058 | 0.1005 |
Frequentist–Thomas | 0.0149 | 0.0121 |
Bayesian–Thomas | 0.0156 | 0.1025 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Estrada Alvarez, J.M.; Luna del Castillo, J.d.D.; Montero-Alonso, M.Á. Point and Interval Estimation of Population Prevalence Using a Fallible Test and a Non-Probabilistic Sample: Post-Stratification Correction. Mathematics 2025, 13, 805. https://doi.org/10.3390/math13050805
Estrada Alvarez JM, Luna del Castillo JdD, Montero-Alonso MÁ. Point and Interval Estimation of Population Prevalence Using a Fallible Test and a Non-Probabilistic Sample: Post-Stratification Correction. Mathematics. 2025; 13(5):805. https://doi.org/10.3390/math13050805
Chicago/Turabian StyleEstrada Alvarez, Jorge Mario, Juan de Dios Luna del Castillo, and Miguel Ángel Montero-Alonso. 2025. "Point and Interval Estimation of Population Prevalence Using a Fallible Test and a Non-Probabilistic Sample: Post-Stratification Correction" Mathematics 13, no. 5: 805. https://doi.org/10.3390/math13050805
APA StyleEstrada Alvarez, J. M., Luna del Castillo, J. d. D., & Montero-Alonso, M. Á. (2025). Point and Interval Estimation of Population Prevalence Using a Fallible Test and a Non-Probabilistic Sample: Post-Stratification Correction. Mathematics, 13(5), 805. https://doi.org/10.3390/math13050805