Chronic Disease Monitoring: Methodology for Classification Error and Self-Selection Bias Correction in Clinical Laboratory Data
Abstract
1. Introduction
2. Materials and Methods
2.1. Data
2.2. Methods
2.2.1. Demographic Reweighting
2.2.2. Classification Errors
2.2.3. Self-Selection Bias
3. Results
3.1. Bias and Error Correction
3.2. Prevalence Estimation
4. Discussion
4.1. Demographic Trends
4.2. Geographic and Socioeconomic Trends
4.3. Methodological Considerations
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| PR | Puerto Rico |
| CDC | Centers for Disease Control and Prevention |
| ADA | American Diabetes Association |
| HITECH | Health Information Technology for Economic and Clinical Health Act |
| NHANES | National Health and Nutrition Examination Survey |
| BRFSS | Behavioral Risk Factor Surveillance System |
| EHR | Electronic health record |
| LIS | Lab Information System |
| PII | Personally identifiable information |
| ICD-10 | International Classification of Diseases (v10) |
| LOINC | Logical Observation Identifier Names and Codes |
| POI | Period of interest |
| DM | Diabetes mellitus |
| FPG | Fasting plasma glucose |
| GTT | Glucose tolerance test |
| A1C | Hemoglobin A1c |
References
- Petersmann, A.; Müller-Wieland, D.; Müller, U.A.; Landgraf, R.; Nauck, M.; Freckmann, G.; Heinemann, L.; Schleicher, E. Definition, Classification and Diagnosis of Diabetes Mellitus. Exp. Clin. Endocrinol. Diabetes 2019, 127, S1–S7. [Google Scholar] [CrossRef]
- Centers for Disease Control and Prevention. National Diabetes Statistics Report. Available online: https://www.cdc.gov/diabetes/php/data-research/ (accessed on 9 April 2025).
- Centers for Disease Control (CDC). Health and Economic Costs of Chronic Diseases. Available online: https://www.cdc.gov/chronic-disease/data-research/facts-stats/ (accessed on 23 August 2025).
- Johnson, J.A.; Cavanagh, S.; Jacelon, C.S.; Chasan-Taber, L. The Diabetes Disparity and Puerto Rican Identified Individuals. Diabetes Educ. 2017, 43, 153–162. [Google Scholar] [CrossRef] [PubMed]
- US Census 2021 ACS 5-Year Survey (Table S1902). Available online: https://data.census.gov/table/ACSST5Y2021.S1902?g=010XX00US$0400000 (accessed on 22 November 2025).
- Centers for Disease Control. National Health and Nutrition Examination Survey. Available online: https://wwwn.cdc.gov/nchs/nhanes/continuousnhanes/default.aspx?Cycle=2017-2020 (accessed on 23 August 2025).
- Centers for Disease Control. BRFSS Prevalence & Trends Data. Available online: https://www.cdc.gov/brfss/brfssprevalence/ (accessed on 23 August 2025).
- Osborn, C.Y.; Amico, K.R.; Cruz, N.; O’Connell, A.A.; Perez-Escamilla, R.; Kalichman, S.C.; Wolf, S.A.; Fisher, J.D. A Brief Culturally Tailored Intervention for Puerto Ricans with Type 2 Diabetes. Health Educ. Behav. 2010, 37, 849–862. [Google Scholar] [CrossRef] [PubMed]
- Haddock, L.; Conty, I.T. de Prevalence Rates for Diabetes Mellitus in Puerto Rico. Diabetes Care 1991, 14, 676–684. [Google Scholar] [CrossRef] [PubMed]
- Lerman Ginzburg, S. Sweetened Syndemics: Diabetes, Obesity, and Politics in Puerto Rico. J. Public Health 2022, 30, 701–709. [Google Scholar] [CrossRef]
- Pérez, C.M.; Soto-Salgado, M.; Suárez, E.; Guzmán, M.; Ortiz, A.P. High Prevalence of Diabetes and Prediabetes and Their Coexistence with Cardiovascular Risk Factors in a Hispanic Community. J. Immigr. Minor. Heal. 2015, 17, 1002–1009. [Google Scholar] [CrossRef]
- Lin, R.; Ye, Z.; Wang, H.; Wu, B. Chronic Diseases and Health Monitoring Big Data: A Survey. IEEE Rev. Biomed. Eng. 2018, 11, 275–288. [Google Scholar] [CrossRef]
- Newton-Dame, R.; McVeigh, K.H.; Schreibstein, L.; Perlman, S.; Lurie-Moroni, L.; Jacobson, L.; Greene, C.; Snell, E.; Thorpe, L.E. Design of the New York City Macroscope: Innovations in Population Health Surveillance Using Electronic Health Records. eGEMs 2016, 4, 26. [Google Scholar] [CrossRef]
- Birkhead, G.S.; Klompas, M.; Shah, N.R. Uses of Electronic Health Records for Public Health Surveillance to Advance Public Health. Annu. Rev. Public Health 2015, 36, 345–359. [Google Scholar] [CrossRef]
- Perlman, S.E.; McVeigh, K.H.; Thorpe, L.E.; Jacobson, L.; Greene, C.M.; Gwynn, R.C. Innovations in Population Health Surveillance: Using Electronic Health Records for Chronic Disease Surveillance. Am. J. Public Health 2017, 107, 853–857. [Google Scholar] [CrossRef]
- Tarabichi, Y.; Goyden, J.; Liu, R.; Lewis, S.; Sudano, J.; Kaelber, D.C. A Step Closer to Nationwide Electronic Health Record–Based Chronic Disease Surveillance: Characterizing Asthma Prevalence and Emergency Department Utilization from 100 Million Patient Records through a Novel Multisite Collaboration. J. Am. Med Informatics Assoc. 2020, 27, 127–135. [Google Scholar] [CrossRef] [PubMed]
- Robbins, T.; Lim Choi Keung, S.N.; Sankar, S.; Randeva, H.; Arvanitis, T.N. Diabetes and the Direct Secondary Use of Electronic Health Records: Using Routinely Collected and Stored Data to Drive Research and Understanding. Digit Health 2018, 4, 2055207618804650. [Google Scholar] [CrossRef] [PubMed]
- Rassen, J.A.; Bartels, D.B.; Schneeweiss, S.; Patrick, A.R.; Murk, W. Measuring Prevalence and Incidence of Chronic Conditions in Claims and Electronic Health Record Databases. Clin. Epidemiol. 2018, 11, 1–15. [Google Scholar] [CrossRef] [PubMed]
- Paul, M.M.; Greene, C.M.; Newton-Dame, R.; Thorpe, L.E.; Perlman, S.E.; McVeigh, K.H.; Gourevitch, M.N. The State of Population Health Surveillance Using Electronic Health Records: A Narrative Review. Popul. Health Manag. 2015, 18, 209–216. [Google Scholar] [CrossRef]
- Young, J.C.; Conover, M.M.; Jonsson Funk, M. Measurement Error and Misclassification in Electronic Medical Records: Methods to Mitigate Bias. Curr. Epidemiol. Rep. 2018, 5, 343–356. [Google Scholar] [CrossRef]
- Taksler, G.B.; Dalton, J.E.; Perzynski, A.T.; Rothberg, M.B.; Milinovich, A.; Krieger, N.I.; Dawson, N.V.; Roach, M.J.; Lewis, M.D.; Einstadter, D. Opportunities, Pitfalls, and Alternatives in Adapting Electronic Health Records for Health Services Research. Med. Decis. Mak. 2021, 41, 133–142. [Google Scholar] [CrossRef]
- Rogan, W.J.; Gladen, B. Estimating Prevalence from the Results of a Screening Test. Am. J. Epidemiol. 1978, 107, 71–76. [Google Scholar] [CrossRef]
- Richardson, M.J.; Van Den Eeden, S.K.; Roberts, E.; Ferrara, A.; Paulukonis, S.; English, P. Evaluating the Use of Electronic Health Records for Type 2 Diabetes Surveillance in 2 California Counties, 2010–2014. Public Health Rep. 2017, 132, 463–470. [Google Scholar] [CrossRef]
- Chan, P.Y.; Zhao, Y.; Lim, S.; Perlman, S.E.; McVeigh, K.H. Using Calibration to Reduce Measurement Error in Prevalence Estimates Based on Electronic Health Records. Prev. Chronic Dis. 2018, 15, 180371. [Google Scholar] [CrossRef]
- Forrey, A.W.; McDonald, C.J.; DeMoor, G.; Huff, S.M.; Leavelle, D.; Leland, D.; Fiers, T.; Charles, L.; Griffin, B.; Stalling, F.; et al. Logical Observation Identifier Names and Codes (LOINC) Database: A Public Use Set of Codes and Names for Electronic Reporting of Clinical Laboratory Test Results. Clin. Chem. 1996, 42, 81–90. [Google Scholar] [CrossRef]
- ElSayed, N.A.; McCoy, R.G.; Aleppo, G.; Balapattabi, K.; Beverly, E.A.; Briggs Early, K.; Bruemmer, D.; Ebekozien, O.; Echouffo-Tcheugui, J.B.; Ekhlaspour, L.; et al. 2. Diagnosis and Classification of Diabetes: Standards of Care in Diabetes—2025. Diabetes Care 2025, 48, S27–S49. [Google Scholar] [CrossRef]
- United States Census Bureau 2020 Census Results. Available online: https://www.census.gov/programs-surveys/decennial-census/decade/2020/2020-census-results.html (accessed on 22 November 2025).
- Pagana, K.D.; Pagana, T.J. Laboratory Tests and Diagnostic Procedures, 5th ed.; Mosby: St. Louis, MO, USA, 2012; ISBN 9780323057479. [Google Scholar]
- Jacobs, D.; Blackburn, H.; Higgins, M.; Reed, D.; Iso, H.; McMillan, G.; Neaton, J.; Nelson, J.; Potter, J.; Rifkind, B. Report of the Conference on Low Blood Cholesterol: Mortality Associations. Circulation 1992, 86, 1046–1060. [Google Scholar] [CrossRef]
- Agency for Healthcare Research and Quality Social Determinants of Health Database. Available online: https://www.ahrq.gov/sdoh/data-analytics/sdoh-data.html (accessed on 22 November 2025).
- Loinc.org. Comprehensive Metabolic 2000 Panel—Seurum or Plasma. Available online: https://loinc.org/24323-8 (accessed on 22 November 2025).






| Sex | Persons | Results | Fasting Plasma Glucose (FPG) | Hemoglobin A1c (A1C) |
|---|---|---|---|---|
| Female | 433,641 | 1,065,532 | 744,054 | 321,478 |
| Male | 300,206 | 716,516 | 500,030 | 216,486 |
| Both | 733,847 | 1,782,048 | 1,244,084 | 537,964 |
| Sex | (%) | (%) | (%) | (%) |
|---|---|---|---|---|
| Female | 16.3 | 12.5 | 19.9 | 25.7 |
| Male | 20.0 | 16.0 | 26.0 | 32.6 |
| Both | 18.0 | 14.1 | 22.8 | 28.9 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Betancourt, J.; Betancourt, E.; Roche-Lima, A.; Velev, J. Chronic Disease Monitoring: Methodology for Classification Error and Self-Selection Bias Correction in Clinical Laboratory Data. Healthcare 2025, 13, 3056. https://doi.org/10.3390/healthcare13233056
Betancourt J, Betancourt E, Roche-Lima A, Velev J. Chronic Disease Monitoring: Methodology for Classification Error and Self-Selection Bias Correction in Clinical Laboratory Data. Healthcare. 2025; 13(23):3056. https://doi.org/10.3390/healthcare13233056
Chicago/Turabian StyleBetancourt, Jesuan, Efrain Betancourt, Abiel Roche-Lima, and Julian Velev. 2025. "Chronic Disease Monitoring: Methodology for Classification Error and Self-Selection Bias Correction in Clinical Laboratory Data" Healthcare 13, no. 23: 3056. https://doi.org/10.3390/healthcare13233056
APA StyleBetancourt, J., Betancourt, E., Roche-Lima, A., & Velev, J. (2025). Chronic Disease Monitoring: Methodology for Classification Error and Self-Selection Bias Correction in Clinical Laboratory Data. Healthcare, 13(23), 3056. https://doi.org/10.3390/healthcare13233056

