Fundamentals of Analysis of Health Data for Non-Physicians
Abstract
1. Introduction
2. Methods
2.1. Methodology
2.2. Data Sources
2.3. Outlier Analysis
2.4. Age-Adjusted Rates
3. Results
Pillars for a Health Data Scientist
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
ICD-10-CM | International Classification for Diseases, 10th revision, Clinical Modification |
AHRDMs | Avoidable Hospitalizations Related to Diabetes Mellitus |
DGIS | Dirección General de Informacion de Salud (due to terms of use, acronym must be in Spanish |
AHRQ | Agency for Healthcare Research and Quality |
ACSC | Ambulatory Care-Sensitive Condition |
PQI | Prevention Quality Indicator |
MA | Metropolitan Area of Mexico City |
SS | Secretaría de Salud (due to terms of use, acronym must be in spanish) |
Appendix A. Population of the Metropolitan Area of Mexico City
Data Cleansing and Pre-Processing
Year | Metropolitan Area Cases | Population (Habs.) | ||
---|---|---|---|---|
Years-Old | Women | Men | Women | Men |
2010 | 3611 (100%) | 3460 (100%) | 8,010,757 | 7,149,639 |
20–44 | 768 (21.3%) | 903 (26.1%) | 4,958,925 | 4,569,823 |
45–64 | 1800 (49.8%) | 1841 (53.2%) | 2,236,828 | 1,961,667 |
+65 | 1043 (28.9%) | 716 (20.7%) | 815,004 | 618,149 |
2011 | 4001 (100%) | 4088 (100%) | 8,173,368 | 7,299,250 |
20–44 | 869 (21.7%) | 1034 (25.3%) | 4,983,308 | 4,602,965 |
45–64 | 2069 (51.7%) | 2227 (54.5%) | 2,328,376 | 2,040,086 |
+65 | 1063 (26.6%) | 827 (20.2%) | 861,684 | 656,199 |
2012 | 3360 (100%) | 3795 (100%) | 8,335,979 | 7,448,860 |
20–44 | 834 (22.8%) | 913 (24.1%) | 5,007,691 | 4,636,107 |
45–64 | 1877 (51.3%) | 2087 (55.0%) | 2,419,923 | 2,118,504 |
+65 | 949 (25.9%) | 795 (20.9%) | 908,365 | 694,249 |
2013 | 3544 (100%) | 3565 (100%) | 8,498,589 | 7,598,472 |
20–44 | 785 (22.2%) | 888 (24.9%) | 5,032,073 | 4,669,249 |
45–64 | 1775 (50.1%) | 1986 (55.7%) | 2,511,471 | 2,196,923 |
+65 | 984 (27.8%) | 691 (19.4%) | 955,045 | 732,300 |
2014 | 3354 (100%) | 3720 (100%) | 8,661,201 | 7,748,083 |
20–44 | 688 (20.5%) | 854 (23.0%) | 5,056,456 | 4,702,391 |
45–64 | 1702 (50.7%) | 1977 (53.1%) | 2,603,019 | 2,275,342 |
+65 | 964 (28.7%) | 889 (23.9%) | 1,001,726 | 770,350 |
2015 | 3276 (100%) | 3399 (100%) | 8,823,812 | 7,897,695 |
20–44 | 717 (21.9%) | 780 (22.9%) | 5,080,839 | 4,735,534 |
45–64 | 1586 (48.4%) | 1867 (54.9%) | 2,694,567 | 2,353,761 |
+65 | 973 (29.7%) | 752 (22.1%) | 1,048,406 | 808,400 |
2016 | 3565 (100%) | 3913 (100%) | 8,986,422 | 8,047,305 |
20–44 | 843 (23.6%) | 920 (23.5%) | 5,105,222 | 4,768,676 |
45–64 | 1763 (49.5%) | 2154 (55.0%) | 2,786,114 | 2,432,179 |
+65 | 959 (26.9%) | 839 (21.4%) | 1,095,086 | 846,450 |
2017 | 3544 (100%) | 3994 (100%) | 9,149,034 | 8,196,916 |
20–44 | 750 (21.2%) | 891 (22.3%) | 5,129,605 | 4,801,818 |
45–64 | 1810 (51.1%) | 2256 (56.5%) | 2,877,662 | 2,510,598 |
+65 | 984 (27.8%) | 847 (21.2%) | 1,141,767 | 884,500 |
2018 | 3678 (100%) | 3852 (100%) | 9,311,644 | 8,346,528 |
20–44 | 839 (22.8%) | 860 (22.3%) | 5,153,987 | 4,834,960 |
45–64 | 1865 (50.7%) | 2180 (56.6%) | 2,969,210 | 2,589,017 |
+65 | 974 (26.5%) | 812 (21.1%) | 1,188,447 | 922,551 |
2019 | 3943 (100%) | 4386 (100%) | 9,474,255 | 8,496,138 |
20–44 | 907 (23.0%) | 1052 (24.0%) | 5,178,370 | 4,868,102 |
45–64 | 1982 (50.3%) | 2436 (55.5%) | 3,060,757 | 2,667,435 |
+65 | 1054 (26.7%) | 898 (20.5%) | 1,235,128 | 960,601 |
2020 | 2173 (100%) | 2594 (100%) | 9,636,866 | 8,645,749 |
20–44 | 595 (27.4%) | 728 (28.1%) | 5,202,753 | 4,901,244 |
45–64 | 1050 (48.3%) | 1349 (52.0%) | 3,152,305 | 2,745,854 |
+65 | 528 (24.3%) | 517 (19.9%) | 1,281,808 | 998,651 |
2021 | 2023 (100%) | 2348 (100%) | 9,799,477 | 8,795,360 |
20–44 | 524 (25.9%) | 537 (22.9%) | 5,227,136 | 4,934,386 |
45–64 | 971 (48.0%) | 1356 (57.8%) | 3,243,853 | 2,824,273 |
+65 | 528 (26.1%) | 455 (19.4%) | 1,328,488 | 1,036,701 |
2022 | 2607 (100%) | 3036 (100%) | 9,962,088 | 8,944,970 |
20–44 | 614 (23.6%) | 744 (24.5%) | 5,251,519 | 4,967,528 |
45–64 | 1285 (49.3%) | 1650 (54.3%) | 3,335,400 | 2,902,691 |
+65 | 708 (27.2%) | 642 (21.1%) | 1,375,169 | 1,074,751 |
2023 | 3314 (100%) | 2697 (100%) | 10,124,698 | 9,094,582 |
20–44 | 791 (23.9%) | 655 (24.3%) | 5,275,901 | 5,000,670 |
45–64 | 1800 (54.3%) | 1325 (49.1%) | 3,426,948 | 2,981,110 |
+65 | 723 (21.8%) | 717 (26.6%) | 1,421,849 | 1,112,802 |
References
- Noncommunicable Diseases. Available online: https://www.who.int/news-room/fact-sheets/detail/noncommunicable-diseases (accessed on 18 April 2024).
- Salas-Zapata, L.; Palacio-Mejía, L.S.; Aracena-Genao, B.; Hernández-Ávila, J.E.; Nieto-Lopez, E.S. Costos Directos de las hospitalizaciones por diabetes mellitus en el Instituto Mexicano del Seguro Social. Gac. Sanit. 2018, 32, 209–215. [Google Scholar] [CrossRef] [PubMed]
- International Diabetes Federation. IDF Diabetes Atlas, 10th ed.; International Diabetes Federation: Brussels, Belgium, 2021; ISBN 978-2-930229-98-0. [Google Scholar]
- Agudelo, M.; Murillo, J.; Gutierrez, L.; Giraldo, L. Hospitalizaciones y Muertes Evitables por Condiciones Sensibles a Atención Primaria en Salud. México, 2005–2014; Consejo Nacional de Población: Mexico City, Mexico, 2017. [Google Scholar]
- Flores, S.; Acosta, O.; Hernández, M.I.; Delgado, S.; Reyes, H. Calidad de la atención en diabetes tipo 2, avances y retos de 2012 a 2018–2019 para el sistema de salud de México. Salud Publica Mex. 2020, 62, 618–626. [Google Scholar] [CrossRef] [PubMed]
- Wood, S.M.; Yue, M.; Kotsis, S.V.; Seyferth, A.V.; Wang, L.; Chung, K.C. Preventable Hospitalization Trends before and after the Affordable Care Act. AJPM Focus 2022, 1, 100027. [Google Scholar] [CrossRef] [PubMed]
- Saxena, A.; Ramamoorthy, V.; Rubens, M.; McGranaghan, P.; Veledar, E.; Nasir, K. Trends in quality of primary care in the United States, 2007–2016. Sci. Rep. 2022, 12, 1982. [Google Scholar] [CrossRef] [PubMed]
- Olawade, D.; Wada, O.; Kunonga, E.; Abaire, O.; Ling, J. Using artificial intelligence to improve public health: A narrative review. Front. Public Health 2023, 11, 1196397. [Google Scholar] [CrossRef] [PubMed]
- INEGI (Mexico), Press Release No. 657/22. 10 November 2022.
- Agency for Healthcare Research and Quality. Prevention Quality Indicator 93 (PQI 93), Prevention Quality Diabetes Composite. In AHRQ Quality Indicators ICD-10-CM/PCS Specification; Agency for Healthcare Research and Quality: Rockville, MD, USA, 2023. [Google Scholar]
- Rattanavipapong, W.; Wang, Y.; Butchon, R.; Kittiratchakool, N.; Thammatacharee, J.; Teerawattananon, Y.; Isaranuwatchai, W. Retrospective secondary data analysis to identify high-cost users in inpatient department of hospitals in Thailand, a middle-income country with universal healthcare coverage. BMJ Open 2021, 11, e047330. [Google Scholar] [CrossRef] [PubMed]
- Goode, W.; Punjabi, V.; Niewiara, J.; Roberts, L.; Bruce, J.; Silva, S.; Morgan, B.; Pereira, K.; Brysiewicz, P.; Clarke, D. Using a Retrospective Secondary Data Analysis to Identify Risk Factors for Pulmonary Complications in Trauma Patients in Pietermaritzburg, South Africa. J. Surg. Res. 2021, 262, 47–56. [Google Scholar] [CrossRef] [PubMed]
- Cao, L. Data Science: A Comprehensive Overview. ACM Comput. Surv. 2017, 50, 43. [Google Scholar] [CrossRef]
- Jager, K.J.; Zocalli, C.; MacLeod, A.; Dekker, F.W. Confounding: What it is and how to deal with it. Kidney Int. 2008, 73, 256–260. [Google Scholar] [CrossRef] [PubMed]
- Naing, N.N. Easy way to learn standardization: Direct and indirect methods. Malays. J. Med. Sci. 2000, 7, 10. [Google Scholar] [PubMed] [PubMed Central]
- Higham, J.; Flowers, J.; Hall, P. Standardization. Inf. Public Health Obs. Recomm. Methods 2005, 6, 1–8. [Google Scholar]
- Merchant, A.T. Standardization. In Statistical Approaches for Epidemiology, 3rd ed.; Mitra, A.K., Ed.; Springer: Cham, Switzerland, 2024; pp. 147–154. [Google Scholar] [CrossRef]
- Keiding, N.; Clayton, D. Standardization and Control for Confounding in observational studies: A Historical Perspective. Stat. Sci. 2014, 29, 529–558. [Google Scholar] [CrossRef]
- Goldstein, N.D.; LeVasseur, M.; McClure, L.A. On the Convergence of Epidemiology, Biostatistics, and Data Science. Harv. Data Sci. Rev. 2020, 2. [Google Scholar] [CrossRef]
- Noncommunicable Diseases. Available online: https://ec.europa.eu/eurostat/web/interactive-publications/demography-2023 (accessed on 18 April 2024).
- StataCorp. Stata Statistical Software: Release 18; StataCorp LLC: College Station, TX, USA, 2023. [Google Scholar]
Year | National | After | MA | MA Diabetes | Population |
---|---|---|---|---|---|
Records | Cleaning | Records | Cases | Habs. * | |
2010 | 2,634,339 | 2,632,251 (99.92%) | 510,888 (19.41%) | 7071 (1.38%) | 15,160,396 |
2011 | 2,775,189 | 2,774,330 (99.97%) | 536,111 (19.32%) | 8089 (1.51%) | 15,472,618 |
2012 | 2,880,706 | 2,880,075 (99.98%) | 566,765 (19.68%) | 7455 (1.32%) | 15,784,839 |
2013 | 2,879,313 | 2,879,052 (99.99%) | 576,107 (20.01%) | 7109 (1.23%) | 16,097,061 |
2014 | 2,959,197 | 2,958,924 (99.99%) | 603,358 (20.39%) | 7074 (1.17%) | 16,409,284 |
2015 | 2,970,812 | 2,970,483 (99.99%) | 581,673 (19.58%) | 6675 (1.15%) | 16,721,507 |
2016 | 2,955,144 | 2,952,697 (99.92%) | 599,528 (20.30%) | 7478 (1.25%) | 17,033,727 |
2017 | 2,729,341 | 2,715,873 (99.51%) | 535,983 (19.74%) | 7538 (1.41%) | 17,345,950 |
2018 | 2,623,379 | 2,622,560 (99.97%) | 530,078 (20.21%) | 7530 (1.42%) | 17,658,172 |
2019 | 2,629,434 | 2,628,771 (99.97%) | 515,396 (19.61%) | 8329 (1.62%) | 17,970,393 |
2020 | 1,937,344 | 1,934,458 (99.85%) | 387,942 (20.05%) | 4767 (1.23%) | 18,282,615 |
2021 | 2,088,780 | 2,088,352 (99.98%) | 391,540 (18.75%) | 4371 (1.12%) | 18,594,837 |
2022 | 2,203,636 | 2,197,685 (99.73%) | 372,418 (16.95%) | 5643 (1.52%) | 18,907,058 |
2023 | 2,399,179 | 2,390,869 (99.65%) | 381,771 (15.97%) | 6011 (1.57%) | 19,219,280 |
Total | 36,665,793 | 36,626,380 (99.89%) | 7,089,558 (19.36%) | 95,140 (1.34%) | 240,657,737 |
Year | Cases | Population (Habs.) | Crude Rates (×100,000 Habs.) | |||
---|---|---|---|---|---|---|
Women | Men | Women | Men | Women | Men | |
(A) | (B) | (C) | (D) | (A/C) | (B/D) | |
2010 | 3611 | 3460 | 8,010,757 | 7,149,639 | 45.08 | 48.39 |
2011 | 4001 | 4088 | 8,173,368 | 7,299,250 | 48.95 | 56.01 |
2012 | 3360 | 3795 | 8,335,979 | 7,448,860 | 43.91 | 50.95 |
2013 | 3544 | 3565 | 8,498,589 | 7,598,472 | 41.70 | 46.92 |
2014 | 3354 | 3720 | 8,661,201 | 7,748,083 | 38.72 | 48.01 |
2015 | 3276 | 3399 | 8,823,812 | 7,897,695 | 37.13 | 43.04 |
2016 | 3565 | 3913 | 8,986,422 | 8,047,305 | 39.67 | 48.63 |
2017 | 3544 | 3994 | 9,149,034 | 8,196,916 | 38.74 | 48.73 |
2018 | 3678 | 3852 | 9,311,644 | 8,346,528 | 39.50 | 46.15 |
2019 | 3943 | 4386 | 9,474,255 | 8,496,138 | 41.62 | 51.62 |
2020 | 2173 | 2594 | 9,636,866 | 8,645,749 | 22.55 | 30.00 |
2021 | 2023 | 2348 | 9,799,477 | 8,795,360 | 20.64 | 26.70 |
2022 | 2607 | 3036 | 9,962,088 | 8,944,970 | 26.17 | 33.94 |
2023 | 3314 | 2697 | 10,124,698 | 9,094,582 | 32.73 | 29.65 |
Year | Crude Rate | Specific Rate * | Age and Sex Adjusted CI 95% | |||
---|---|---|---|---|---|---|
(Lower,Upper) | ||||||
[Error] | ||||||
Years-Old | Women | Men | Women | Men | Women | Men |
2010 | ||||||
20–44 | 15.49 | 19.76 | 8.98 | 11.46 | 48.48 | 53.52 |
45–64 | 80.47 | 93.85 | 24.01 | 31.03 | (46.90,50.07) | (51.71,55.32) |
+65 | 127.97 | 115.83 | 14.03 | 15.45 | [0.81] | [0.92] |
2011 | ||||||
20–44 | 17.44 | 22.46 | 10.12 | 13.03 | 51.58 | 60.86 |
45–64 | 88.86 | 109.03 | 26.66 | 32.71 | (49.98,53.18) | (59.02,62.78) |
+65 | 123.36 | 126.03 | 14.8 | 15.12 | [0.82] | [0.96] |
2012 | ||||||
20–44 | 16.65 | 19.69 | 9.66 | 11.42 | 45.47 | 54.71 |
45–64 | 77.56 | 98.51 | 23.27 | 29.55 | (43.99,46.94) | (52.97,56.47) |
+65 | 104.47 | 114.51 | 12.54 | 13.74 | [0.75] | [0.89] |
2013 | ||||||
20–44 | 15.60 | 19.02 | 9.05 | 11.03 | 42.61 | 49.47 |
45–64 | 70.68 | 90.40 | 21.20 | 27.12 | (41.21,44.02) | (47.84,51.10) |
+65 | 103.03 | 94.36 | 12.36 | 11.32 | [0.72] | [0.83] |
2014 | ||||||
20–44 | 13.61 | 18.16 | 7.89 | 10.53 | 39.06 | 50.45 |
45–64 | 65.39 | 86.89 | 19.62 | 26.07 | (37.73,40.38) | (48.82,52.07) |
+65 | 96.23 | 115.40 | 11.55 | 13.85 | [0.67] | [0.83] |
2015 | ||||||
20–44 | 14.11 | 16.47 | 8.18 | 9.55 | 36.98 | 44.51 |
45–64 | 58.86 | 79.32 | 17.66 | 23.80 | (35.71,38.25) | (43.01,46.01) |
+65 | 92.81 | 93.02 | 11.14 | 11.16 | [0.65] | [0.77] |
2016 | ||||||
20–44 | 16.51 | 19.29 | 9.58 | 11.19 | 39.07 | 49.65 |
45–64 | 63.28 | 88.56 | 18.98 | 26.57 | (37.79,40.35) | (48.09,51.21) |
+65 | 87.57 | 99.12 | 10.51 | 11.89 | [0.65] | [0.79] |
2017 | ||||||
20–44 | 14.62 | 18.56 | 8.48 | 10.76 | 37.69 | 49.21 |
45–64 | 62.90 | 89.86 | 18.87 | 26.96 | (36.45,38.93) | (47.68,50.74) |
+65 | 86.18 | 95.76 | 10.34 | 11.49 | [0.63] | [0.78] |
2018 | ||||||
20–44 | 16.28 | 17.79 | 9.44 | 10.32 | 38.12 | 46.14 |
45–64 | 62.81 | 84.20 | 18.84 | 25.26 | (36.89,39.35) | (44.68,47.60) |
+65 | 81.96 | 88.02 | 9.84 | 10.56 | [0.63] | [0.74] |
2019 | ||||||
20–44 | 17.52 | 21.61 | 10.16 | 12.53 | 39.83 | 51.15 |
45–64 | 64.76 | 91.32 | 19.43 | 27.40 | (38.58,41.07)) | (49.63,52.66) |
+65 | 85.34 | 93.48 | 10.24 | 11.22 | [0.64] | [0.77] |
2020 | ||||||
20–44 | 11.44 | 14.85 | 6.64 | 8.61 | 21.57 | 29.56 |
45–64 | 33.31 | 49.13 | 9.99 | 14.74 | (20.66,22.48) | (28.43,30.70) |
+65 | 41.19 | 51.77 | 4.94 | 6.21 | [0.46] | [0.58] |
2021 | ||||||
20–44 | 10.02 | 10.88 | 5.81 | 6.31 | 19.56 | 25.98 |
45–64 | 29.93 | 48.01 | 8.98 | 14.40 | (18.71,20.42) | (24.93,27.03) |
+65 | 39.74 | 43.89 | 4.77 | 5.27 | [0.44] | [0.54] |
2022 | ||||||
20–44 | 11.69 | 14.98 | 6.78 | 8.69 | 24.53 | 32.91 |
45–64 | 38.53 | 56.84 | 11.56 | 17.05 | (23.57,25.46) | (31.74,34.08) |
+65 | 51.48 | 59.73 | 6.19 | 7.17 | [0.48] | [0.60] |
2023 | ||||||
20–44 | 14.99 | 13.10 | 8.70 | 7.60 | 30.55 | 28.67 |
45–64 | 52.52 | 44.45 | 15.76 | 13.33 | (29.51,31.60) | (27.58,29.75) |
+65 | 50.85 | 64.43 | 6.10 | 7.73 | [0.53] | [0.55] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hernández-Nava, C.; Mata-Rivera, M.-F.; Flores-Hernández, S. Fundamentals of Analysis of Health Data for Non-Physicians. Data 2024, 9, 112. https://doi.org/10.3390/data9100112
Hernández-Nava C, Mata-Rivera M-F, Flores-Hernández S. Fundamentals of Analysis of Health Data for Non-Physicians. Data. 2024; 9(10):112. https://doi.org/10.3390/data9100112
Chicago/Turabian StyleHernández-Nava, Carlos, Miguel-Félix Mata-Rivera, and Sergio Flores-Hernández. 2024. "Fundamentals of Analysis of Health Data for Non-Physicians" Data 9, no. 10: 112. https://doi.org/10.3390/data9100112
APA StyleHernández-Nava, C., Mata-Rivera, M.-F., & Flores-Hernández, S. (2024). Fundamentals of Analysis of Health Data for Non-Physicians. Data, 9(10), 112. https://doi.org/10.3390/data9100112