Next Article in Journal
Proactive Agent Behaviour in Dynamic Distributed Constraint Optimisation Problems
Previous Article in Journal
Transformer-Based Approach to Pathology Diagnosis Using Audio Spectrogram
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

MortalityMinder: Visualization and AI Interpretations of Social Determinants of Premature Mortality in the United States

1
Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY 12180, USA
2
Future of Computing Institute, Rensselaer Polytechnic Institute, Troy, NY 12180, USA
3
Department of Mathematical Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180, USA
*
Author to whom correspondence should be addressed.
Information 2024, 15(5), 254; https://doi.org/10.3390/info15050254
Submission received: 29 March 2024 / Revised: 12 April 2024 / Accepted: 19 April 2024 / Published: 30 April 2024
(This article belongs to the Special Issue Interactive Machine Learning and Visual Data Mining)

Abstract

:
MortalityMinder enables healthcare researchers, providers, payers, and policy makers to gain actionable insights into where and why premature mortality rates due to all causes, cancer, cardiovascular disease, and deaths of despair rose between 2000 and 2017 for adults aged 25–64. MortalityMinder is designed as an open-source web-based visualization tool that enables interactive analysis and exploration of social, economic, and geographic factors associated with mortality at the county level. We provide case studies to illustrate how MortalityMinder finds interesting relationships between health determinants and deaths of despair. We also demonstrate how GPT-4 can help translate statistical results from MortalityMinder into actionable insights to improve population health. When combined with MortalityMinder results, GPT-4 provides hypotheses on why socio-economic risk factors are associated with mortality, how they might be causal, and what actions could be taken related to the risk factors to improve outcomes with supporting citations. We find that GPT-4 provided plausible and insightful answers about the relationship between social determinants and mortality. Our work is a first step towards enabling public health stakeholders to automatically discover and visualize relationships between social determinants of health and mortality based on available data and explain and transform these into meaningful results using artificial intelligence.

1. Introduction

Midlife mortality rates have been rising across the United States (US) since before the COVID-19 pandemic [1]. To understand why, we consider the environmental conditions and social determinants that contribute to health outcomes such as cancer, cardiovascular disease, and deaths of despair. Community health is affected by environmental conditions such as access to clinical care, education, employment, and social connectivity, which varies across different geographical regions [2]. The health of individuals is also affected by their place of birth, age, gender, race, socioeconomic status, etc., which are referred to as the social determinants of health. The Health People 2030 framework provides lists of the many studies covering 17 different types of social determinants of health and how they play a significant role in health inequities across the United States. An example is the pivot study of Stein et al. [3], which found that “deaths of despair”, which are deaths due to suicide and substance abuse, have increased dramatically among white males between the ages of 25–64, particularly those who live in rural America. In 2020, we saw that the rate of deaths due to COVID-19 varied significantly between different states and counties, and was also affected by social determinants. For example, African Americans are dying from COVID-19 at a higher rate across the country [4]. With rare exceptions [5], few interactive tools are available for stakeholders such as healthcare researchers, providers, payers, and policy makers to gain data-driven insights into which social determinants are associated with the trends of increasing mortality for states and counties of interest. While prior studies have used machine learning to predict and examine social determinants of health [6], this work identifies and visualize potential health disparities by combining a risk group analysis approach with interactive infographics. Ong et al. [7] discuss how AI methods such as large language models could be transformative tools in social determinant research. Here, we show how AI methods can be used to identify potential mechanisms for translating observed health disparities into actions to improve public health outcomes.
Regional disparities exist for all causes of premature death. A person living in Alabama is more likely than someone living in California to die from heart disease or stroke before the age of 65 [8]. MortalityMinder illuminates where these health disparities exist, and what we can do about them. MortalityMinder uses county-level mortality rate data from CDC WONDER [9] to analyze trends among U.S. adults aged 25–64 from 2000 to 2017. Using county-level surveillance data from County Health Rankings [2], MortalityMinder identifies social and economic factors associated with differences in mortality trends at the county-level for US and individual states. For example, MortalityMinder found that people in counties with higher rates of diabetes, poverty, food insecurity, and mental distress rates were more likely to die under the age 65 due to heart disease or stroke. MortalityMinder was awarded third place in the AHRQ’s Visualization of Community-Level Social Determinants of Health Challenge [10] for its innovative social determinants analysis methods and compelling dynamic visualizations. With MortalityMinder, a user can select the region (specific state or US) and the cause of death (all causes, cancer, cardiovascular, or deaths of despair) and the application will dynamically create three analysis and visualization infographics.
To address the need to make data-driven insights in social determinants of health actionable, we demonstrate how to use large language models (LLMs), such as GPT-4 [11], to further analyze social determinants in MortalityMinder in order to understand their association with increased mortality and potential improvements. MortalityMinder is an example of a public health observational study (PHOS), i.e., observational studies that analyze the associations between population health and risk factors. Here, we tackle one of the major weaknesses of PHOS. While thousands of PHOS are performed, few are translated into actionable insights that lead to better health. LLMs can help address the challenge of enabling population health stakeholders to understand the meaning of PHOS. Do the PHOS findings reveal potential causes of population health problems? To address this, we need to go beyond the statistics of the PHOS. We must understand relationships between significant risk factors and health outcomes in the context of the healthcare socio-economic ecosystem in which the subjects in the data live. If further investigation of the findings supports that risk factors can impact the outcome, what kind of program and activities can be created to mitigate harmful factors and improve beneficial ones? Currently, it takes a team of individuals with diverse expertise to interpret PHOS findings and translate them into strategies to improve population health. We present our findings on how LLMs can help us do a better job.
Thus, we propose MortalityMinder as an interactive tool for policymakers to gain actionable insights into regional disparities in mortality trends across the United States and address the following objectives in this work:
  • Create novel analysis-to-visualization methods to find associations between county-level social determinants and multiyear mortality trends with interactive infographic visualizations in the MortalityMinder app.
  • Provide use cases to show how stakeholders using MortalityMinder can achieve data-driven insights into social determinants of health associated with mortality trends at the national, state, and county levels based on US data from 2000 to 2017.
  • Leverage LLMs to answer WHY, CAUSALITY, and ACTION queries to help users develop actionable insights about the identified social determinants to support public health.

2. Materials and Methods

2.1. Data Sources

MortalityMinder uses county-level mortality rates and social and economic factors measurement data for its analysis. Mortality rates per 100 K from 2000–2017 are obtained through the CDC WONDER portal [9], the definitive source of mortality information in the United States. Social determinants data for 2015–2017 are obtained through County Health Rankings (CHR) [2], an aggregate of county-level data curated by the Robert Wood Johnson Foundation. MortalityMinder considers 70+ factors from twenty sources, including data sets from BRFSS [12], the Bureau of Labor Statistics [13], the FBI [14], and many others. MortalityMinder focuses on premature midlife deaths attributed to leading causes of death, including “Deaths of Despair”, “Cardiovascular”, “Cancer”, and “All Cause”. The ICD-10 definitions for the causes are taken from Stein et al. [3].

2.2. Data Processing

Age-specific mortality rates for the year 2000 to 2017 were aggregated into three-year chunks (2000–2002, 2003–2005, 2006–2008, 2009–2011, 2012–2014, and 2015–2017) for each cause of death at the county, state, and national levels. For privacy reasons, CDC WONDER suppresses rates for counties with too few deaths, so calculating rates over three-year chunks ensured more accurate rates.
In conjunction with three-year data aggregation, county mortality rates that were missing or suppressed were imputed using mortality rates for the state using the Amelia package [15] for R to enable effective visualizations. Further, as MortalityMinder aims to capture the actual mortality of Americans at the community level, our analysis is not age-adjusted and captures the real mortality trends by considering all deaths equal. The imputed data sets used for analysis are available for download from GitHub [16].

2.3. Risk Group Clustering

To create effective visualizations that depict rigorous analysis and identify counties with similar mortality trends through time, we clustered the counties into risk groups. For each mortality cause, each county is represented by a five-dimensional vector consisting of mortality rates in 2000–2002, 2003–2005, 2006–2008, 2009–2011, 2012–2014, and 2015–2017. The risk groups are identified by clustering the counties using the K-Means algorithm on the average mortality rates for each county and then ordering the resulting clusters from low to high risk by the average mortality rate of the cluster in 2015–2017. In general, mortality rates for these causes of death were more similar in counties in 2000-2002 and then increased at different rates to their highest rates in 2015–2017. Thus, we used 2015–2017 to rank order the clusters because for every state and cause of mortality combination, we found that 2015–2017 had the highest average mortality rate. Each risk group represents varying susceptibility of people in a county towards premature death by grouping counties with similar trajectories in mean mortality rates between 2000 and 2017. Our proposed clustering approach also smooths out the inherent noise in estimated mortality rates to better reveal mortality trends. For each state, we categorize the counties into low, medium, and high risk groups based on their mortality rates between 2000 and 2017. A similar risk group clustering is performed across US where we group counties into six risk groups ranging from 1: low risk to 6: high risk. Figure 1b illustrates the six risk groups found for deaths of despair in the United States, by plotting the average rates of deaths of despair over time for each risk group.

2.4. Social Determinants Identification

We gathered social determinants (factors) addressing health behaviors, clinical care, education, employment, social supports, community safety, and physical environment domains from County Health Rankings [2]. From a total of 168 different social determinants, we first selected the determinants which were either rates or measurements that represented rates but did not directly reflect county population size. We calculated the Kendall correlation between each factor and the ordered mortality risk groups for each cause of mortality. We selected Kendall correlation because it is rank based. Using multiple hypothesis testing using the Benjamin–Hochberg Method [17], we narrowed our selected determinants to 70 which were relevant to at least one cause of death at the national level. This final set of social determinants were included as part of the socioeconomic factor analysis in MortalityMinder. The complete data flow from data collection to visualization into the MortalityMinder application is described in Figure 2.

2.5. MortalityMinder Application

MortalityMinder is an award-winning application aimed at healthcare researchers, providers, payers, and policy makers to gain actionable insights on how, where, and why mortality is increasing in the community. This shall enable them to develop policies that target causes of mortality relevant to the specific county and demographic. MortalityMinder was designed based on a formal usability study of 20+ users and recommendations from our advisory board of healthcare and design professionals.
MortalityMinder is available as an open-source R project on GitHub [16] with full application code, data, and documentation. R was chosen due to its powerful environment for statistical computing and graphics using standard packages. MortalityMinder utilizes the R Shiny [18] and FullPage JavaScript frameworks [19,20] for web interactivity. The code can be easily customized and maintained while ensuring that it can be extended to incorporate user feedback using an agile framework.
The user can choose to display the analysis for a specific state or the whole nation for all-cause, cardiovascular, cancer or deaths of despair causes of death. The application is split into four pages: the first three pages include specific analysis for the given state and cause of death. The last page includes documentation for the application. The three analysis pages of the application are referred to as “three views”, as described below:
  • National View: The view explores the mortality rates across all counties in United States for a given three-year period for a specific cause of death as a choropleth plot. The change in mortality rates for the selected state is compared with the national level across the years 2000 to 2017, split into three-year chunks.
  • State View: The view depicts the mortality rates across all counties of the selected state. It clusters the counties into risk groups and plots them geographically and as a line plot through time while comparing the results with the national average. Further, the top protective and destructive social determinants for the state are depicted as determined by their Kendall correlation values.
  • Factor View: For the selected county, this view shows a detailed description of the selected social determinant and plots the counties based on the risk groups.
The application is deployed on the R Shiny server. A snapshot of the application for mortality trends in Massachusetts for deaths of despair is shown in Figure 3, Figure 4, Figure 5 and Figure 6. MortalityMinder can be accessed at: https://mortalityminder.idea.rpi.edu (accessed on 11 April 2024).

2.6. Social Determinants Reasoning

Using OpenAI’s GPT-4 [11] via ChatBS, a separate, R-based, customized web interface to the OpenAI API [21,22], we identify three questions that are frequently not well addressed in PHOS, whose answers can help translate the PHOS results into actionable insights. Without loss of generality, we assume the risk factor is Z, the outcome is Y, the population is L, and the PHOS finds that an increase in risk factor Z is associated with either an increase or a decrease in outcome Y. Thus, to improve the PHOS with reasoning and actionable insights, we use GPT-4 to help answer three questions:
  • WHY: Considering L, why is Z associated with higher (or lower) risks of Y?
  • CAUSALITY: Considering L, why would reducing (or increasing) Z cause a reduction in Y?
  • ACTION: Considering L, what actions could be taken related to Z that are likely to cause a reduction in Y?
These questions are used as prompt templates. A user takes a specific finding from the MortalityMinder analysis of interest involving a specific risk factor, outcome and population, and then populates the templates with these values to make prompts for GPT-4. The definitions in MortalityMinder are used to precisely provide the language to describe the desired risk factor, outcome, and population. Examples of the prompts can be seen in Supplementary Material S1.
We selected GPT-4 for this proof-of-concept work in part because the available API provided us with easy programmatic access and in part due to our experience in prompt engineering against the GPT-4 model. Most important was GPT-4’s demonstrated ability to generate “chain-of-thought” rationalizations of its responses, corroborated by existing published evidence. With appropriate prompt engineering and templates, this work demonstrates the potential for LLMs to be used to systematically investigate these questions for risk analysis of PHOS to help provide insights. We utilize the definitions and insights from MortalityMinder to construct zero-shot prompts using GPT-4 to answer the three questions as user prompts along with the following system prompt:
System: You are a helpful assistant. If you provide a published evidence, provide the full reference. Don’t make stuff up.
These prompts are fed into the existing ChatBS application, which allows the retrieval and visualization of the corresponding answers from the OpenAI API. The questions are designed to address the WHY, CAUSALITY, and ACTION questions based on the insights drawn from MortalityMinder. We do not fine-tune the pre-trained GPT-4 with any data, but rather use its existing extensive knowledge base to answer and reason about the observations made in the MortalityMinder. For future work, we plan to integrate the OpenAI API directly into the MortalityMinder application.

3. Results

MortalityMinder shows evidence that health inequities exist between different regions of the United States, at the state and county level. The data show that there is a larger, underlying, community-based picture in all aspects of health and wellness. MortalityMinder dramatically illustrates recently reported mortality rate increases, while providing greater insights into state-level variations and their associated factors to help determine remedies. In the MortalityMinder application, we analyze mortality rate trends across deaths of despair, cardiovascular diseases, cancer, and all causes of death for the years 2000–2017. MortalityMinder provides an in-depth analysis and visualization of mortality trends, while highlighting key social determinants across states and counties. The depicted information enables us to draw case-by-case insights for specific counties and states and are explored as case studies in this section. First, we highlight the mortality trends across United States. Then, using the results, we compare varying social determinants across states and finally, discuss the community-level differences through the case study of Sierra County in New Mexico. Finally, we explore how LLMs can reason about the risk factors identified for premature morality in Massachusetts using a case study of driving alone as the risk factor.

3.1. Mortality Trends across United States

In the United States, midlife deaths due to deaths of despair have increased by a whopping 90.4% from 2000 to 2017. Although the Southwest and southern Appalachian region experience the highest concentrations of mortality in this category, the deaths of despair mortality rates for the United States increased across the board. Figure 1a highlights this variation in the risk groups across various states and counties with a darker color (6, red) indicating high risk and a lighter color (1, yellow) indicating low risk. Figure 1b shows the mortality trends for various county-level risk groups identified based on the mortality rates. The plot highlights that some counties are performing worse than the national average, urging the need to diagnose the causes and focus on specific regions of high mortality. With the prevalence of individuals expiring prematurely due to suicide and substance abuse affecting communities nationwide, it is important to consider the factors associated with deaths of despair which could underscore the underlying causes behind the loss of life.
At the national level, there are several social determinants that are associated with deaths of despair mortality. The top destructive and protective factors are listed in Table 1 along with their Kendall correlation, where a higher absolute value indicates a stronger correlation. These factors can be grouped into the following: mental health (mentally unhealthy days, frequent mental distress, mental health provider rate), physical health (physically unhealthy days, adult smoking, frequent physical distress, other primary care provider rate, diabetes prevalence, insufficient sleep), and socioeconomic status (percent unemployed, segregation, socioeconomic, non-Hispanic white). The correlations reveal that deaths of despair mortality is particularly impacted by mental health, physical health, and socioeconomic status of a community at the national level as shown by high correlation values of mentally unhealthy days, frequent mental distress, physically unhealthy days, percentage of people who are unemployed, and adult smoking.
During the 2007–2009 economic recession, many communities experienced an economic downturn. This impacted the health and wellbeing of many individuals, who were now unable to provide for their families, and in turn experienced poor mental and physical health. This is reflected in the rise of deaths of despair mortality seen at that time. Today, we have seen many of the same issues being exaggerated due to the global COVID-19 pandemic. The unemployment numbers hit record highs within a number of days [23] and collective anxiety about the virus took over. With the nation locked down, mental and physical health plummeted. Since the pandemic, this has led to decreased mental and physical health, and has dramatically affected the socioeconomic status of millions, contributing to a further surge in deaths of despair in the United States amongst all age groups [24].

3.2. State-Level Comparison for Deaths of Despair Mortality

National trends of midlife mortality due to deaths of despair are on the rise primarily due to the increased mortality rates across counties and states. However, not all the states are experiencing the same rate of mortality increase over time. Thus, to understand the underlying trends in mortality across states, we compared Washington, Arizona, New Jersey, and Massachusetts due to their similar population sizes. We wanted to gauge the effect of population size on social determinants and understand the mortality rates for deaths of despair in these states. The original expectation was that states with similar population sizes would attract the same community types and thus, will all be affected by same determinants. However, in contrast to our expectation of having multiple shared determinants, the results were quite different. Each state proved to have a unique community that in turn produced distinct determinants with only slight overlap. We ranked the social determinants in each state by their Kendall correlation as discussed in Section 2.4. The top four destructive and top four protective determinants for each of these states are shown in Table 2.
The top destructive determinants did have some similar determinants across pairs of states, but no single factor was common among all states. For example, food insecurity was the top destructive factor for both Washington and Arizona, and the prevalence of diabetics was common between Washington and Massachusetts. However, all other factors were unique to the individual state and its community. The same results are also evident in the top protective factors. Being Hispanic or Asian is a protective factor across all four states, with the food environmental index being common only between Arizona and New Jersey. Apart from these, there were no other common determinants, again underscoring the effect of uniqueness of each community and state on mortality rates. While the rates across these four states vary from 42.5% to 152.7%, the overall trend clearly shows an increase in mortality across all states.
Using MortalityMinder, we can see that Ohio had the largest change, with a 224.5% increase in deaths of despair mortality rate since 2000. Comparatively, Texas had the smallest change, with a 39.5% increase in deaths of despair mortality within the same time frame. Figure 7 shows the prominent determinants of Ohio and Texas. Despite having similar population sizes, the social determinants found for Ohio and Texas are very different. Food insecurity is shared as a destructive factor for both states. Surprisingly, home ownership is protective for Ohio and destructive for Texas. Race/ethnicity also has very different associations. In Texas, the rate non-Hispanic whites in a county is the top destructive factor and the rate of Hispanics in a county is protective. In Ohio, increased rates of African Americans are associated with greater deaths of despair. Clearly, the characteristics of communities at risk for deaths of despair are quite different between Ohio and Texas. These findings are consistent with a nationwide study of death rates for opioid overdose deaths between 1999 to 2017 found that that largest average annual increases in rates occurred among non-Hispanic whites in non-metropolitan areas (13.6% increase per year) and medium-small metropolitan areas (12.3% increase per year), followed by non-Hispanic blacks in medium-small metropolitan areas (11.3% increase per year) [25]. This underscores the importance of MortalityMinder providing state-specific analyses in order to understand social–economic determinants of health in the community context of each state.
We examined if we could use GPT-4 to reason why such differences in mortality rates are observed across states to gain a more comprehensive understanding. We repeatedly presented GPT-4 with the following comparative challenge between the top factors for two states, based on Table 2: “In Washington the top four factors contributing to deaths of despair between 2000 and 2017 were #1 ‘Older than 65’, #2 ‘Diabetes Prevalence’, #3 ‘Non-Hispanic White’, #4 ‘Food Insecure’, whereas in Arizona the top four were #1 ‘Segregation (Black/White)’, #2 ‘Single Parent Household’, #3 ‘Food Insecure’, #4 ‘American Indian/Alaskan Native’. Explain the differences between Washington’s top four factors and Arizona’s”. The exact queries are in Supplementary Material S1.
It avoided giving explanations for specific state-to-state differences, but reported that generally, such differences depend on socioeconomic conditions, policies, healthcare access, and more. GPT-4 leaned into explaining why specific factors were important for the given states; for example, it explained that Washington has high rates of opioid deaths and limited mental health services in rural areas. Similarly, Arizona has a significant population of Native American people who have higher rates of substance abuse and suicide; GPT-4’s exact response is in Supplementary Material S1.
We extended our query by asking GPT-4 to provide specific references for its explanations, including DOIs. Generally—typically 80% of the time—it cited relevant articles, but the authors and DOIs provided were unreliable. Repeated queries led to similar but not identical explanations. We found GPT-4’s justifications of its explanations imperfect but promising.

3.3. Sierra County in New Mexico

New Mexico experiences midlife mortality rates far higher than the national average. In New Mexico, the midlife mortality rate increased by 25.6% from 2000 to 2017, whereas the United States as a whole increased 8.2%. For this reason, New Mexico stands out as a state in desperate need of policy intervention to address midlife deaths. The leading factor positively associated with all cause midlife deaths in New Mexico is children in poverty.
Sierra County in New Mexico is at high risk for all causes of midlife deaths. The highest rates of midlife mortality in New Mexico are in Sierra County with approximately 1100 deaths per 100,000. The lowest-risk group cluster of counties in New Mexico has an average rate of 250 deaths per 100,000 while the medium-risk groups are around 625 per 100,000. The leading factors of midlife deaths in Sierra County are children in poverty, free or reduced lunch, socioeconomic status, and mentally unhealthy days.
Sierra County has seen growing rates of all-cause midlife mortality, and has consistently been in the high-risk group from 2000 to 2017. Midlife deaths in Sierra County due to cancer has also risen at alarming rates. They experienced a peak in 2008 with rates as high as 250 per 100,000. The leading factors associated with midlife deaths due to cancer are teen birth rate, primary care physician’s ratio, children in poverty, and single-parent households. Deaths due to cardiovascular disease are also high in Sierra County, and chart higher than the high-risk group cluster average. The leading factors associated with deaths due to Cardiovascular disease are the primary care physician’s ratio, teen birth rate, diabetes prevalence, and mental health provider ratio. The primary care physician’s ratio stands out as a very important determinant across various causes of deaths as most of New Mexico has a low number of primary care physicians [26]. Further, Sierra County has also experienced a shocking spike in deaths of despair mortality in 2009, reaching a peak of nearly 200 deaths per 100,000, compared to the national level of about 25 deaths per 100,000 and the high-risk cluster average in New Mexico of 130 per 100,000.
The visualizations and analysis of MortalityMinder lists the top destructive and protective factors for various states and counties of United States. However, it is essential to reason why these risk factors are particularly prevalent in specific states and identify actionable insights to mitigate them. Thus, we explore this with the help of LLMs in the next case study.

3.4. Driving Alone in Massachusetts

In this case study, we define user prompts for zero-shot analysis using OpenAI’s GPT-4 model. The user prompt is based on a template instantiated with the factor, direction of result, population, and cause of death as captured in MortalityMinder. We use the LLM explanation for the analysis of the risk factor “Driving Alone” for deaths of despair in counties of Massachusetts from 2000 to 2017. MortalityMinder finds a positive association between driving alone and premature mortality due to deaths of despair in Massachusetts. We use zero-shot learning with GPT-4 to explore the reasoning behind this observation. The user prompt starts with the same context definition for all three questions: User: ‘Deaths of Despair’ are deaths due to suicide, overdose, substance abuse and poisonings. ‘Driving Alone’ is the percentage of the workforce that drives alone to work. The prompt is then followed by one of the three questions: WHY, CAUSALITY, or ACTION as defined above. For example, for Action, the prompt is: Consider counties in Massachusetts between 2000 and 2017, what actions could be taken related to ‘Driving Alone’ that are likely to cause a reduction in ‘Deaths of Despair’? The prompt ends with instructions on how the response should be presented. Explain step by step. Provide published evidence. Similar prompts are created for the other two questions, with the complete responses presented in Supplementary Material S1.
Based on the prompt templates above, three types of sample prompts were created to explain the correlation MortalityMinder found between driving alone and deaths of despair. Figure 8 shows the variable parts of the prompts and summaries of the GPT-4 responses. While this connection is not intuitive, GPT-4 offers several sensible explanations for this correlation. For example, driving alone could contribute to feelings of isolation/loneliness, leading to mental health issues, resulting in higher rates of suicide and substance abuse. GPT-4 offers a plausible hypothesis for how reducing driving alone could reduce deaths of despair along with viable suggestions for mitigating the potentially negative effect of driving alone. In its response, GPT-4 found no direct studies, but it did hypothesize plausible mechanisms and provide supporting evidence, including a total of 34 verified citations across nine responses to our sample prompts.
Prompts 4–6 in the Supplement examine the interesting but unexplained fact that increased rates of non-Hispanic whites in a county is associated with increased deaths of despair, while increased Hispanics is associated with decreased deaths of despair. GPT-4 provided very different explanations illustrating that it captured the different directionalities and cultural differences. It correctly indicated that directly changing the rate of demographic groups was not a valid approach for improving deaths of despair. Additionally, the actions suggested were plausible but were largely generic.

4. Discussion

Overall, mortality rates due to deaths of despair have shown the most dramatic increase over time compared to other causes of death. Stein et al. found that deaths of despair have increased dramatically amongst white males between the ages of 25–64, particularly in rural America [3], and MortalityMinder results agree. For instance, in the State of California, MortalityMinder found that the factors associated with mortality due to deaths of despair are living in rural areas, being non-Hispanic white, food insecurity, and many others. An article in the New York Times by Kristof and WuDunn also highlights how Americans in rural areas are dying of despair and the wrong people are receiving the blame for it [27]. The article cites unemployment as the one of the causes for the problem, and again, MortalityMinder agrees. MortalityMinder picked up the percentage of people who are unemployed as one of the top factors for Deaths of Despair in the nation. As we can see from these examples, MortalityMinder helps to identify the factors associated with mortality at both the community and national level, so that policy makers and other responsible stakeholders can take action and address these problems.
During the pandemic, COVID-19 became a top-five cause of death for all age groups in the United States [28]. To capture this, we developed a variant of MortalityMinder called COVIDMINDER to reveal the regional disparities in outcomes, determinants, and mediations of the COVID-19 pandemic [29,30]. Outcomes are the direct effects of COVID-19, determinants (social and economic) are pre-existing risk factors that impact COVID-19 outcomes, and mediations are resources and programs used to combat the pandemic. We have utilized some of the analysis methods that we developed for MortalityMinder to investigate the social determinants of COVID-19 and leverage LLMs to reason these determinants at the county and state-level. Social determinants play a huge role in the geographic disparities in COVID-19 mortality and cases in the United States [31] and how these disparities change over time [32]. MortalityMinder finds community health inequities like race and access to healthcare as significant determinants which also appear to play a role in COVID-19 deaths. Therefore, it is important for stakeholders at the national, state, county, and community levels to investigate these social determinants so that they can address them.
From a public health perspective, a system that employs automated analysis-to-visualization of socioeconomic health determinants could prove incredibly effective, especially when AI is utilized to interpret context and meaning of the results. Our MortalityMinder app produces 636 unique infographics including analysis of the association of 70 varying social and economic determinants. We discovered that GPT-4 was highly effective in explaining and contextualizing the MortalityMinder results. When queried with the structured WHY, CAUSALITY, and ACTION prompts, GPT-4 conveyed meaningful responses with accurate references. However, when posed with more ad hoc questions, such as comparisons between states’ results, responses from GPT-4 were insightful but in some cases more generic. In such instances, GPT-4 occasionally included unverifiable references, suggesting some level of computational “hallucination”. The zero-shot prompting approach used here was not robust enough to handle less-structured queries. We propose that an AI system specifically fine-tuned to the public health domain, combined with techniques such as retrieval-augmented generation methodologies, could yield more reliable and beneficial results. As such, we recommend future research in this promising direction.

5. Conclusions

In this work, we introduced MortalityMinder, a web-based analysis-to-visualization tool that enables the interactive exploration of social, economic, and geographic factors associated with premature mortality among mid-life adults ages 25–64 across the United States. Using authoritative data from the CDC and other sources, MortalityMinder is developed as a freely available, publicly-accessible, and open-source application. The goal of MortalityMinder is to enable healthcare researchers, providers, payers, and policy makers to gain actionable insights into how, where, and why midlife mortality rates are rising in the United States. It is designed to help healthcare payers, providers and policymakers at the national, state, county and community levels identify and address unmet healthcare needs.
We demonstrated how AI methods can help the process of understanding MortalityMinder results on social determinants of mortality and translate them to actionable insights that can help reduce mortality. LLMs empower population-health stakeholders to automatically ask Why, Causality, and Action questions for public health observational studies like MortalityMinder. In our findings, we discovered that the responses generated by GPT-4 do a surprisingly good job of producing insights into MortalityMinder results. The advantage of a study of the socio-economic risk factors is that evaluating the plausibility of answers to WHY, CAUSALITY, and ACTION questions is quite intuitive for investigation. However, at the same time it is very difficult to come up with the answer to these questions substantiated by prior research unless one is an expert in that domain. LLMs are potentially better at this than humans. The explanations generated by LLMs in our tests are more relevant and extensive than the simpler explanations in MortalityMinder. Arguably, authors of PHOS could improve their results and the impact of their papers by simply asking the LLMs, using our approach, to investigate the meaning of their results, and verifying their accuracy. Incorporating LLM results makes PHOS much more powerful and insightful to stakeholders who may be neither experts in the analysis nor the domain.
Eventually, we imagine that PHOS can and will be coupled with LLMs to create powerful, intelligent population health systems. Imagine if population health stakeholders could ask questions in natural language, and then have the LLM translate these into PHOS; conduct the studies, including generating and running the code; analyze the meaning of the results; propose mitigations; and write a report summarizing the results. Integrating visualizations in infographics would also assist in comprehending the studies more effectively. Users could effortlessly engage with chatbots to dynamically ask public health questions based on the data, regions, determinants, and time periods of interest. However, it is crucial to retain a human within the loop to validate the results, maintain scientific rigor, and interpret results appropriately. There is a need for continued research to ensure the reliability and precision of these AI-powered public health systems. Maintaining trust and accuracy should be primary considerations while developing such advanced systems.
The social determinants of health, and more generally, population health, are excellent domains for LLM. Intelligent population health analysis could utilize the variety of readily accessible population health data sets, along with the wealth of papers (for instance, those found on PubMed) that use standard analysis methods. Given the recent research in code generation [33,34] and the results on interpreting the outcomes of PHOS, the capability for end-to-end PHOS, including comprehending the significance of results, seems entirely plausible. The availability of such systems could significantly speed up the breadth, pace, rigor, and quality of population health findings based on PHOS. But improvements in the reliability, reproducibility, and accuracy of LLMs are needed before this vision is fully realized.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/info15050254/s1, File S1: GPT-4 Prompts and Results.

Author Contributions

Conceptualization, K.B., J.S.E. and K.P.B.; methodology, K.B., J.S.E. and K.P.B.; software, K.B., J.S.E. and K.P.B.; validation, K.B., J.S.E. and K.P.B.; formal analysis, K.B., J.S.E. and K.P.B.; investigation, K.B., J.S.E. and K.P.B.; resources, J.S.E. and K.P.B.; data curation, K.B., J.S.E. and K.P.B.; writing—original draft preparation, K.B., J.S.E. and K.P.B.; writing—review and editing, K.B., J.S.E. and K.P.B.; visualization, K.B., J.S.E. and K.P.B.; supervision, J.S.E. and K.P.B.; project administration, J.S.E. and K.P.B.; funding acquisition, J.S.E. and K.P.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by a grant from the United Health Foundation.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data and code analyzed in MortalityMinder are available on its GitHub repository: https://github.com/TheRensselaerIDEA/MortalityMinder (accessed on 11 April 2024).

Acknowledgments

MortalityMinder and COVIDMINDER was created by students in the Rensselaer Data INCITE Lab with support from the United Health Foundation and the Rensselaer Institute for Data Exploration and Applications (IDEA). We thank Resnsselaer students Lilian Ngweta, Jocelyn McConnon and Skye Jacobson for helping to draft an early version of this paper. The MortalityMinder Team would like to thank our advisory board, including Anne Yau, United Health Foundation; Dan Fabius, Continuum Health; Melissa Kamal, New York State Department of Health; and Tom White, Capital District Physicians’ Health Plan (CDPHP). We would also like to thank the communication and design professionals at Rensselaer Polytechnic Institute, who helped with design.

Conflicts of Interest

The authors declare no conflicts of interest. Anne Yau of United Health Foundation did serve on our advisory board, which provided feedback on versions of MortalityMinder. The funders had no other role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
GPTGenerative Pretrained Transformer
LLMLarge Language Model
PHPopulation Health
PHOSPublic Health Observational Studies

References

  1. Case, A.; Deaton, A. Rising morbidity and mortality in midlife among white non-Hispanic Americans in the 21st century. Proc. Natl. Acad. Sci. USA 2015, 112, 15078–15083. [Google Scholar] [CrossRef] [PubMed]
  2. Hood, C.M.; Gennuso, K.P.; Swain, G.R.; Catlin, B.B. County health rankings: Relationships between determinant factors and health outcomes. Am. J. Prev. Med. 2016, 50, 129–135. [Google Scholar] [CrossRef] [PubMed]
  3. Stein, E.M.; Gennuso, K.P.; Ugboaja, D.C.; Remington, P.L. The epidemic of despair among White Americans: Trends in the leading causes of premature death, 1999–2015. Am. J. Public Health 2017, 107, 1541–1547. [Google Scholar] [CrossRef] [PubMed]
  4. Yancy, C.W. COVID-19 and African Americans. JAMA 2020, 323, 1891–1892. [Google Scholar] [CrossRef] [PubMed]
  5. Ratnayake, I.; Pepper, S.; Anderson, A.; Alsup, A.; Mudaranthakam, D.P. An R Shiny Application (SDOH) for Predictive Modeling Using Regional Social Determinants of Health Survey Responses. Int. J. Soc. Determ. Health Health Serv. 2024, 54, 21–27. [Google Scholar] [CrossRef] [PubMed]
  6. Kino, S.; Hsu, Y.T.; Shiba, K.; Chien, Y.S.; Mita, C.; Kawachi, I.; Daoud, A. A scoping review on the use of machine learning in research on social determinants of health: Trends and research prospects. SSM-Popul. Health 2021, 15, 100836. [Google Scholar] [CrossRef] [PubMed]
  7. Ong, J.C.L.; Seng, B.J.J.; Law, J.Z.F.; Low, L.L.; Kwa, A.L.H.; Giacomini, K.M.; Ting, D.S.W. Artificial intelligence, ChatGPT, and other large language models for social determinants of health: Current state and future directions. Cell Rep. Med. 2024, 5, 101356. [Google Scholar] [CrossRef] [PubMed]
  8. McNeill, E.; Lindenfeld, Z.; Mostafa, L.; Zein, D.; Silver, D.; Pagán, J.; Weeks, W.B.; Aerts, A.; Des Rosiers, S.; Boch, J.; et al. Uses of Social Determinants of Health Data to Address Cardiovascular Disease and Health Equity: A Scoping Review. J. Am. Heart Assoc. 2023, 12, e030571. [Google Scholar] [CrossRef] [PubMed]
  9. Friede, A.; Reid, J.A.; Ory, H.W. CDC WONDER: A comprehensive on-line public health information system of the Centers for Disease Control and Prevention. Am. J. Public Health 1993, 83, 1289–1294. [Google Scholar] [CrossRef]
  10. AHRQ. Announcing the Winners of AHRQ’s Visualization of Community-Level Social Determinants of Health Challenge; AHRQ: Rockville, MD, USA, 2020.
  11. OpenAI. GPT-4 Technical Report. arXiv 2023, arXiv:2303.08774. [Google Scholar]
  12. Centers for Disease Control and Prevention. CDC Behavioral Risk Factor Surveillance System; CDC: Atlanta, GA, USA, 2020. Available online: https://www.cdc.gov/brfss/index.html (accessed on 28 November 2020).
  13. U.S. Bureau of Labor Statistics. U.S. Bureau of Labor Statistics Web Site; US-BLS: Washington, DC, USA, 2020. Available online: https://www.bls.gov/ (accessed on 28 November 2020).
  14. U.S. Federal Bureau of Investigation. National Archive of Criminal Justice Data Uniform Crime Reporting Program Data Series; University of Michigan: Ann Arbor, MI, USA, 2020; Available online: https://bit.ly/2VGevMs (accessed on 16 December 2020).
  15. Honaker, J.; King, G.; Blackwell, M. Amelia II: A Program for Missing Data. J. Stat. Softw. 2011, 45, 1–47. [Google Scholar] [CrossRef]
  16. IDEA MortalityMinder; IDEA: Troy, NY, USA, 2020.
  17. Haynes, W. Benjamini–Hochberg Method. In Encyclopedia of Systems Biology; Dubitzky, W., Wolkenhauer, O., Cho, K.H., Yokota, H., Eds.; Springer: New York, NY, USA, 2013; p. 78. [Google Scholar] [CrossRef]
  18. Chang, W.; Cheng, J.; Allaire, J.; Xie, Y.; McPherson, J. Shiny: Web Application Framework for R, R package version; Packt Publishing Ltd.: Birmingham, UK, 2017; Volume 1. [Google Scholar]
  19. RinteRface. fullPage. Available online: https://github.com/RinteRface/fullPage (accessed on 11 April 2024).
  20. Álvaro. fullPage.js. Available online: https://github.com/alvarotrigo/fullPage.js (accessed on 11 April 2024).
  21. ChatBS: A Context-Aware LLM Exploratory Sandbox (Site). Available online: https://tw.rpi.edu/project/chatbs-context-aware-llm-exploratory-sandbox (accessed on 6 December 2023).
  22. ChatBS: A Context-Aware LLM Exploratory Sandbox (App). Available online: https://inciteprojects.idea.rpi.edu/chatbs/app/chatbs/ (accessed on 6 December 2023).
  23. Cohn, M. Coronavirus Pandemic Drives Unemployment to Record High. Accounting Today. 2020. Available online: https://www.accountingtoday.com/news/unemployment-reaches-record-high-from-coronavirus-pandemic (accessed on 11 April 2024).
  24. Remington, T.F. The COVID-19 Pandemic and “Rising Deaths of Despair” in the United States. In COVID-19 Pandemic: Problems Arising in Health and Social Policy; Springer: Berlin/Heidelberg, Germany, 2023; pp. 57–71. [Google Scholar]
  25. Lippold, K.; Ali, B. Racial/ethnic differences in opioid-involved overdose deaths across metropolitan and non-metropolitan areas in the United States, 1999–2017. Drug Alcohol Depend. 2020, 212, 108059. [Google Scholar] [CrossRef] [PubMed]
  26. Winnegar, A. Difficulties of Finding a Primary Care Doctor. Santa Fe New Mexican. 2020. Available online: https://www.santafenewmexican.com/life/family/difficulties-of-finding-a-primary-care-doctor/article_9692e624-dd93-11ea-ae9e-bfa19b7231a8.html (accessed on 11 April 2024).
  27. Kristof, N.; Wudunn, S. Who Killed the Knapp Family? The New York Times. 9 January 2020. Available online: https://www.nytimes.com/2020/01/09/opinion/sunday/deaths-despair-poverty.html (accessed on 11 April 2024).
  28. Shiels, M.S.; Haque, A.T.; Berrington de González, A.; Freedman, N.D. Leading Causes of Death in the US During the COVID-19 Pandemic, March 2020 to October 2021. JAMA Intern. Med. 2022, 182, 883–886. [Google Scholar] [CrossRef] [PubMed]
  29. COVIDMINDER (WEBSITE): Revealing the Regional Disparities in Outcomes, Determinants, and Mediations of the COVID-19 Pandemic. Available online: https://idea.rpi.edu/research/projects/covidminder (accessed on 11 December 2023).
  30. COVIDMINDER (APPLICATION): Revealing the Regional Disparities in Outcomes, Determinants, and Mediations of the COVID-19 Pandemic. Available online: https://inciteprojects.idea.rpi.edu/apps/covidminder/ (accessed on 11 December 2023).
  31. Debopadhaya, S.; Sprague, A.D.; Mou, H.; Benavides, T.L.; Ahn, S.M.; Reschke, C.A.; Erickson, J.S.; Bennett, K.P. Social Determinants Associated with COVID-19 Mortality in the United States. medRxiv 2020. medRxiv:28.20183848. [Google Scholar] [CrossRef]
  32. Debopadhaya, S.; Erickson, J.S.; Bennett, K.P. Temporal Analysis of Social Determinants Associated with COVID-19 Mortality. medRxiv 2021. medRxiv:22.21258971. [Google Scholar] [CrossRef]
  33. Chen, M.; Tworek, J.; Jun, H.; Yuan, Q.; de Oliveira Pinto, H.P.; Kaplan, J.; Edwards, H.; Burda, Y.; Joseph, N.; Brockman, G.; et al. Evaluating Large Language Models Trained on Code. arXiv 2021, arXiv:2107.03374. [Google Scholar]
  34. Vaithilingam, P.; Zhang, T.; Glassman, E.L. Expectation vs. & Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models. In Proceedings of the Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (CHI EA’22), New York, NY, USA, 30 April–5 May 2022. [Google Scholar] [CrossRef]
Figure 1. Deaths of despair nationwide analysis: (a) Counties across the United States are clustered into 6 risk groups (1: low to 6: high) based on the deaths of despair mortality trends; (b) the risk group trends are plotted against the national average, demonstrating the rise of mortality rates across years.
Figure 1. Deaths of despair nationwide analysis: (a) Counties across the United States are clustered into 6 risk groups (1: low to 6: high) based on the deaths of despair mortality trends; (b) the risk group trends are plotted against the national average, demonstrating the rise of mortality rates across years.
Information 15 00254 g001
Figure 2. Flow of data through the MortalityMinder application. (a) Data are collected from Country Health Rankings (social determinants) and CDC WONDER (mortality rates). Missing data are imputed, data are clustered into 3-year chunks and risk groups are identified using K-Means. (b) The imputed mortality rates for a given three-year period (e.g., 2015–2017) are plotted as a choropleth plot, with a darker color representing higher mortality. (c) Each state (e.g., Ohio) can then be further explored based on the risk group clusters: high (red), medium (orange), and low (yellow). (d) Finally, top social determinants are listed as destructive or protective factors based on their Kendall correlation between the factors and the ordered mortality risk groups.
Figure 2. Flow of data through the MortalityMinder application. (a) Data are collected from Country Health Rankings (social determinants) and CDC WONDER (mortality rates). Missing data are imputed, data are clustered into 3-year chunks and risk groups are identified using K-Means. (b) The imputed mortality rates for a given three-year period (e.g., 2015–2017) are plotted as a choropleth plot, with a darker color representing higher mortality. (c) Each state (e.g., Ohio) can then be further explored based on the risk group clusters: high (red), medium (orange), and low (yellow). (d) Finally, top social determinants are listed as destructive or protective factors based on their Kendall correlation between the factors and the ordered mortality risk groups.
Information 15 00254 g002
Figure 3. MortalityMinder Application National View: distribution of deaths of despair mortality rates across United States and its comparison with Massachusetts.
Figure 3. MortalityMinder Application National View: distribution of deaths of despair mortality rates across United States and its comparison with Massachusetts.
Information 15 00254 g003
Figure 4. MortalityMinder Application State View: mortality rates and risk groups for counties of Massachusetts for deaths of despair.
Figure 4. MortalityMinder Application State View: mortality rates and risk groups for counties of Massachusetts for deaths of despair.
Information 15 00254 g004
Figure 5. MortalityMinder Application Factor View: exploring diabetes prevalence as a prominent social determinant for deaths of despair mortality in Massachusetts.
Figure 5. MortalityMinder Application Factor View: exploring diabetes prevalence as a prominent social determinant for deaths of despair mortality in Massachusetts.
Information 15 00254 g005
Figure 6. MortalityMinder application documentation about the application with links to data and application code.
Figure 6. MortalityMinder application documentation about the application with links to data and application code.
Information 15 00254 g006
Figure 7. Comparing the top destructive and protective social determinants across the states with (a) the highest increase in deaths of despair mortality (Ohio) and (b) the lowest increase in deaths of despair mortality (Texas).
Figure 7. Comparing the top destructive and protective social determinants across the states with (a) the highest increase in deaths of despair mortality (Ohio) and (b) the lowest increase in deaths of despair mortality (Texas).
Information 15 00254 g007
Figure 8. Variable part of prompts for WHY, CASUALTY, and ACTION questions with summaries of responses from GPT-4. See Supplement Prompts 1–3 for full text.
Figure 8. Variable part of prompts for WHY, CASUALTY, and ACTION questions with summaries of responses from GPT-4. See Supplement Prompts 1–3 for full text.
Information 15 00254 g008
Table 1. Social determinants affecting deaths of despair mortality across the United States as indicated by Kendall correlation with risk groups. A destructive determinant is positively correlated, while a protective determinant is negatively correlated.
Table 1. Social determinants affecting deaths of despair mortality across the United States as indicated by Kendall correlation with risk groups. A destructive determinant is positively correlated, while a protective determinant is negatively correlated.
RelationshipNameCorrelation
DestructiveMentally unhealthy days0.22
DestructiveFrequent mental distress0.19
DestructivePhysically unhealthy days0.17
DestructivePct unemployed0.15
DestructiveAdult smoking0.15
DestructiveSegregation (black/white)0.15
DestructiveFreq. physical distress0.15
DestructiveMental health prov. rate0.14
DestructiveSocioeconomic0.14
DestructiveOther prim. care prov. rate0.12
DestructiveNon-Hispanic white0.11
DestructiveDiabetes prevalence0.11
DestructiveInsufficient sleep0.11
ProtectiveYounger than 18−0.11
ProtectiveMental health prov. ratio−0.13
Table 2. Top four destructive and top four protective social determinants for Washington, Arizona, New Jersey, and Massachusetts. Ranking determined by Kendall correlation of determinant with risk groups.
Table 2. Top four destructive and top four protective social determinants for Washington, Arizona, New Jersey, and Massachusetts. Ranking determined by Kendall correlation of determinant with risk groups.
StateDestructiveProtective
Washington#1 Older than 65#1 Younger than 18
#2 Diabetes prevalence#2 Not proficient in English
#3 Non-Hispanic white#3 Hispanic
#4 Food-insecure#4 Sexual trans. infect.
Arizona#1 Segregation (black/white)#1 Food environment index
#2 Single-parent household#2 Hispanic
#3 Food-Insecure#3 Native Hawaiian Islander
#4 American Indian/Alaskan Native#4 Air quality
New Jersey#1 Mentally unhealthy Days#1 Food environment index
#2 Limited access to healthy food#2 80th percentile income
#3 Pct unemployed#3 Asian
#4 Adult smoking#4 Dentist rate
Massachusetts#1 Diabetes prevalence#1 Some college
#2 Socioeconomic#2 Prim. care physicians
#3 Driving alone#3 Flu vaccinated
#4 Disconnected youth#4 Asian
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bhanot, K.; Erickson, J.S.; Bennett, K.P. MortalityMinder: Visualization and AI Interpretations of Social Determinants of Premature Mortality in the United States. Information 2024, 15, 254. https://doi.org/10.3390/info15050254

AMA Style

Bhanot K, Erickson JS, Bennett KP. MortalityMinder: Visualization and AI Interpretations of Social Determinants of Premature Mortality in the United States. Information. 2024; 15(5):254. https://doi.org/10.3390/info15050254

Chicago/Turabian Style

Bhanot, Karan, John S. Erickson, and Kristin P. Bennett. 2024. "MortalityMinder: Visualization and AI Interpretations of Social Determinants of Premature Mortality in the United States" Information 15, no. 5: 254. https://doi.org/10.3390/info15050254

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop