A Multiyear Model of Influenza Vaccination in the United States

Vaccinating adults against influenza remains a challenge in the United States. Using data from the Centers for Disease Control and Prevention, we present a model for predicting who receives influenza vaccination in the United States between 2012 and 2014, inclusive. The logistic regression model contains nine predictors: age, pneumococcal vaccination, time since last checkup, highest education level attained, employment, health care coverage, number of personal doctors, smoker status, and annual household income. The model, which classifies correctly 67 percent of the data in 2013, is consistent with models tested on the 2012 and 2014 datasets. Thus, we have a multiyear model to explain and predict influenza vaccination in the United States. The results indicate room for improvement in vaccination rates. We discuss how cognitive biases may underlie reluctance to obtain vaccination. We argue that targeted communications addressing cognitive biases could be useful for effective framing of vaccination messages, thus increasing the vaccination rate. Finally, we discuss limitations of the current study and questions for future research.


Introduction
Vaccine avoidance is a serious problem, but its severity is not yet recognized by many adults. Their refusal to vaccinate contributes to increased incidence of preventable diseases, such as measles, putting population health in danger [1,2]. Influenza, which the majority of adults survive without serious side effects, can result in hospitalization or death, particularly among the elderly [3].
The United States' Centers for Disease Control and Prevention (CDC) describes the benefits of influenza vaccines for vulnerable populations, such as children, the elderly, the chronically ill, and pregnant women. Influenza vaccination reduces the risk of influenza-related hospitalization for children and the elderly, while preventing chronic disease-based hospitalization for people with health conditions such as diabetes or chronic lung disease. It can also help prevent cardiac events in people with cardiac disease. Influenza vaccination reduces the risk of respiratory infections for pregnant women, and may protect their infants through antibodies passed via breast milk. Finally, the CDC points out that influenza vaccination helps not only the recipient but also the community through herd immunity.
The CDC recommends influenza vaccination for everyone 6 months of age and older. The only contraindications are allergies to the influenza vaccine, eggs, or any of its ingredients, and a Guillain-Barré Syndrome (GBS) occurrence in the six weeks prior to influenza vaccination. Those who experience only hives as an allergic reaction to eggs are recommended to nevertheless receive the vaccine. Individuals with non-life threatening allergic reactions other than hives should be vaccinated under medical supervision. GBS is a rare autoimmune disorder associated with rapid onset Int. J. Environ. Res. Public Health 2017, 14, 849 2 of 14 muscle weakness, and sometimes paralysis [3]. Individuals who have a recent history of GBS, who are also at high risk for influenza complications, should nevertheless receive the influenza vaccination in some cases. Considering the rarity of GBS and egg allergies in adults, as well as the sub-populations within these groups who should nevertheless receive influenza vaccination, the vast majority of adults should obtain it annually [3].
There are many contributing factors to the problem of failure to vaccinate. The common factors include the following: vaccine efficacy vs. the unpleasantness of the vaccination process or side-effects, the perceived adequacy of one's natural immunity against viruses, the cost in time or money, or a distrust in the medical establishment or pharmaceutical companies [1]. Many people may be complacent about influenza, believing their good current health is a strong enough defense against viruses [4]. Some may not be persuadable, even with the prospect of a severe influenza season and a steady campaign of appeals to personal health, public health, and the desire to protect one's family. Other people, conversely, will obtain a vaccination readily, with only one or two reminders from their doctor [5]. Finally, there are in-between people, those who need several reminders from various sources.
A review of the literature on influenza vaccination determined the most common reasons given for (not) obtaining it. Descriptively, the top three reasons for seeking influenza vaccinations: self-protection, protection of family or community, and protecting the patients of health care workers. The top three reasons for declining influenza vaccination: perception that one is healthy and not at risk [6], skepticism regarding vaccine efficacy [7], and unpleasant side-effects. In sum, there are many factors contributing to the public's beliefs, attitudes, intentions, and ultimately whether a vaccination is obtained.
Given so many plausible contributing factors, it would be useful to model rigorously the variety of factors contributing to the likelihood of receiving a seasonal influenza vaccination. To better communicate with the segment of the population who may be persuaded, we need to understand the variety of factors weighing on the decision to vaccinate, including demographics (education level, income level, age, and gender), behavior (nutrition, exercise, smoking, etc.), health care coverage, and current health status.
In this paper, we present a classification model about whether a person receives a flu vaccination. The data is from the CDC's Behavioral Risk Factor Surveillance system (BRFSS) annual survey, a nationwide phone-based survey about health-related behaviors on adults [8]. The response variable is the binary variable Influenza Vaccination, answering the question "During the past 12 months, have you had either a flu shot or a flu vaccine?" We review our methods in Section 2 and show the results in Section 3. In Section 4, we discuss the results, some implications, possible communication strategies, and some ideas for further research. Section 5 concludes with a summary of the paper.

Methods
The BRFSS data is publically available online [8]. It is a health-related phone survey of adults (at least 18 years of age), and conducted in fifty states, the District of Columbia, and three U.S. territories. State health departments use in-house interviewers or contract with universities or telephone call centers to administer the surveys throughout the year. It is the largest continuous health survey in the world, and is offered in both English and Spanish. Participants are administered a standardized core questionnaire, as well as optional modules and state-specific questions. The survey is conducted using Random Digit Dialing on cell phones and land lines.
We used three recent years' data of BRFSS: 2012 to 2014. First, the 2013 dataset was used to build the model. Specifically, a logistic regression was developed on 60% of the data from 2013, validated on a holdout sample of 20% of the data, and tested on the final 20% of the data. This is consistent with standard data mining practices to avoid model overfitting [9]. The best 2013 model was tested further on the 2012 and 2014 BRFSS datasets to assess whether predictors of vaccination changed over time. All data (sub)sampling was random, and there was no evidence of significant collinearity among the nine predictors in any model. Ultimately, the model was stable during the 2012-2014 time period.
We initially gathered data from the 2013 BRFSS dataset (n = 491,773) and performed transformations to clean and standardize the data. Some non-Gaussian numerical variables were transformed to be more Gaussian with a mathematical function, e.g., logarithm. Almost all the variables had a small fraction of Don't Know/Not Sure, Refused to Answer, or Missing/No Answer/Blank. In our model, we grouped those responses into one category: Missing. For the variable Pneumococcal Vaccination, the proportion of Don't Know/Not Sure was almost 10%, so we left that category as is. Ultimately, out of 359 possible variables, using knowledge of the domain, 75 variables were deemed to be clean, complete, and usable for modeling, i.e., having significant variability.
We conducted a standard logistic regression, starting with the initial set of 75 usable predictor variables, and the binary response variable Influenza Vaccination [10]. The response variable is nearly balanced, e.g., 46% Yes vs. 54% No in the 2013 data. The fitted model predicted True if the fitted probability exceeded 0.5 and False otherwise. Our methodology for variable selection was stepwise: Forward Selection (Conditional), maximizing adjusted R-Squared, until the final model was obtained. The final model of nine predictors maximized predictive power (adjusted R-Squared), contained only statistically significant predictors, and contained little collinearity (VIF < 5 for each predictor). The response variable and the nine predictors are found in Table 1.  In Figure 1, the resulting Confusion Matrix and graph of Area Under the receiver operating Characteristic (AUC) curve show the predictive accuracy of the models. Figure 1b shows that sixty-seven percent of the cases were classified correctly as receiving influenza vaccination (27%) or not (40%), based on the nine predictors. The red line shows a smooth tradeoff between the false positive rate and the true positive rate. The gray diagonal line represents chance classification. The AUC for 2013 is 72% [12]. The same nine-predictor model applied to the 2012 and 2014 datasets yielded similar confusion matrices and AUC Curves (Figure 1a,c) [5,6]. The overall trend in vaccination is shown by the three confusion matrices. From 2012 to 2014, people receive an influenza vaccination at approximately the same rate, 44% (2012) to 46% (2013) to 47% (2014), whereas the model correctly classifies at a level of 67-68% across the three years.
Overall, Table 3 shows the nine predictors, ranked by mean analysis of deviance, across the three years. This shows the relative importance of the nine predictors in a given year as well as relative importance from year to year. In Figure 1, the resulting Confusion Matrix and graph of Area Under the receiver operating Characteristic (AUC) curve show the predictive accuracy of the models. Figure 1b shows that sixty-seven percent of the cases were classified correctly as receiving influenza vaccination (27%) or not (40%), based on the nine predictors. The red line shows a smooth tradeoff between the false positive rate and the true positive rate. The gray diagonal line represents chance classification. The AUC for 2013 is 72% [12]. The same nine-predictor model applied to the 2012 and 2014 datasets yielded similar confusion matrices and AUC Curves (Figure 1a,c) [5,6]. The overall trend in vaccination is shown by the three confusion matrices. From 2012 to 2014, people receive an influenza vaccination at approximately the same rate, 44% (2012) to 46% (2013) to 47% (2014), whereas the model correctly classifies at a level of 67-68% across the three years.
Overall, Table 3 shows the nine predictors, ranked by mean analysis of deviance, across the three years. This shows the relative importance of the nine predictors in a given year as well as relative importance from year to year.   We see stability among the predictors, with some variation in the rankings. Age, Pneumococcal Vaccination, and Time Since Last Checkup are the top three predictors, and they never change rank.  Table 4 shows the details of the logistic models for the three years. We consider each of the nine predictors and interpret the results. For each categorical variable, the referent category is the first one, which has a beta coefficient of zero in the table. We have omitted the Missing category. We did so because even as an aggregate category, the category's proportion of the data for each variable was small, and the result would be less interesting to interpret.
Age: Age is the only numerical variable in the model. For every additional year of age, the odds ratio is 1.015, meaning the odds increase by 1.5% for having received an influenza vaccination.
Pneumococcal Vaccination: The referent category is Yes, received a pneumococcal vaccination. If the respondent instead receives no pneumococcal vaccination, the odds of an influenza vaccination decrease strongly with an odds ratio of 0.365, which means a decrease of 1 − 0.365 = 63.5% in the   Table 4 shows the details of the logistic models for the three years. We consider each of the nine predictors and interpret the results. For each categorical variable, the referent category is the first one, which has a beta coefficient of zero in the table. We have omitted the Missing category. We did so because even as an aggregate category, the category's proportion of the data for each variable was small, and the result would be less interesting to interpret.
Age: Age is the only numerical variable in the model. For every additional year of age, the odds ratio is 1.015, meaning the odds increase by 1.5% for having received an influenza vaccination.
Pneumococcal Vaccination: The referent category is Yes, received a pneumococcal vaccination. If the respondent instead receives no pneumococcal vaccination, the odds of an influenza vaccination decrease strongly with an odds ratio of 0.365, which means a decrease of 1 − 0.365 = 63.5% in the odds. If the respondent does not know/is not sure whether a pneumococcal vaccination was received, the odds ratio is 0.456 compared to the referent category.
Time Since Last Checkup: The referent category is having a checkup less than one year ago. The other categories show odds ratios (each compared to the referent case) steadily decreasing as the time since last checkup grows longer. If the last checkup was between 1 and 2 years ago, the odds decrease strongly compared to the referent case with an odds ratio of 0.716, which means a decrease of 1 − 0.716 = 28.4% in the odds. If the last checkup was between 2 and 5 years ago, the odds decrease further with an odds ratio of 0.579 compared to the referent case. If the last checkup was greater than 5 years ago, the odds decrease further (odds ratio = 0.452 compared to the referent case). The exception to this trend is the case of never having received a checkup, which is rare (< 1% of respondents). In this case, the odds ratio (0.637 compared to the referent case) is between that of a checkup 1-2 years ago and a checkup 2-5 years ago.
Number of Personal Doctors: The referent category is having exactly one personal doctor. If the respondent has more than one personal doctor, the difference is not statistically significant. If the respondent has no personal doctor, the odds of an influenza vaccination decrease strongly with an odds ratio of 0.657, which means a decrease of 34.3% over having exactly one personal doctor.
Annual Household Income: The categories are ordinal, and income of $10,000 or less was used as the referent category. If the respondent has an annual household income of $10,000-$15,000, the difference is not statistically significant. Starting at the next bracket (category of income >$15,000), the odds of an influenza vaccination gradually increase as the income increases: from an odds ratio of 1.054 for the income level of $15,000-$20,000, up to an odds ratio of 1.447 for the income level of greater than $75,000 (all odds ratios are compared to the referent case; see Table 4 for all categories).
Smoking Status: The referent category is smoking Every Day. If the respondent only smokes Some Days, the odds of an influenza vaccination increase with an odds ratio of 1.134, which means an increase of 13.4% over smoking every day. If the respondent is a former smoker (ex-smoker), the odds of an influenza vaccination increase more strongly with an odds ratio of 1.416 compared to the referent case, which is an additional 28.2% over smoking only some days. If the respondent has never smoked, the odds increase is not as much as the former smokers, with an odds ratio of 1.372 compared to the referent case.
Health Care Coverage: The referent category is Yes, having some kind of health care coverage, private or public. If the respondent answers No, the odds of an influenza vaccination decrease strongly with an odds ratio of 0.635, which means a decrease of 36.5% over having some kind of coverage.
Employment: The referent category is Employed for Wages. If the respondent is Self-employed, the odds of an influenza vaccination decrease strongly with an odds ratio of 0.629 compared to the referent case, which means a decrease of 1 − 0.629 = 37.1% in the odds. If the respondent is out of work for more than 1 year, the odds decrease with an odds ratio of 0.847 compared to the referent case, which means a decrease of 1 − 0.847 = 15.3% over being Employed for Wages. The cases of being out of work for less than 1 year and being a homemaker show odds decrease quite similar to being out of work for more than 1 year, with odds ratios at 0.865 and 0.870, respectively, compared to the referent case. The only category more likely to have an influenza vaccination is Student. If that is the category, the odds increase slightly with an odds ratio of 1.077, which means an increase of 7.7% over being Employed for Wages. If the respondent is Retired, the odds decrease slightly with an odd ratio of 0.972, which means a decrease of 2.8% over being Employed for Wages. The category of Unable to Work is not statistically different from being Employed for Wages.
Highest Level of Education Attained: The referent category is Did Not Graduate High School. If the respondent is a high school graduate, the odds of an influenza vaccination decrease slightly with an odds ratio of 0.964 compared to the referent case, with a marginally significant p-value between 0.01 and 0.05. If the respondent has attended but not graduated from college or technical school, the difference is not statistically significant from the referent case. If the respondent is a College or Technical School Graduate, the odds increase strongly with an odds ratio of 1.275 compared to the referent case.

Discussion
We revisit the nine predictors and interpret the results broadly. Age: Our results show an overall increase in the odds of influenza vaccination as one ages. Elderly individuals have more medical problems, e.g., stroke, myocardial infarction, pneumonia, chronic respiratory tract infections; thus they see a doctor more frequently, thereby receiving messages that they are at greater risk of complications from influenza [13][14][15][16]. Note: regarding influenza between 2010 and 2013, 54-70% of hospitalizations and 71-85% of deaths occurred among adults aged ≥65 years, adding to the significance of doctor-patient communications [17]. The elderly are thus reminded, perhaps frequently, that an influenza vaccination is effective at reducing and preventing morbidity and mortality [18]. The age effect is stable from 2012 to 2014, inclusive.
Pneumococcal Vaccination: The Pneumococcal Vaccination effect is the second most statistically significant of the nine predictors and it correlates positively with influenza vaccination. This could indicate convenience, obtaining both vaccinations in the same medical/clinical session. It could also indicate the lack of needle-phobia [19]. Note: the CDC's policy for adults is that it recommends pneumococcal vaccination for all adults 65 years or older, adults who smoke, and others with certain medical conditions. The Pneumococcal Vaccination effect is stable from 2012 to 2014, inclusive.
Time Since Last Checkup: Time since one's last checkup is a strong and consistent predictor of influenza vaccination. In general, the more recent a checkup, the greater the odds of an influenza vaccination. This could be a result of the reminder frequency or a recency effect. The one exception is adults who have never had a checkup. An adult in that category is somewhat more likely to receive an influenza vaccination than someone who has gone more than five years without a checkup. The Time Since Last Checkup effect is stable from 2012 to 2014, inclusive.
Number of Personal Doctors: Those with no personal doctor are less likely to receive influenza vaccination than those with one or more personal doctors. One does not need a personal doctor to obtain an influenza vaccination, however [20]. Many clinics, local government offices, and pharmacies provide the vaccination at little to no cost. This may indicate that someone without a personal doctor may be disinclined to seek medical attention or may be misinformed about the benefits of vaccination [21]. The effect is stable from 2012 to 2014, inclusive.
Annual Household Income: The models show the trend that greater income corresponds to greater odds of influenza vaccination, particularly at the higher income levels. This can be interpreted simply, that greater annual household income may indicate greater educational attainment and thereby obtaining information from reliable sources [22]. The Annual Household Income effect is stable from 2012 to 2014, inclusive, in most of the categories.
Smoking Status: The models show that in general, the less one smokes, the more likely it is that an influenza vaccination is obtained. It is interesting to note the exception: that an ex-smoker is even more likely to obtain vaccination than someone who has never smoked. A possible explanation may be that someone who quits smoking has consciously achieved a difficult health goal, whereas someone who has never smoked has not achieved that difficult goal. The ex-smokers are thus taking greater deliberate care of themselves, whereas those who have never smoked may be complacent [4]. The Smoking Status effect is stable from 2012 to 2014, inclusive.
Health Care Coverage: Those with any form of Health Care Coverage, private or public, are much more likely to receive influenza vaccination. A straightforward explanation is that the cost of the vaccination would not be a concern for those having coverage. The Health Care Coverage effect is stable from 2012 to 2014, inclusive.
Employment: The results in this variable can be interpreted from both an infection and a communication perspective. Employed for Wages, the most common category, is one in which adults are usually exposed to many people at work. There are therefore two possible factors that increase the odds of vaccination in those adults: verbal reminders (to vaccinate) and viral sources (if they fail to do so). All the other categories expose an adult to less of these two sources, except perhaps for the Student category. Students are usually exposed to even more reminders and viruses, when on a university campus, than someone working for wages, but the Student category is not statistically significant in 2012 or 2014. Nevertheless, this pattern may suggest that availability of information and social interaction/contagion, are important considerations. The Employment effect is stable from 2012 to 2014, inclusive, in most of the categories.
Highest Level of Education Attained: The categories of High School Graduate and College or Technical School Attendee both have odds ratios close to 1 and p-values larger than 0.01. It is reasonable to consider these two categories combined with the referent case, so that this variable is simplified to two levels: College or Technical School Graduate vs. lower level of education attained. It can be seen that college or technical school graduates are significantly more likely to receive influenza vaccination. The reasons may be that they are more receptive to medical advice, more information literate [23], or less swayed by misinformation found on the Internet [21]. The Highest Level of Education Attained effect is stable for College or Technical School Graduate from 2012 to 2014, inclusive.

Implications and Future Research
We address some implications of our findings, as well as communication strategies based on cognitive bias, limitations of this research, and ideas for future vaccination research.
Our results contain several findings that are not surprising. One is more likely to receive an influenza vaccination if one is older, has health care coverage, or has more frequent checkups, etc. Some readily apparent implications are suggested. For example, health communications in physician offices and pharmacies, e.g., posters or pamphlets, need to be appropriate to the age of the people likely to encounter them. Also, they should be designed for a lower education level to reach those who are less likely to vaccinate [24]. Some implications are less readily apparent. For example, one could tailor communications differently to the various employment categories: those who are Employed for Wages, Students, Homemakers, Unemployed, Self-Employed, Retired, or Unable to Work. Adults in those categories are exposed to different physical and social environments. The overall idea is to immunize the overall population and thereby achieve herd immunity [25,26] by optimizing communications to different subpopulations of the herd. Future research could investigate the tailoring of messages to these categories, and more generally, to people claiming to be too busy to vaccinate [27] or who postpone it repeatedly [28].
This paper shows that graduates of college or technical school are more likely to receive influenza vaccination, but even they can fail to receive a vaccination. When uninformed or uncertain, people-whether highly educated or not-may rely on cognitive biases. Cognitive biases are mental rules-of-thumb, i.e., heuristics that help people to make adequate decisions with a low level of effort. They are not optimal, systematic, or complete in their information processing, but they are usually adequate. For example, there are the Availability Heuristic and the Bandwagon Effect [29].
Availability Heuristic. The CDC, under its 2016 message strategy, referred to influenza as a serious disease that may result in hospitalization and possibly death [3]. This is scientifically accurate, but many people are ignorant or skeptical about the true risk posed by influenza. This may be an example of the Availability Heuristic, the estimation of likelihood based on the retrievability of cases from an individual's memory, which is influenced by social communications. The United States population is currently greater than 324 million, and the upper limit of influenza deaths 1976-2007 is 49,000 in one season; less than 0.001 percent of the population has died. People are therefore much more likely to not know a person who died from influenza than to know a person who has died from it.
Bandwagon Effect. For another example of cognitive bias, if a friend, family member, or coworker says that an influenza vaccination is not efficacious, the Bandwagon Effect may result. The Bandwagon Effect compounds the problem of someone being misinformed/skeptical, because such a person may be surrounded by likeminded people. This would be herd immunity against messages about the importance of influenza vaccination.
Rather than consider cognitive biases to be irrational shortcuts to systemic, logical thinking, or ignore them completely, they could be used as a basis for decision guidance toward normative decision-making. The research question would be how to identify the cognitive bias in the given situation, or given subpopulation, and how to nudge it toward the normative vaccination behavior. Future research could investigate different cognitive biases that may need to be anticipated in order to help public health communication campaigns, whether aimed broadly at the overall population or narrowly at different subpopulations. Which populations can be persuaded to reconsider and rethink their cognitive biases if sent well-framed messages?
Although the BRFSS datasets have been shown to be reliable and valid [30], it is important to keep several limitations in mind. One, this study, although examining 2012-2014 datasets, is multiple-snapshot, cross-sectional research. Cross-sectional designs do not allow for causal inferences. Two, data self-reported in response to a telephone survey could be underreported or biased in some variables, especially in ones with social stigma, e.g., smoking or low household income [3,31]. Finally, institutionalized individuals, including those living in hospitals, nursing homes, or prisons are excluded.
Despite these limitations, our large sample size, rigorous data sampling, and model development with multiple validation and test samples, gives us a stable, multiple-year model of those who participated. There will always be some bias in survey response data. We assumed that any such bias is of low magnitude, given the rigorous reputation of BRFSS [30], and not specific to any particular demographic or behavioral category. There also may be timing issues, such as people not having a checkup and therefore not receiving vaccination reminders, in the year after a mild influenza season. The stability of the model over three years, 2012-2014, helps to guard against such timing issues.
To make our model more predictive, new research questions could be investigated. What is the impact of a person's health information source preference: doctor, online consumer health websites, or social networks [32,33]? How influential are the beliefs about the health of a person's online or offline community? How influential is a person's trust in information from the health care system, personal doctors, pharmaceutical companies, government, or social media? How significant a factor is a person's anxiety about vaccination infection [34] or fear of hypodermic needles [19]?

Conclusions
An influenza-induced fever is common and usually lasts two to four days [3,34]. Millions of people fall ill to influenza every year in the United States and recover in a few days [15]. Almost everyone has experienced it and knows others who have suffered similarly [16]. This commonly experienced episode needs to be counterbalanced with scientifically correct and appropriately communicated messages about influenza vaccination risk and the importance of vaccination.
The Internet has enabled the distribution of medical information, empowering patients to treat themselves or at least ask their doctors better, more informed questions. The Internet has also been getting noisier. The empowered patient has increased access to opinion or myth (noise) in addition to scientific information from the medical establishment (signal) [35,36]. Access to a variety of "medical beliefs, scientific and nonscientific evidence, and emotionally arousing stories of other patients" is growing [21]. Patients are questioning the legitimacy and recommendations of scientific authorities, allowing unscientific views the same legitimacy and weight as scientific ones [7]. The movement to greater patient engagement and empowerment may be unintentionally creating greater distrust of traditional authorities [1].
This article offers a stable, multiple-year model for predicting which adults received influenza vaccination in the United States from 2012 to 2014, inclusive. We also present some practical implications of the results and how health communications could be targeted. Our model improves our understanding of influenza vaccination in the United States, and how we might intervene. Outside the United States, there may be other challenges, i.e., vaccine shortages causing the prioritization of at-risk individuals and health care workers over other groups. In such environments, the ecosystem of vaccine supply, as well as social and viral contagion, may differ.