Shortening a Patient Experiences Survey for Medical Homes

The Consumer Assessment of Healthcare Providers and Systems—Patient-Centered Medical Home (CAHPS PCMH) Survey assesses patient experiences reflecting domains of care related to general patient experience (access to care, communication with providers, office staff interaction, provider rating) and PCMH-specific aspects of patient care (comprehensiveness of care, self-management support, shared decision making). The current work compares psychometric properties of the current survey and a proposed shortened version of the survey (from 52 to 26 adult survey items, from 66 to 31 child survey items). The revisions were based on initial psychometric analysis and stakeholder input regarding survey length concerns. A total of 268 practices voluntarily submitted adult surveys and 58 submitted child survey data to the National Committee for Quality Assurance in 2013. Mean unadjusted scores, practice-level item and composite reliability, and item-to-scale correlations were calculated. Results show that the shorter adult survey has lower reliability, but still it still meets general definitions of a sound survey for the adult version, and resulted in few changes to mean scores. The impact was more problematic for the pediatric version. Further testing is needed to investigate approaches to improving survey response and the relevance of survey items in informing quality improvement.


Introduction
The Patient-Centered Medical Home (PCMH) care model is gaining prominence as a way of improving primary care. The PCMH is commonly defined by an emphasis on comprehensive, team-based care; patient-centered care; coordination across different aspects of the health care system; access to care; and a commitment to quality and safety [1]. The adoption of PCMH functions have been encouraged under the Affordable Care Act, and multiple payers have provided financial incentives for practices to become medical homes [2,3].
As PCMH adoption expands, the ability to evaluate patient experiences has become critical in evaluating the impact of the medical home [4]. With Commonwealth Fund support, NCQA collaborated with the Consumer Assessment of Healthcare Providers and Systems (CAHPS , Agency for Healthcare Research and Quality, Rockville, MD, USA) Consortium, overseen by the Agency for Healthcare Research and Quality (AHRQ), to develop a survey instrument functionally aligned with key PCMH functions. This instrument, the CAHPS PCMH, finalized in late 2011, evaluates patient experiences on key domains of care associated with the medical home [5], and has since been used in a variety of settings, including National Committee for Quality Assurance's (NCQA) PCMH

Sample and Survey Protocol
Practices voluntarily submitting survey data to NCQA must follow procedures that NCQA requires for sample selection. For each survey administered, a random sample of patients is drawn based on the number of clinicians at a practice site (1 clinician in a practice = a required sample size of 128; 2-3 clinicians = 171 sample size; 4-9 clinicians = 343 sample size; 10-13 clinicians = 429 sample size; 14-19 clinicians = 500 sample size; 20-28 clinicians = 643 sample size; 29 or more clinicians = 686 sample size).
Practices choose a random selection of adults (aged ě18 years) and pediatric (aged <18 years) patients who had at least 1 visit to a provider in the past 12 months prior to survey completion. A parent or guardian is asked to complete the survey for eligible children. In 2013, 268 practices submitted data for adult patients (n = 27,896 respondents); 58 practices submitted data for child survey patients (n = 4277 respondents). Data were submitted to NCQA in April and September 2013, and surveys had to be administered within the 15 months prior to submission. The last month of data collection allowed was August 2013. The survey administration protocol included mail only, telephone only, mail with telephone follow-up and Internet only administration options. The majority of practices used mail only administration (78% adult, 79% child), with smaller proportions using Internet only (12% adult, 15% child), telephone only (10% adult, 5% child) or mail with telephone follow-up (none in adult, 1% child) administration.

Analysis
We calculated internal consistency reliability (Cronbach's alpha) of multi-item composites; practice-level unadjusted mean scores for each composite; and site-level reliabilities for each item and composite. Following prior methods used to report CAHPS PCMH results in the literature, we calculated scores using proportional scoring and the summated rating method-i.e., we calculated the mean responses to each item, after transforming each response to a 0-100 scale (100 representing the most positive response on any given item response scale; 0 representing the least positive) [3,7]. For example, on a Yes/No response scale, if "Yes" represents the most positive response, then Yes = 100 and No = 0; on an Always/Usually/Sometimes/Never response scale, if "Always" represents the most positive response, then Always = 100, Usually = 67, Sometimes = 33 and Never = 0. A higher score means that practices were rated more positively for care on that item. We use this 0-100 scale to facilitate comparison of our results to prior, peer-reviewed published CAHPS PCMH results that were reported based on a 0-100 possible range off scores [3,7]. We examined site-level reliabilities by differentiating between-site and within-site variance in one-way ANOVAs [3,7].
We also assessed the extent to which shortening the access and communication composites resulted in changes to the relative ranking of practices. Specifically, we examined the extent to which the ranking of practices shifted under the revised survey composites using two statistical tests. First, we conducted a Pearson's Chi-Squared test that examined the relationship between (categorical) quintile rankings of practices in the revised versus original composites. Second, we examined the rank order correlations among practices using the short and long versions of each composite. Both of these analyses were conducted on each composite (access and communication) for all samples (child and adult).

Proposed Revisions to Shorten the Survey
Based on initial psychometric analysis and stakeholder input, we propose a shorter survey-reducing the adult tool from 52 to 26 items, and the child tool from 66 to 31 items. We consulted 22 stakeholders, representing a variety of perspectives: 11 were clinicians, researchers, survey implementers, those who work with practices to improve patient experiences, and those who use the survey for public reporting purposes; another 11 were patient advocates identified in collaboration with the National Partnership on Women and Families and the Institute for Patient and Family Centered Care. We asked all stakeholders to provide input and select items for a shortened survey based on several key principles: Which items are psychometrically sound (i.e., site-level reliability of 0.70 or higher)? Which items are conceptually central to the PCMH model? Which items are important to consumers? Which items are actionable?
We gathered qualitative input during discussions with stakeholders, including the rationale for prioritizing items based on the above principles. As part of this process, we also asked stakeholders to vote to either "keep" or "drop" items for a shortened survey. The final selection of items was based on this input, including items that were prioritized by stakeholders and garnered the largest number of "keep" votes.
Based on stakeholder input, key changes include reductions in access, communication and comprehensiveness of care composites for the adult and child tool. Because stakeholders did not prioritize the shared decision-making and office staff composites, or several individual (non-composite) items related to access, information, and coordination of care, the proposed shortened survey drops these composites and items (further detail on all items retained for the shortened survey are in the Results).
Item-level results often informed stakeholder input regarding which items could be dropped for a proposed shorter survey. Generally, stakeholders agreed that items achieving estimated reliabilities of less than 0.70 at the practice level could be dropped. For example, an item in the access composite-getting answers to medical questions as soon as needed when phoning one's provider after-hours-did not achieve 0.70 reliability (0.45 adult, 0.42 child) and was dropped. Self-management support items also did not achieve 0.70 reliability and were dropped.
There were some exceptions, however, including if the item met other guiding principles, such as being conceptually important to the PCMH model or to consumers. For example, a coordination of care item-provider seemed informed and up-to-date about care received from specialists-did not achieve 0.70 reliability (0.66 adult, 0.20 child). However, most stakeholders deemed this item too conceptually important to the PCMH model to be dropped; thus, the item was retained. Conversely, some items achieved 0.70 site-level reliability, but based on concerns over survey length and other guiding principles, stakeholders did not prioritize these items. For example, two items in the access composite (got appointment for routine care; saw provider within 15 min of appointment time) achieved site-level reliabilities above 0.70, but most stakeholders did not deem these two items as conceptually important relative to others in the composite; one of the items also had a lower item-scale correlation with the total composite. Thus, the proposed shortened survey did not include these items.
We sought public comment on the proposed changes in October and November 2014, and received 635 comments-the majority (88%) voted in support the proposed changes [15].

Results
A total of 268 practices submitted data on the adult survey and 58 practices submitted data on the child survey. The mean number of respondents per practice was 104 for the adult survey and 74 for the child survey. The overall response rate was 27% for adults and 23% for children. Respondent characteristics are presented in Table 1. For the adult survey, the majority of respondents were female (61%) and aged 55-64 years (25%). Most self-rated their general health as good (36%) and their mental health as very good (35%). For the child survey (filled out by the child's parent or guardian), the majority of respondents were also female (89%). Parental ratings of child health on the child survey were better overall than self-rated health on the adult survey, with excellent general and mental health ratings of 57% and 56%, respectively, for the child sample. The majority of practices, for both adult and child samples, were comprised of multiple providers (four or more), with ownership under a hospital, health system, or health plan (rather than physician owned) and located in the Northeast census region. Below we describe key results for the current PCMH composites and items for both adults (Table 2) and children (Table 3), as well as the impact of shortening the survey (Table 4 and Figures 1-6). Tables 2 and 3 indicate, in italics, all items retained for the shortened survey.

Internal Consistency Reliabilities
The majority of multi-item composites formed an internally consistent scale in current versions of both adult (Table 2) and child surveys (Table 3)-with four composites meeting the recommended standard of a 0.70 or higher Cronbach's α: communication with providers, six items (Cronbach's α = 0.92 adult and 0.91 child); office staff interaction, two items (0.84 adult and 0.85 child); access to care, five items (0.81 adult; 0.70 child); and comprehensiveness of behavioral care, three items (0.79 adult-only composite). Only two composites did not achieve the 0.70 level: self-management support (0.66 adult; 0.60 child) and shared decision making (0.65 adult-only composite).   Reducing the number of items in existing composites generally led to reductions in internal consistency reliability (Table 4). For the access composite for adults, reducing from five to two items led to reduction in internal consistency reliability from 0.81 to 0.67. For the communication composite in adults, reducing from six to two items changed the internal consistency reliability from 0.92 to 0.72. These findings are to be expected since the Cronbach's alpha increases as the number of items in a scale increases. Only the internal consistency of the access item for adults (0.67) fell below the recommended level of 0.70.
These patterns were also generally found in the child results (Table 4). However, three child composites fell below the recommended internal consistency level of 0.70 when revised: access (0.56 for the two-item scale), communication (0.68 for the two-item scale), and comprehensiveness in child prevention (0.59 for the two-item scale). These patterns were also generally found in the child results (Table 4). However, three child composites fell below the recommended internal consistency level of 0.70 when revised: access (0.56 for the two-item scale), communication (0.68 for the two-item scale), and comprehensiveness in child prevention (0.59 for the two-item scale).    These patterns were also generally found in the child results (Table 4). However, three child composites fell below the recommended internal consistency level of 0.70 when revised: access (0.56 for the two-item scale), communication (0.68 for the two-item scale), and comprehensiveness in child prevention (0.59 for the two-item scale).

Practice Level Reliabilities
Practice-level reliabilities achieved the recommended level of 0.70 or higher for most current versions of multi-item composites in the adult (Table 2) and child (Table 3) surveys, with the exception of the shared decision making composite (0.58).
Reducing the number of items in existing composites led to reductions in practice level reliability for only the access composite (Table 4). Reducing the access composite from five to two items led to reduction in practice level reliability from 0.94 to 0.85 for adults, and 0.88 to 0.77 for children. There was no reduction in practice level reliability for the communication composite in adults or children (e.g., 0.82 for both the six-item and two-item scale in adults), nor for the child comprehensiveness of preventive care composite (0.88 for both the five-item and two-item scale).

Item to Scale Correlations and Unadjusted Mean Scores
Item-scale correlations (Tables 2 and 3) provided support for the reduced composites (all correlations achieving levels of 0.50 or higher, and correlating about as highly as many items in the original), with some exceptions. For the adult communication and the child comprehensiveness of preventive care composite, items in the reduced two-item composite correlated more weakly than the lowest correlation in the original multi-item composites (e.g., for adult communication: 0.79 for the two-item scale and 0.83 for the weakest correlation in the six-item scale; for child comprehensiveness of preventive care: 0.62 for the two-item scale and 0.73 for the weakest correlation in the five-item scale). However, these item-scale correlations still achieved 0.50. We also note that the correlations for two-item scales should not be interpreted as correlations with a true "scale" as they relate one item to only one other item.
Unadjusted mean scores were also generally stable with the reductions; the largest difference being a five-point score improvement for the adult access composite (76.0 for the original six-item scale to 81.5 for the short two-item scale). The number of estimated responses per practice needed to achieve reliabilities of 0.70, 0.80, and 0.90 are presented in Table 4. For the adult survey, the number of responses for a reliability of 0.70 ranged from 14 to 84, with a higher minimum number of responses needed to achieve the same reliability in the revised composites (minimum of 26, maximum of 50). For the child survey, the number of responses for a reliability of 0.70 ranged from 17 to 71, with a higher minimum number of responses needed to achieve the same reliability in the revised composites (minimum of 21, maximum of 55).

Relative Ranking of Practices under Revised Survey Composites
Shortening both the adult and child composites for access and communication resulted in more changes in the relative ranking of practices for the access composite compared to the communication composite. Specifically, for the communication composite, results from the quintile analysis showed that 74% of adult practices did not change rank while 25% changed one quintile rank ( Figure 1); 66% of child practices did not change quintile rank while 35% changed one quintile rank ( Figure 2). For the access composite, however, there were more changes based on quintile ranks: 51% of adult practices did not change rank while 40% changed one quintile rank ( Figure 1); 52% of child practices did not change rank while 31% changed one quintile rank (Figure 2).
Results from the (full) rank order correlation analysis were consistent with results from the quintile ranking analysis. Specifically, for the communication composite, long and short versions of the composite resulted in similar practice rankings for both the adult (r = 0.97, p < 0.001) and child (r = 0.96, p < 0.001) versions of the composites (Figures 4 and 6). For the access composite, long and short versions of the composite also resulted in more changes in rankings for the adult (r = 0.83, p < 0.001) and child (r = 0.76, p < 0.001) composites (Figures 3 and 5) compared to the communication composite-although these still meet common recommended levels of 0.70 or higher for a strong, positive correlation.

Discussion
This study provides further support for the reliability and validity of the current CAHPS PCMH survey, based on updated data across a larger sample, and characterizes the psychometric impact of shortening the survey. Importantly, our findings suggest that a shorter adult survey is possible.
Unadjusted mean scores were also generally stable with the reduction; the largest difference being a five-point score improvement for the adult access composite. For both the adult and child surveys, the reductions did result in more changes in the relative ranking of practices for the access composite, compared to the communication composite.
In general, internal consistency reliability for the multi-item composites exceeded or equaled original published field test results [3]. Estimates of site-level reliability also indicate that a reliability of 0.70 or higher can generally be achieved for most multi-item composites. However, borderline site-level reliability among select composites and items suggest that, despite their salience to the PCMH care model, these items and composites may be considered for removal to streamline the survey and its effectiveness and uptake. Previous research by the CAHPS Consortium suggests that survey length generally does not affect survey response rates, with prior findings suggesting that the number of survey questions that respondents were required to answer, from as few as 23 to as many as 95, had little effect on response rates and respondents were as likely to answer a relatively longer survey as a shorter one [16]. However, recent input from a diverse group of stakeholders under NCQA's PCMH recognition program have suggested a need to consider shortening the survey in order to increase response rates. Both NCQA and the CAHPS Consortium have conducted research to re-evaluate the PCMH survey, and, as of the time of this present study, have each put forth their own proposals for changes to the survey [11,15], with the CAHPS Consortium finalizing their revised version of the CAHPS Clinician and Group survey (version 3.0) in July 2015, reducing the length from 34 to 31 items [12]. Additional possibilities for shortening the CAHPS Clinician and Group survey have also since been published [13].
The results here suggest that reduction in length are possible; despite some reduction in psychometric properties, the reduced adult survey would still generally meet standard definitions of a psychometrically sound survey; however, given three child composites fell below the recommended internal consistency level of 0.70 when revised, further testing is recommended to establish appropriate criteria for shortening the child survey. For example, further work could investigate whether internal consistency reliability suffered because these composites may not have reflected "true" scales, whether the smaller child survey samples may have influenced practice-level reliability-which in turn influenced item-level results and decisions to drop item, or whether there may be something else altogether beyond these psychometric concerns-such as the possibility of more variability in the kind of care pediatric populations require.
Additionally, while a shorter survey addresses ongoing concerns about survey length, further work should also investigate related issues of survey response and uptake, including whether a shorter survey facilitates meaningful improvement in response rates, or facilitates opportunities for customization of the survey to fit practice needs. Input from consumers and families about the relevance of these measures for decision-making as well as practice input on the usefulness in quality improvement are also key considerations.
In recommending any further potential changes to shorten the survey, several overarching principles should be taken into account, including some of those used in the current study. First, any reduction needs to be weighed not only against its impact on psychometric attributes, but also against goals for survey use. During stakeholder discussions and public comment fielding, many indicated the importance of having a shorter survey that meets a mix of both accountability and quality improvement needs. One useful principle already used in the current study is to consider whether an item is actionable, which speaks to its usefulness from both an accountability and quality improvement perspective.
Second, reductions may need to be considered for only certain composites versus all composites. In the current study, some reductions achieved higher internal consistency reliability than others, begging the question of whether a broad approach of shortening all composites is too "blunt", and if reductions should instead be customized to only some portions of the survey.
Finally, relevant to these concept of customization, any survey change needs to consider the increasing attention towards flexibility. Although there was overwhelming support for shortening the survey, there were also diverse opinions regarding which items should be dropped. Given the CAHPS Consortium, NCQA and other groups (including the Massachusetts Health Quality Partners) have each proposed slightly different approaches for shortening the survey, this begs the further question of whether the route to a shorter survey should emphasize not so much the selection of specific items, but rather the creation of a flexible route to assessment. The literature has already begun to acknowledge the need to strike this balance, calling for patient surveys, such as the CAHPS surveys, to allow for variation, while retaining common core elements as a "foundation" to facilitate alignment and standardization [17].
This study had some limitations. First, response rates were lower than seen in some other surveys, although they are similar to response rates in some implementations of CAHPS surveys [3]. While a low response rate may not have affected the psychometric results presented in this study, this is an important limitation. As we were unable to examine differences between non-responders and responders, the study results must be interpreted with caution and may not be generalizable. Second, the majority of practices were from the northeast area, which also affects the generalizability of our results. However, unlike prior published findings of the CAHPS PCMH survey, practices from most major census regions (west, midwest, northeast), except the south, submitted data. Despite these limitations, this study provides important information on the psychometric impact of shortening the survey, and opens up possibilities for assessing patient experiences in medical home settings where survey length or burden may be a concern.
As PCMH adoption expands, the ability to evaluate the PCMH promise of improving patient experiences and other aspects of care remains essential. The current literature acknowledges that more evidence is generally needed to determine the effects of the PCMH on select outcomes [2]. Given the concerns around survey length, opportunities to shorten the CAHPS PCMH survey would complement current measurement efforts to evaluate PCMH settings. Further research should address and further delineate the approaches needed to ensure that the CAHPS PCMH plays a useful role in optimizing patient experiences in PCMH and other efforts to reform the health system, whether it is investigating approaches to improving survey response or uptake, the relevance of survey items and composites to inform quality improvement, or the incorporation of new methods to efficiently assess priority domains, while retaining opportunities for shortening and customizing the survey.

Conclusions
In conclusion, the current study provided an opportunity to evaluate key aspects of the PCMH model of care across a large group of medical practices. The findings show that shortening the survey-in response to survey length concerns-reduces reliability, but still meets general definitions of a sound survey for the adult version; however, further testing is recommended to establish appropriate criteria for shortening the child survey. Future opportunities to evaluate PCMH patient experiences, and to improve current measures for doing so, remain key towards assessing whether the PCMH translates into improvements for patients. Author Contributions: Judy H. Ng and Sarah Hudson Scholle conceived and designed the study, oversaw data analysis, interpreted the results, and wrote the paper; Sarah Hudson Scholle applied for study funding; Erika Henry and Peichang Shi analyzed the data; Erika Henry and Tyler Oberlander provided further interpretation of results, produced graphs, and edited sections of the paper.

Conflicts of Interest:
The authors declare no conflict of interest, other than employment by the National Committee for Quality Assurance, which accepts data from the CAHPS PCMH Survey in its PCMH recognition program.