2.1. Item Writing
To establish the content to be presented in the original measure, we conducted seven professional informant interviews with bioinformaticians, bioethicists, health lawyers, and physicians. We asked experts to provide their definition of AI, how they would explain AI to non-experts, to highlight clinical examples of AI (both present and future), and to enumerate potential concerns and benefits of AI in healthcare. We interviewed these leading experts to identify key ethical and practical concerns and to ensure our interpretation of the literature was accurate. We cross-referenced their responses with a literature review of more than 300 articles. For this literature review, we consulted with a medical librarian and used Boolean logic to search PubMed, Scopus, and Embase for articles related to “artificial intelligence”, “machine learning”, “big data”, and “health care.” We subsequently searched for clinical applications, ethical concerns, and patient perspectives within these broad searches. We did not restrict searches to pediatrics in order to understand the broad landscape of the AI field of study. After screening, we identified more than 300 articles related to practical and ethical aspects of AI. Additionally, we identified review articles and commentaries that highlighted major ethical issues in the use of AI in healthcare [3
]. From this review, we developed a framework of nine factors that might influence openness to AI-driven healthcare interventions in pediatrics: privacy, transparency, human element of care, social justice, societal cost, personal cost, quality/accuracy, access to care, and access to knowledge. We hypothesized that these factors represented latent constructs that captured the breadth of parental concerns, and we wrote items that targeted each of these constructs.
We next developed an original measure, Attitudes toward Artificial Intelligence in Pediatric Healthcare (AAIH-P), to assess openness to the use of AI-driven technologies in their child’s healthcare and to identify concerns that parents considered important when considering engagement with these devices. This new measure consisted of a general openness scale and a scale that measured the importance of various concerns in the parent’s consideration of these technologies. We developed the openness scale with the intention of understanding parental openness to a variety of AI-driven applications. We created the concerns scale with the goal of determining whether our hypothesized factors represented underlying latent constructs, and whether these concerns were associated with the openness scale.
Of note, we opted to avoid using the terms ‘artificial intelligence’ or ‘machine learning’ in these measures because of misconceptions and complexity associated with these terms, based on feedback from informant interviews with professionals. We also asked these professionals how they would describe AI or machine learning to lay audiences. These descriptions focused on the need for large amounts of data, the ability to make comparisons to many other people similar to the patient, and the unique clinical functions these AI-driven technologies might offer. Due to this, we opted to describe characteristics of the technologies without labeling them as ‘artificial intelligence’ or ‘machine learning’. During cognitive interviews, parents preferred the terms ‘computer programs’ and ‘devices’, so we opted for this phrasing.
The general openness scale asked participants, “How open are you to allowing computer programs to do the following things?” Participants were asked to read 12 items and select their level of openness on a five-point Likert scale ranging from ‘not at all open’ (1) to ‘extremely open’ (5). These 12 items represented four different AI-driven healthcare functions, including diagnosis (e.g., “Determine if your child broke a bone”), risk prediction (e.g., “Predict your child’s risk of developing depression in the future”), treatment selection (e.g., “Decide on the best treatment for your child’s diabetes”), and medical guidance (e.g., “Give you advice on how to prevent your child’s asthma attacks”) (Appendix B
). We included interventions that we believed represented a wide range of emotional intensity for the respondents. This determination of emotional intensity was based on clinical expertise of one author (BAS) in treating children with minor complaints (e.g., ear infections) and diagnosing children with serious, life-threatening diagnoses (e.g., cancer). To support face and content validity, we asked parents during cognitive interviews about their emotional reactions to the range of diagnoses, and whether they were familiar with these diagnoses.
The second component of this measure aimed to identify concerns that parents found important when considering the use of AI-driven healthcare interventions in their child’s medical care. Participants were asked, “When you think of using these new devices, how important are the following details to you?” Each of the items described a potential aspect of AI-driven healthcare interventions that might affect openness. For example, “Whether these devices are better than your child’s doctor at figuring out why your child is sick.” Participants selected how important these items were on a five-point Likert scale, ranging from ‘not important’ (1) to ‘extremely important’ (5). We drafted 57 initial survey items that targeted 8 hypothetical constructs (collapsing personal cost and societal cost into a single construct). We wrote 4 to 10 items per hypothetical factor, using both positive and negative valence to increase variance in responses.
After drafting this measure, we performed 11 cognitive interviews to ensure face validity, item clarity, and content validity. We subsequently revised the measure before distributing [20
]. For these cognitive interviews, we recruited mothers and fathers whose children were younger than 18 years old with a broad range of racial/ethnic, professional, socioeconomic, and educational backgrounds. To identify participants for these cognitive interviews, members of the research team approached acquaintances who could provide diverse perspectives. We prioritized diversity in education level and racial background. In these interviews, we reviewed surveys item-by-item for readability and clarity. For select items, we asked interviewees to rephrase the question in their own words, to ensure understandability. We also asked participants if they recommended any changes to information, scenarios, wording, or content. For any problematic questions or terminology, we asked what made the question hard to answer and what might help them to answer the question. We also learned about parents’ emotional reactions to the range of diagnoses, and whether they were familiar with these diagnoses. Lastly, we asked for general comments about the measure and scope of the items. Most commonly, parents raised concerns with specific words of phrasings that were confusing. Parents did not raise concerns about content of scenarios. We also assessed for reading ease and grade level, achieving Flesch–Kincaid Grade Level of 7.6.
2.2. Survey Participants
We employed criterion-based sampling to recruit participants using Amazon’s Mechanical Turk (MTurk), a web-based service that matches ‘requesters’ with ‘workers’ to complete various ‘human intelligence tasks’ (HITs). Participants were required to be parents whose children were all 18 years or younger. Past studies have demonstrated that MTurk is a cost-effective and reliable means of recruiting nationally representative samples of parents for online surveys [21
]. We applied specific MTurk qualifications, stipulating that participants have a 98% approval rating on at least 100 prior HITs to qualify for this survey. Additionally, we applied custom qualifications to prevent multiple entries from the same individual. Eligible participants completed the survey through a web-link to Qualtrics survey software. Participants received $
3.65 for completing this 30-min task. All subjects gave their informed consent for inclusion before they participated in the study. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Ethics Committee of Washington University School of Medicine (202004198). Participants reviewed an exempt information sheet at the beginning of the survey indicating the purpose of the study, the identity of the researchers, and the anticipated time commitment. Data were stored on encrypted servers.
We administered this survey in two rounds to perform exploratory factor analysis (EFA) followed by confirmatory factor analysis (CFA). Round 1 collected 449 responses, and round 2 collected 451 responses. Surveys were opened on 23 April 2020 and 11 May 2020. For the AAIH-P measure, each screen of the questionnaire contained 6 to 8 items. In total, the full battery of surveys contained 14 screens of questions. We utilized forced entry, so no questionnaires were incomplete. Respondents were not able to review or change answers from prior screens. Completion rates were 449/508 (88%) and 451/498 (91%) for rounds 1 and 2, respectively.
For quality control, we excluded participants who completed the survey in less than 7 min, anticipating that each question should require 4–5 s to respond meaningfully. For this reason, we excluded 20 participants (4%) in the first round and 54 (11%) in the second round. Additionally, we excluded 11 participants from each round who provided ages of children that were greater than 18 years. After exclusions, round 1 included 418 participants and round 2 included 386 participants, for a total of 804 participants in the entire study.
2.3. Validity and Psychometric Properties of AAIH-P Measure
To assess the validity of the AAIH-P openness measure, we assessed for internal reliability and convergent validity with other validated scales. To assess the validity of the AAIH-P concerns measure, we performed exploratory and confirmatory factor analysis.
2.3.1. AAIH-P Openness Scale
For the 12-item AAIH-P openness scale, we computed the mean response to items to create a composite score that ranged from 1 to 5. To test for internal reliability, we calculated Cronbach’s alpha for the full 12-item scale (α = 0.92). We also calculated mean responses and Cronbach’s alpha for the items related to the four different AI-driven healthcare functions: diagnosis (α = 0.84), risk prediction (α = 0.87), treatment selection (α = 0.90), and medical guidance (α = 0.84).
2.3.2. Exploratory Factor Analysis
We performed EFA with Promax rotation to assess the factor structure of the AAIH-P concerns scale. We examined the scree plot of eigenvalues and considered the amount of variance explained by each additional factor in order to determine the total number of factors to include. We also considered the number and magnitude of loadings on factors across potential factor solutions [22
]. Factorability of the items was confirmed by a significant Bartlett’s test of sphericity, v2 (1596) = 12,393.25, p
< 0.001, and Kaiser–Meyer–Olkin value of greater than 0.6 (KMO = 0.92).
Results of the EFA indicated a seven-factor solution that accounted for 54% of variance. We dropped two items that loaded poorly on these seven factors, and we dropped 16 items because of significant cross-loading on other factors (>0.3). To decrease survey burden, we dropped eight items with content that was captured in other similar items. After dropping these items, we repeated EFA with the 34 remaining items, finding that a seven-factor solution explained 60% of variance (see Table 1
for definitions of factors).
2.3.3. Confirmatory Factor Analysis
We then distributed the battery of surveys containing the revised 34-item AAIH-P concerns scale and performed CFA using maximum likelihood estimation. We allowed all factors to correlate with each other. We removed one item with poor fit. There was acceptable fit between the finalized model and the observed data: χ2 p < 0.001, CFI = 0.91, RMR = 0.069, RMSEA = 0.053.
2.3.4. Measures of Sociodemographic Attributes, Attitudes, and Personality Traits
To assess for convergent validity of the AAIH-P openness scale, we also administered measures of sociodemographic attributes, attitudes, and personality traits and examined associations with openness on AAIH-P. The Ten Item Personality Inventory (TIPI) is a 10-item measure used to assess personality constructs: openness to new experiences, conscientiousness, extraversion, agreeableness, and emotional stability [47
]. We anticipated a positive relationship between AAIH-P openness scale and openness as a personality trait. We assessed participants’ trust in health information systems using Platt’s scale. This 20-item scale includes four subscales: Fidelity, Competency, Trust, and Integrity [48
]. We also included a measure of faith and trust in general technology [49
]. We hypothesized a positive correlation between measures of trust and openness scores on AAIH-P.
We asked participants which political party they most align with: Democrat, Republican, Independent, or other. For respondents who chose independent or other, we asked whether they leaned democrat or leaned republican. For analysis, we clustered democrat with lean democrat and republican with lean republican. We hypothesized that participants who align with the Republican Party would be more conservative and might have lower openness on AAIH-P.
Lastly, we included the Positive and Negative Affect Scale (PANAS) to assess for the impact of the participant’s current affect on responses. This survey was administered during the early phases of the COVID−19 pandemic, and we hypothesized that negative affect generated by the societal challenges might negatively correlate with openness.