Integrating Machine-Learning Methods with Importance–Performance Maps to Evaluate Drivers for the Acceptance of New Vaccines: Application to AstraZeneca COVID-19 Vaccine

de Andrés-Sánchez, Jorge; Souto-Romero, Mar; Arias-Oliva, Mario

doi:10.3390/ai7010034

Open AccessArticle

Integrating Machine-Learning Methods with Importance–Performance Maps to Evaluate Drivers for the Acceptance of New Vaccines: Application to AstraZeneca COVID-19 Vaccine

by

Jorge de Andrés-Sánchez

^1,*

,

Mar Souto-Romero

²

and

Mario Arias-Oliva

³

¹

Business and Research Laboratory, Department of Business Administration, Universitat Rovira i Virgili, Campus de Bellissens, 43204 Reus, Spain

²

Department of Business Economics, Faculty of Business & Economy, Universidad Rey Juan Carlos, P.° de los Artilleros, 38, 28032 Madrid, Spain

³

Marketing Department, Faculty of Business & Economy, Universidad Complutense of Madrid, Campus de Somosaguas, 28223 Madrid, Spain

^*

Author to whom correspondence should be addressed.

AI 2026, 7(1), 34; https://doi.org/10.3390/ai7010034

Submission received: 13 November 2025 / Revised: 5 January 2026 / Accepted: 20 January 2026 / Published: 21 January 2026

(This article belongs to the Special Issue Explainable and Trustworthy AI in Health and Biology: Enabling Transparent and Actionable Decision-Making)

Download

Browse Figures

Versions Notes

Abstract

Background: The acceptance of new vaccines under uncertainty—such as during the COVID-19 pandemic—poses a major public health challenge because efficacy and safety information is still evolving. Methods: We propose an integrative analytical framework that combines a theory-based model of vaccine acceptance—the cognitive–affective–normative (CAN) model—with machine-learning techniques (decision tree regression, random forest, and Extreme Gradient Boosting) and SHapley Additive exPlanations (SHAP) integrated into an importance–performance map (IPM) to prioritize determinants of vaccination intention. Using survey data collected in Spain in September 2020 (N = 600), when the AstraZeneca vaccine had not yet been approved, we examine the roles of perceived efficacy (EF), fear of COVID-19 (FC), fear of the vaccine (FV), and social influence (SI). Results: EF and SI consistently emerged as the most influential determinants across modelling approaches. Ensemble learners (random forest and Extreme Gradient Boosting) achieved stronger out-of-sample predictive performance than the single decision tree, while decision tree regression provided an interpretable, rule-based representation of the main decision pathways. Exploiting the local nature of SHAP values, we also constructed SHAP-based IPMs for the full sample and for the low-acceptance segment, enhancing the policy relevance of the prioritization exercise. Conclusions: By combining theory-driven structural modelling with predictive and explainable machine learning, the proposed framework offers a transparent and replicable tool to support the design of vaccination communication strategies and can be transferred to other settings involving emerging health technologies.

Keywords:

vaccine acceptance; machine learning; decision tree regression; random forest; Extreme Gradient Boosting; importance–performance maps

1. Introduction

Vaccination is the cornerstone of public health policies because of its ability to prevent infectious diseases and reduce mortality [1]. Since 1974, vaccination has prevented 154 million deaths, including 146 million among children under five years old, of which 101 million were infants under one year of age [2]. Vaccines not only prevent the onset of diseases but also reduce the burden on healthcare systems by decreasing the need for hospitalization and costly treatments [1]. Public health policies that prioritize vaccination also promote health equity by ensuring that everyone, regardless of socioeconomic status, has access to this vital preventive measure [3].

Vaccination during the COVID-19 crisis was one of the most effective tools for reducing the health, social, and economic impact of the pandemic. Owing to the rapid implementation of immunization campaigns, hospitalization and death rates have significantly reduced [4]. Furthermore, mass vaccination helps alleviate pressure on healthcare systems, facilitates the gradual return to economic and educational activities, contributes to economic recovery [5], and improves psychological well-being [6].

The conditions surrounding vaccination campaigns for well-known diseases such as influenza or diphtheria contrast sharply with those implemented in response to the COVID-19 pandemic in 2020. Public health campaigns have focused on maintaining high vaccination coverage and preventing outbreaks through established schedules and actions targeted at vulnerable populations, such as children and the elderly [7]. These are well-established vaccines that usually trigger a predictable and robust immune response and have a proven safety track record [8].

In contrast, the COVID-19 pandemic represents an urgent public health challenge that requires rapid and adaptive response to a completely new virus. In March 2020, health authorities worldwide were mobilized to develop effective vaccines against COVID-19, leading to an accelerated approval process that was significantly faster than the traditional timelines for vaccine development [9]. Research conducted before and during the pandemic indicated that public attitudes toward vaccines were highly influenced by the sociopolitical context, resulting in disparities in vaccine acceptance that are not typically observed in response to traditional diseases [10].

Vaccine hesitancy, often fueled by the rapid rollout and perceived risks associated with new vaccines, contrasts with the generally high levels of trust and acceptance of well-established vaccines with a history of safe use [11]. However, in the context of low population immunity and high viral transmission, such as during the COVID-19 pandemic in 2020, mass vaccination has become especially important as a key strategy to control the disease [12].

In the context of new vaccines, it is particularly important for public health managers to understand the key variables influencing vaccination intention and develop methods that predict vaccine acceptance. While there is no data on actual vaccine use in such cases, data on intended use can be obtained through social media [13] or surveys [14].

This study proposes an analytical framework that combines machine learning (ML) tools with Importance–Performance Maps (IPMs), with the goal of identifying the key variables that explain acceptance of new vaccines for which no historical evidence on efficacy or side effects is available. It is assumed that the available information comes from structured questionnaires. The ML methods employed include decision tree regression (DTR) and its extensions, Random Forest (RF) and Extreme Gradient Boosting (XGBoost). Unlike estimates based on linear regression techniques, these methods allow for the capture of non-linear relationships and interactions between variables that may not have been considered in the theoretical model being estimated [15].

Although ML and deep learning methods are widely applied across various domains of medical data analysis [16,17], their use in research on vaccine acceptance or hesitancy remains relatively limited. Among the few that do employ ML, sentiment analysis of social media data is particularly prominent [13,18]. Likewise, Bodapati et al. [19] analyze the vulnerability of different countries to emerging COVID-19 variants despite vaccination progress, comparing the performance of logistic regression, K-nearest neighbors (KNN), and neural networks. With regard to structured surveys, notable prior studies include [20,21,22]. However, the first (focused on influenza) and the third (on SARS-CoV-2) examine actual vaccine uptake rather than intention to vaccinate.

Specifically, this study aims to achieve the following research objectives (ROs):

RO1: To propose a ML approach to visualize the importance of explanatory variables for the intention to use a vaccine under development, providing an alternative explanatory perspective to conventional linear regression methods while providing strong predictive performance.

RO2: To enhance the model’s explanatory capacity by integrating the results of ML algorithms into an IPM in order to provide health authorities with strategic information on the most relevant variables influencing the acceptance of new vaccines.

The novelty of this paper lies in the use of diverse combined analytical instruments that employ decision tree methods to understand how new vaccines are perceived for decision-making in their implementation. This approach enhances two key dimensions of machine learning to make it more useful: transparency and explainability [15]. A decision tree regression (DTR) has lower predictive power than Random Forest (RF) or XGBoost, but it can be understood as an “average tree” underlying ensemble method, thereby providing greater transparency. The use of SHAP (commonly applied in ensemble methods) enables the interpretability of the outcomes, and, as a novel contribution, the integration of SHAP with an IPMs introduces an additional level of interpretability. This integration facilitates the use of machine learning estimations for public health policy implementation related to the adoption of new vaccines.

2. Materials and Methods

2.1. Theoretical Groundwork

The variable that quantifies vaccine acceptance is behavioral intention, defined as a person’s subjective perception of the likelihood of performing a specific behavior [23], in this case, the intention to use (IU) a vaccine that is still in the trial phase. We consider this variable particularly appropriate compared to others, such as actual use, since the analysis was conducted in a context in which the vaccine had not yet been distributed or even approved.

The application of predictive tools requires an analytical framework that justifies the use of explanatory variables to model dependent variables. We adopted the cognitive–affective–normative (CAN) model [24] as a reference, which has been applied to vaccine acceptance [25,26]. The CAN model is also grounded in the theory of Planned Behaviour [23], which explains behavioral intentions based on attitude, control, and subjective norms. Figure 1 presents the explanatory variables and their relationships with the dependent variables.

The first component of the CAN model is related to cognitive factors, which, in the context of vaccination, refer to the perceived effectiveness (EF) of the vaccine [25]. This is a widely reported variable and is considered fundamental in explaining vaccine acceptance [27], as it directly influences attitudes toward the vaccine. It has also been shown to be a key factor in understanding the acceptance and hesitancy toward COVID-19 vaccines [25,28,29,30,31]. Since the vaccine is still in development, perceptions of effectiveness must be based on preliminary evidence from clinical trials.

The second group of variables in the CAN model corresponds to affective factors. In the vaccination context, Pelegrín-Borondo et al. [25] operationalized this as fear of COVID-19 (FC) and fear of take the vaccine (FV). Both affect attitudes toward vaccination—the former fostering a positive attitude and the latter discouraging it. Several studies have shown that fear of the disease is positively linked with the intention to be vaccinated [14,25,26,30,31], whereas fear of the vaccine acts as a significant barrier to acceptance [25,26,31,32] and, on the contrary, the perception of its safety diminishes hesitancy [28].

The third component of the model relates to subjective norms and refers to social influence (SI). This can be conceptualized as the perception that close or significant others believe that one should perform a particular behavior [25]. In addition to the influence of family and friends, the key role of health professionals, such as doctors, must be highlighted, as their recommendations have proven decisive [33]. Expert knowledge and social norms act as prescriptive and informative cues, whereas dyadic interactions (such as medical consultations) provide emotional and instrumental signals [33]. Citizens’ trust in government authorities responsible for promoting vaccination policies should also not be overlooked [28].

Social influence has been shown to be a robust explanatory factor in the context of COVID-19 vaccine acceptance [25,30,34,35].

Finally, as shown in Figure 1, two control variables were included: sex (SEX) and age (AGE), which are relevant to understand vaccination patterns [36]. Several studies have identified that women tend to show greater hesitancy toward COVID-19 vaccination than do men [36,37]. Regarding age, it has been empirically demonstrated that from age 50 onward, the likelihood of COVID-19 infection leading to serious health consequences increases significantly [38]; consequently, vaccination intention is expected to be higher in this age group [14].

2.2. Sample and Sampling

This paper uses a survey based on an online questionnaire, which explored perceptions surrounding the first COVID-19 vaccine to be distributed: Vaxzevria, developed by AstraZeneca and used in [25,26]. Data collection took place between 9 and 16 September 2020, and targeted the residents of Spain. At that time, the vaccine had already received some media coverage, including reports of possible adverse side effects [39]. However, the survey was conducted prior to its authorization for commercial use in the European Union, which was granted on 29 January 2021 [40], and before its distribution in Spain, which began in February of the same year [41].

The target population was persons living in Spain. They were recruited by social media, with quotas applied to ensure balanced representation by gender (a minimum of 40% of males and females) and age. A uniform distribution was sought across three age groups: 17–30 years, 31–50 years, and over 50 years.

From a total of 827 responses received, the first 600 that met the completeness criteria and established quota balance were selected. The participants were 41.97 years on average with a standard deviation of 15.52 years. The complete sociodemographic distribution of participants is shown in Table 1.

2.3. Measurement of Variables

The survey began with informational text that contextualized the situation at the time of its administration. This indicated that, at the moment of response, the clinical tests for the AstraZeneca drug were put on hold following the report of a major adverse event. The introductory text states the following:

“Imagine that the COVID-19 vaccine currently being developed by the University of Oxford and AstraZeneca is the first vaccine approved by the health authority of the European Union, after addressing its adverse effects. Please note that the trials for this vaccine were suspended on September 9, following a report of a ‘serious adverse event’ in a volunteer. Please respond to the following questions on a scale from 0 (strongly disagree) to 10 (strongly agree).”

The survey was distributed in Spanish, although this paper presents its translation in English (Table 2). All items were answered using an 11-point Likert scale ranging from 0 (strongly disagree) to 10 (strongly agree).

Regarding the dependent variable—vaccination intention (IU)—the items focused on the vaccine developed by AstraZeneca, which, in autumn 2020, was the most developed drug and very likely to be used in Spain. The design of the items related to IU, as well as those related to attitudinal and subjective norm factors, was based on [25].

The sociodemographic variables were coded using binary variables. For gender, a value of 1 was assigned to women, and 0 otherwise. In the case of age, individuals aged 50 years or older were coded as 1, given that the probability of death from COVID-19 increases exponentially from this age onward [38]—and those under 50 were coded as 0.

2.4. Analytical Methodology

2.4.1. Assessment of Research Objective 1

Step 1. Reliability was assessed using Cronbach’s alpha to evaluate internal consistency, together with the average variance extracted (AVE) and an exploratory factor analysis (EFA) to examine convergent validity. Discriminant validity was evaluated using the Fornell–Larcker criterion. In addition, cross-loadings and a confirmatory factor analysis (CFA) were conducted to further assess the adequacy of the measurement model. In this step, we used the following the packages psych, stats, plspm, and lavaan available in R version 4.4.2.

Step 2. Both the dependent variable (IU) and the explanatory variables—EF (perceived effectiveness), FC (fear of COVID-19), FV (fear of the vaccine), and SI (social influence)—were measured using multiple items. The value of each construct was quantified using the standardized score derived from the factor extraction obtained in Step 1, which was also used to assess convergent validity. The dataset contained no missing values in the variables used for estimation, as incomplete responses were removed prior to analysis (complete-case approach).

We computed construct scores using the first principal component rather than simple item averages. Unlike mean scores, which implicitly assign equal weight to all indicators, the first principal component yields a weighted composite that maximizes the shared variance captured by the items and provides a unique (deterministic) scoring solution suitable for subsequent predictive modeling [42]. Given the bounded nature of the survey scales, no specific outlier detection or removal procedure was applied. In this step, we used the following R 4.4.2 package: stats.

Sociodemographic variables such as age and sex were measured (Step 1) using binary coding. Sex was coded as 1 for women and 0 otherwise, and age was coded as 1 for individuals aged 50 years or older and 0 for the rest. While this binary coding is naturally appropriate for sex, the 50+ age threshold was motivated by evidence that the risk of severe COVID-19 outcomes—such as severe hospitalization or even death—increases markedly from this age onward [38].

Step 3. To evaluate predictive performance using decision-tree-based methods while preventing any potential data leakage, the full dataset (N = 600) was randomly split once into a training subset (n = 480) and an external (hold-out) test subset (n = 120). The test subset was kept completely aside and was not used at any stage of model tuning or selection. Three machine learning models were estimated on the training data DTR [43], RF [44], and XGBoost [45]. Hyperparameter optimization was conducted using a homogeneous fine-tuning strategy across the three algorithms, based on an exhaustive grid search combined with five-fold cross-validation on the training subset. Specifically, the DTR model was tuned over maxdepth {2, 3, 4, 5}, minsplit {10, 20, 30, 40}, and cp {0.0005, 0.001, 0.002, 0.005, 0.01, 0.02} (96 configurations). The RF model was tuned over mtry {2, 3, 4, 5, 6}, ntree {300, 500, 700, 1000}, and nodesize {1, 3, 5, 7} (80 configurations). The XGBoost model was tuned over nrounds {200, 400, 600}, max_depth {2, 4, 6}, eta {0.03, 0.10}, gamma {0, 1}, colsample_bytree {0.6, 1.0}, min_child_weight 1, and subsample {0.7, 1.0} (144 configurations). In all cases, the optimal configuration was selected by minimizing the average root mean squared error (RMSE) across folds. In this step, we used the following R 4.4.2 packages: caret, rpart, randomForest, xgboost, and parallel.

Step 4. To assess the robustness and stability of predictive performance, we applied a bootstrap evaluation on the external test set (N = 120) for the selected best-performing ML model. Specifically, we generated 5000 bootstrap resamples of the test set and recomputed predictive accuracy metrics in each resample. Performance was summarized using Q² and RMSE, yielding empirical distributions and 95% confidence intervals rather than single-point estimates. This step was implemented in R 4.4.2 using base R (resampling and metric computation) and the corresponding model package (rpart, randomForest or xgboost).

Step 5. For explanatory purposes, the structural model was estimated using PLS-SEM on the full sample (N = 600). This analysis provides theory-driven evidence based on average net effects, enabling the assessment of the magnitude and direction of the relationships specified in the conceptual framework. In parallel, the best-performing decision tree regression model identified in Step 3 was also presented on the full sample as an interpretative complement. While PLS-SEM summarizes the average influence of each explanatory construct on IU, the decision tree offers a rule-based representation of how predictors interact through sequential splits and threshold values. In this way, the tree model provides an intuitive depiction of decision paths leading to higher versus lower acceptance levels, thereby enriching the explanatory interpretation beyond linear average effects. In this step, we used the following R 4.4.2 packages: plspm, rpart, and rpart.plot.

2.4.2. Assessment of Research Objective 2

Step 6. The performance of the constructs was obtained analogously to [46] in a PLS-SEM context. For factors with a positive link with intention to use, performance was stated as the weighted mean of the items by their factor loadings normalized on a 0–100 scale. Thus, if we denote this variable as X, its average as

{A v g}_{X}

, and its performance as

{P e r}_{X}

, we establish that

{P e r}_{X}

=

{A v g}_{X}

. In our case, X corresponds to EF, FC, and SI.

Conversely, if variable X has a negative link with intention to use,

{P e r}_{X}

is computed as the complementary value of

{A v g}_{X}

; that is,

{P e r}_{X}

= 100 −

{A v g}_{X}

. In this study, this refers to the fear of the vaccine. In this step, we used the following R 4.4.2 packages: plspm and base R.

Step 7. The importance of each predictor is determined with Shapley Additive Explanations (SHAP) [47], computed from the machine-learning model with the highest predictive performance. We operationalize global importance as the mean absolute SHAP value across observations (mean(|SHAP|)). Because SHAP is inherently an observation-level (local) attribution method, we aggregate local SHAP values to obtain an overall measure for the full sample. In addition, we conducted two supplementary analyses. First, we computed mean(|SHAP|) for policy-relevant subgroups, particularly individuals exhibiting higher vaccine rejection. We defined this subgroup as respondents whose IU falls at or below the 33rd percentile of the IU performance distribution obtained in Step 6. This group is of particular interest because lower vaccination uptake is associated with higher mortality risk, both due to increased susceptibility to infection among individuals aged 50 years and older and to higher transmission rates driven by unvaccinated individuals. Second, using the full sample, we evaluated how SI, which is the construct that health authorities can manage with relative ease to foster a favorable climate toward vaccination through relatively ‘low-cost’ policies such as communication and information campaigns [25], interacts with the remaining CAN constructs (EF, FC and FV). This enables us to assess the relevance of interactions between variables [48], which is especially informative in our context for the design of vaccination communication policies.

In this step, we used the following R 4.4.2 packages: fastshap, ggplot2, dplyr, tidyr, and ggbeeswarm.

Step 8. We plotted the IPM for the CAN variables (EF, FC, FV, and SI) in a two-dimensional graph. To interpret the IPM, instead of using the traditional four-quadrant approach (e.g., [46]), we apply the diagonal approach [49], as shown in Figure 2, because it offers greater discriminative power. This analysis is performed for the overall sample and for the group with higher vaccine rejection. In this step, we used the following R 4.4.2 packages: ggplot2 and dplyr (and ggrepel, if label repulsion was needed).

Figure 3 graphically presents the analytical methodology.

3. Results

3.1. Measurement Model Assessment

Table 2 presents mean and standard deviations of the items used in this paper. Regarding intention to use the vaccine, the mean value was close to 5 out of 10, suggesting a hesitant or doubtful attitude among the respondents.

As for Step 1, the results indicate that all scales used are reliable. First, internal consistency is adequate, as Cronbach’s alpha coefficient exceeds the recommended threshold of 0.70 in all cases. Additionally, convergent reliability was confirmed, since AVE was above 0.50, and all item factor loadings on their respective constructs exceeded 0.702, in line with methodological recommendations [50]. Moreover, Table 3 shows that the scales exhibited discriminant validity, as in all cases, the correlations were under the square rooted AVE of the variables.

Likewise, in Table 4, each indicator shows its highest loading on its intended construct, providing evidence of discriminant validity based on cross-loadings. The primary loadings are high (IU: 0.974–0.976; EF: 0.840–0.952; FC: 0.876–0.901; FV: 0.957–0.967; SI: 0.963–0.979), while all cross-loadings on the remaining constructs are lower than the corresponding primary loading, indicating that the items are not captured by alternative constructs.

A confirmatory factor analysis (CFA) was performed, showing a very good overall fit of the measurement model. Specifically, the Comparative Fit Index (0.985), Tucker–Lewis Index (0.979), and Normed Fit Index (0.979) clearly exceed common thresholds (≈0.95). The Root Mean Square Error of Approximation is low (0.060; 90% confidence interval: 0.050–0.071), indicating acceptable-to-good fit, and the Standardized Root Mean Square Residual is very small (0.025), suggesting minimal discrepancies between the observed and model-implied covariance matrices.

3.2. Machine Learning Method Fine-Tuning and Predictive Validation Results

After implementing Step 2, in which all latent variables were quantified through their corresponding factor extractions, the three ML methods used were fine tuned to provide the best fit to the CAN model (Step 3). Table 5 shows the results of the five-fold cross-validation fine-tuning conducted on the training subset (N = 480) using homogeneous grid-search procedures for the three machine-learning methods. Overall, RF exhibits the strongest performance, with the highest mean R² and the lowest mean RMSE. XGBoost ranks second in terms of RMSE and R², while DTR yields the weakest average predictive accuracy, as expected from a single-tree model. The reported standard deviations indicate that performance differences are not driven by a single split but remain reasonably stable across folds, supporting the robustness of the selected hyperparameter configurations.

Subsequently we performed validation of the predictive capability of ML methods with the hold-out (or external) test subset (Step 4). Table 6 indicates that RF achieves the best predictive performance on the external test set, with the highest R² (0.7164 ± 0.0519) and the lowest RMSE (0.5503 ± 0.0466), outperforming both DTR (R² = 0.5278 ± 0.0931; RMSE = 0.7104 ± 0.0679) and XGBoost (R² = 0.6980 ± 0.0637; RMSE = 0.5667 ± 0.0559). XGBoost also improves clearly over DTR in both metrics, suggesting that boosting offers a meaningful gain in generalization compared with a single-tree baseline.

Table 7 results indicate that although RF shows slightly better average performance than XGBoost, the pairwise differences are not statistically conclusive, since the 95% CIs for both ΔR² (0.0188; 95% CI [−0.0223, 0.0637]) and ΔRMSE (−0.0173; 95% CI [−0.0558, 0.0233]) include zero. In contrast, the differences versus DTR are robust: both RF − DTR and XGBoost − DTR yield clearly positive ΔR² and clearly negative ΔRMSE, with 95% CIs that exclude zero, confirming that the ensemble methods consistently outperform the single decision tree on out-of-sample prediction.

3.3. Visualizing Explanatory Drivers of Vaccine Acceptance: PLS-SEM and Decision-Tree Regression Insights

With the aim of complementing the theory-driven, linear evidence, the PLS-SEM results are reported in Table 8, whereas the optimal decision tree regression (DTR) model is displayed in Figure 4 (Step 5). Although DTR shows lower predictive performance than the ensemble-based methods (RF and XGBoost), it offers a clear interpretative advantage: it provides a transparent, rule-based visualization of how the explanatory variables combine sequentially through thresholds to yield different levels of intention to use. Therefore, the DTR can be interpreted as a parsimonious and communicable summary of the underlying decision structure that RF and XGBoost capture in a more complex way—through multiple randomized trees (RF) or sequentially optimized trees (XGBoost)—but that is less directly interpretable.

According to Table 8, based on the path coefficients (β), the most influential variable on the intention to use (IU) is EF, with a coefficient of β = 0.502 (p < 0.001), followed by SI (β = 0.360, p < 0.001), both showing positive and significant effects. Significant effects of smaller magnitude were also identified in the case of FV, which had a negative influence on IU (β = −0.103, p < 0.001), and FC, with a positive influence (β = 0.078, p = 0.002). In contrast, the control variables (age and sex) did not show any statistically significant effects.

The results of DTR, shown in Figure 4, reinforce the findings from PLS-SEM by indicating that the most relevant variables explaining IU are EF and SI. These are the only variables that define decision thresholds at the tree nodes, hierarchically splitting observations. Moreover, the pattern of the terminal nodes suggests that the relationship between EF and SI with IU increases, and the highest IU values are associated with higher levels of EF and SI as they exceed the thresholds established at the intermediate nodes. It should also be noted that the fact that FC and FV do not appear as primary splits does not mean they are irrelevant. Both variables can still contribute to reducing node impurity through surrogate splits. Surrogate splits are used when the primary splitting variable is missing; thus, FC and FV would become useful substitutes if there were missing observations in EF or SI (although this situation does not occur in our sample).

3.4. Using SHAP and Importance–Performance Maps for the Evaluation of Vaccination Policies

3.4.1. SHAP-Based Importance Analysis of Vaccine Acceptance Determinants

Next, we implemented Step 6 (performance evaluation following [46]) and Step 7 (SHAP-based importance). Specifically, we computed SHAP values and reported mean absolute SHAP (mean(|SHAP|)) for both the full sample and the subgroup with the highest vaccination reluctance (i.e., respondents with IU at or below the 33rd percentile of the performance score). SHAP values were derived from the random forest (RF) model, which achieved the strongest overall predictive performance among the evaluated machine-learning methods. Although the RF–XGBoost difference in predictive accuracy is not statistically conclusive, RF is also less computationally demanding than XGBoost, making it a pragmatic choice for SHAP-based interpretability analyses.

Figure 5 shows the SHAP summary plot (beeswarm plot style), illustrating the individual impacts of the variables FC, FV, EF, SI, SEX and AGE on the prediction of each observation of the dependent variable IU. On the vertical axis, the explanatory variables are listed, while the horizontal axis displays the SHAP values, which indicate how much each observation contributes to increasing or decreasing the prediction of IU. In the case of EF and SI, higher values (in red) tend to be associated with positive SHAP values, suggesting that they increase the prediction of IU. In contrast, for FC, lower values (in blue) are more dispersed toward negative SHAP values, indicating that a low score in FC tends to decrease the prediction of IU. The variables SEX and AGE show a reduced overall effect, with point concentrations near zero SHAP values, suggesting that their influence on the prediction is minimal in the model.

Figure 6 shows the importance of the explanatory variables in shaping IU, based on the mean absolute SHAP values. The most relevant variables in the full sample were EF (mean(|SHAP|) = 0.467), followed by SI (mean(|SHAP|) = 0.271), FV (mean(|SHAP|) = 0.093), and FC (mean(|SHAP|) = 0.065). In contrast, the sociodemographic variables (AGE and SEX) showed the lowest levels of importance, with mean(|SHAP|) values below 0.02. Overall, this ranking is fully consistent with the results obtained from the PLS-SEM and DTR analyses, confirming the dominant role of perceived effectiveness and social influence in explaining vaccination intention.

However, when focusing on the subgroup with lower vaccination acceptance (IU ≤ 33rd percentile), the relative importance of the explanatory variables becomes more pronounced. In this group, EF exhibits a substantially higher contribution (mean(|SHAP|) = 0.606), followed by SI (mean(|SHAP|) = 0.364), indicating that both perceived effectiveness and social influence are particularly decisive in explaining vaccine reluctance. The importance of FV (mean(|SHAP|) = 0.095) and FC (mean(|SHAP|) = 0.081) also increases slightly compared to the full sample, whereas sociodemographic factors remain marginal.

SHAP can also be used to examine the impact of interactions among variables on IU. Figure 7 presents three plots in which, for the full sample, we analyze the interaction between SI—the variable over which health authorities exert the greatest control through communication policies —and the other three CAN constructs (EF, FC, and FV). The three plots show clear dependence patterns that are consistent with the expected sign of each predictor. In EF × SI, the effect of EF on IU is strongly increasing and nonlinear (with a “jump” around EF ≈ 0), and a reinforcing pattern is also observed: at high levels of EF, observations with high SI (warm colors) tend to display higher SHAP(EF) values, suggesting complementarity between perceived efficacy and social influence. In FC × SI, the contribution of FC is positive but smaller in magnitude: the curve increases smoothly and the SI color gradient is weaker, indicating that FC acts as a secondary driver with limited interaction with SI. In FV × SI, the relationship is clearly negative: as FV increases, SHAP (FV) becomes more negative, and the color pattern suggests that when SI is high the penalty associated with FV may be slightly less pronounced in some ranges, implying that social pressure could partially buffer the effect of fear of the vaccine. Overall, these plots support the idea that EF and SI form the core of the decision-making process, whereas FC provides an incremental effect and FV acts as a relevant deterrent.

3.4.2. Importance–Performance Map Analysis of CAN Constructs

In Step 6, the performance of the constructs included in the CAN model was evaluated. Based on these performance scores and the importance values obtained in Step 6 and 7 the IPM was constructed and interpreted in Step 8, as illustrated in Figure 8 and Figure 9. While Figure 8 presents the IPM for the full sample, offering a global overview of the relative importance and performance of the explanatory variables, Figure 9 focuses on the most policy-relevant subgroup, namely individuals exhibiting lower vaccination acceptance.

Figure 8 shows that, with the exception of FC, all constructs are of strategic interest in the design of interventions aimed at improving COVID-19 vaccine acceptance. However, the reasons for their relevance differed in each case.

Regarding FV, the data in Table 2 show high scores for its corresponding items, while Figure 6 indicates that its performance is low. This suggests that it is a variable with a wide margin for improvement; that is, its performance could be significantly increased with relatively little effort, making it a top strategic priority.

In contrast, EF displays above-average performance. Although improving it may require greater effort, even small improvements in this dimension could have a substantial impact on intention to use, given its high importance within the model.

SI is positioned between FV and EF; it shows below-average performance, but greater importance. This makes it a construct with reasonable potential for improvement in terms of both impact and feasibility.

Fear of COVID-19 lies in the overkill zone. In other words, it is unlikely that the perceived severity of the disease among the population can be further increased, and it is also the construct with the least influence on vaccine acceptance. Therefore, it is not a strategic priority for intervention.

Importantly, the IPM for the low-acceptance subgroup (Figure 9) reinforces and sharpens these conclusions: in this segment, EF gains even greater strategic relevance, as its importance increases while its performance remains comparatively lower than in the full sample. This combination places EF at the core of targeted intervention strategies, suggesting that strengthening confidence in vaccine effectiveness is particularly critical among individuals most reluctant to vaccinate.

4. Discussion

4.1. General Considerations

This paper covered two research objectives (RO). To address the first research objective (RO1), we propose an integrative approach to visualize and prioritize the determinants of intention to use a vaccine under development by combining the cognitive–affective–normative (CAN) model [25] with several modelling techniques. Specifically, we use decision tree regression (DTR), random forest (RF), and Extreme Gradient Boosting (XGBoost) as a predictive and explainable layer, where ensemble learners (RF and XGBoost) capture complex predictive patterns and decision tree regression (DTR) provides a transparent, rule-based visualization that can be interpreted as a simplified surrogate of these more complex models. In parallel, the CAN model is estimated using partial least squares structural equation modelling (PLS-SEM) as a theory-grounded component to summarize average structural relationships among constructs. Together, these methods offer a coherent and practically interpretable view of vaccine acceptance during the vaccine development period.

The second research objective (RO2) was to demonstrate a tool that integrates the measurement of explanatory variable importance for vaccine acceptance using Shapley Additive explanations (SHAP) measure and strategic analysis tools, such as Importance–Performance Maps (IPM). This methodological combination enhances the explanatory capacity of ML-based instruments and offers a valuable tool for designing communication campaigns and public policies aimed at increasing acceptance of vaccines for emerging diseases.

We have observed that there is a consistency in identifying the most important factors shaping the intention to be vaccinated. Both PLS-SEM model and the ML algorithms (DTR, RF, and XGBoost) agreed to identify perceived efficacy (EF) and social influence (SI) as the most influential factors on the dependent variable. This convergence suggests that these constructs serve as genuine explanatory cores of vaccine-related behavior in uncertain contexts, such as the trial phases of a vaccine of particular relevance to public health.

4.2. Practical Implications

The importance of perceived efficacy in vaccine acceptance has been widely documented in the literature [25,27,30], and our findings reinforce this empirical evidence. In this study, perceived efficacy (EF) is not only the most influential predictor in the PLS-SEM estimation (β = 0.502), but it also exhibits the highest global SHAP importance in the Random Forest model when aggregating local explanations as mean absolute SHAP values (mean(|SHAP|) = 0.467 in the full sample). Importantly, EF becomes even more central among individuals with lower vaccination acceptance (IU ≤ 33rd percentile), where its importance further increases (mean(|SHAP|) = 0.606), underscoring its strategic relevance for public health interventions targeting hesitancy. In practice, this pattern suggests that vaccination campaigns should prioritize clear, transparent, and evidence-based communication about vaccine effectiveness—especially for reluctant groups, where strengthening efficacy beliefs is likely to yield the largest marginal gains in acceptance. Finally, because efficacy-related claims at the vaccine development stage are primarily grounded in clinical trial evidence rather than long-term post-market experience, maintaining transparency throughout the development and communication process remains essential for sustaining trust and supporting uptake.

SI also emerged as a major driver of IU. Its positive and significant effect in PLS-SEM (β = 0.360), together with its high global contribution in RF (mean(|SHAP|) = 0.271 in the full sample), indicates that subjective norms and perceived social pressure can meaningfully shape vaccine intentions, particularly in contexts where technical information is still evolving. This influence is even stronger among individuals with lower vaccination acceptance, for whom SI remains a leading determinant (mean(|SHAP|) = 0.364). Consistent with prior evidence [33,34], endorsements from healthcare professionals and trusted peers may act as powerful cues that accelerate vaccination decisions. Accordingly, empowering physicians and credible community leaders as visible ambassadors—supported by clear, consistent messaging—may be especially effective for reaching hesitant groups.

Although FV shows a moderate negative association with IU in PLS-SEM (β = −0.103), SHAP-based importance indicates that it plays a non-trivial role in prediction (mean(|SHAP|) = 0.093 in the full sample) and should not be treated as secondary. In addition, the IPM suggests comparatively low performance, implying substantial scope for improvement. From a strategic standpoint, this combination is critical: reducing exaggerated concerns about side effects—through transparent risk communication, trustworthy testimonials, and accessible empirical evidence—could translate into meaningful gains in vaccination intention, particularly when aimed at the population segments where hesitancy is concentrated.

These reflections regarding FV should be interpreted with caution. The survey was fielded shortly after public news about the temporary suspension of the AstraZeneca trial, which may have heightened FV at that specific time point. As a result, the absolute level of FV—and potentially its relationship with intention—may be partly context-dependent, which could limit the generalizability of fear-related effects to later stages of the vaccination campaign, when safety information and public narratives evolved. Although FV is not among the top predictors relative to EF and SI, its SHAP-based importance is non-negligible and, combined with its comparatively low performance in the IPM, it remains a relevant target for communication strategies aimed at reducing safety concerns. From an importance–performance perspective, a context-driven increase in FV would mainly affect its performance level (i.e., its mean score) rather than its predictive importance, thereby likely moving FV further away from the “priority” zone and into a clearer low-priority area, rather than turning it into a central strategic lever.

In contrast, fear of COVID-19 (FC) played a marginal role in the model. It shows a small positive effect (β = 0.078) and a low mean|SHAP|(0.065), and is positioned in the overuse zone of the IPM. That is, the perceived threat of the virus was already high in the analyzed sample; therefore, further increasing fear is unlikely to significantly impact vaccine acceptance. This finding calls for caution regarding alarmist messaging as a persuasion strategy, and underscores the need for approaches based on vaccine effectiveness and safety.

The analysis of SHAP interaction shown in Figure 7 indicates that SI, which is closely linked with information and education campaigns about vaccination, does not operate in isolation; it acts as a mechanism that shapes core perceptions in the acceptance process. In particular, SI can simultaneously reinforce EF and attenuate FV. When efficacy messages come from socially legitimized sources—such as primary-care physicians, pharmacists, community leaders, or trusted peers—credibility increases and uncertainty decreases, making it easier for individuals to translate clinical-trial evidence into stronger EF perceptions (e.g., clear clinician-led explanations of what “efficacy” means, recognizable spokespersons, or personalized reminders from health centers). Likewise, SI can reduce FV by normalizing vaccination and providing social reassurance about side effects through endorsements and testimonials, accessible real-world narratives, and visible monitoring and reporting protocols communicated by health authorities and clinicians. Overall, SI-oriented policies can trigger two connected pathways—boosting EF while dampening FV—suggesting that the most effective strategies integrate normative messaging with concise information and reassurance.

We did not include several socio-economic and technological variables that may shape vaccine attitudes, such as education, income, or internet access/social media use. Although these factors may be partially reflected in the psychosocial constructs captured by the CAN framework (e.g., through information exposure and normative pressures), future research should explicitly incorporate them as controls to assess whether they alter model estimates and the relative importance patterns derived from SHAP. Nevertheless, prior evidence suggests that, at least in comparable Spanish samples, these socio-demographic and digital-access indicators tend to exhibit limited explanatory power relative to attitudinal and belief-based constructs [26].

4.3. Analytical Implications of the Paper

From a methodological standpoint, our results illustrate the value of combining ML techniques with theory-based frameworks to study complex and multidimensional phenomena such as vaccine acceptance. In our setting, ensemble learners such as RF and XGBoost yielded stronger out-of-sample predictive performance than DTR. At the same time, the joint use of decision tree regression (DTR) and partial least squares structural equation modelling (PLS-SEM) remains valuable: PLS-SEM summarizes theory-grounded structural relationships among constructs, whereas DTR provides an intuitive, rule-based representation that helps communicate the main decision pathways in a transparent manner. Likewise, SHAP measures offer a transparent, model-agnostic summary of the relative predictive contribution of each determinant, providing an interpretable complement to coefficient-based results that is particularly valuable in public decision-making contexts.

Regarding predictive power, Monte Carlo cross-validation supports the full-sample results. RF emerged as the best-performing technique across the main predictive metrics (Q², RMSE), whereas DTR showed the weakest performance. This pattern suggests that, although simple tree-based techniques can be useful for interpretation, their generalization capability may be more limited. In contrast, ensemble-based methodologies such as RF and XGBoost tend to provide greater robustness, albeit with less direct interpretability. Notably, the slightly better—though not statistically significant—within-sample and out-of-sample performance of RF relative to XGBoost indicates that the most “complex” or computationally intensive ensemble is not necessarily the best choice in every setting. With a sample size of 600, which is common in academic survey research, RF offers an attractive balance between explanatory fit, predictive accuracy, and computational efficiency compared with more resource-demanding boosting approaches. Therefore, when selecting ensemble methods for similar applications, it is advisable to evaluate multiple alternatives rather than assuming that the most sophisticated algorithm will consistently dominate.

Integrating construct importance measured with SHAP into an IPM enables strategic recommendations to encourage vaccine acceptance in pandemic contexts. In the case analyzed here, while improving perceived efficacy would require substantial effort—given its already above-average performance and the experimental nature of the vaccine—its high importance justifies intervention, since even small improvements in performance may translate into sizeable gains in intention to use. Moreover, because SHAP is inherently a local measure of importance, mean absolute SHAP values and the resulting IPM can be computed for specific subgroups of interest rather than only for the full sample. In our study, we leveraged this property to focus the SHAP-based IPM on individuals with the lowest vaccination acceptance, who constitute the most relevant target for public health communication and outreach.

This study makes a dual contribution to the literature. First, it provides empirical evidence of the determinants of vaccine acceptance in the context of uncertainty and rapid innovation, such as that generated by the COVID-19 pandemic. Second, it proposes a replicable methodological framework that combines theory, explanatory, and predictive ML tools, and strategic interpretation through IPMs. This type of analysis offers health authorities a visual and quantitative tool for prioritizing interventions and efficiently allocating resources. This approach can be applied not only to future vaccination campaigns but also to other contexts requiring an understanding of the public acceptance of new healthcare technologies.

5. Conclusions

This study proposes a structured and replicable analytical approach to examine the determinants of vaccine acceptance under conditions of uncertainty. Specifically, it combines the CAN model with ensemble tree-based methods (decision tree regression (DTR), Random Forest (RF) and XGboost and importance–performance maps (IPMs). This integration is intended to complement theory-driven modelling with a predictive and explainable layer that supports prioritization of intervention targets.

From a methodological perspective, RF and XGBoost were used to estimate vaccination intention and to derive interpretable summaries of predictor relevance. These tree-based ensemble methods are well suited to capturing nonlinear patterns and interaction effects that are difficult to capture with purely linear specifications. By contrast, decision tree regression (DTR) provides transparent, rule-based representations that facilitate communication of the main decision pathways, offering a useful complement to results obtained from conventional regression modelling. In addition, integrating Shapley Additive Explanations (SHAP) with IPMs provides a practical way to jointly consider (i) the relative predictive contribution of each construct and (ii) its current performance level, thereby supporting the identification of potential priorities for communication and outreach strategies. Because SHAP is inherently a local measure of importance, this approach can be applied both to the full sample and to specific subgroups of interest; in the present study, we focus on individuals exhibiting the highest levels of vaccine reluctance.

Overall, the proposed framework can be transferred to the study of other vaccines or emerging health technologies where acceptance is uncertain and public communication is critical. However, the implications should be interpreted with caution. The results are based on a cross-sectional survey collected in Spain at a specific time point (September 2020), when information about the AstraZeneca vaccine was still evolving, which may limit generalizability to later phases of the vaccination campaign or to other contexts. In addition, the analysis focuses on a restricted set of CAN constructs and does not incorporate several socio-economic or technological covariates that may shape vaccine attitudes. Finally, model-based explainability results (including SHAP summaries and IPMs) are conditional on the chosen algorithms and validation design; future work using additional predictors and external datasets would be valuable to further assess robustness and generalizability.

Author Contributions

Conceptualization, M.S.-R. and M.A.-O.; methodology, M.S.-R. and J.d.A.-S.; software, J.d.A.-S.; validation, M.S.-R. and M.A.-O.; formal analysis, J.d.A.-S. and M.A.-O.; investigation, M.S.-R. and M.A.-O.; resources, M.A.-O.; data curation, M.S.-R. and M.A.-O.; writing—original draft preparation, M.S.-R., J.d.A.-S. and M.A.-O.; writing—review and editing, M.S.-R. and J.d.A.-S.; visualization, J.d.A.-S.; supervision, M.S.-R. and M.A.-O.; project administration, M.A.-O.; funding acquisition, M.A.-O. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by Telefonica and the Telefonica Chair on Smart Cities of the Universitat Rovira i Virgili and Universitat de Barcelona (project number 42.DB.00.18.00).

Institutional Review Board Statement

(1) All participants received detailed written information about the study and procedure. (2) No data directly or indirectly related to the health of the subjects were collected. Therefore, the Declaration of Helsinki was not mentioned when informing the participants. (3) Anonymity of the collected data was ensured at all times. (4) No ethical approval from a board or committee was obtained, as it was not required according to applicable institutional and national guidelines and regulations. (5) Participants were informed that their responses would be used in the study. If consent was not obtained, the questionnaire was not completed. (6) The study was approved by the Ethical Committee of Rovira i Virgili University (CEIPSA-2021-PR-0042).

Informed Consent Statement

All participants provided their informed consent.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ML	Machine Learning
PLS-SEM	Partial least squares–structural equation modelling
DTR	Decision tree regression
RF	Random forest
XGBoost	Extreme Gradient Boosting
SHAP	Shapley Additive Explanations
Mean(\|SHAP\|)	Mean absolute SHAP
CAN	Cognitive–Affective–Normative model
IU	Intention to use vaccine
EF	Efficacy (of the vaccine)
FC	Fear of COVID-19
FV	Fear of vaccine
SI	Social influence
IPM	Importance–Performance Map

References

Schuchat, A. Human vaccines and their importance to public health. Procedia Vaccinol. 2011, 5, 120–126. [Google Scholar] [CrossRef]
Shattock, A.J.; Johnson, H.C.; Sim, S.Y.; Carter, A.; Lambach, P.; Hutubessy, R.C.; Thompson, K.M.; Badizadegan, K.; Lambert, B.; Ferrari, M.J.; et al. Contribution of vaccination to improved survival and health: Modelling 50 years of the Expanded Programme on Immunization. Lancet 2024, 403, 2307–2316. [Google Scholar] [CrossRef]
Alsharif, A.M.; Mohammed Alotaibi, A.T.; Albidah, A.S.; Al-Jarah, A.S.H.; Alotaibi, G.N.M.; Almahbub, W.A.; Haij, A.E.M.; Yalmunimi, A.S.M.; Alqahtani, A.S.H.; Kurdy, S.S.; et al. The Impact of Vaccination Programs on Public Health: A Systematic Review. Migr. Lett. 2022, 19, 764–772. Available online: https://migrationletters.com/index.php/ml/article/view/10061 (accessed on 19 January 2026).
Sah, P.; Vilches, T.N.; Pandey, A.; Schneider, E.C.; Moghadas, S.M.; Galvani, A.P. Estimating the impact of vaccination on reducing COVID-19 burden in the United States: December 2020 to March 2022. J. Glob. Health 2022, 12, 03062. [Google Scholar] [CrossRef]
Deb, P.; Furceri, D.; Jimenez, D.; Kothari, S.; Ostry, J.D.; Tawk, N. The effects of COVID-19 vaccines on economic activity. Swiss J. Econ. Stat. 2022, 158, 3. [Google Scholar] [CrossRef]
Bagues, M.; Dimitrova, V. The psychological gains from COVID-19 vaccination. J. Public Econ. 2025, 242, 105304. [Google Scholar] [CrossRef]
Rizzo, C.; Rezza, G.; Ricciardi, W. Strategies in recommending influenza vaccination in europe and us. Hum. Vaccines Immunother. 2018, 14, 693–698. [Google Scholar] [CrossRef]
Reemers, S.S.; Bommel, S.V.; Cao, Q.; Sutton, D.; Zande, S.V.D. Protection against the new equine influenza virus florida clade i outbreak strain provided by a whole inactivated virus vaccine. Vaccines 2020, 8, 784. [Google Scholar] [CrossRef] [PubMed]
McCullough, J.; Robins, M. The Opportunity Cost of COVID for Public Health Practice: COVID-19 Pandemic Response Work and Lost Foundational Areas of Public Health Work. J. Public Health Manag. Pract. 2023, 29, S64–S72. [Google Scholar] [CrossRef] [PubMed]
Phori, P.M.; Fawcett, S.; Nidjergou, N.N.; Silouakadila, C.; Hassaballa, R.; Siku, D.K. Participatory Monitoring and Evaluation of the COVID-19 Response in the Africa Region. Health Promot. Pract. 2023, 24, 432–443. [Google Scholar] [CrossRef] [PubMed]
Chaudhuri, K.; Chakrabarti, A.; Chandan, J.; Bandyopadhyay, S. COVID-19 vaccine hesitancy in the UK: A longitudinal household cross-sectional study. BMC Public Health 2022, 22, 104. [Google Scholar] [CrossRef] [PubMed]
Krammer, F. The role of vaccines in the COVID-19 pandemic: What have we learned? Semin. Immunopathol. 2024, 45, 451–468. [Google Scholar] [CrossRef] [PubMed]
Bar-Lev, S.; Reichman, S.; Barnett-Itzhaki, Z. Prediction of vaccine hesitancy based on social media traffic among Israeli parents using machine learning strategies. Isr. J. Health Policy Res. 2021, 10, 59. [Google Scholar] [CrossRef]
Sarasty, O.; Carpio, C.E.; Hudson, D.; Guerrero-Ochoa, P.A.; Borja, I. The demand for a COVID-19 vaccine in Ecuador. Vaccine 2020, 38, 8090–8098. [Google Scholar] [CrossRef]
Liu, X.; Huang, D.; Yao, J.; Dong, J.; Song, L.; Wang, H.; Yao, C.; Chu, W. From Black Box to Glass Box: A Practical Review of Explainable Artificial Intelligence (XAI). AI 2025, 6, 285. [Google Scholar] [CrossRef]
Krishanthi, G.; Jayetileke, H.; Wu, J.; Liu, C.; Wang, Y.-G. Enhancing feature selection optimization for COVID-19 microarray data. COVID 2023, 3, 1336–1355. [Google Scholar] [CrossRef]
Awad, M.M. Evaluation of COVID-19 reported statistical data using cooperative convolutional neural network model (CCNN). COVID 2022, 2, 674–690. [Google Scholar] [CrossRef]
Perez-Sanchez, A.V.; Valtierra-Rodriguez, M.; De-Santiago-Perez, J.J.; Perez-Ramirez, C.A.; Garcia-Perez, A.; Amezquita-Sanchez, J.P. Artificial Intelligence-Based Epileptic Seizure Prediction Strategies: A Review. AI 2025, 6, 274. [Google Scholar] [CrossRef]
Bodapati, P.; Zhang, E.; Padmanabhan, S.; Das, A.; Bhattacharya, M.; Jahanikia, S.A. A global network analysis of COVID-19 vaccine distribution to predict breakthrough cases among the vaccinated population. COVID 2024, 4, 1546–1560. [Google Scholar] [CrossRef]
Bughin, J.; Cincera, M. How Institutional Actions Before Vaccine Affect Time Vaccination Intention Later: Prediction via Machine Learning. J. Ind. Integr. Manag. 2023, 8, 277–292. [Google Scholar] [CrossRef]
Kim, M.; Kim, Y.J.; Park, S.J.; Kim, K.G.; Oh, P.C.; Kim, Y.S.; Kim, E.Y. Machine learning models to identify low adherence to influenza vaccination among Korean adults with cardiovascular disease. BMC Cardiovasc. Disord. 2021, 21, 129. [Google Scholar] [CrossRef]
Bronstein, M.V.; Kummerfeld, E.; MacDonald, A., III; Vinogradov, S. Identifying psychological predictors of SARS-CoV-2 vaccination: A machine learning study. Vaccine 2024, 42, 126198. [Google Scholar] [CrossRef]
Ajzen, I. The theory of planned behavior. Organ. Behav. Hum. Decis. Process. 1991, 50, 179–211. [Google Scholar] [CrossRef]
Reinares-Lara, E.; Olarte-Pascual, C.; Pelegrín-Borondo, J. Do you want to be a cyborg? The moderating effect of ethics on neural implant acceptance. Comput. Hum. Behav. 2018, 85, 43–53. [Google Scholar] [CrossRef]
Pelegrín-Borondo, J.; Arias-Oliva, M.; Almahameed, A.A.; Román, M.P. COVID-19 Vaccines: A Model of Acceptance Behavior in the Healthcare Sector. Eur. Res. Manag. Bus. Econ. 2021, 27, 100171. [Google Scholar] [CrossRef]
Andrés-Sánchez, J.; Arias-Oliva, M.; Pelegrín-Borondo, J. Assessing the Intention to Use a First-Generation Vaccine against COVID-19 Using Quantile Regression: A Cross-Sectional Study in Spain. COVID 2024, 4, 1211–1226. [Google Scholar] [CrossRef]
Dubé, E.; Gagnon, D.; Ouakki, M.; Bettinger, J.A.; Witteman, H.O.; MacDonald, S.; Fisher, W.; Saini, V.; Greyson, D. Measuring vaccine acceptance among Canadian parents: A survey of the Canadian Immunization Research Network. Vaccine 2018, 36, 545–552. [Google Scholar] [CrossRef] [PubMed]
Ali, Z.; Perera, S.M.; Garbern, S.C.; Abou Diwan, E.; Othman, A.; Germano, E.R.; Ali, J.; Awada, N. Vaccine Hesitancy Toward COVID-19 Vaccines Among Humanitarian Healthcare Workers in Lebanon, 2021. COVID 2024, 4, 2017–2029. [Google Scholar] [CrossRef]
McPhedran, R.; Toombs, B. Efficacy or delivery? An online Discrete Choice Experiment to explore preferences for COVID-19 vaccines in the UK. Econ. Lett. 2021, 200, 109747. [Google Scholar] [CrossRef]
Schwarzinger, M.; Watson, V.; Arwidson, P.; Alla, F.; Luchini, S. COVID-19 vaccine hesitancy in a representative working-age population in France: A survey experiment based on vaccine characteristics. Lancet Public Health 2021, 6, e210–e221. [Google Scholar] [CrossRef]
Wong, M.C.; Wong, E.L.; Huang, J.; Cheung, A.W.; Law, K.; Chong, M.K.; Ng, R.W.; Lai, C.K.; Boon, S.S.; Lau, J.T.; et al. Acceptance of the COVID-19 vaccine based on the health belief model: A population-based survey in Hong Kong. Vaccine 2021, 39, 1148–1156. [Google Scholar] [CrossRef]
Eguia, H.; Vinciarelli, F.; Bosque-Prous, M.; Kristensen, T.; Saigí-Rubió, F. Spain’s Hesitation at the Gates of a COVID-19 Vaccine. Vaccines 2021, 9, 170. [Google Scholar] [CrossRef]
Pilli, L.; Veldwijk, J.; Swait, J.D.; Donkers, B.; de Bekker-Grob, E.W. Sources and processes of social influence on health-related choices: A systematic review based on a social-interdependent choice paradigm. Soc. Sci. Med. 2024, 361, 117360. [Google Scholar] [CrossRef]
Almohaithef, M.A.; Padhi, B.K. Determinants of COVID-19 vaccine acceptance in Saudi Arabia: A web-based national survey. J. Multidiscip. Health 2020, 13, 1657–1663. [Google Scholar] [CrossRef]
Mir, H.H.; Parveen, S.; Mullick, N.H.; Nabi, S. Using structural equation modelling to predict Indian people’s attitudes and intentions towards COVID-19 vaccination. Diabetes Metab. Syndr. Clin. Res. Rev. 2021, 15, 1017–1022. [Google Scholar] [CrossRef]
Karimi, S.M.; Moghadami, M.; Parh, M.Y.A.; Shakib, S.H.; Zarei, H.; Aranha, V.; Poursafargholi, S.; Allen, T.; Little, B.B.; Antimisiaris, D.; et al. COVID-19 Vaccine Uptake Inequality Among Adults: A Multidimensional Demographic Analysis. COVID 2025, 5, 75. [Google Scholar] [CrossRef]
Kerr, J.R.; Schneider, C.R.; Recchia, G.; Dryhurst, S.; Sahlin, U.; Dufouil, C.; Arwidson, P.; Freeman, A.L.; Van Der Linden, S. Correlates of intended COVID-19 vaccine acceptance across time and countries: Results from a series of cross-sectional surveys. BMJ Open 2021, 11, e048025. [Google Scholar] [CrossRef]
Centers for Disease Control and Prevention. Risk for COVID-19 Infection, Hospitalization, and Death by Age Group; Centers for Disease Control and Prevention: Atlanta, GA, USA, 2021. Available online: https://archive.cdc.gov/#/details?url=https://www.cdc.gov/coronavirus/2019-ncov/covid-data/investigations-discovery/hospitalization-death-by-age.html (accessed on 19 January 2026).
British Broadcasting Corporation. Coronavirus: Oxford University Vaccine Trial Paused After Participant Falls Ill; British Broadcasting Corporation: London, UK, 2021; Available online: https://www.bbc.com/news/world-54082192 (accessed on 19 January 2026).
European Medicines Agency. Vaxzevria (Previously COVID-19 Vaccine AstraZeneca); European Medicines Agency: Amsterdam, The Netherlands, 2025; Available online: https://www.ema.europa.eu/en/medicines/human/EPAR/vaxzevria (accessed on 19 January 2026).
Spanish Ministry of Health. España Recibe las Primeras 196.800 Dosis de la Vacuna de AstraZeneca y la Universidad de Oxford Contra la COVID-19; Spanish Ministry of Health: Madrid, Spain, 2021. Available online: https://www.sanidad.gob.es/gabinete/notasPrensa.do?id=5218 (accessed on 19 January 2026).
DiStefano, C.; Zhu, M.; Mindrilă, D. Understanding and Using Factor Scores: Considerations for the Applied Researcher. Pract. Assess. Res. Eval. 2009, 14, 1–11. [Google Scholar] [CrossRef]
Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. Classification and Regression Trees; Wadsworth: Belmont, CA, USA, 1984. [Google Scholar]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGboost: A scalable tree boosting system. In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
Ringle, C.M.; Sarstedt, M. Gain more insight from your PLS–SEM results: The importance–performance map analysis. Ind. Manag. Data Syst. 2016, 116, 1865–1886. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4765–4774. [Google Scholar]
Lundberg, S.M.; Erion, G.G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.-I. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2020, 2, 2522–5839. [Google Scholar] [CrossRef]
Abalo, J.; Varela, J.; Manzano, V. Importance values for Importance–Performance Analysis: A formula for spreading out values derived from preference rankings. J. Bus. Res. 2007, 60, 115–121. [Google Scholar] [CrossRef]
Hair, J.F.; Risher, J.J.; Sarstedt, M.; Ringle, C.M. When to use and how to report the results of PLS-SEM. Eur. Bus. Rev. 2019, 31, 2–24. [Google Scholar] [CrossRef]

Figure 1. Conceptual model used in this paper.

Figure 2. Interpretation of Importance–Performance Map used in this study.

Figure 3. Chart flow of the methodology used in this paper.

Figure 4. Results of decision tree regression adjustment.

Figure 5. Beeswarm plot of SHAP values for the overall sample.

Figure 6. Global and low-acceptance importance of explanatory variables based on mean(|SHAP|).

Figure 7. SHAP interaction dependence plots of SI with the other CAN constructs (EF, FC, and FV).

Figure 8. Importance–Performance Map (the whole sample).

Figure 9. Importance–Performance Map (low intention to use vaccines).

Table 1. Sample profile.

Variable	Category	Proportion
Population	Residents in Spain
Gender	Male	45%
	Female	55%
Age group	Between 17–30 years	33%
	Between 31–50 years	33%
	Between 51 years and older	34%
Monthly income	Less than €1000	6.3%
	€1000–€1749	22.3%
	€1750–€2499	20.0%
	€2500–€3000	13.0%
	More than €3000	24.8%
	Not available (NA)	13.5%

Table 2. Items used in this paper and their descriptive statistics.

Items	Mean	SD	Factor Loading	CA	AVE
Intention to use (IU)				0.95	0.95
IU1. I will try to get the AstraZeneca vaccine.	5.07	3.48	0.98
IU2. I predict that I will use the AstraZeneca vaccine	4.96	3.40	0.97
Perceived efficacy (EF)				0.93	0.84
EF1. I believe in the effectiveness of the AstraZeneca vaccine.	4.93	2.81	0.93
EF2. The AstraZeneca vaccine will help protect me from contracting COVID-19.	5.31	2.80	0.95
EF3. Getting the AstraZeneca vaccine will lower my risk of being infected with COVID-19.	5.95	2.97	0.94
EF4. The AstraZeneca vaccine will reduce or eliminate the need for additional COVID-19 treatments	4.89	2.94	0.84
Fear of COVID (FC)				0.73	0.79
FC1. I fear becoming infected with COVID-19	6.60	2.72	0.90
FC2. I worry about passing COVID-19 on to others	7.86	2.74	0.88
Fear of vaccine (FV)				0.92	0.93
FV1. I’m concerned about the short-term side effects of the AstraZeneca vaccine	6.75	3.02	0.97
FV2. I’m worried about the long-term consequences of the AstraZeneca vaccine	7.23	3.04	0.96
Social influence (SI)				0.97	0.95
SI1. People who matter to me believe I should get the AstraZeneca vaccine	4.85	2.95	0.96
SI2. People who influence me think I should get vaccinated with AstraZeneca	4.64	2.95	0.98
SI3. People whose opinions I respect think I should receive the AstraZeneca vaccine	4.68	3.03	0.98

Note: SD = standard deviation; CA = Cronbach’s alpha; AVE = average variance extracted.

Table 3. Matrix for evaluating discriminant validity.

	IU	EF	FC	FV	SI	Sex	Age
IU	0.975
EF	0.818	0.914
FC	0.380	0.405	0.889
FV	−0.326	−0.267	0.154	0.962
SI	0.755	0.688	0.315	−0.288	0.973
Sex	−0.058	−0.055	0.160	0.170	−0.091	1.000
Age	0.033	0.019	−0.045	−0.009	0.043	−0.152	1.000

Note: The main diagonal is the square root of the Average Variance Extracted (AVE). Below the diagonal are Pearson correlations.

Table 4. Cross-loadings of the CAN measurement items (IU, EF, FC, FV, and SI).

Item	IU	EF	FC	FV	SI
IU1	0.976	0.807	0.369	−0.346	0.743
IU2	0.974	0.788	0.372	−0.289	0.729
EF1	0.798	0.925	0.397	−0.3	0.662
EF2	0.792	0.952	0.385	−0.238	0.663
EF3	0.757	0.935	0.387	−0.249	0.629
EF4	0.627	0.84	0.302	−0.179	0.552
FC1	0.354	0.357	0.901	0.178	0.266
FC2	0.32	0.363	0.876	0.091	0.296
FV1	−0.333	−0.288	0.138	0.967	−0.288
FV2	−0.292	−0.222	0.159	0.957	−0.263
SI1	0.717	0.652	0.301	−0.28	0.963
SI2	0.731	0.666	0.301	−0.296	0.979
SI3	0.754	0.689	0.316	−0.263	0.975

Table 5. Fine-tuning results (5-fold cross-validation) and optimal hyperparameter configurations for the machine-learning models (training sample, N = 480).

Machine Learning Method (Best Hyperparameter Configuration)	R²	RMSE
DTR (maxdepth = 4; minsplit = 10; cp = 0.0005)	0.6662 ± 0.0626	0.5688 ± 0.0524
RF (mtry = 3; ntree = 500; nodesize = 7)	0.7450 ± 0.0715	0.4949 ± 0.0641
XGBoost (nrounds = 600; max_depth = 6; eta = 0.1; gamma = 0; colsample_bytree = 1.0; min_child_weight = 1; subsample = 0.7)	0.7024 ± 0.0775	0.5392 ± 0.0701

Note: The metrics of R² and RMSE are provided as mean ± standard deviation.

Table 6. Predictive validation results on the external test set (n = 120; B = 5000).

Machine Learning Method (Best Hyperparameter Configuration)	Q²	RMSE
DTR (maxdepth = 4; minsplit = 10; cp = 0.0005)	0.5278 ± 0.0931	0.7104 ± 0.0679
RF (mtry = 3; ntree = 500; nodesize = 7)	0.7164 ± 0.0519	0.5503 ± 0.0466
XGBoost (nrounds = 600; max_depth = 6; eta = 0.1; gamma = 0; colsample_bytree = 1.0; min_child_weight = 1; subsample = 0.7)	0.6980 ± 0.0637	0.5667 ± 0.0559

Note: The metrics of R² and RMSE are provided as mean ± standard deviation.

Table 7. Pairwise differences with 95% bootstrap confidence intervals on the external test set (n = 120; B = 5000).

Comparison (A − B)	ΔQ² (Mean)	95% CI (ΔR²)	ΔRMSE (Mean)	95% CI (ΔRMSE)
RF − DTR	0.1881	[0.0772, 0.3283]	−0.1589	[−0.2570, −0.0723]
XGBoost − DTR	0.1692	[0.0497, 0.3113]	−0.1416	[−0.2457, −0.0441]
RF − XGBoost	0.0188	[−0.0223, 0.0637]	−0.0173	[−0.0558, 0.0233]

Table 8. Results of PLS-SEM fit.

Variable	β	SD	t-Ratio	p-Value
EF	0.502	0.030	16.540	<0.001
FC	0.078	0.024	3.188	0.002
FV	−0.103	0.023	−4.456	<0.001
SI	0.360	0.029	12.360	<0.001
SEX	0.028	0.044	0.633	0.527
AGE	0.015	0.044	0.348	0.728

Note: β = path coefficient, SD = standard deviation.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

de Andrés-Sánchez, J.; Souto-Romero, M.; Arias-Oliva, M. Integrating Machine-Learning Methods with Importance–Performance Maps to Evaluate Drivers for the Acceptance of New Vaccines: Application to AstraZeneca COVID-19 Vaccine. AI 2026, 7, 34. https://doi.org/10.3390/ai7010034

AMA Style

de Andrés-Sánchez J, Souto-Romero M, Arias-Oliva M. Integrating Machine-Learning Methods with Importance–Performance Maps to Evaluate Drivers for the Acceptance of New Vaccines: Application to AstraZeneca COVID-19 Vaccine. AI. 2026; 7(1):34. https://doi.org/10.3390/ai7010034

Chicago/Turabian Style

de Andrés-Sánchez, Jorge, Mar Souto-Romero, and Mario Arias-Oliva. 2026. "Integrating Machine-Learning Methods with Importance–Performance Maps to Evaluate Drivers for the Acceptance of New Vaccines: Application to AstraZeneca COVID-19 Vaccine" AI 7, no. 1: 34. https://doi.org/10.3390/ai7010034

APA Style

de Andrés-Sánchez, J., Souto-Romero, M., & Arias-Oliva, M. (2026). Integrating Machine-Learning Methods with Importance–Performance Maps to Evaluate Drivers for the Acceptance of New Vaccines: Application to AstraZeneca COVID-19 Vaccine. AI, 7(1), 34. https://doi.org/10.3390/ai7010034

Article Menu

Integrating Machine-Learning Methods with Importance–Performance Maps to Evaluate Drivers for the Acceptance of New Vaccines: Application to AstraZeneca COVID-19 Vaccine

Abstract

1. Introduction

2. Materials and Methods

2.1. Theoretical Groundwork

2.2. Sample and Sampling

2.3. Measurement of Variables

2.4. Analytical Methodology

2.4.1. Assessment of Research Objective 1

2.4.2. Assessment of Research Objective 2

3. Results

3.1. Measurement Model Assessment

3.2. Machine Learning Method Fine-Tuning and Predictive Validation Results

3.3. Visualizing Explanatory Drivers of Vaccine Acceptance: PLS-SEM and Decision-Tree Regression Insights

3.4. Using SHAP and Importance–Performance Maps for the Evaluation of Vaccination Policies

3.4.1. SHAP-Based Importance Analysis of Vaccine Acceptance Determinants

3.4.2. Importance–Performance Map Analysis of CAN Constructs

4. Discussion

4.1. General Considerations

4.2. Practical Implications

4.3. Analytical Implications of the Paper

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI