You are currently viewing a new version of our website. To view the old version click .
Diagnostics
  • Systematic Review
  • Open Access

24 December 2025

Radiomic-Based Machine Learning Classifiers for HPV Status Prediction in Oropharyngeal Cancer: A Systematic Review and Meta-Analysis †

,
,
,
,
,
,
,
,
and
1
Head and Neck Surgery Department and LIM 28, University of São Paulo Medical School, São Paulo 05508-020, Brazil
2
Hospital Israelita Albert Einstein, São Paulo 05652-900, Brazil
3
Department of Head and Neck Surgery and Otorhinolaryngology, A.C. Camargo Cancer Center, São Paulo 01509-010, Brazil
4
Faculdade de Odontologia de Piracicaba, Universidade Estadual de Campinas (FOP-UNICAMP), Piracicaba 13414-903, Brazil
This article belongs to the Special Issue Clinical Diagnosis of Otorhinolaryngology

Abstract

Background: The aim of the present systematic review (SR) is to compile evidence regarding the use of radiomic-based machine learning (ML) models for predicting human papillomavirus (HPV) status in oropharyngeal squamous cell carcinoma (OPSCC) patients and to assess their reliability, methodological frameworks, and clinical applicability. The SR was conducted following PRISMA 2022 guidelines and registered in PROSPERO (CRD42025640065). Methods: Using the PICOS framework, the review question was defined as follows: “Can radiomic-based ML models accurately predict HPV status in OPSCC?” Electronic databases (Cochrane, Embase, IEEE Xplore, BVS, PubMed, Scopus, Web of Science) and gray literature (arXiv, Google Scholar and ProQuest) were searched. Retrospective cohort studies assessing radiomics for HPV prediction were included. Risk of bias (RoB) was evaluated using Prediction model Risk Of Bias ASsessment Tool (PROBAST), and data were synthesized based on imaging modality, architecture type/learning modalities, and the presence of external validation. Meta-analysis was performed for externally validated models using MetaBayesDTA and RStudio. Results: Twenty-four studies including 8627 patients were analyzed. Imaging modalities included computed tomography (CT), magnetic resonance imaging (MRI), contrast-enhanced computed tomography (CE-CT), and 18F-fluorodeoxyglucose positron emission tomography (18F-FDG PET). Logistic regression, random forest, eXtreme Gradient Boosting (XGBoost), and convolutional neural networks (CNNs) were commonly used. Most datasets were imbalanced with a predominance of HPV+ cases. Only eight studies reported external validation results. AUROC values ranged between 0.59 and 0.87 in the internal validation and between 0.48 and 0.91 in the external validation results. RoB was high in most studies, mainly due to reliance on p16-only HPV testing, insufficient events, or inadequate handling of class imbalance. Deep Learning (DL) models achieved moderate performance with considerable heterogeneity (sensitivity: 0.61; specificity: 0.65). In contrast, traditional models provided higher, more consistent performance (sensitivity: 0.72; specificity: 0.77). Conclusions: Radiomic-based ML models show potential for HPV status prediction in OPSCC, but methodological heterogeneity and a high RoB limit current clinical applicability.

1. Introduction

The association between high-risk human papillomavirus (HPV) infection and oropharyngeal squamous cell carcinomas (OPSCC) has been consistently demonstrated. Recognition of HPV-positive OPSCC as a distinct pathological entity has led to its classification with unique staging rules in the UICC/AJCC 8th edition Tumor–Node–Metastasis (TNM) system, reflecting its significantly more favorable prognosis compared to HPV-negative carcinomas [1,2]. Affected patients are usually younger, healthier, non-smoking individuals, with mortality reported to be significantly lower [3]. Given the superior survival outcomes, significant clinical research efforts are now focused on de-intensifying therapy (e.g., reduced radiation dose, transoral surgery alone) for HPV-positive OPSCC patients. The primary goal of these de-escalation trials is to maintain high cure rates while minimizing treatment-related morbidity. Consequently, accurate and accessible determination of HPV status is not merely a diagnostic exercise but a critical step towards delivering personalized care [4,5,6].
It is well established that HPV status should be confirmed using at least two complementary methods (e.g., p16 immunohistochemistry combined with HPV-specific testing), as relying solely on HPV DNA genotyping or p16 expression may lead to misclassification and inflated prevalence estimates. However, these methods may not be readily available in all clinical settings, such as smaller community hospitals, resource-limited laboratories, or centers without specialized molecular pathology facilities [7]. In these cases, p16 immunohistochemistry may be sufficient as a standalone test for HPV infection status in tumors with appropriate morphology [8]. In contrast, medical imaging modalities such as sonography, computed tomography (CT), magnetic resonance imaging (MRI), and positron emission tomography CT (PET/CT) are routinely used for staging and treatment planning. This accessibility has driven the growth of radiomics, which involves the high-throughput extraction of quantitative features from medical images to characterize tumor shape, texture, and intensity patterns. Machine learning (ML) algorithms can then analyze these data to uncover complex relationships and build predictive models for diagnosis and prognosis. Radiomics aims to link the underlying tumor biology with its radiographic phenotype. Integrating radiomic features with modern ML techniques, including advanced artificial intelligence (AI) models such as convolutional neural networks (CNNs), has the potential to strengthen the role of imaging biomarkers in translational research, improving patient stratification and risk assessment [9,10].
However, the path to clinical application faces challenges. There is no consensus regarding the impact of equipment and image acquisition techniques on the stability of radiomic features [11]. Furthermore, while some prior studies have shown limitations in segmentation strategies and a lack of robust results [12,13,14,15], more recent studies have demonstrated great performance metrics, with area under the receiver operating characteristic curve (AUROCs) reaching over 90% in some cases [11,16,17]. A reliable radiomic biomarker could therefore have significant clinical utility, enabling retrospective HPV analyses when tissue samples are unavailable, assisting in cases of cancers of unknown primary (CUP), or serving as a screening tool for at-risk individuals in regions without routine HPV testing [18]. More importantly, by providing a potentially standardized and accessible means of stratification, radiomics could support the broader clinical adoption of personalized treatment protocols (i.e., de-intensification, standard or intensified therapy) [4,5,6]. Additionally, HPV testing on cytology specimens from cervical lymph node metastasis, when combined with radiomic analysis, may enhance the accuracy of HPV status determination and support more precise patient stratification [19].
The present systematic review (SR) aims to retrieve and summarize good-quality evidence regarding the use of radiomic signatures extracted from medical images for HPV status prediction in OPSCC patients. Specifically, this review aims to evaluate the diagnostic performance of these radiomic-based ML models and to analyze the methodological frameworks adopted across different imaging modalities, with the goal of assessing their current feasibility and reliability for clinical deployment. Therefore, this SR and meta-analysis aims to assess the diagnostic accuracy and methodological robustness of radiomic-based ML models for HPV status prediction in OPSCC.

2. Methods

2.1. Protocol and Registration

The present SR was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2022 guidelines [20,21] and the PRISMA-P checklist [22,23]. Given that this review focuses on diagnostic test accuracy, it also adhered to the PRISMA-DTA statement [24] to ensure transparent and standardized reporting of diagnostic accuracy meta-analyses. The protocol was prospectively registered in the International Prospective Register of Systematic Reviews (PROSPERO) under the registration number CRD42025640065. The review protocol defined inclusion and exclusion criteria, data extraction forms, and pre-specified subgroup analyses to ensure methodological transparency.

2.2. Eligibility Criteria

The PICOS framework yielded the following focused review question: “Can radiomic-based ML models accurately predict HPV status in OPSCC?” We defined the Population, Intervention, Comparator, Outcome, and Study design (PICOS) a priori, and no deviations from the original protocol occurred during the review process. Population was patients diagnosed with OPSCC; Interventions were radiomic-based ML classifiers; Comparators were not applicable in this systematic review; Outcomes were the performance of radiomic-based models for HPV status prediction; and the included Studies were retrospective cohort studies. Studies combining the HPV status with other relevant descriptors to predict tumor volume, cancer progression, risk stratification, locoregional recurrence, survival prediction, treatment individualization or response were not included.

2.3. Information Sources and Search Strategy

Individualized search strategies were conducted on 7 November 2024 for the following electronic databases: Cochrane Library, Embase, IEEE Xplore, Portal Regional da Biblioteca Virtual em Saúde (BVS), PubMed, Scopus, and Web of Science. Gray literature was included to minimize publication bias and to identify potentially unpublished protocols. The gray literature search encompassed the following sources: arXiv, Google Scholar, and ProQuest. The reference list of included articles and other SRs were manually screened to identify any records that were not retrieved through database search. The complete search strategy is shown in Table A1 and Table A2 in Appendix A.

2.4. Selection Process

Initially, duplicate records were automatically removed using Endnote [25] and Rayyan [26]. An additional manual exclusion step was necessary to address the records retrieved from IEEE Xplore, arXiv, and the Cochrane Library that could not be exported in RIS format. Once duplicates were removed, two reviewers (A.L.D.A. and C.S.S.) independently conducted the first phase of study selection by reading titles and abstracts. The eligibility criteria were applied only to those articles that remained after the first phase (sought for retrieval). Divergences were reviewed by a third reviewer (L.P.K.).

2.5. Data Collection Process and Data Items

Two reviewers (A.L.D.A. and L.P.K.) independently conducted the screening of articles by reading the titles and abstracts and excluding those that clearly did not fulfill the eligibility criteria. In the second phase, the two reviewers read the full texts of the papers to identify the eligible articles. Discrepancies were resolved by consensus.
Rayyan QCRI [26] was used as the reference manager to perform the screening of articles, exclude duplicates, and record the primary reasons for exclusion. Data extraction was conducted by the primary researcher (A.L.D.A.) and guided by a tailored data extraction form originally suggested by The Cochrane Collaboration. Qualitative and quantitative data were tabulated and processed in Microsoft Excel®. The main data extracted included the following: author, year, source of volume of interest (VOI), total images/patients per subset and per HPV status, HPV status test, strategies to mitigate data imbalance, imaging modality, radiomic features, feature selection method, classifier, AUROC with confidence interval, standard deviation, and standard error, true positive (TP), false negative (FN), true negative (TN), and false positive (FP) values, explainability methods, study limitations, and conclusions.

2.6. Risk of Bias (RoB) Assessment

Each study was assessed independently by two authors (A.L.D.A. and L.P.K.) through the Prediction model Risk of Bias Assessment Tool (PROBAST) for assessing the RoB and applicability of diagnostic and prognostic prediction model studies [27,28].

2.7. Effect Measures

The effect measures included the area under the receiver operating characteristic curve (AUROC) and the corresponding confidence intervals, which are not inherently affected by class imbalance and allow for the assessment of class separability and the generalization ability of the models. For meta-analyses, TP, FN, TN, and FP values are required.

2.8. Synthesis Methods

The synthesis of data was performed based on imaging modality, model architecture/learning type, and the presence of external validation. The meta-analysis was conducted by selecting the best performance models reported in studies conducting external validation. Since the studies are diverse in methods, we grouped the models into three main groups: (1) Deep Learning (DL) models, (2) Traditional models with their respective feature selection methods, and (3) Ensemble and Gradient Boosting methods with associated feature selection methods. The meta-analysis was conducted using The MetaBayesDTA (v1.5.2) [29], an extension of the MetaDTA web application [30,31,32], a Shiny application developed by the group at the Centre for Reviews and Dissemination (CRD), University of York, designed to perform diagnostic accuracy meta-analysis (particularly involving sensitivity and specificity).
Between-study heterogeneity was planned to be evaluated both visually and statistically. Initially, heterogeneity was to be explored through the inspection of summary receiver operating characteristic (sROC) plots and confidence regions generated by the MetaBayesDTA package, allowing for qualitative assessment of variability across studies. In addition, quantitative measures of heterogeneity, including Cochran’s Q statistic, τ2 (tau-squared), and I2, were calculated using RStudio (version 2025.09.2-418). These metrics were used to estimate the magnitude and proportion of total variability attributable to differences between studies rather than to sampling error.

3. Results

3.1. Study Selection

Amongst a total of 777 records identified through the search strategy, twenty-four articles [11,16,17,18,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52] fulfilled the eligibility criteria and were included in this SR. The study selection process is summarized in the PRISMA Flowchart (Figure 1) and the reasons for exclusion of each article read in full text in the second phase are described in Table A3 in Appendix B.
Figure 1. PRISMA Flowchart. This diagram illustrates the identification, screening, eligibility assessment, and inclusion of studies evaluating radiomics and machine learning approaches for HPV status prediction. A total of 24 studies were included in the final qualitative synthesis. BVS: Biblioteca Virtual em Saúde; IIEE: Instituto dos Engenheiros Eletrônicos e Eletricistas.

3.2. Study Characteristics

A total of 8627 patients with OPSCC were included for the development and validation of HPV status detection models, with some risk of patient duplication since studies from the same research groups were included in this SR. Concerning dataset size, four studies included fewer than 100 patients [18,48,51,52]; twelve included between 100 and 300 patients [11,16,33,34,36,37,38,42,43,46,47,50]; four included between 300 and 500 patients [17,39,40,41]; two included between 500 and 1000 patients [44,45]; and two included more than 1000 patients [34,49].
Confirmatory HPV-specific testing in addition to p16 immunohistochemistry, using at least two combined methods as recommended by American Society of Clinical Oncology (ASCO) guidelines [7], was conducted in six studies. The specific HPV tests included polymerase chain reaction (PCR) [36,52], DNA in situ hybridization (DNA-ISH) [44], and VirusSeq [42], with some studies applying more than one specific method [40,41]. Seventeen studies relied solely on p16 immunohistochemistry (IHC p16) as a surrogate marker, and two studies [33,48] did not report the HPV testing method used.
Methodologically, studies were vastly diverse with main variations residing in source of VOI and imaging modalities. The imaging modalities were predominantly computed tomography (CT [17,18,34,35,39,40,42,44,48,49], followed MRI [16,36,37,38,43,47,50,51,52], contrast-enhanced computed tomography (CE-CT) [11,33,45], and 18F-fluorodeoxyglucose positron emission tomography (18F-FDG PET) [41,43,46]. Two studies explored multiple imaging modalities [40,43]. Only three studies made clear that images were obtained prior to the treatment [11,51,52]. The majority of studies were developed based on gross primary tumor volume (GTVp) [33,34,35,36,37,38,39,41,42,43,44,45,47,49,50,51], five were based on GTVp and gross nodal tumor volume (GTVn) [11,16,17,46,52], one applied GTVp, GTVn and parotid [18], one used the cubic region of interest (ROI) of the oropharynx, not precise anatomical delineation [48], and one compared GTVp, GTVn, GTVp + GTVn, and consensus of all lymph nodes [40]. When comparing approaches according to the VOI, best results were achieved when associating primary tumor and nodal VOIs [48]. Three studies [36,37,42] also developed clinical and multimodal models (i.e., radiomic + clinical data) for comparison. Most studies’ samples were highly imbalanced [11,17,33,36,39,40,41,43,44,45,46,47,49,50,51,52], with a predominance of HPV+ patients [11,16,17,33,37,39,40,43,44,45,46,47,48,49,50,51,52] and only four with more HPV- patients [18,34,36,38]. Four studies were slightly imbalanced [16,34,37,48], one was moderately imbalanced [18], and only one was balanced [38]. Three studies did not report the proportion of HPV+ patients in the cohort used [35,41,42]. Some studies applied strategies to mitigate data imbalance as synthetic minority over-sampling technique (SMOTE) [43,47,51], t-distributed Stochastic Neighbor Embedding (t-SNE) [49], Random Over-Sampling Examples (ROSE) [33], oversampling [50], or class-weighted approach [40]. Only eight studies [11,16,39,40,41,42,44,50] reported external validation results. Regarding the performance, AUROC values ranged between 0.59 and 0.87 in the internal validation and between 0.48 and 0.91 in the external validation results. Feature selection methods varied a lot with recursive feature elimination (RFE) [16,36,37,38,47], linear regression with the least absolute shrinkage and selection operator (LASSO) [36,45,51,52] and hierarchical clustering [35,40] being the most common. With respect to HPV status prediction classifiers, Logistic regression (LR) was the most commonly used model [34,35,36,37,38,42,45,46,47,48,51,52] followed by random forest (RF) [18,36,48,49,50,52], and eXtreme Gradient Boosting (XGBoost) [18,40,41,52]. Two studies explored multilayer perceptron (a feed-forward neural network architecture that learns nonlinear associations between high-dimensional image-derived features and clinical outcomes such as HPV status) and fully connected networks [18,50]. Only four studies employed convolutional neural network approaches, including Inception V3 [39], ElNet [40], 3D and 2D CNN architectures [44], and ResNet-18 [48]. Explainability methods included saliency maps [42], Shapley Additive exPlanations (SHAP) plots [36,43], and Gradient-weighted Class Activation Mapping (grad-CAM) [39]. In addition to the main goal, which was HPV status prediction, one study assessed the impact of artifacts on model performance [33] and one study evaluated the influence of CT scanner manufacturer on the robustness of radiomic features [49]. The studies’ characteristics are summarized in Table 1.
Table 1. General characteristics of studies evaluating radiomics-based ML models for HPV status prediction in OPSCC cancer.

3.3. RoB in Studies

Retrospective cohorts are the most common study design used for prediction model development. Prior to model training, exclusions may occur but are typically well justified (e.g., poor-quality MRI, missing HPV status). Therefore, in the participant domain, all studies were rated as having a low RoB and low concern regarding applicability.
For the predictor domain, an unclear RoB was noted when it was not explicitly stated whether the predictor assessors (particularly those extracting radiomic features) were blinded to HPV status. A high risk was assigned when predictors used in model development were not available prior to treatment, compromising the model’s intended applicability (i.e., pre-treatment use). However, all included studies developed their models using pre-treatment variables (clinical assessment and imaging).
For the outcome domain, an unclear or high RoB was assigned in the absence of information regarding whether pathologists were blinded to predictor data, and when HPV status was determined solely based on p16 immunohistochemistry. The exclusive use of p16 as a surrogate marker is not considered adequate according to current clinical guidelines (e.g., ASCO), which recommend dual testing for accurate HPV status determination [7].
For the analysis domain, an unclear or high RoB was assigned when one or more of the following conditions were present: an insufficient number of events (HPV-positive cases); predictor selection based solely on univariable analyses (e.g., univariable Cox or logistic regression, t-tests, Kolmogorov–Smirnov tests, or biserial correlation); inadequate handling of class imbalance; the absence of modeling for center effects despite the use of multi-institutional data; no mention of missing data or imputation strategies; failure to account for stratification, clustering, or other forms of data dependency; and a lack of appropriate methods to mitigate overfitting or assess model optimism, such as nested cross-validation with Bayesian optimization, bootstrapping, or penalization techniques. Additionally, if the dataset was imbalanced, authors should have applied strategies to mitigate this imbalance; if no such strategies were conducted, we considered this a source of bias. In this systematic review, we prioritized the AUROC, which was reported in all included studies. However, even in the absence of AUROC, studies that reported other relevant performance metrics such as accuracy, sensitivity, specificity, precision, and/or F1-score were considered to have a low RoB.
A high RoB was seen in twenty-one studies [11,16,17,18,33,34,35,36,37,38,39,42,43,44,45,46,47,48,49,50,51] and an unclear RoB in three studies [40,41,52] (Table 2). No applicability concerns were raised.
Table 2. Risk of bias assessment of the studies included in the SR.

3.4. Meta-Analysis

To streamline the subgroup analysis for our meta-analysis, we consolidated external validation results. From each paper, we selected only the best-performing model from each main methodological category (i.e., DL models, Traditional models, and Ensemble/Gradient Boosting methods) for inclusion. However, a significant limitation emerged, as most studies did not report the confusion matrix (TP, FN, FP, TN) required for the meta-analysis, nor did they report the performance metrics needed to calculate them. In the absence of reported confusion matrices, we derived them from the reported cohort sizes and available performance metrics (e.g., sensitivity and specificity). These derived matrices were cross-verified by recalculating the published performance results. The reconstructed values showed minor inconsistencies compared with the reported outcomes. We proceeded with the planned meta-analysis, assuming that these discrepancies were likely attributable to rounding errors, as the resulting distributions of TP, FN, FP, and TN appeared plausible given the sample sizes of the external validation cohorts. Nevertheless, since these derived data may introduce estimation bias, the pooled results should be interpreted with caution, and this limitation was considered when assessing the robustness of the meta-analytic estimates.
Three studies were selected for the DL meta-analysis [39,44,50] and three for the Traditional model’s meta-analysis [11,16,42]. For the Ensemble and Gradient Boosting Methods, only one study [50] provided external validation results; therefore, a meta-analysis could not be conducted for this group. The results of the meta-analysis are presented in Figure 2, which shows the sROC curves for both DL and Traditional models. Aiming for transparency, we also provide a summary table (Table 3) showing the TP, FN, FP, and TN values used in the meta-analysis.
Figure 2. MetaBayesDTA framework. Summary of Receiver Operating Characteristic (sROC) curves and forest plots of sensitivity and specificity for DL models [39,44,50] versus Traditional models [11,16,42]. Individual studies are represented as black dots, with their dispersion reflecting heterogeneity. The square indicates the summary point (pooled sensitivity and specificity), estimated using a bivariate model. The inner gray ellipse represents the 95% confidence region around the summary point, while the outer dashed ellipse shows the 95% prediction region, indicating where results of future studies would be expected to fall. A wider prediction region suggests greater heterogeneity. The sROC curve represents the overall trade-off between sensitivity and specificity across different diagnostic thresholds, with proximity to the top-left corner indicating better diagnostic performance.
Table 3. Summary of true positive (TP), false negative (FN), false positive (FP), and true negative (TN) counts extracted from the selected studies that reported external validation results and were used to calculate the pooled performance metrics in the meta-analysis.
In the sROC plots, each black dot represents an individual study, with their spread indicating the heterogeneity between studies. The square marks the pooled diagnostic performance, the gray shaded ellipse of its 95% confidence region, and the larger dashed ellipse of the 95% prediction region, which reflects the expected range for future studies.
A direct comparison reveals a notable performance difference. The DL models showed moderate pooled performance (sensitivity: 0.61; 95% CI: 0.35–0.87, specificity: 0.65; 95% CI: 0.31–0.99) with considerable heterogeneity, as indicated by the wide dispersion of individual studies and a large prediction region. In contrast, the Traditional models demonstrated higher, more consistent performance (sensitivity: 0.72; 95% CI: 0.51–0.93, specificity: 0.77; 95% CI: 0.57–0.98), with low variability among studies and a tighter prediction ellipse, suggesting good consistency and stable performance across the included studies.
To further quantify heterogeneity among the included studies, an additional analysis was performed using RStudio (version 2025.09.2-418). The pooled estimates indicated substantial heterogeneity, with a Q statistic of 29.47 (df = 5, p < 0.001), τ2 = 1.66, and I2 = 78.2%, suggesting considerable between-study variability. The pooled sensitivity was 0.855 (95% CI = 0.763–0.915) and the pooled specificity was 0.799 (95% CI = 0.654–0.893). These findings confirm the presence of notable heterogeneity, consistent with the variability observed in the sROC plots generated by MetaBayesDTA.

4. Discussion

The present SR aimed to evaluate and synthesize evidence on the diagnostic performance of ML models for HPV status prediction in OPSCC. To our knowledge, this is the first systematic review with meta-analysis focusing exclusively on external validation results comparing Traditional and DL approaches. Similar reviews as the one by Chen et al. [53] focused on parameters obtained from structural and diffusion-weighted MRI. Although their methodological approach differed substantially from the present SR, their findings align with ours, indicating that in ML studies, predictive performance significantly improves when clinical variables are incorporated into the models, compared with models relying solely on imaging features. Song et al. [54] reported a pooled sensitivity and specificity of 79% and 75%, respectively, and performed subgroup analyses stratified by imaging modality, validation type, sample size, and geographic region. Most recently, Ansari et al. [55] conducted a well-structured systematic review with meta-analysis that reported a pooled sensitivity of 77.2% and specificity of 76%. Their review encompassed studies largely overlapping with those included in our analysis, concluding that the diagnostic accuracy of radiomics-derived features from medical imaging remains inferior to that achieved by established para-clinical IHC techniques. In their meta-analysis, the authors stratified the sROC curves according to imaging modality, distinguishing between MRI- and CT-derived datasets. In contrast, our work offers a complementary perspective by concentrating on the performance of externally validated models, regardless of the imaging source. We believe this focus provides an additional and valuable dimension to the existing evidence. Both types of meta-analytical aggregation (i.e., by imaging modality and by validation strategy) are informative and should be jointly considered by researchers in this field.
To date, no clear recommendations exist regarding the reliability and clinical applicability of these models, due to methodological heterogeneity and limited external validation. In Section 4, we will address specific aspects aimed at highlighting the most pertinent questions regarding the optimal methodologies and common findings of the included studies, particularly focusing on which features are more suitable for model training, whether it is preferable to use only images or to combine them with clinical data, which data augmentation strategies are most used, and which explainability methods were explored. Considerations about the performance and RoB will also be discussed.

4.1. Radiomic Features Related to Tumor Characteristics

Radiomic features are known to reflect the biological differences between HPV-positive and HPV-negative tumors. However, the exploration of such features is specific to each imaging modality and depends on the completeness and quality of the dataset. In this section, we summarize the most discriminative features reported across studies to support methodological choices in future research.
In the context of MRI, according to Boot et al. [36], the features compactness1, compactness2 and sphericity were more discriminative of HPV-positive tumors, which tend to be more spherical, while the major and minor axis length are more discriminative of HPV-negative tumors. Additionally, the features coefficient of variation and quartile coefficient were also pointed to be correlated with higher intensity heterogeneity in HPV-negative tumors. Suh et al. [52] indicated that lower mean and median Apparent Diffusion Coefficient (ADC) histograms were indicative of more homogeneous tumor tissue, along with higher kurtosis and skewness. In the model developed by Jo et al. [43], which integrated radiomic features from MRI and 18F-FDG PET/CT, SUVmax was identified as the most relevant variable.
For studies applying CE-CT, Altinok et al. [33] identified Sphericity and Max2D-DiameterRow as the key radiomic features in their Bayesian Network model. The authors noted that higher values of Max2D-DiameterRow reflect greater complexity in tumor morphology. These features are considered particularly informative, as tumors with well-defined borders and a more rounded shape are characteristic of HPV-positive cases. Similarly, Leijenaar et al. [45] reported that the radiomic features highlighted by their predictive models suggest that HPV-positive tumors tend to exhibit greater uniformity and homogeneity, reduced contrast enhancement, lower minimum density values, and increased variability in intensity across neighboring voxels. These imaging characteristics are associated with Gray-level size zone matrix, small zone emphasis, Gray-level co-occurrence matrix inverse variance and Laplacian of Gaussian (4 mm) 10th percentile, which may correspond to underlying histopathological traits of HPV-positive tumors, such as lobular growth patterns and lymphocytic infiltration; however, confirming these associations requires further investigation using datasets enriched with histological information. In addition, Gray-level size zone matrix, low gray-level large size emphasis, and Laplacian of Gaussian (3 mm) kurtosis have been linked to HPV-positive tumors that demonstrate lower contrast uptake and more extreme density values [11].
In studies using CT imaging, Prasse et al. [18] reported that two features were repeatedly selected in the best-performing models: original_firstorder_Minimum in the GTV and original_firstorder_RootMeanSquared in the parotid. These findings support the hypothesis that HPV infection may leave detectable signs even in non-tumoral tissues, although the relevance of the parotid as a predictive VOI remains inconclusive regarding its causal role. Similarly, Yu et al. [17] observed that HPV-positive patients tended to have smaller and geometrically simpler tumors, highlighting tumor size and shape as important features. Following this approach, the features most indicative of HPV status included MeanBreadth, associated with tumor size, with HPV-positive patients typically exhibiting smaller tumors, and SphericalDisproportion, reflecting tumor morphology and suggesting a simpler, less irregular shape in HPV-positive cases. Consistent with these observations, Bogowicz et al. [34] found that tumor heterogeneity was the key feature influencing model performance, emphasizing the importance of intratumoral variability in characterizing HPV-related tumors.

4.2. Explainable AI (XAI)

The correlation of such features with clinically interpretable characteristics requires extensive investigation, as radiomic parameters often capture abstract image patterns that are not directly visible to the human eye. In this context, the integration of explainable artificial intelligence (XAI) approaches represents a significant advancement, as they can help uncover the biological or morphological meaning behind radiomic signatures. However, only a few studies to date have incorporated explicit explainability methods, and their application remains largely limited to post hoc visualizations or feature importance analyses. Future research should therefore prioritize the use of interpretable models to strengthen the clinical reliability and translational potential of radiomics-based AI systems.
Saliency maps as reported by Lv et al. [46] highlight which regions of an image most strongly influence the model’s decision. In practice, they assign an “importance value” to each pixel. SHAP plots [36,43] are used mainly with tabular or numerical data (like radiomic features). They show how much each feature contributes to the model’s final prediction, which helps identify which radiomic features are most relevant for distinguishing between classes (e.g., HPV-positive vs. HPV-negative tumors). The grad-CAM visualization as reported by Fanizzi et al. [39] is exclusive for CNNs. It uses the gradients of the output layer to create a heatmap over the original image, highlighting the regions that most influenced the model’s classification. This allows researchers to check whether the model is focusing on meaningful anatomical or pathological areas. The t-SNE reported by [49] is a dimensionality reduction technique that projects high-dimensional data into two or three dimensions, allowing visually similar points to cluster together and revealing patterns or subgroups within the data. These methods may introduce optimism if not adequately assessed.
Given the growing complexity of AI models in radiomics, we recommend the routine incorporation of explainability methods in future studies. Using XAI approaches not only improves transparency and trust in model predictions but also facilitates clinical interpretation, enabling researchers and clinicians to better understand the biological relevance of the extracted features and to guide decision-making in patient care.

4.3. The Added Value of Clinical and Demographic Information in Image-Based Prediction Models

The majority of studies focused on imaging data alone but three included studies [36,37,42] also developed models using clinical data or combining clinical and radiomic data for comparison. Interestingly, the best-performing models were those using combined data.
While Yu and colleagues [17] advocated the exclusive use of imaging data, suggesting that the inclusion of clinical information could introduce additional uncertainties into the predictive model, the evidence of these comparative studies [36,37,42] suggests that, in certain contexts, traditional clinical features may provide complementary or even superior diagnostic value compared to imaging-derived features alone.

4.4. The Impact of Scanner Manufacturer on the Robustness of Radiomic Features

A key finding of the Petrou et al. study [48] is that the exclusion of non-robust features consistently impaired model efficacy, suggesting that such features may carry significant predictive signal. Consequently, discarding them purely due to scanner dependency is not advisable. This underscores the need for harmonization methods like ComBat a statistical method used for data harmonization, specifically designed to correct batch effects, such as differences introduced by scanners, acquisition protocols, or different centers, while preserving the relevant biological information.
At the same time, it is important to recognize that data variability is not inherently detrimental. Heterogeneous, multicenter data are essential for training robust and generalizable models, as they expose algorithms to a broader range of real-world imaging conditions [56]. However, when technical variability overwhelms biological signal, model performance and reproducibility can be compromised. Therefore, a balance must be achieved between maintaining sufficient diversity to support model robustness and applying harmonization strategies to reduce non-biological noise.

4.5. Performance of Models and Aspects That Influence It

A key limitation identified in the literature is that many studies focus exclusively on imaging data. However, evidence demonstrates that integrating different data types significantly improves model performance. Three included studies [36,37,42] developed models that integrated clinical and radiomics data, and these combined models demonstrated the best overall performance. For instance, Boot et al. found that combined models were the most predictive for determining HPV status and for overall survival [36].
The choice of algorithms and dataset characteristics also impacts model performance, as well as the number of features. For small datasets, simpler models can outperform complex alternatives. Suh et al. [52] demonstrated that Logistic Regression (LR) and Random Forest (RF) performed better than XGBoost, attributing this to LR’s superiority with small samples and the high relevance of the selected features. The relationship between the number of features and performance is not linear. Altinok et al. [33] observed that SVM outperformed the Bayesian Network (BN), possibly due to the number of features used. However, it is noted that an excessive number of features can increase model complexity and reduce intelligibility.
Combining different sources of information, both in terms of modality and anatomical location, is an effective strategy. The combination of distinct imaging modalities, such as 18F-FDG PET/CT and MRI [43], generally yields superior results compared to using a single modality. Similarly, combining features extracted from the primary tumor and lymph nodes [16] proved more advantageous than using only one source.
A major limitation that hinders robust analysis and model comparison is the inconsistent reporting of essential performance metrics across studies. The absence of important performance metrics was previously highlighted by our group in past studies [57]. Future studies should adhere to AI-specific reporting guidelines such the Checklist for Artificial Intelligence in Medical Imaging (CLAIM) [58], the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) [59,60,61], the Standards for Reporting Diagnostic Accuracy (STARD-AI) statement [62], and the Must AI Criteria-10 (MAIC-10) Checklist [63] to enhance reproducibility and transparency. This underscores the critical need for future studies to comprehensively report all relevant performance metrics to support robust and reliable meta-analyses. Future studies must comprehensively report all relevant performance metrics—including accuracy, sensitivity, specificity, precision, confusion matrix (values for TP, FP, FN, TN), F1-score, and AUROC—to enable robust and reliable meta-analyses.

4.6. Limitations of Included Studies That Affected the SR

The included studies presented several limitations:
(1)
Small sample sizes were noted in several investigations, which reduces the power of generalization.
(2)
Most datasets were obtained from a single center, potentially affecting reproducibility due to scanner- and protocol-related variations.
(3)
Manual delineation of VOIs is subject to interobserver variability.
(4)
The retrospective design carries a potential RoB, even though assessors were blinded to HPV status. However, we understand that a prospective image collection would not necessarily alter the analytical workflow.
(5)
HPV detection in most studies relied on IHC p16 alone instead of combining it with at least one specific molecular test, a major clinical limitation affecting the “ground truth” label.
(6)
The absence of external validation further limits the assessment of model generalizability.
(7)
Explainability methods were applied in only a few studies, despite their importance for understanding model decision-making and identifying the features most influential in the predictions.

5. Conclusions

From a clinical perspective, radiomics-based ML models could serve as complementary or alternative tools to molecular HPV testing, particularly in resource-limited settings where tissue is unavailable or specialized tests like p16 IHC or HPV DNA analysis are inaccessible. This potential is supported by imaging traits common in HPV-positive oropharyngeal cancers—such as increased signal intensity, smaller and more spherical lesion volumes, and greater spatial heterogeneity—which are captured by the radiomic features underlying these models’ strong predictive performance.
Despite promising diagnostic results, these models remain far from clinical application, constrained by high methodological variability, suboptimal reference standards, and a lack of external validation. Future research must therefore prioritize methodological rigor, reproducibility, and transparent AI reporting. Crucially, multicenter and prospective studies employing standardized protocols, dual HPV testing and the integration of clinical data with radiomics are warranted to validate clinical utility. Moreover, integrating explainable AI frameworks will be essential to transform radiomic models from promising research tools into clinically reliable biomarkers for HPV-related oropharyngeal cancer. This approach not only offers a tool to evaluate HPV status independently but also holds promise for enabling personalized therapies, such as treatment de-escalation, by tailoring strategies to individual patient risk. The goal is not to replace molecular testing but to advance reliable radiomic biomarkers from research into routine clinical decision-making.

Author Contributions

Conceptualization, A.L.D.A., L.P.K. and A.F.; Methodology, A.L.D.A., L.P.K. and A.R.S.-S.; Formal Analysis, A.L.D.A. and C.S.-S.; Investigation, A.L.D.A., O.A.A.M.d.M. and D.C.; Data Curation, A.L.D.A., C.S.-S. and L.P.K.; Writing—Original Draft Preparation, A.L.D.A.; Writing—Review & Editing, L.P.K., A.R.S.-S., B.V.R.L., A.C.-P., O.G.-L., R.d.B., P.G., K.N.R., R.P.T., N.F.S. and A.F.; Visualization, A.L.D.A. and C.S.-S.; Supervision, L.P.K. and A.F.; Project Administration, L.P.K.; Funding Acquisition, L.P.K. and A.F. All authors have read and agreed to the published version of the manuscript.

Funding

This study was financed, in part, by the São Paulo Research Foundation (FAPESP), Brasil. Process Numbers #2021/14585-7 and #2022/13069-8.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

We declare that the authors have no financial relationship with any commercial associations, current and within the past five years, that might pose a potential, perceived or real conflict of interest. These include grants, patent-licensing arrangements, consultancies, stock or other equity ownership, advisory board memberships, or payments for conducting or publicizing our study.

Abbreviations

ADCApparent Diffusion Coefficient
AIArtificial Intelligence
AJCCAmerican Joint Committee on Cancer
ASCOAmerican Society of Clinical Oncology
AUROCArea Under the Receiver Operating Characteristic Curve
BNBayesian Network
BVSBiblioteca Virtual em Saúde
CE-CTContrast-Enhanced Computed Tomography
CIConfidence Interval
CLAIMChecklist for Artificial Intelligence in Medical Imaging
CNNConvolutional Neural Network
CRDCentre for Reviews and Dissemination
CTComputed Tomography
CUPCancer of Unknown Primary
DLDeep Learning
DNA-ISHDNA In Situ Hybridization
DWIDiffusion-Weighted Imaging
F1-scoreHarmonic Mean of Precision and Recall
FDGFluorodeoxyglucose
FNFalse Negative
FPFalse Positive
GLCMGray-Level Co-occurrence Matrix
GLSZMGray-Level Size Zone Matrix
GTVnGross Nodal Tumor Volume
GTVpGross Primary Tumor Volume
HPVHuman Papillomavirus
IHCImmunohistochemistry
LASSOLeast Absolute Shrinkage and Selection Operator
LoGLaplacian of Gaussian
LRLogistic Regression
MAIC-10Must AI Criteria-10
MLMachine Learning
MRIMagnetic Resonance Imaging
OPSCCOropharyngeal Squamous Cell Carcinoma
p16Cyclin-Dependent Kinase Inhibitor 2A
PCRPolymerase Chain Reaction
PETPositron Emission Tomography
PET/CTPositron Emission Tomography–Computed Tomography
PICOSPopulation, Intervention, Comparator, Outcome, Study Design
PRISMAPreferred Reporting Items for Systematic Reviews and Meta-Analyses
PRISMA-DTAPreferred Reporting Items for Systematic Reviews and Meta-Analyses of Diagnostic Test Accuracy
PRISMA-PPreferred Reporting Items for Systematic Review and Meta-Analysis Protocols
PROBASTPrediction Model Risk of Bias Assessment Tool
PROSPEROInternational Prospective Register of Systematic Reviews
QCRIQatar Computing Research Institute
RFRandom Forest
RFERecursive Feature Elimination
RoBRisk of Bias
ROIRegion of Interest
SHAPShapley Additive exPlanations
SMOTESynthetic Minority Over-sampling Technique
SRSystematic Review
sROCSummary Receiver Operating Characteristic
STARD-AIStandards for Reporting Diagnostic Accuracy for Artificial Intelligence
SUVmaxMaximum Standardized Uptake Value
SVMSupport Vector Machine
t-SNEt-distributed Stochastic Neighbor Embedding
TNTrue Negative
TNMTumor–Node–Metastasis
TPTrue Positive
TRIPODTransparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis
UICCUnion for International Cancer Control
VOIVolume of Interest
XAIExplainable Artificial Intelligence
XGBoosteXtreme Gradient Boosting

Appendix A. General Search Keywords

Table A1. PICO strategy.
Table A1. PICO strategy.
Acronym PICOKeywords
Patients“oral cancer” OR “oropharyngeal cancer”
Intervention/Expositionradiomics
AND
“machine learning” OR “artificial intelligence” OR “deep learning” OR “convolutional neural network” OR “artificial neural network”
ComparisonNot applicable
OutcomeHPV OR “HPV status”
After multiple attempts to reorganize the search keywords, we opted for the combination of the terms “radiomic” and “HPV,” which yielded the most comprehensive results across the databases, thereby ensuring broader coverage of the research topic.
Table A2. Search strategy.
Table A2. Search strategy.
DatabaseQuery
Search Date: 7 November 2024
Retrieved Studies (n)
Cochraneradiomics AND HPV7
Embase(‘radiomics’/exp OR radiomics) AND hpv155
IEEE Xplore(“Full Text & Metadata”:radiomics AND “Full Text & Metadata”:human papillomavirus OR HPV)205
Portal Regional da Biblioteca Virtual em Saúderadiomics OR radiômica OR radiomica AND HPV OR “human papillomavirus” OR “papilomavírus humano” OR “ virus del papiloma humano”15
PubMed(radiomics) AND (HPV)87
Scopus(radiomics) AND (HPV)92
Web of Scienceradiomics (Topic) and HPV (Topic)103
arXiworder: -announced_date_first; size: 50; classification: Computer Science (cs); include_cross_list: True; terms: AND title=RADIOMICS; AND title=human papillomavirus; OR title=HPV7
Google Scholarradiomics AND HPV99
ProQuesttitle(radiomics) AND title(HPV)7
TOTAL 777

Appendix B. Excluded Articles and Reasons for Exclusion

Table A3. Excluded articles and reasons for exclusion.
Table A3. Excluded articles and reasons for exclusion.
AuthorTitleYearCountryJournal/Conference NameExclusion Criteria
1Bologna et al.Prognostic radiomic signature for head and neck cancer: Development and validation on a multi-centric MRI dataset2023ItalyRadiotherapy and Oncology.Other outcome prediction (not HPV status)
2Giannitto et al.Association of quantitative MRI-based radiomic features with prognostic factors and recurrence rate in oropharyngeal squamous cell carcinoma2020ItalyNeoplasmaOnly correlated radiomic features with prognostic
factors
3Leijenaar et al.Radiomics in OPSCC: a novel quantitative imaging biomarker for HPV status?2016ItalyESTRO 35 2016Conference Abstract
4Parmar et al.Radiomic feature clusters and Prognostic Signatures specific for Lung and Head & Neck cancer2015USAScientific ReportsOnly identification of radiomic signatures
5Buch et al.Using Texture Analysis to Determine Human Papillomavirus Status of Oropharyngeal Squamous Cell Carcinomas on CT2014USAHead and NeckConventional statistical analysis rather than machine learning
6Ou et al.Predictive and prognostic value of CT based radiomics signature in locally advanced head and neck cancers patients treated with concurrent chemoradiotherapy or bioradiotherapy and its added value to Human Papillomavirus status2017FranceOral OncologyOther outcome prediction (not HPV status)

References

  1. Edge, S.B.; Byrd, D.; Compton, C.; Fritz, A.; Greene, F.; Trotti, A. AJCC Cancer Staging Manual, 8th ed.; Springer: New York, NY, USA, 2017. [Google Scholar]
  2. Brierley, J.; Gospodarowicz, M.; Wittekind, C. (Eds.) TNM Classification of Malignant Tumours, 8th ed.; Wiley-Blackwell: Hoboken, NJ, USA, 2016; ISBN 978-1-119-26356-2. [Google Scholar]
  3. Abrahão, R.; Perdomo, S.; Pinto, L.F.R.; Nascimento de Carvalho, F.; Dias, F.L.; de Podestá, J.R.V.; Ventorin von Zeidler, S.; Marinho de Abreu, P.; Vilensky, M.; Giglio, R.E.; et al. Predictors of Survival After Head and Neck Squamous Cell Carcinoma in South America: The InterCHANGE Study. JCO Glob. Oncol. 2020, 6, 486–499. [Google Scholar] [CrossRef]
  4. de Almeida, J.R.; Martino, R.; Hosni, A.; Goldstein, D.P.; Bratman, S.V.; Chepeha, D.B.; Waldron, J.N.; Weinreb, I.; Perez-Ordonez, B.; Yu, E.; et al. Transoral Robotic Surgery and Radiation Volume Deintensification in Unknown Primary Squamous Cell Carcinoma of the Neck. JAMA Otolaryngol.–Head Neck Surg. 2024, 150, 463. [Google Scholar] [CrossRef]
  5. Oosthuizen, J.C.; Doody, J. De-Intensified Treatment in Human Papillomavirus-Positive Oropharyngeal Cancer. Lancet 2019, 393, 5–7. [Google Scholar] [CrossRef]
  6. Mehanna, H. Update on De-Intensification and Intensification Studies in HPV. HPV Infect. Head Neck Cancer 2017, 206, 251–256. [Google Scholar]
  7. Fakhry, C.; Lacchetti, C.; Rooper, L.M.; Jordan, R.C.; Rischin, D.; Sturgis, E.M.; Bell, D.; Lingen, M.W.; Harichand-Herdt, S.; Thibo, J.; et al. Human Papillomavirus Testing in Head and Neck Carcinomas: ASCO Clinical Practice Guideline Endorsement of the College of American Pathologists Guideline. J. Clin. Oncol. 2018, 36, 3152–3161. [Google Scholar] [CrossRef]
  8. Louredo, B.V.R.; Prado-Ribeiro, A.C.; Brandão, T.B.; Epstein, J.B.; Migliorati, C.A.; Piña, A.R.; Kowalski, L.P.; Vargas, P.A.; Lopes, M.A.; Santos-Silva, A.R. State-of-the-Science Concepts of HPV-Related Oropharyngeal Squamous Cell Carcinoma: A Comprehensive Review. Oral Surg. Oral Med. Oral Pathol. Oral Radiol. 2022, 134, 190–205. [Google Scholar] [CrossRef] [PubMed]
  9. Rao, K.N.; Fernandez-Alvarez, V.; Guntinas-Lichius, O.; Sreeram, M.P.; de Bree, R.; Kowalski, L.P.; Forastiere, A.; Pace-Asciak, P.; Rodrigo, J.P.; Saba, N.F.; et al. The Limitations of Artificial Intelligence in Head and Neck Oncology. Adv. Ther. 2025, 42, 2559–2568. [Google Scholar] [CrossRef]
  10. Chiesa-Estomba, C.M.; Mayo-Yanez, M.; Guntinas-Lichius, O.; Vander-Poorten, V.; Takes, R.P.; de Bree, R.; Halmos, G.B.; Saba, N.F.; Nuyts, S.; Ferlito, A. Radiomics in Hypopharyngeal Cancer Management: A State-of-the-Art Review. Biomedicines 2023, 11, 805. [Google Scholar] [CrossRef]
  11. Bagher-Ebadian, H.; Lu, M.; Siddiqui, F.; Ghanem, A.I.; Wen, N.; Wu, Q.; Liu, C.; Movsas, B.; Chetty, I.J. Application of Radiomics for the Prediction of HPV Status for Patients with Head and Neck Cancers. Med. Phys. 2020, 47, 563–575. [Google Scholar] [CrossRef] [PubMed]
  12. Aerts, H.J.W.L.; Velazquez, E.R.; Leijenaar, R.T.H.; Parmar, C.; Grossmann, P.; Carvalho, S.; Bussink, J.; Monshouwer, R.; Haibe-Kains, B.; Rietveld, D.; et al. Decoding Tumour Phenotype by Noninvasive Imaging Using a Quantitative Radiomics Approach. Nat. Commun. 2014, 5, 4006. [Google Scholar] [CrossRef]
  13. Traverso, A.; Wee, L.; Dekker, A.; Gillies, R. Repeatability and Reproducibility of Radiomic Features: A Systematic Review. Int. J. Radiat. Oncol. Biol. Phys. 2018, 102, 1143–1158. [Google Scholar] [CrossRef]
  14. Lu, L.; Lv, W.; Jiang, J.; Ma, J.; Feng, Q.; Rahmim, A.; Chen, W. Robustness of Radiomic Features in [11C]Choline and [18F]FDG PET/CT Imaging of Nasopharyngeal Carcinoma: Impact of Segmentation and Discretization. Mol. Imaging Biol. 2016, 18, 935–945. [Google Scholar] [CrossRef] [PubMed]
  15. Leijenaar, R.T.H.; Carvalho, S.; Velazquez, E.R.; van Elmpt, W.J.C.; Parmar, C.; Hoekstra, O.S.; Hoekstra, C.J.; Boellaard, R.; Dekker, A.L.A.J.; Gillies, R.J.; et al. Stability of FDG-PET Radiomics Features: An Integrated Analysis of Test-Retest and Inter-Observer Variability. Acta Oncol. 2013, 52, 1391–1397. [Google Scholar] [CrossRef] [PubMed]
  16. Li, Q.; Xu, T.; Gong, J.; Xiang, S.; Shen, C.; Zhou, X.; Hu, C.; Wu, B.; Lu, X. Applying Multisequence MRI Radiomics of the Primary Tumor and Lymph Node to Predict HPV-Related P16 Status in Patients with Oropharyngeal Squamous Cell Carcinoma. Quant. Imaging Med. Surg. 2023, 13, 2234–2247. [Google Scholar] [CrossRef]
  17. Yu, K.; Zhang, Y.; Yu, Y.; Huang, C.; Liu, R.; Li, T.; Yang, L.; Morris, J.S.; Baladandayuthapani, V.; Zhu, H. Radiomic Analysis in Prediction of Human Papilloma Virus Status. Clin. Transl. Radiat. Oncol. 2017, 7, 49–54. [Google Scholar] [CrossRef]
  18. Prasse, G.; Glaas, A.; Meyer, H.-J.; Zebralla, V.; Dietz, A.; Hering, K.; Kuhnt, T.; Denecke, T. A Radiomics-Based Machine Learning Perspective on the Parotid Gland as a Potential Surrogate Marker for HPV in Oropharyngeal Cancer. Cancers 2023, 15, 5425. [Google Scholar] [CrossRef]
  19. Takes, R.P.; Kaanders, J.H.A.M.; van Herpen, C.M.L.; Merkx, M.A.W.; Slootweg, P.J.; Melchers, W.J.G. Human Papillomavirus Detection in Fine Needle Aspiration Cytology of Lymph Node Metastasis of Head and Neck Squamous Cell Cancer. J. Clin. Virol. 2016, 85, 22–26. [Google Scholar] [CrossRef]
  20. Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 Statement: An Updated Guideline for Reporting Systematic Reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef] [PubMed]
  21. Page, M.J.; Moher, D.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. PRISMA 2020 Explanation and Elaboration: Updated Guidance and Exemplars for Reporting Systematic Reviews. BMJ 2021, 372, n160. [Google Scholar] [CrossRef]
  22. Moher, D.; Shamseer, L.; Clarke, M.; Ghersi, D.; Liberati, A.; Petticrew, M.; Shekelle, P.; Stewart, L.A.; PRISMA-P Group. Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols (Prisma-p) 2015 Statement. Syst Rev. 2015, 4, 1. [Google Scholar] [CrossRef]
  23. Shamseer, L.; Moher, D.; Clarke, M.; Ghersi, D.; Liberati, A.; Petticrew, M.; Shekelle, P.; Stewart, L.A.; Altman, D.G.; Booth, A.; et al. Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols (Prisma-p) 2015: Elaboration and Explanation. BMJ 2015, 349, g7647. [Google Scholar] [CrossRef]
  24. McInnes, M.D.F.; Moher, D.; Thombs, B.D.; McGrath, T.A.; Bossuyt, P.M.; Clifford, T.; Cohen, J.F.; Deeks, J.J.; Gatsonis, C.; Hooft, L.; et al. Preferred Reporting Items for a Systematic Review and Meta-Analysis of Diagnostic Test Accuracy Studies. JAMA 2018, 319, 388. [Google Scholar] [CrossRef] [PubMed]
  25. The EndNote Team. EndNote X9 [Computer Program]; EndNote: London, UK, 2013. [Google Scholar]
  26. Ouzzani, M.; Hammady, H.; Fedorowicz, Z.; Elmagarmid, A. Rayyan—A Web and Mobile App for Systematic Reviews. Syst. Rev. 2016, 5, 210. [Google Scholar] [CrossRef]
  27. Wolff, R.F.; Moons, K.G.M.; Riley, R.D.; Whiting, P.F.; Westwood, M.; Collins, G.S.; Reitsma, J.B.; Kleijnen, J.; Mallett, S. PROBAST: A Tool to Assess the Risk of Bias and Applicability of Prediction Model Studies. Ann. Intern. Med. 2019, 170, 51–58. [Google Scholar] [CrossRef]
  28. Moons, K.G.M.; Wolff, R.F.; Riley, R.D.; Whiting, P.F.; Westwood, M.; Collins, G.S.; Reitsma, J.B.; Kleijnen, J.; Mallett, S. PROBAST: A Tool to Assess Risk of Bias and Applicability of Prediction Model Studies: Explanation and Elaboration. Ann. Intern. Med. 2019, 170, W1–W33. [Google Scholar] [CrossRef]
  29. Cerullo, E.; Sutton, A.J.; Jones, H.E.; Wu, O.; Quinn, T.J.; Cooper, N.J. MetaBayesDTA: Codeless Bayesian Meta-Analysis of Test Accuracy, with or without a Gold Standard. BMC Med. Res. Methodol. 2023, 23, 127. [Google Scholar] [CrossRef]
  30. Freeman, S.C.; Kerby, C.R.; Patel, A.; Cooper, N.J.; Quinn, T.; Sutton, A.J. Development of an Interactive Web-Based Tool to Conduct and Interrogate Meta-Analysis of Diagnostic Test Accuracy Studies: MetaDTA. BMC Med. Res. Methodol. 2019, 19, 81. [Google Scholar] [CrossRef]
  31. Mallett, S.; Halligan, S.; Thompson, M.; Collins, G.S.; Altman, D.G. MetaDTA: Diagnostic Test Accuracy Meta-Analysis. Available online: https://crsu.shinyapps.io/MetaDTA/ (accessed on 22 October 2025).
  32. Patel, A.; Cooper, N.; Freeman, S.; Sutton, A. Graphical Enhancements to Summary Receiver Operating Characteristic Plots to Facilitate the Analysis and Reporting of Meta-analysis of Diagnostic Test Accuracy Data. Res. Synth. Methods 2021, 12, 34–44. [Google Scholar] [CrossRef] [PubMed]
  33. Altinok, O.; Guvenis, A. Interpretable Radiomics Method for Predicting Human Papillomavirus Status in Oropharyngeal Cancer Using Bayesian Networks. Phys. Medica 2023, 114, 102671. [Google Scholar] [CrossRef] [PubMed]
  34. Bogowicz, M.; Riesterer, O.; Ikenberg, K.; Stieb, S.; Moch, H.; Studer, G.; Guckenberger, M.; Tanadini-Lang, S. Computed Tomography Radiomics Predicts HPV Status and Local Tumor Control After Definitive Radiochemotherapy in Head and Neck Squamous Cell Carcinoma. Int. J. Radiat. Oncol. Biol. Phys. 2017, 99, 921–928. [Google Scholar] [CrossRef]
  35. Bogowicz, M.; Jochems, A.; Deist, T.M.; Tanadini-Lang, S.; Huang, S.H.; Chan, B.; Waldron, J.N.; Bratman, S.; O’Sullivan, B.; Riesterer, O.; et al. Privacy-Preserving Distributed Learning of Radiomics to Predict Overall Survival and HPV Status in Head and Neck Cancer. Sci. Rep. 2020, 10, 4542. [Google Scholar] [CrossRef]
  36. Boot, P.A.; Mes, S.W.; de Bloeme, C.M.; Martens, R.M.; Leemans, C.R.; Boellaard, R.; van de Wiel, M.A.; de Graaf, P. Magnetic Resonance Imaging Based Radiomics Prediction of Human Papillomavirus Infection Status and Overall Survival in Oropharyngeal Squamous Cell Carcinoma. Oral Oncol. 2023, 137, 106307. [Google Scholar] [CrossRef]
  37. Bos, P.; van den Brekel, M.W.M.; Gouw, Z.A.R.; Al-Mamgani, A.; Waktola, S.; Aerts, H.J.W.L.; Beets-Tan, R.G.H.; Castelijns, J.A.; Jasperse, B. Clinical Variables and Magnetic Resonance Imaging-based Radiomics Predict Human Papillomavirus Status of Oropharyngeal Cancer. Head Neck 2021, 43, 485–495. [Google Scholar] [CrossRef]
  38. Bos, P.; van den Brekel, M.W.M.; Taghavi, M.; Gouw, Z.A.R.; Al-Mamgani, A.; Waktola, S.; Jwl Aerts, H.; Beets-Tan, R.G.H.; Castelijns, J.A.; Jasperse, B. Largest Diameter Delineations Can Substitute 3D Tumor Volume Delineations for Radiomics Prediction of Human Papillomavirus Status on MRI’s of Oropharyngeal Cancer. Phys. Medica 2022, 101, 36–43. [Google Scholar] [CrossRef] [PubMed]
  39. Fanizzi, A.; Comes, M.C.; Bove, S.; Cavalera, E.; de Franco, P.; Di Rito, A.; Errico, A.; Lioce, M.; Pati, F.; Portaluri, M.; et al. Explainable Prediction Model for the Human Papillomavirus Status in Patients with Oropharyngeal Squamous Cell Carcinoma Using CNN on CT Images. Sci. Rep. 2024, 14, 14276. [Google Scholar] [CrossRef]
  40. Haider, S.P.; Mahajan, A.; Zeevi, T.; Baumeister, P.; Reichel, C.; Sharaf, K.; Forghani, R.; Kucukkaya, A.S.; Kann, B.H.; Judson, B.L.; et al. PET/CT Radiomics Signature of Human Papilloma Virus Association in Oropharyngeal Squamous Cell Carcinoma. Eur. J. Nucl. Med. Mol. Imaging 2020, 47, 2978–2991. [Google Scholar] [CrossRef] [PubMed]
  41. Haider, S.P.; Zeevi, T.; Sharaf, K.; Gross, M.; Mahajan, A.; Kann, B.H.; Judson, B.L.; Prasad, M.L.; Burtness, B.; Aboian, M.; et al. Impact of 18 F-FDG PET Intensity Normalization on Radiomic Features of Oropharyngeal Squamous Cell Carcinomas and Machine Learning–Generated Biomarkers. J. Nucl. Med. 2024, 65, 803–809. [Google Scholar] [CrossRef]
  42. Huang, C.; Cintra, M.; Brennan, K.; Zhou, M.; Colevas, A.D.; Fischbein, N.; Zhu, S.; Gevaert, O. Development and Validation of Radiomic Signatures of Head and Neck Squamous Cell Carcinoma Molecular Features and Subtypes. EBioMedicine 2019, 45, 70–80. [Google Scholar] [CrossRef]
  43. Jo, K.H.; Kim, J.; Cho, H.; Kang, W.J.; Lee, S.-K.; Sohn, B. 18 F-FDG PET/CT Parameters Enhance MRI Radiomics for Predicting Human Papilloma Virus Status in Oropharyngeal Squamous Cell Carcinoma. Yonsei Med. J. 2023, 64, 738. [Google Scholar] [CrossRef]
  44. Lang, D.M.; Peeken, J.C.; Combs, S.E.; Wilkens, J.J.; Bartzsch, S. Deep Learning Based HPV Status Prediction for Oropharyngeal Cancer Patients. Cancers 2021, 13, 786. [Google Scholar] [CrossRef] [PubMed]
  45. Leijenaar, R.T.; Bogowicz, M.; Jochems, A.; Hoebers, F.J.; Wesseling, F.W.; Huang, S.H.; Chan, B.; Waldron, J.N.; O’Sullivan, B.; Rietveld, D.; et al. Development and Validation of a Radiomic Signature to Predict HPV (P16) Status from Standard CT Imaging: A Multicenter Study. Br. J. Radiol. 2018, 91, 20170498. [Google Scholar] [CrossRef]
  46. Lv, W.; Xu, H.; Han, X.; Zhang, H.; Ma, J.; Rahmim, A.; Lu, L. Context-Aware Saliency Guided Radiomics: Application to Prediction of Outcome and HPV-Status from Multi-Center PET/CT Images of Head and Neck Cancer. Cancers 2022, 14, 1674. [Google Scholar] [CrossRef]
  47. Park, Y.M.; Lim, J.; Koh, Y.W.; Kim, S.; Choi, E.C. Machine Learning and Magnetic Resonance Imaging Radiomics for Predicting Human Papilloma Virus Status and Prognostic Factors in Oropharyngeal Squamous Cell Carcinoma. Head Neck 2022, 44, 897–903. [Google Scholar] [CrossRef]
  48. Petrou, E.; Chatzipapas, K.; Papadimitroulas, P.; Andrade-Miranda, G.; Katsakiori, P.F.; Papathanasiou, N.D.; Visvikis, D.; Kagadis, G.C. Investigation of Machine and Deep Learning Techniques to Detect HPV Status. J. Pers. Med. 2024, 14, 737. [Google Scholar] [CrossRef]
  49. Reiazi, R.; Arrowsmith, C.; Welch, M.; Abbas-Aghababazadeh, F.; Eeles, C.; Tadic, T.; Hope, A.J.; Bratman, S.V.; Haibe-Kains, B. Prediction of Human Papillomavirus (HPV) Association of Oropharyngeal Cancer (OPC) Using Radiomics: The Impact of the Variation of CT Scanner. Cancers 2021, 13, 2269. [Google Scholar] [CrossRef] [PubMed]
  50. Sim, Y.; Kim, M.; Kim, J.; Lee, S.-K.; Han, K.; Sohn, B. Multiparametric MRI–Based Radiomics Model for Predicting Human Papillomavirus Status in Oropharyngeal Squamous Cell Carcinoma: Optimization Using Oversampling and Machine Learning Techniques. Eur. Radiol. 2023, 34, 3102–3112. [Google Scholar] [CrossRef] [PubMed]
  51. Sohn, B.; Choi, Y.S.; Ahn, S.S.; Kim, H.; Han, K.; Lee, S.; Kim, J. Machine Learning Based Radiomic HPV Phenotyping of Oropharyngeal SCC: A Feasibility Study Using MRI. Laryngoscope 2021, 131, E851–E856. [Google Scholar] [CrossRef]
  52. Suh, C.H.; Lee, K.H.; Choi, Y.J.; Chung, S.R.; Baek, J.H.; Lee, J.H.; Yun, J.; Ham, S.; Kim, N. Oropharyngeal Squamous Cell Carcinoma: Radiomic Machine-Learning Classifiers from Multiparametric MR Images for Determination of HPV Infection Status. Sci. Rep. 2020, 10, 17525. [Google Scholar] [CrossRef]
  53. Chen, L.L.; Lauwers, I.; Verduijn, G.; Philippens, M.; Gahrmann, R.; Capala, M.E.; Petit, S. MRI for Differentiation between HPV-Positive and HPV-Negative Oropharyngeal Squamous Cell Carcinoma: A Systematic Review. Cancers 2024, 16, 2105. [Google Scholar] [CrossRef]
  54. Song, C.; Chen, X.; Tang, C.; Xue, P.; Jiang, Y.; Qiao, Y. Artificial Intelligence for HPV Status Prediction Based on Disease-specific Images in Head and Neck Cancer: A Systematic Review and Meta-analysis. J. Med. Virol. 2023, 95, e29080. [Google Scholar] [CrossRef] [PubMed]
  55. Ansari, G.; Mirza-Aghazadeh-Attari, M.; Mosier, K.M.; Fakhry, C.; Yousem, D.M. Radiomics Features in Predicting Human Papillomavirus Status in Oropharyngeal Squamous Cell Carcinoma: A Systematic Review, Quality Appraisal, and Meta-Analysis. Diagnostics 2024, 14, 737. [Google Scholar] [CrossRef]
  56. Araújo, A.L.D.; Sperandio, M.; Calabrese, G.; Faria, S.S.; Cardenas, D.A.C.; Martins, M.D.; Saldivia-Siracusa, C.; Giraldo-Roldán, D.; Pedroso, C.M.; Vargas, P.A.; et al. Artificial Intelligence in Healthcare Applications Targeting Cancer Diagnosis—Part I: Data Structure, Preprocessing and Data Organization. Oral Surg. Oral Med. Oral Pathol. Oral Radiol. 2025, 140, 79–88. [Google Scholar] [CrossRef] [PubMed]
  57. Araújo, A.L.D.; Sperandio, M.; Calabrese, G.; Faria, S.S.; Cardenas, D.A.C.; Martins, M.D.; Vargas, P.A.; Lopes, M.A.; Santos-Silva, A.R.; Kowalski, L.P.; et al. Artificial Intelligence in Healthcare Applications Targeting Cancer Diagnosis—Part II: Interpreting the Model Outputs and Spotlighting the Performance Metrics. Oral Surg. Oral Med. Oral Pathol. Oral Radiol. 2025, 140, 89–99. [Google Scholar] [CrossRef]
  58. Tejani, A.S.; Klontzas, M.E.; Gatti, A.A.; Mongan, J.T.; Moy, L.; Park, S.H.; Kahn, C.E.; Abbara, S.; Afat, S.; Anazodo, U.C.; et al. Checklist for Artificial Intelligence in Medical Imaging (CLAIM): 2024 Update. Radiol. Artif. Intell. 2024, 6, e240300. [Google Scholar] [CrossRef]
  59. Collins, G.S.; Reitsma, J.B.; Altman, D.G.; Moons, K.G.M. Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis Or Diagnosis (TRIPOD): The TRIPOD Statement. Ann. Intern. Med. 2015, 162, 55–63. [Google Scholar] [CrossRef]
  60. Moons, K.G.M.; Altman, D.G.; Reitsma, J.B.; Ioannidis, J.P.A.; Macaskill, P.; Steyerberg, E.W.; Vickers, A.J.; Ransohoff, D.F.; Collins, G.S. Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis Or Diagnosis (TRIPOD): Explanation and Elaboration. Ann. Intern. Med. 2015, 162, W1–W73. [Google Scholar] [CrossRef]
  61. Collins, G.S.; Moons, K.G.M.; Dhiman, P.; Riley, R.D.; Beam, A.L.; Van Calster, B.; Ghassemi, M.; Liu, X.; Reitsma, J.B.; van Smeden, M.; et al. TRIPOD+AI Statement: Updated Guidance for Reporting Clinical Prediction Models That Use Regression or Machine Learning Methods. BMJ 2024, 385, e078378. [Google Scholar] [CrossRef] [PubMed]
  62. Sounderajah, V.; Guni, A.; Liu, X.; Collins, G.S.; Karthikesalingam, A.; Markar, S.R.; Golub, R.M.; Denniston, A.K.; Shetty, S.; Moher, D.; et al. The STARD-AI Reporting Guideline for Diagnostic Accuracy Studies Using Artificial Intelligence. Nat. Med. 2025, 31, 3283–3289. [Google Scholar] [CrossRef] [PubMed]
  63. Cerdá-Alberich, L.; Solana, J.; Mallol, P.; Ribas, G.; García-Junco, M.; Alberich-Bayarri, A.; Marti-Bonmati, L. MAIC–10 Brief Quality Checklist for Publications Using Artificial Intelligence and Medical Images. Insights Imaging 2023, 14, 11. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.