Radiomics Features in Predicting Human Papillomavirus Status in Oropharyngeal Squamous Cell Carcinoma: A Systematic Review, Quality Appraisal, and Meta-Analysis

We sought to determine the diagnostic accuracy of radiomics features in predicting HPV status in oropharyngeal squamous cell carcinoma (SCC) compared to routine paraclinical measures used in clinical practice. Twenty-six articles were included in the systematic review, and thirteen were used for the meta-analysis. The overall sensitivity of the included studies was 0.78, the overall specificity was 0.76, and the overall area under the ROC curve was 0.84. The diagnostic odds ratio (DOR) equaled 12 (8, 17). Subgroup analysis showed no significant difference between radiomics features extracted from CT or MR images. Overall, the studies were of low quality in regard to radiomics quality score, although most had a low risk of bias based on the QUADAS-2 tool. Radiomics features showed good overall sensitivity and specificity in determining HPV status in OPSCC, though the low quality of the included studies poses problems for generalizability.


Introduction
Head and neck squamous cell carcinoma (HNSCC) is the seventh most common cancer globally, accounting for more than 300,000 annual deaths [1].Alarmingly, it is projected that there will be a 30% increase in the overall incidence of HNSCC globally, with both developed and developing nations experiencing a growth in case numbers.This trend is driven by human papillomavirus (HPV), especially in North America and Europe [2,3], although there are other risk factors involved in the development of HNSCC such as alcohol consumption, tobacco products, blood group type, and more [4,5].Even today, it is estimated that the frequency of HPV-positive oropharyngeal (OP) SCC is 2.5 times that of HPV-negative OPSCC [6].Furthermore, although HPV-positive and -negative HNSCC have conventionally been classified as a single clinical diagnosis and represented as a single entity in the ICD classification [7], they show different clinical profiles and have recently been viewed as two distinct clinical entities by the National Comprehensive Cancer Network [8].HPV-positive OPSCC usually has a higher 5-year overall, progression-free, and recurrence-free survival compared to HPV-negative OPSCC.It has also been shown that responses to conventional chemo-radiotherapy and immunotherapy are significantly different between the two groups [9,10].Importantly, there are also critical differences between the two in tumor staging: the American Joint Committee on Cancer (AJCC) staging guidelines now offer two different T and N staging systems for OPSCC depending upon HPV status, essentially classifying them as two distinct clinical entities [11,12].
Conventionally, tissue samples have been used to document the presence of HPV in cancer cells, but recently, texture analysis features extracted from medical images have been proposed as a potential means to differentiate between HPV-positive and -negative OPSCC [13].Radiomics features are mathematically extracted as high-throughput quantitative features that represent valuable information that is not readily/visibly appreciated by radiologists [14].These features have the potential to substitute for tissue sampling as a form of virtual biopsy, as subtle nuances in the texture of a given lesion may be indicative of differences in microanatomy, which itself could be a reflection of different genomic alterations and pathologic processes in the cancer [15,16].
Currently, there is a lack of meta-analytic evidence regarding the possible role of radiomics features in determining HPV status in HNSCC [17].Differences in the methodology of these studies, such as feature extraction, modality of choice, and issues regarding classification and validation, limit the generalizability of the results of isolated studies.
This investigation aims to determine the value of radiomics features extracted from medical images including CT, MRI, and ultrasound in determining HPV status in OPSCC and to determine the quality of radiomics methodologies used in the published literature.
Exclusion criteria consisted of studies that were not in English, conference papers, editorials, reports from society meetings or Kaggle challenges (open dataset challenges recruiting teams to train and validate predictive models or similar schemes) [18], abstracts, and review articles.
The same researchers were also responsible for screening the articles based on title and abstract to exclude irrelevant studies.After the suitable studies were identified, information regarding their methodology and numeric findings (namely the number of true positive, true negative, false positive, and false negative cases) were extracted.Only studies reporting complete metadata were included in the meta-analysis.

Quality Assessment
The radiomics quality score (RQS) and Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) were used by the two researchers (G.A. and M.M.A.A.) to evaluate the methodological quality and risk of bias of the studies included in the systematic review.The two readers independently graded the studies and any case of disagreement was solved via discussion until consensus was achieved.

Statistical Analysis
Statistical analyses were performed with the MIDAS package Stata software, version 16.0, and the Meta-DiSc 2.0 web application.Statistical heterogeneity was assessed using the I2 value, providing an estimate of the percentage of variability among the included reports.I2 values of 0-25%, 25-50%, 50-75%, and >75% represented very low, low, medium, and high heterogeneity, respectively.Pooled sensitivity, specificity, diagnostic odds ratio (DOR), positive and negative likelihood ratios (PLR and NLR), and area under curve (AUC), with corresponding 95% confidence intervals (CIs), were calculated.A forest plot and summary receiver operating characteristic (SROC) plot were generated for each of the modalities and in total.The pooling of studies and effect size were evaluated using a random-effects model, and subgroup analysis was performed to assess the diagnostic accuracy of radiomics features extracted from each imaging modality used.
Deek's funnel plot was used to determine publication bias, and Fagan's nomogram was used to determine clinical utility.

Characteristics of the Included Studies
Twenty-six studies were included in the systematic review.Of these studies, only 14 studies containing 21 databases reported complete diagnostic results, which were included in the meta-analysis.One study was later excluded from the analysis due to reporting results obtained from patients with different types of HNPCC.A PRISMA flow chart of the study is presented in Figure 1.Table 1 Presents more information about the studies included in the study.OPSCC was by far the most common head and neck cancer studied, as 23 of the studies in the systematic review only included patients with OPSCC.Two studies included a variety of HNSCCs, and a single study included non-OPSCC HNSCCs.Only studies including OPSCC were included in the meta-analysis.Contrastenhanced CT was the most prevalent modality used for texture feature extraction (sixteen studies), followed by MRI (eight studies using different combinations of sequences used for extraction of features), and PET-CT (two studies).The earliest study we were able to find was published in 2015, with 16 studies published between 2020 and 2023.
Pooled sensitivity, specificity, diagnostic odds ratio (DOR), positive and negative likelihood ratios (PLR and NLR), and area under curve (AUC), with corresponding 95% confidence intervals (CIs), were calculated.A forest plot and summary receiver operating characteristic (SROC) plot were generated for each of the modalities and in total.The pooling of studies and effect size were evaluated using a random-effects model, and sub-group analysis was performed to assess the diagnostic accuracy of radiomics features extracted from each imaging modality used.
Deek's funnel plot was used to determine publication bias, and Fagan's nomogram was used to determine clinical utility.

Characteristics of the Included Studies
Twenty-six studies were included in the systematic review.Of these studies, only 14 studies containing 21 databases reported complete diagnostic results, which were included in the meta-analysis.One study was later excluded from the analysis due to reporting results obtained from patients with different types of HNPCC.A PRISMA flow chart of the study is presented in Figure 1.Table 1 Presents more information about the studies included in the study.OPSCC was by far the most common head and neck cancer studied, as 23 of the studies in the systematic review only included patients with OPSCC.Two studies included a variety of HNSCCs, and a single study included non-OPSCC HNSCCs.Only studies including OPSCC were included in the meta-analysis.Contrastenhanced CT was the most prevalent modality used for texture feature extraction (sixteen studies), followed by MRI (eight studies using different combinations of sequences used for extraction of features), and PET-CT (two studies).The earliest study we were able to find was published in 2015, with 16 studies published between 2020 and 2023.

Methodological Quality
The QUADAS-2 tool was used to determine the methodologic qualities of the included manuscripts in the systematic review (Supplementary Table S1).Generally, most of the studies had a low risk of bias in patient selection, index testing, and timing.However, there was an undetermined risk of bias regarding the reference test, as different studies utilized different methods to determine if a lesion was HPV positive, including the detection of HPV DNA, HPV messenger RNA (mRNA), or the p16 protein.Polymerase chain reaction has inherent limitations in determining HPV infection, with the most important being the fact that PCR cannot differentiate low-risk strains of HPV from transcriptionally active ones, which are shown to be involved in the pathogenesis of OPSCC.radiomics quality score of the included articles is presented in Supplementary Table S2 and Figure 1.

Methodological Quality
The QUADAS-2 tool was used to determine the methodologic qualities of the included manuscripts in the systematic review (Supplementary Table S1).Generally, most of the studies had a low risk of bias in patient selection, index testing, and timing.However, there was an undetermined risk of bias regarding the reference test, as different studies utilized different methods to determine if a lesion was HPV positive, including the detection of HPV DNA, HPV messenger RNA (mRNA), or the p16 protein.Polymerase chain reaction has inherent limitations in determining HPV infection, with the most important being the fact that PCR cannot differentiate low-risk strains of HPV from transcriptionally active ones, which are shown to be involved in the pathogenesis of OPSCC.radiomics quality score of the included articles is presented in Supplementary Table S2 and Figure 1.

Publication Bias
Figure 2 depicts the Deek's plot of asymmetry and the funnel plot of the study.There was no significant publication bias among the publications included in the meta-analysis.The two plots show that although there was some heterogeneity witnessed on the funnel plot, there was no significant publication bias in the included articles.

Diagnostic Accuracy of Radiomics MRI
Twenty-one datasets pertaining to 14 studies were initially sought to be included in the meta-analysis.Of these studies, 13 studies incorporating 19 datasets exclusively included OPSCC cases, while a single study [29] did not determine the exact origin of the lesions included in their investigation.There were 1104 HPV-positive cancers and 748 HPVnegative lesions included.The pre-test probability of having the condition equaled 59%.
Figures 3 and 4 depict the forest plot for sensitivity and specificity of radiomics features in determining HPV status in HNSCCs.The overall sensitivity of the included studies was 0.78 (0.74, 0.82), and the overall specificity was 0.76 (0.71, 0.81).The diagnostic odds ratio (DOR) equaled 12 (8,17).
Subgroup analysis was performed based on modality, and the relative sensitivity level for MRI vs. CT comparison was 0.9 (sensitivity of 0.7, 0.8, respectively, 0.8-1.0,p = 0.27), and the relative specificity level for MRI vs. CT comparison was 1.0 (specificity of 0.79, 0.73, respectively, 0.9-1.1,p = 0.5) showed no significant difference between the two modalities.The total accuracy of datasets including MRI was 0.76, and the total accuracy of the studies including CT was also 0.76. Figure 5 presents the SROC curve of the studies based on the two modalities.It is evident that a lesser degree of variability was seen for results pertaining to datasets that used CT imaging.
HPV-negative lesions included.The pre-test probability of having the condition equaled 59%.
Figures 3 and 4 depict the forest plot for sensitivity and specificity of radiomics features in determining HPV status in HNSCCs.The overall sensitivity of the included studies was 0.78 (0.74, 0.82), and the overall specificity was 0.76 (0.71, 0.81).The diagnostic odds ratio (DOR) equaled 12 (8,17).Subgroup analysis was performed based on modality, and the relative sensitivity level for MRI vs. CT comparison was 0.9 (sensitivity of 0.7, 0.8, respectively, 0.8-1.0,p = 0.27), and the relative specificity level for MRI vs. CT comparison was 1.0 (specificity of 0.79, 0.73, respectively, 0.9-1.1,p = 0.5) showed no significant difference between the two modalities.The total accuracy of datasets including MRI was 0.76, and the total accuracy HPV-negative lesions included.The pre-test probability of having the condition equaled 59%.
Figures 3 and 4 depict the forest plot for sensitivity and specificity of radiomics features in determining HPV status in HNSCCs.The overall sensitivity of the included studies was 0.78 (0.74, 0.82), and the overall specificity was 0.76 (0.71, 0.81).The diagnostic odds ratio (DOR) equaled 12 (8,17).Subgroup analysis was performed based on modality, and the relative sensitivity level for MRI vs. CT comparison was 0.9 (sensitivity of 0.7, 0.8, respectively, 0.8-1.0,p = 0.27), and the relative specificity level for MRI vs. CT comparison was 1.0 (specificity of 0.79, 0.73, respectively, 0.9-1.1,p = 0.5) showed no significant difference between the two modalities.The total accuracy of datasets including MRI was 0.76, and the total accuracy of the studies including CT was also 0.76. Figure 5 presents the SROC curve of the studies based on the two modalities.It is evident that a lesser degree of variability was seen for results pertaining to datasets that used CT imaging.The red line and the circles show studies using CT, and the green lines and triangles show studies using MRI.CT imaging showed a lesser degree of variation in diagnostic accuracy results compared to MR imaging.
Summary findings for a pre-test HPV-positive prevalence of 59% are presented in Figure 6.The red line and the circles show studies using CT, and the green lines and triangles show studies using MRI.CT imaging showed a lesser degree of variation in diagnostic accuracy results compared to MR imaging.
Summary findings for a pre-test HPV-positive prevalence of 59% are presented in Figure 6.The red line and the circles show studies using CT, and the green lines and triangles show studies using MRI.CT imaging showed a lesser degree of variation in diagnostic accuracy results compared to MR imaging.
Summary findings for a pre-test HPV-positive prevalence of 59% are presented in Figure 6.

Heterogeneity Assessment
The I2 statistic showed that heterogeneities for sensitivity and specificity were medium (I2 = 56.9% and 55.4%, respectively).

Clinical Utility
Figure 7 depicts the Fagan nomogram.Using a radiomics model generated on crosssectional imaging would increase the post-test probability to 83% from 58%, with a positive likelihood ratio of 3.0 when the pre-test was positive.When the pre-test was negative, the post-test probability decreased to 29% with a negative likelihood ratio of 0.3.

Heterogeneity Assessment
The I2 statistic showed that heterogeneities for sensitivity and specificity were medium (I2 = 56.9% and 55.4%, respectively).

Discussion
In the present review, we present meta-analytic evidence regarding the possible role of radiomics features extracted from CT and MR in determining the status of HPV in oropharyngeal squamous cell carcinoma.Our results show that the combined sensitivity and specificity of texture features equaled 78% and 76%, respectively, with studies using CT imaging showing less variability in their results.Interestingly, the following descriptive characteristics and their quantitative representative texture features were consistently observed to differentiate between HPV-positive and -negative lesions: HPV-positive lesions were more homogenous, spherical, and smaller, contained fewer clusters of areas with low HU (probably due to less tissue necrosis) [45], and showed lower ADC values.

Discussion
In the present review, we present meta-analytic evidence regarding the possible role of radiomics features extracted from CT and MR in determining the status of HPV in oropharyngeal squamous cell carcinoma.Our results show that the combined sensitivity and specificity of texture features equaled 78% and 76%, respectively, with studies using CT imaging showing less variability in their results.Interestingly, the following descriptive characteristics and their quantitative representative texture features were consistently observed to differentiate between HPV-positive and -negative lesions: HPV-positive lesions were more homogenous, spherical, and smaller, contained fewer clusters of areas with low HU (probably due to less tissue necrosis) [45], and showed lower ADC values.
Methodologically, most studies included patients from a single center for the development of their model, did not perform external validation, and acquired low radiomics quality scores.Furthermore, there are methodologic nuances that further limit the extent to which the results could be generalizable to different clinical settings: one crucial issue being the lack of utilization of Image Biomarker Standardization Initiative (IBSI)-approved extraction tools and IBSI-approved nomenclature for texture features of interest [46,47].This is of particular importance since the next step in the implementation of radiomics to this population is the employment of established radiomics classifiers on external datasets and validation of competing classifiers, both of which could highly benefit from established IBSI performance benchmarks [29,48,49].For example, some of the studies included in our meta-analysis utilized cases from an already established dataset on the Cancer Imaging Archive, which could serve as an opportunity for collaborative model development and external benchmarking of trained models [50,51].Another potential factor that could limit the generalizability of the results included in the meta-analysis and introduce a means of heterogeneity is the differences in inclusion criteria of patients, especially the inclusion of patients in different stages of OPSCC.HPV-positive stage III-IV OPSCC lesions may have more histologic and anatomic similarity to HPV-negative lesions compared to lower-stage HPV-positive OPSCCs.This particular variation in patient inclusion may result in overly optimistic results in studies including a larger cohort of patients with earlier-stage OPSCC.This is of particular importance as most of the included studies in the meta-analysis did not report data regarding the stages of the included tumors, increasing their potential risk of bias in patient selection.Furthermore, of the 14 studies included in the meta-analysis, 13 exclusively focused on OPSCC, while a single study did not determine the exact origin of the lesions included, though the majority of their patient cohort consisted of patients with OPSCC (more than 75%) [41].
HPV is a well-established biomarker of prognosis.Routine HPV testing for oropharynx cancers is indeed recommended in NCCN guidelines, and clinical trials also incorporate HPV status with de-intensification trials focusing on HPV-positive patients [8].Furthermore, the NCCN and updated American Joint Commission on Cancer guidelines consider them separate clinical conditions.This growing clinical importance of determining the status of HPV in HNSCC has led multi-disciplinary researchers to develop and validate surrogate markers that can readily determine high-risk infection [52].One of the most promising developments has been the introduction of "NavDX ® liquid biopsy for the detection of circulating tumor-modified HPV DNA [53].The lack of consensus regarding the definition of a clear cut-off for a positive immunohistochemical (IHC) test and concerns regarding varying sensitivity and specificity rates of this method arising from technical considerations have further necessitated a revisit of diagnostic approaches [54].Other possible supplements to IHC have been the utilization of polymerase chain reaction (PCR) and in situ hybridization to detect HPV DNA [52,55,56].However, these new methods may face barriers to feasibility and wide-scale application [13].
Since medical imaging of HNSCC and OPSCC is an essential part of staging the tumor and determining subsequent medical or surgical treatment plans, radiomics texture features offer a potential solution to characterizing the HPV status of these tumors.Our results show that the pooled sensitivity and specificity of radiomics features (77.2% and 76.3%) were below that of the currently established methods mentioned above, as IHC has shown a sensitivity and specificity close to 90% [57].Though the current pieces of evidence do not suggest a substitutive role for radiomics features instead of routine para-clinical practices such as IHC, they provide insight regarding the additive value of radiomics features to these established diagnostic methods.Furthermore, artificial intelligence algorithms seem to be improving with each successive month, and, as more cases of HPV-positive and -negative cancers are accumulated, it is likely that radiomics features will become more reliable and accurate.This may be true not only for HPV status but also for prognostication and predicting response to treatment without regard to the HPV status of the lesion [58].Future investigations should leverage the existing publicly available datasets and classifiers to develop, train, and externally validate new features [59].These endeavors should also take into account clinical variables such as tumor grading and staging in developing predictive models and investigate the possible associations between characteristics such as blood types (recognized as risk factors for hypopharyngeal and oral cavity cancers) and clinical course of HNSCCs [60,61].Future studies should also focus on increasing the interpretability of models by introducing feature maps and SHAP summary plots [62].

Conclusions
The present meta-analysis showed a combined sensitivity and specificity of 77.2% and 76%, respectively, for radiomics features in determining HPV status in OPSCC.The accuracy of radiomics features extracted from imaging is lower than that of the already established para-clinical IHC methods.However, since CT and MR imaging are essential in the work-up of most OPSCCs, collaborative learning models may uncover untapped potential for radiomics features in the determination of HPV status, prediction of treatment response, and other important factors in the diagnosis and therapy of OPSCC.

Figure 1 .
Figure 1.PRISMA diagram of the study, showing the included studies.

Figure 2 18 Figure 1 .
Figure2depicts the Deek's plot of asymmetry and the funnel plot of the study.There was no significant publication bias among the publications included in the meta-analysis.

Figure 2 .
Figure 2. (A)Funnel plot with pseudo 95% confidence interval and (B) Deek's funnel plot of asymmetry of the included studies.The two plots show that although there was some heterogeneity witnessed on the funnel plot, there was no significant publication bias in the included articles.

Figure 2 .
Figure 2. (A)Funnel plot with pseudo 95% confidence interval and (B) Deek's funnel plot of asymmetry of the included studies.The two plots show that although there was some heterogeneity witnessed on the funnel plot, there was no significant publication bias in the included articles.

Figure 3 .
Figure 3. Forest plot showing the individual sensitivity of the included studies and their respective datasets in the meta-analysis [20-24,31-38].

Figure 4 .
Figure 4. Forest plot showing the individual specificity of the included studies and their respective datasets in the meta-analysis [20-24,31-38].

Figure 3 .
Figure 3. Forest plot showing the individual sensitivity of the included studies and their respective datasets in the meta-analysis [20-24,31-38].

Figure 3 .
Figure 3. Forest plot showing the individual sensitivity of the included studies and their respective datasets in the meta-analysis [20-24,31-38].

Figure 4 .
Figure 4. Forest plot showing the individual specificity of the included studies and their respective datasets in the meta-analysis [20-24,31-38].

Figure 4 .
Figure 4. Forest plot showing the individual specificity of the included studies and their respective datasets in the meta-analysis [20-24,31-38].

Figure 5 .
Figure 5. SROC curve of the studies included in the meta-analysis based on their utilized modality.The red line and the circles show studies using CT, and the green lines and triangles show studies using MRI.CT imaging showed a lesser degree of variation in diagnostic accuracy results compared to MR imaging.

Figure 5 .
Figure 5. SROC curve of the studies included in the meta-analysis based on their utilized modality.The red line and the circles show studies using CT, and the green lines and triangles show studies using MRI.CT imaging showed a lesser degree of variation in diagnostic accuracy results compared to MR imaging.

Figure 5 .
Figure 5. SROC curve of the studies included in the meta-analysis based on their utilized modality.The red line and the circles show studies using CT, and the green lines and triangles show studies using MRI.CT imaging showed a lesser degree of variation in diagnostic accuracy results compared to MR imaging.

Figure 6 .
Figure 6.Summary findings of the meta-analysis, based on a pre-test prevalence of 58% and a hypothetical sample size of 100.

Figure 6 .
Figure 6.Summary findings of the meta-analysis, based on a pre-test prevalence of 58% and a hypothetical sample size of 100.

Figure 7
Figure 7 depicts the Fagan nomogram.Using a radiomics model generated on crosssectional imaging would increase the post-test probability to 83% from 58%, with a positive likelihood ratio of 3.0 when the pre-test was positive.When the pre-test was negative, the post-test probability decreased to 29% with a negative likelihood ratio of 0.3.Diagnostics 2024, 14, x FOR PEER REVIEW 13 of 18

Figure 7 .
Figure 7. Fagan nomogram depicting the clinical utility of radiomics features in determining HPV status in the datasets included in the meta-analysis.

Figure 7 .
Figure 7. Fagan nomogram depicting the clinical utility of radiomics features in determining HPV status in the datasets included in the meta-analysis.

Table 1 .
Characteristics of the articles included in the systematic review.Bold articles are those that are also included in the meta-analysis.Bold feature extraction software are those adherent to IBSI nomenclature.