MRI for Differentiation between HPV-Positive and HPV-Negative Oropharyngeal Squamous Cell Carcinoma: A Systematic Review

Simple Summary Human papillomavirus-positive (HPV+) oropharyngeal squamous cell carcinoma (OPSCC) has a different disease course compared to HPV-negative (HPV−) OPSCC. This systematic review aims to investigate whether magnetic resonance imaging (MRI) can discriminate between HPV+ and HPV− OPSCC or predict HPV status in OPSCC patients using MRI. Our results show that parameters derived from structural MRI and diffusion-weighted MRI are able to discriminate between HPV+ and HPV− cases and predict HPV status with reasonable accuracy. Other MRI sequences have yet to prove their added value for the discrimination and prediction of HPV status in OPSCC patients. Machine learning studies that compared predictive models with and without clinical variables found that performance improved significantly when clinical variables were included in the model. Before the clinical implementation of MRI for HPV status determination, larger studies with external model validation using independent datasets are needed. Abstract Human papillomavirus (HPV) is an important risk factor for oropharyngeal squamous cell carcinoma (OPSCC). HPV-positive (HPV+) cases are associated with a different pathophysiology, microstructure, and prognosis compared to HPV-negative (HPV−) cases. This review aimed to investigate the potential of magnetic resonance imaging (MRI) to discriminate between HPV+ and HPV− tumours and predict HPV status in OPSCC patients. A systematic literature search was performed on 15 December 2022 on EMBASE, MEDLINE ALL, Web of Science, and Cochrane according to PRISMA guidelines. Twenty-eight studies (n = 2634 patients) were included. Five, nineteen, and seven studies investigated structural MRI (e.g., T1, T2-weighted), diffusion-weighted MRI, and other sequences, respectively. Three out of four studies found that HPV+ tumours were significantly smaller in size, and their lymph node metastases were more cystic in structure than HPV− ones. Eleven out of thirteen studies found that the mean apparent diffusion coefficient was significantly higher in HPV− than HPV+ primary tumours. Other sequences need further investigation. Fourteen studies used MRI to predict HPV status using clinical, radiological, and radiomics features. The reported areas under the curve (AUC) values ranged between 0.697 and 0.944. MRI can potentially be used to find differences between HPV+ and HPV− OPSCC patients and predict HPV status with reasonable accuracy. Larger studies with external model validation using independent datasets are needed before clinical implementation.


Introduction
Human papillomavirus (HPV) status has been recognized as one of the most important risk factors for oropharyngeal squamous cell carcinoma (OPSCC) [1,2].HPV-positive (HPV+) cases are associated with a different pathophysiology, microstructure, and prognosis compared to HPV-negative (HPV−) cases [1,2].Moreover, patients with HPV+ tumours generally respond better to radiation treatment and have lower mortality rates than patients with HPV− tumours [1,3].From a histopathological perspective, HPV− OPSCC are more often keratinizing and are more likely to have more maturing squamous differentiation [4].In fact, the differences between HPV+ and HPV− tumours are so pronounced that they are regarded as separate disease classes with different TNM staging [5,6].In spite of these differences, both HPV− and HPV+ patients receive the same treatment [5,7,8].The current curative standard of treatment for locally advanced OPSCC is an intense chemoradiation regimen causing considerable unwanted side effects [9,10].As the prevalence of HPV+ OPSCC increases, it becomes crucial to investigate the possibility of treating HPV+ tumours with a less aggressive approach to reduce treatment-related toxicity [5,7,11].
Because of the differences in pathophysiology, prognosis, and treatment response, it is essential to determine the HPV status of the tumour before the start of treatment.The current gold standard method to determine HPV status is by performing HPV polymerase chain reaction (PCR) on biopsied material [12,13].Although PCR for HPV determination is accurate, it is also costly, time-consuming, and invasive [12,13].Because of these disadvantages, p16 immunohistochemistry is often applied as a surrogate marker, as it is more accessible, less expensive, and faster to perform [14].However, p16 immunohistochemistry has a false positive rate of about 20% and requires an invasive biopsy as well [12,15,16].
As an alternative to invasive tests, imaging modalities have been considered to determine HPV status.Compared to invasive tests, magnetic resonance imaging (MRI) has several advantages: it is non-invasive, which lowers the treatment burden, and MRI is applied routinely in the diagnostic work-up and radiotherapy treatment planning of OPSCC, which means that the workflow only needs to be minimally adjusted [17,18].This means that MRI has the potential to provide an independent assessment of HPV status, with a negligible addition of time and cost.Hence, there is increasing interest in investigating whether MRI can differentiate between HPV+ and HPV− tumours [19][20][21][22][23]. Apart from the structural MRI sequences (e.g., T1-, T2-weighted), other sequences such as Dynamic Contrast Enhanced (DCE) and Diffusion-Weighted Imaging (DWI) are also worth investigating, as these sequences provide information about the differences in microstructure and tissue perfusion between HPV+ and HPV− tumours [2,[24][25][26][27][28][29].Moreover, applications such as radiomics and machine learning are also interesting to investigate, as these techniques can be applied to find features through histogram or textural analysis that would not have been uncovered otherwise [21,22,[30][31][32].
Although there is growing interest in the usage of MRI to determine HPV status in OPSCC patients, a comprehensive systematic review that compiles all relevant literature has not been published yet.Therefore, the aim of this study was to report on the literature that uses MRI to discriminate between HPV+ and HPV− cases and predict HPV status in patients with OPSCC.

Search Strategy
A computerized search was conducted of EMBASE, MEDLINE ALL, Web of Science, and Cochrane databases on 15 December 2022, to identify original articles published up to December 2022.This review was performed in accordance with the preferred reporting items for systematic reviews (PRISMA) guidelines and has not been registered [33].The search terms were "OPSCC", "MRI", "HPV", and their synonyms (the full search strategy for reproducing the search can be found in Supplementary Data S1).

Eligibility Criteria
The search results were screened and only studies were selected that focused on OPSCC patients (primary tumour or locoregional lymph nodes), MRI as an imaging modality, and studies that report differences between OPSCC patients with and without HPV infection or studies that aim to predict HPV status.Studies were excluded if the study population consisted of fewer than twenty patients, the study was not published in English, or if the study was not performed on humans.Letters, editorials, conference abstracts, meta-analyses, consensus statements, guidelines, and (systematic) review articles were also excluded.

Study Selection
Title and abstract screening were performed using EndNote20 (Clarivate Analytics, Philadelphia, PA, USA) by two independent reviewers (I.L. and L.C.).Afterwards, full-text screening was performed using the eligibility criteria.
Of the included studies, I.L. and L.C. independently assessed the methodological quality of the selected studies using the QUADAS-2 tool, which assesses studies on possible bias from patient selection, index test, reference standard, and flow and timing [34].Discussion between the reviewers was carried out for any discrepancies in the screening process and quality assessment.

Data Extraction and Analysis
The following data were extracted from the selected studies: 1.
Study characteristics: first author, year of publication, studied subsite(s), study design, total patient number, number of HPV+ patients, number of HPV− patients, and HPV determination method (e.g., PCR, p16).

3.
Analysis methods: study design for HPV status (descriptive/predictive) and, if predictive, type of model, type of validation, area under the receiving operating curve (AUC), accuracy, sensitivity, specificity, and significant variables.

Search Results
In total, we retrieved 395 articles, of which 145 were duplicates.After the first screening of the 250 unique articles, 55 records were assessed for eligibility.Finally, 28 studies met the eligibility criteria and were included in our study (Figure 1).

Study Characteristics
Twenty-eight studies with a total of 2634 patients (1571 HPV+ cases, 899 HPV-cases), conducted between 2014 and 2022, were included in this review.Fourteen out of twenty-eight studies used MRI to find differences between HPV+ and HPV-patients
Machine learning was used in fourteen papers [21][22][23]26,27,31,32,37,40,42,[44][45][46]48].Out of the articles using machine learning, nine papers only used radiomics and radiological variables derived from MRI images [21,22,26,31,32,37,40,44,46], and five papers additionally used clinical features for the prediction of HPV status [23,27,42,45,48].Moreover, one article investigated radiomics without using machine learning [19].None of the models from the studies using MRI to predict HPV status were validated with an independent or external dataset.Logistic regression was the most frequently used classifier (n = 7) [21][22][23]27,31,32,45]. Park et al. and Suh et al. reported several classifiers to compare for their dataset [21,32].Marzi et al. also compared different classifiers and built their final model using naive Bayes [48].Overall, the lowest reported AUC value was 0.697 using a multivariate linear regression model, with an accompanying sensitivity and specificity of 0.769 and 0.733 [44].The highest reported AUC value was 0.944 using a multivariate logistic regression model, with an accompanying sensitivity and specificity of 0.833 and 0.926 [45].The models with the highest AUC values all included clinical features such as smoking, tumour sublocation, and alcohol intake (Table 1).Four studies compared the use of clinical variables with radiological variables, and all four found that a combination of clinical variables outperformed either category on its own [23,31,45,48].
Giannitto et al. investigated radiomics in structural MRI without machine learning and found the 10th and 90th percentile of the T1w intensity histogram to be significantly different between HPV+ and HPV− tumours (16.54 vs. 14.74,p = 0.03 and 161.02 vs. 161.57,p = 0.03, respectively) [19].
As for predictive studies, Sohn et al. and Park et al. used radiomics in combination with machine learning to predict HPV status using structural MRI, reporting AUC values of 0.74 and 0.83 [21,22].First-order features like skewness, 10th and 90th percentile in the intensity histograms, and second-order features like busyness and complexity were found to be useful features for discernment between HPV+ and HPV− tumours [21,22].
Ahn et al. explored the application of pseudo-continuous arterial spin labelling (PCASL), where magnetically labelled flowing blood in the carotids serves as an endogenous contrast agent, eliminating the need for an injected contrast agent [40,54].The image histogram overall standard deviation and 95th percentile of tumour blood flow (TBF95) were significantly different between HPV+ and HPV-cases (27.8 ± 8.7 vs. 37.7 ± 9.0 mL/100 g/min, p = 0.001 and 111.7 vs. 147.3mL/100 g/min, p = 0.004, respectively) [40].Ahn et al. used the standard deviation and 95th percentile of tumour blood flow from PCASL to predict HPV status, which resulted in an AUC of 0.805 after leave-oneout cross-validation tests [40].
Five studies used DW-MRI to predict HPV status, with reported AUC values ranging from 0.77 to 0.944 [27,32,45,46,48].Of the features derived from DW-MRI, two studies used D t and one study used the ADC mean from primary tumours in their model to predict HPV status [27,45,46].Two studies investigating radiomics in MRI found that amongst the useful features were histogram parameters from ADC sequences such as the entropy and homogeneity as well as the inverse difference moment of the lymph node ADC, a measure for homogeneity [46,48].
Ahn et al. explored the application of pseudo-continuous arterial spin labelling (PCASL), where magnetically labelled flowing blood in the carotids serves as an endogenous contrast agent, eliminating the need for an injected contrast agent [40,54].The image histogram overall standard deviation and 95th percentile of tumour blood flow (TBF 95 ) were significantly different between HPV+ and HPV− cases (27.8 ± 8.7 vs. 37.7 ± 9.0 mL/100 g/min, p = 0.001 and 111.7 vs. 147.3mL/100 g/min, p = 0.004, respectively) [40].Ahn et al. used the standard deviation and 95th percentile of tumour blood flow from PCASL to predict HPV status, which resulted in an AUC of 0.805 after leave-one-out cross-validation tests [40].
Fujima et al. investigated the use of amide proton transfer (APT), with the hypothesis that tumours have high cellularity and may exhibit elevated APT values [37].In their analysis, the APT coefficient of variation was significantly higher in the HPV+ group than the HPV− group (0.43 ± 0.04 vs. 0.48 ± 0.04, p = 0.004), though the APT mean and standard deviation were not significantly different between the two groups (p = 0.82 and p = 0.11, respectively) [37].

Discussion
In this systematic review, we aimed to report on all available literature that used MRI to find differences between HPV+ and HPV− cases and predict HPV status in OPSCC patients.For this purpose, we investigated structural sequences, DW-MRI sequences, and other MRI sequences.Our results indicate that parameters derived from structural MRI and DW-MRI show the potential to discriminate between HPV+ and HPV− tumours and predict the HPV status.On the other hand, MRI sequences DCE-MRI or ASL have yet to prove their added value to the differentiation of HPV status in OPSCC patients.Overall, none of the found predictive models have been applied yet in clinical practice.
Concerning structural MRI sequences, the findings were focused on macroscopic differences, such as the size, necrosis, and ulceration of the primary tumour or locoregional lymph nodes.As these studies used structural sequences acquired in clinical routine, the overall number of participants was highest in this category, making the findings more generalizable than the findings from other MRI sequences.However, few studies explored predictive models based exclusively on structural MRI.
In terms of findings regarding DW-MRI, ADC mean was found to be significantly different between HPV+ and HPV− tumours.Despite the heterogeneity in HPV+ and HPV− group size, used b-values, delineation method, and the absolute difference in ADC mean , almost all studies that reported the ADC mean found that it was significantly higher in the HPV− group compared to the HPV+ group.However, there were large differences in the ADC mean values found within studies (Figure 3), which could be caused by the aforementioned heterogeneity of the methods.This underlines the importance of homogenizing DWI protocols across different institutes and studies.A recent study performed the optimization of b-values and identified the optimal set of b-values to maximize DWI parameter accuracy vs. scan time [55].This set of b-values could be a good starting point to further homogenize DWI protocols across institutes.The two studies that did not find a significant difference in ADC mean did not have any notable differences in sample size (±40 participants), HPV+/HPV− ratio, HPV determination method, used b-values, tumour subsites, or statistical tests compared to the other DW-MRI studies [28,49].
Of note, Chan et al. reported ADC values a thousand times larger than the other studies, which we attributed to a reporting error and corrected for [42].As Lenoir et al. used different combinations of b-values to calculate the ADC mean for HPV+ and HPV−, we decided to report the ADC mean calculated from the b-values of 0 and 1000 s/mm 2 , as these were most often used in the other studies [29].Using different combinations and different numbers of b-values, Lenoir found different levels of significance, with most combinations not yielding significant differences, highlighting the importance of using the most suitable b-values in ADC calculation [29,55].While the findings indicate that ADC mean measures tend to be significantly lower in HPV+ tumours, the diverse range of ADC mean values across studies poses challenges in establishing a clinically meaningful threshold.
Several studies investigated the potential added value of other MRI sequences, such as DCE-MRI, PCASL, and APT [26][27][28]35,37,40,47].Though there are variables that differed significantly between HPV+ and HPV− groups, further investigation is needed in larger groups before the added value of these sequences in HPV determination can be confirmed.
Machine learning studies that compared predictive models with and without clinical variables found that performance improved significantly when clinical variables were included in the model [31,48].Amongst machine learning studies, the choice of classifiers varied, with multivariable logistic regression being the most frequently applied method [20,22,23,27,31,32,42,45].This type of classifier performs well on smaller datasets and handles complex data better than multivariate linear regression models but is also more prone to overfitting [32,44,45,56].Marzi et al. compared several classifiers and found that the naive Bayes performed best due to its properties of robustness in the presence of missing and noisy data and the ability to work well on small sample sizes [48,56].Overall, validation on larger and external datasets is needed for all models before implementation in the clinic.
The clinically significant radiomics features were found to be consistent with clinical and histopathological data of both HPV+ and HPV− tumours.Bos et al. found differences in the sphericity and maximum 2D diameter, meaning that HPV+ tumours were rounder and smaller, respectively, than HPV− tumours as seen in structural MRI sequences [31,57].Also, the higher histological homogeneity of HPV+ compared to HPV− could explain the differences in the textural features, such as in the busyness and complexity [2,31].
Furthermore, the differences we found in DW-MRI are also reflected in histopathology.HPV+ tumours are more often poorly differentiated, which in turn leads to higher cell attenuation and an increased nuclear-to-cytoplasmic ratio and decreased extracellular space [58][59][60].As ADC is a measure of the diffusion of water molecules within tissue, it can be expected that ADC is lower in the HPV+ group than in the HPV− group due to the aforementioned higher cellularity [42].The histogram-based features of skewness and kurtosis were mentioned as useful features in several articles [2,21,22,26,31,45].This corresponds with histological findings of HPV+ tumours being organized in homogeneous clusters with little interstitial space, leading to a higher kurtosis in ADC distribution as there is a homogeneous tumour matrix [2,61].HPV+ tumours tend to have more outliers, since there is a small number of cells with high ADC values and a larger number of cells that are more densely packed with low ADC values [2].This causes a higher skewness in the distribution of ADC values [2].On the other hand, HPV− tumours show a higher heterogeneity caused by a higher prevalence of keratin pearls, intratumoural necrosis, haemorrhage, and other factors [2].Therefore, the distribution tends to be more Gaussian, leading to a lower skewness [2,62].
To the best of our knowledge, two other systematic reviews have been published on the use of imaging to differentiate between HPV+ and HPV− cases in head and neck carcinomas.However, these studies had a different scope, as these were focused on texture analysis for HPV status determination using MRI, CT and PET or only focusing on average ADC values from DW-MRI, respectively [62,63].
The findings of our study should be considered alongside its limitations.One of the important limitations is that there was a high heterogeneity in the parameters used for DW-MRI scans.The chosen b-values varied across studies, which is expected to introduce bias in the results [29,64].Moreover, diffusion time, which can affect ADC values, was not reported in any of the papers.In addition, the different types of fitting lead to different ADC values, whilst the different segmentation models across studies introduced more heterogeneity.Because of the high heterogeneity, we were not able to perform a pooled analysis.
Furthermore, it should be noted that we included studies that investigated OPSCC, even if other head and neck subsites were also included in the studies.Although those study populations still consisted largely of OPSCC, it is possible that their results were affected by the other subsites, and thus, this should be considered as a limitation.
Moreover, it is possible that publication bias has played a role in the included studies, as each study found at least one parameter that was different between HPV+ and HPV− tumours, or one useful predictive parameter for HPV status.This may have led to an overestimation of the effectiveness of using MRI to find differences between HPV+ and HPV− tumours or predict HPV status.
Lastly, the quality of a systematic review is inherently limited by the quality of the included studies.Even though most studies had a low bias, we found that the quality of the studies varied across the included papers.This should be taken into consideration when interpreting the results.None of the studies made use of an independent dataset for testing, thus limiting the generalizability of the findings.

Conclusions
In conclusion, our results indicate that parameters derived from structural MRI and DW-MRI show the potential to discriminate between HPV+ and HPV− tumours and predict the HPV status.However, due to the heterogeneity in the methods of studies with similar aims and an overall lack of external validation, further research is needed.At the current time, acquisition methods like DCE-MRI or ASL have yet to prove their added value to the differentiation of HPV status in OPSCC patients.So far, no predictive models have been introduced in clinical practice or used as a replacement for invasive histopathological determination methods, though using such a model alongside p16 immunohistochemistry could be considered to overcome its pitfall, e.g., false positive rates.Before introduction in clinical practice, validation on external, independent datasets is needed.

Figure 1 .
Figure 1.Flowchart of study selection, consistent with preferred reporting items for systematic reviews (PRISMA) statement [33].

Table 2 .
Study characteristics of included studies that aim to predict HPV status.Acc = accuracy; ADC = apparent diffusion coefficient; APT = amide proton transfer;

Table 3 .
Main feature for all feature categories.ADC = apparent diffusion coefficient; D t = tissue diffusion coefficient; DW-MRI = diffusion-weighted magnetic resonance imaging; MRI = magnetic resonance imaging.
* Mean unless specified otherwise.** Standard deviation unless specified otherwise.*** Calculated from b-values 0 and 1000 mm 2 /s for generalization as these were the b-values that were most often used.